Applying Phonetics. Murray J. Munro
much differently when we speak than when we are quiet. For one thing, the inspiratory phase is faster during speech, and expiration is much more prolonged. We are normally unaware of the difference, however, and we usually manage to utter all the words we have planned without needing an awkward break to inhale partway through.
2.2.2 modifying the airstream: phonation
During speech, we make important modifications to the egressive airstream using structures inside the larynx, which consists of cartilage, muscle, and bone (see Figure 2.3). At the bottom of the larynx is the trachea; at the top is the hyoid bone, which has the shape of a horseshoe opening toward the back of the neck. The thyroid cartilage, which is connected to the hyoid bone by the thyrohyoid membrane, forms most of the front of the larynx and protects the structures inside. Part of the thyroid cartilage juts out and is often visible on the neck as the laryngeal prominence (Adam's apple). It is typically less visible in women than in men because the larynx tends to be smaller in adult females. Notice also the v‐shaped laryngeal notch. Another major structure of the lower larynx is the cricoid cartilage, which attaches to the thyroid cartilage.
Figure 2.3 Structure of the human larynx
(Source: Adapted from https://commons.wikimedia.org/wiki/File:Larynx_external_en.svg)
Figure 2.4 The vocal folds viewed from above
Inside the larynx and above the trachea are the VOCAL FOLDS, two pieces of tissue that function as a valve to control airflow. The space between the vocal folds is the GLOTTIS. While the air is on its way out of the lungs, the speaker has the option of allowing it to pass through the glottis unimpeded, as during quiet breathing. For this to occur, the vocal folds are kept wide open or ABDUCTED in a V‐shape, as shown in Figure 2.4. Notice that the opening of the V faces the back of the larynx, and each end is attached at one of two pyramidal cartilages known as the arytenoid cartilages. When the vocal folds are tightly shut or ADDUCTED, no air flows at all. Alternatively, the speaker can choose to PHONATE, or generate sound in the larynx. This is achieved through one of the GLOTTAL SETTINGS, as illustrated in Figure 2.5.
The glottal state for the phonation type called MODAL VOICING entails rapid and repeated opening and closing of the vocal folds. This vibratory mode occurs in some speech sounds, such as /n/ as in nine and /z/ as in zoo. If you place your fingers gently on your throat and sustain these sounds for several seconds, you will feel the vibration. In contrast, if you produce a sustained /s/, as at the beginning of Sue, you will feel no vibration at all, because this sound has no voicing.
To get the vocal folds to vibrate, the brain does not simply command them to open and close rapidly. Instead, the speaker configures them to take advantage of aerodynamic effects to achieve the vibratory state. Here is a simplified breakdown of the steps:
The speaker intentionally adducts (brings together) the vocal folds to achieve a “sweet spot”—not too tight or too gentle—that will make vibration possible.
The speaker initiates expiration, and the air pressure in the thoracic cavity therefore increases.
The vocal folds abduct (move apart) as a result of the increased pressure, and air flows up through the upper part of the vocal tract.
As a result of the rapid airflow, the air pressure in the glottis drops.
With the drop in air pressure, the vocal folds snap shut, but then immediately reopen thanks to the air pressure in the thoracic cavity (due to expiration).Figure 2.5 States of the glottis: traditional characterization of phonation types
The cycle repeats itself many times over at a rate of around 100–150 times per second for an adult male and 180–250 times per second for an adult female. You might compare this opening and closing to what happens when you pinch the neck of an inflated balloon so that the escaping air makes a noise.
Of course, as a speaker, you complete the above steps without conscious effort; no one ever had to teach you how to create voicing! Moreover, you make a variety of subtle adjustments to your vocal fold configuration as you speak. In the production of normal voiced speech, the rate of vibration does not remain constant. If it did, human voices would have only a single PITCH and would sound robotic. Vocal pitch (how “high” or “low” the speaker sounds) is determined by how rapidly the vocal folds open and close. More rapid vibration causes a higher pitch. During an utterance, the speaker raises and lowers vocal pitch to help express meaning and convey different attitudes. Increased pitch is achieved by activating the cricothyroid muscles, causing the larynx to rock forward and increasing the tension on the vocal folds. The more rapid rate of vibration is heard as a higher pitch.
If the folds are kept partially abducted during expiration, the result is a different phonation type, WHISPER. In this case, friction between the moving airstream and the vocal folds generates noise instead of the regular vibratory pattern of voicing. While whisper is useful for certain communicative purposes, such as telling a secret or creating mystique, it is not an especially effective phonatory type. For one thing, whispered speech doesn't travel very far; for another, whispered utterances tend to be less intelligible than normally voiced ones.
A third type of phonation, known as BREATHY VOICE, sounds intermediate between voicing and whisper. It is produced by abducting the vocal folds just enough to allow a combination of vibration and friction as air moves through the glottis. Breathy voice is used to distinguish speech sounds in some languages, such as Hindi. In addition, it is possible to use breathy voice extensively to convey a particular attitude or personality type. The actress Marilyn Monroe, in particular, is often cited as an example of a breathy speaker.
How Do we Know What Goes on in the Larynx?
Early speech investigators had to rely mainly on close observations of the lips, tongue, and teeth in their attempts to understand speech production. Strategic placement of mirrors could also allow limited viewing of the vocal folds and some of the other parts of the larynx and lower pharynx. You might find it surprising that special instruments resembling the modern endoscope (used for viewing internal body structures, including the urinary and digestive tracts) appear to have existed since Roman times and are found among artifacts from Pompei. However, systematic use of such devices did not occur until the advent of the field of endoscopy (viewing and imaging internal structures through insertion of instruments) in the 1800s. The rigid endoscope could be introduced through the mouth to illuminate pharyngeal and laryngeal tissues, allowing a doctor to examine them through an eyepiece. A more recent innovation is the flexible transnasal endoscope (Figure 2.6). After lubrication, a thin spaghetti‐like tube with an attached digital camera and fiberoptic light source is threaded through the nostrils until it enters the pharynx. This permits viewing and