HTHS - Acoustic Phonetics

How to handle speech

Thorsten Trippel

Universität Bielefeld

Material provided by a pool of colleagues: Dafydd Gibbon, Vivian Gramley, Alexandra Thies

Acoustic Phonetics & Praat: Theoretical Background

When we speak to each other, the sounds that we make have to travel from the mouth of the speaker to the ear of the listener. This is true whether we are speaking face to face, or by telephone over thousands of miles. What is important for us in our study of speech is that this acoustic signal is completely observable: we can capture everything the listener hears in the form of a recording, and then measure whichever aspect of the signal that we want to know about. (Roach, P. (2001, p. 39)

Goals

At the end of this lesson, you will know about

Today's session summed up in .pdf format!

Recommended reading

... in preparation for this week's session on Acoustic Phonetics:

Terminology

Below, please find a concise illustrated glossary of terms that you may encounter throughout the course of studying Acoustic Phonetics.

Acoustics
- the scientific study of sound and how we hear it
Acoustic phonetics
- deals with the capturing and description of the speech signal as it is produced and perceived; acoustic phonetics is part of acoustics
Wave
- a disturbance of air (vibration) propagated from point to point in a medium or in space
Sound wave
- sound is caused by small areas of high and low pressure propagating outward from its source.

Terminology (2)

Sine wave
- the simplest kind of pressure wave (as created by an ideal tuning fork). Interesting things to measure for a sine wave are the following:
  • amplitude (alternative term: intensity; perceived as loudness) - displacement of the vibrating medium from its rest position; result of pressure differences: high pressure results in (high) peaks, low pressure in (low) valleys; usually measured in decibels (dB)
  • frequency (perceived as pitch) - number of complete vibration cycles per second), usually measured in Hertz (Hz)
  • duration (perceived as speech tempo) - length of a sound, measured in some time unit (e.g. seconds)
See a sine wave.
Complex wave
- a combination of simple sine waves; every complex wave can be decomposed into all the simple sine waves it consists of by means of a spectral analysis (see below). Btw, if you want to learn more as to how simple waves combine into complex waves, click here for a brief illustrated summary!)
Click to see a complex wave. Listen to a composition of simple sine wave .

Terminology (3)

Spectral or Fourier analysis
- analysis that is used to break down any waveform, however complex it might be, into simple waveforms of different frequencies; comparable to breaking down white light into the rainbow pattern of colors that make up its color spectrum
Fundamental frequency (also: f0, or first harmonic)
- the lowest frequency in a complex wave (fundamental); results from the vibration of the larynx in phonation, which to the human ear is audible as the pitch of speech

Terminology (4)

Harmonic
- a frequency that is an integer multiple of the fundamental frequency (so if f0 is 100 Hz, then possible harmonics would be 200 Hz, 300 Hz, etc.)
See a harmonic sound.
Overtones
- while harmonics are integer multiples of f0, overtones refer to any frequency above f0; so while not every overtone is a harmonic, all harmonics are overtones (if f0 = 100 Hz, then 150 Hz is an overtone, but not a harmonic)
Formant
- energy peaks that determine the quality of sounds (esp. vowel sounds) and which are the result of resonances in the vocal tract; they are a consequence of resonance but not resonance itself. Note that a formant may be a harmonic (see above), but doesn't have to be! (For further information on formants, see below!)

Terminology (5)

Oscillogram
- graphical depiction of sound pressure/amplitude (dB) (vertical) and time (horizontal)
See an oscillogram.
Spectrum
- graphical depiction of frequency (Hz) (horizontal) and amplitude (dB) (vertical); depicts decomposed complex waves, "listing" all the frequencies of the simple sine waves involved as well as their respective amplitude (with vowels, peaks in the spectrum constitute the sound-characteristic formant frequencies of the vowel (see explanation of sound-characteristic formant frequencies to find out what that means!)
See to see a spectrum.
Spectrogram
- result of a spectral analysis of some waveform (oscillogram); frequency (Hz) is represented on the vertical axis of the display and time (s) on the horizontal axis, while the intensity (darkness or brightness) of the display shows the amplitude (intensity, in dB) at different frequencies at a particular point in time (a spectrogram is three-dimensional)
See a spectrogram.

Terminology (6)

Silent sound
- during the closed phase of plosives
See a silent sound.
Plosive sound
- when the closure is released and the air pressure is big enough
Click here in order to see a spectrogram of voiced and voiceless plosives in the words "a toe," "a doe," and "otto."
Fricative sound
- due to a constriction in the vocal tract
Please click to see an oscillogram of ∫ and a spectrogram of both voiceless and voiced fricative sounds (top row, left to right: f, θ, s, ∫; bottom row, left to right: v, ð, z, ʒ)

The source-filter theory of sound production

Please note: The following material (on the source-filter model) is taken from http://www.ling.upenn.edu/courses/Spring_2001/ling001/phonetics.html (last visited: November 23, 2005).

The source-filter theory of sound production

Between the larynx and the world at large is about 15 centimeters of throat and mouth. This passageway acts as an acoustic resonator, enhancing some frequencies and attenuating others. The properties of this resonator depend on the position of the tongue and lips, and also on whether the velum is lowered so as to open a side passage to the nasal cavities.

The source-filter theory of sound production

A useful way to view the vocal tract is as an acoustic filter on sounds originating at the larynx: The vibrating larynx creates the buzz [the source], and the vocal tract shape determines the way this buzz is modified [the filter]. (It's best to view this diagram starting at the bottom.)

Source-filter-model

Source-filter-model

In the example above, the tract is in a neutral shape, roughly the vowel of "up." Different positions of the tongue and lips make the difference between one vowel sound and another. This filtering effect can be seen by comparing other vowels.

From the vocal tract to the spectrum

The "spectra" at the right represent the sound waves that we interpret as the vowels [i, a, u].

Formants - Close Up

What they are

(Please note: This introduction to the concept of formants is based on Ladefoged's A Course in Phonetics (1975, 3rd edition).)
  • quality of a sound, e.g. a vowel sound, depends upon its overtone structure
  • vowel sound contains a number of different pitches simultaneously:
    • pitch at which it is actually spoken, plus
    • various overtone pitches that give it its distinctive quality
  • vowels distinguished from each other by differences in audible overtones
  • normally, separate overtones cannot be heard; what we hear is only sensation of pitch = note on which the vowel is actually said (depending on the rate of vibration / frequency of vocal cords)

Task:

  1. Say the vowels
    IPA cardinal vowels
    as in the words heed, hid, head, had, hod, hawed, hood, who'd.
  2. Now whisper these vowels. (alternatively: creaky voice > also reduces vibration of vocal folds).
What does this show you?
  • in a whispered sound, vocal cords are not vibrating; hence no regular pitch of the voice
  • nevertheless, when whispered, it can be heard that these sounds form a series of sounds on a continuously descending pitch = overtones that characterize the vowels
  • this particular overtone highest for [i] and lowest for [u]

Task: Now whistle a very high note, and then the lowest note that you can.

What you should find:
  • for the high note: tongue in position for [i]
  • for the low note: tongue in position for [u]
  • intermediate notes: tongue positions of the other vowels in the series

In sum:

  • vowels largely distinguished by two characteristic pitches:
    1. one of them (the higher of the two) goes downward through the series
      IPA vowels
    2. other one goes up for first four vowels
      IPA vowels
      and down for the next four
      IPA vowels
  • characteristic overtones = formants of the vowels
    • lower of the two: first formant (F1)
    • higher of the two: second formant (F2)
  • (even a third overtone, but its pitch cannot be demonstrated easily)

Please see the distribution of the formant frequencies F1 and F2 for a series of synthesized German vowels.

Visual representation of formants:

In a spectrum we can see the formants as maxima, and in a spectrogram we see them with the most dark shading.

Another example for a spectrographic view - vowels and their formants:

Vowels in a spectrographic view

(Wideband spectrograms of the vowels of American English in a /b__d/ context. Top row, left to right: "bead" "bid" "bade" "bed" "bad". Bottom row, left to right: "bod" "bawd" "bode" "buhd" "booed".)

Formants are the result of different shapes of the vocal tract:

Position of vocal organs

See here the position of the vocal organs and the spectra of the vowel sounds in the middle of the words heed, hid, head, had and hod, hawed, hood, who'd The peaks in each of the spectra, again, are the formants characteristic for the respective vowels.

. Don't forget: Formants are a feature of the vocal tract and completely independent from any source signal. How is that? when you form an [o] with your mouth but don't let out air from your lungs but simply tap against your cheeks/jaw/larynx with your fingers you'll be able to hear an [o] - this is the vocal tract resonance. This is why formants are also called resonance frequencies.

How do formants relate to articulation?

The positions for the first (F1) and second formant (F2) of a vowel aren't random! Let's have a look at the following chart of formant values for vowels of (Canadian) English.

Please note: Most of the following material has been borrowed from http://www.umanitoba.ca/faculties/arts/linguistics/russell/138/sec4/form2.htm.

Formants

Vowel [i] [ɪ] [e] [æ] [ɑ] [ɔ] [ʊ] [u] [ʌ]
F1 280 370 405 860 830 560 400 330 680
F2 2230 2090 2080 1550 1170 820 1100 1260 1310

(vowels as in: bleed, hid, head, had, father, saw, put, shoe, cut)

Each of these vowels can be placed on a graph, where

Formants

Formants F1 and F2

If you pay close attention, you will note that this is just a mirror image of the familiar vowel chart! If we, however, change the axes of the graph so that

Vowel chart in terms of formants

Formants F1 and F2

So, the formant frequencies are inversely related to the traditional articulatory parameters (see above).

Another visualization of the formants in the vowel quadrilateral (same thing as above!).

Blend the above chart (the second, here a reduced version of it!) into an image of the vocal tract, and you will see where each of the vowels is produced and how the vowel formants are related to the shape of the human vocal tract!

Formants and Vowel-Quadrilateral

Formants and Vowel-Quadrilateral

This means that a listener can essentially "hear" the position of the speaker's tongue body:

Homework: Revision exercises

... in preparation for next week's session on Auditory Phonetics: