HTHS - Acoustic Phonetics
How to handle speech
Thorsten Trippel
Universität Bielefeld
Material provided by a pool of colleagues: Dafydd Gibbon, Vivian
Gramley, Alexandra Thies
Acoustic Phonetics & Praat: Theoretical Background
When we speak to each other, the sounds that we make
have to travel from the mouth of the speaker to the ear of
the listener. This is true whether we are speaking face to
face, or by telephone over thousands of miles. What is
important for us in our study of speech is that this
acoustic signal is completely observable: we can capture
everything the listener hears in the form of a recording,
and then measure whichever aspect of the signal that we want
to know about. (Roach, P. (2001, p. 39)
Goals
At the end of this lesson, you will know about
- important terms such as amplitude, frequency, and
duration
- wave forms
- formants
Today's session summed up in .pdf format!
Recommended reading
... in preparation for this week's session on Acoustic Phonetics:
- Chapter 6 on "Acoustics of speech sounds" in Roach, P.
(2001). Phonetics. Oxford: OUP.
Terminology
Below, please find a concise illustrated glossary of terms that
you may encounter throughout the course of studying Acoustic
Phonetics.
- Acoustics
- - the scientific study of sound and how we hear it
- Acoustic phonetics
- - deals with the capturing and description of the speech
signal as it is produced and perceived; acoustic
phonetics is part of acoustics
- Wave
- - a disturbance of air (vibration) propagated from
point to point in a medium or in space
- Sound wave
- - sound is caused by small areas of high and low
pressure propagating outward from its source.
Terminology (2)
- Sine wave
- - the simplest kind of pressure wave (as created by an
ideal tuning fork). Interesting things to
measure for a sine wave are the following:
- amplitude (alternative term:
intensity; perceived
as loudness) - displacement
of the vibrating medium from its
rest position; result of pressure
differences: high pressure results
in (high) peaks, low pressure in
(low) valleys; usually measured in
decibels (dB)
- frequency (perceived as
pitch) - number of complete
vibration cycles per second),
usually measured in Hertz
(Hz)
- duration (perceived as speech
tempo) - length of a
sound, measured in some time unit
(e.g. seconds)
See a sine wave.
- Complex wave
- - a combination of simple sine waves; every complex
wave can be decomposed into all the simple sine
waves it consists of by means of a spectral analysis
(see below). Btw, if you want to learn more as to
how simple waves combine into complex waves, click
here for a brief illustrated summary!)
Click to see a complex wave. Listen to a composition of simple sine
wave .
Terminology (3)
- Spectral or Fourier analysis
- - analysis that is used to break down any waveform,
however complex it might be, into simple waveforms
of different frequencies; comparable to breaking
down white light into the rainbow pattern of colors
that make up its color spectrum
- Fundamental frequency (also: f0, or first harmonic)
- - the lowest frequency in a complex wave (fundamental);
results from the vibration of the larynx in
phonation, which to the human ear is audible as the
pitch of speech
Terminology (4)
- Harmonic
- - a frequency that is an integer multiple of the
fundamental frequency (so if f0 is 100 Hz,
then possible harmonics would be 200 Hz, 300 Hz,
etc.)
See a harmonic sound.
- Overtones
- - while harmonics are integer multiples of f0,
overtones refer to any frequency above f0; so
while not every overtone is a harmonic, all
harmonics are overtones (if f0 = 100 Hz, then
150 Hz is an overtone, but not a harmonic)
- Formant
- - energy peaks that determine the quality of sounds
(esp. vowel sounds) and which are the result of
resonances in the vocal tract; they are a
consequence of resonance but not resonance itself.
Note that a formant may be a harmonic (see
above), but doesn't have to be! (For further
information on formants, see below!)
Terminology (5)
- Oscillogram
- - graphical depiction of sound pressure/amplitude (dB)
(vertical) and time (horizontal)
See an
oscillogram.
- Spectrum
- - graphical depiction of frequency (Hz) (horizontal)
and amplitude (dB) (vertical); depicts decomposed
complex waves, "listing" all the frequencies of the
simple sine waves involved as well as their
respective amplitude (with vowels, peaks in the
spectrum constitute the sound-characteristic formant
frequencies of the vowel (see explanation of sound-characteristic formant frequencies to find out what that
means!)
See to see a spectrum.
- Spectrogram
- - result of a spectral analysis of some waveform
(oscillogram); frequency (Hz) is represented on the
vertical axis of the display and time (s) on the
horizontal axis, while the intensity (darkness or
brightness) of the display shows the amplitude
(intensity, in dB) at different frequencies at a
particular point in time (a spectrogram is
three-dimensional)
See a spectrogram.
Terminology (6)
- Silent sound
- - during the closed phase of plosives
See
a silent sound.
- Plosive sound
- - when the closure is released and the air pressure is
big enough
Click here in order to see a spectrogram of voiced and voiceless
plosives in the words "a toe," "a doe," and "otto."
- Fricative sound
- - due to a constriction in the vocal tract
Please
click to see an oscillogram of ∫ and a spectrogram of both voiceless and
voiced fricative sounds (top
row, left to right: f, θ, s, ∫; bottom row, left to
right: v, ð, z, ʒ)
The source-filter theory of sound production
Between the larynx and the world at large is about 15 centimeters
of throat and mouth. This passageway acts as an acoustic
resonator, enhancing some frequencies and
attenuating others. The properties of this resonator
depend on the position of the tongue and lips,
and also on whether the velum is lowered so as to
open a side passage to the nasal cavities.
The source-filter theory of sound production
A useful way to view the vocal tract is as an acoustic
filter on sounds originating at the larynx: The
vibrating larynx creates the buzz [the source], and
the vocal tract shape determines the way this buzz is
modified [the filter]. (It's best to view this
diagram starting at the bottom.)
Source-filter-model
In the example above, the tract is in a
neutral shape, roughly the vowel of "up." Different
positions of the tongue and lips make the difference between one
vowel sound and another. This filtering effect can be seen by
comparing other vowels.
The "spectra" at the right represent the sound waves that we
interpret as the vowels [i, a, u].
Formants - Close Up
What they are
(Please note: This introduction to the
concept of formants is based on Ladefoged's
A Course in
Phonetics (1975, 3rd edition).)
- quality of a sound, e.g. a vowel sound, depends upon its
overtone structure
- vowel sound contains a number of different pitches
simultaneously:
-
- pitch at which it is actually spoken,
plus
- various overtone pitches that give it
its distinctive quality
- vowels distinguished from each other by differences in
audible overtones
- normally, separate overtones cannot be heard; what we
hear is only sensation of pitch = note on which the
vowel is actually said (depending on the rate of
vibration / frequency of vocal cords)
Task:
- Say the vowels

as in the words heed, hid, head, had,
hod, hawed, hood, who'd.
- Now whisper these vowels. (alternatively: creaky voice
> also reduces vibration of vocal
folds).
What does this show you?
- in a whispered sound, vocal cords are not vibrating;
hence no regular pitch of the voice
- nevertheless, when whispered, it can be heard that these
sounds form a series of sounds on a continuously
descending pitch = overtones that characterize the
vowels
- this particular overtone highest for [i] and lowest for
[u]
Task: Now whistle a very high note, and then the lowest
note that you can.
What you should find:
- for the high note: tongue in position for [i]
- for the low note: tongue in position for [u]
- intermediate notes: tongue positions of the other vowels
in the series
Visual representation of formants:
In a spectrum we can see the formants as maxima,
and in a spectrogram we see them with the most dark
shading.
Another example for a spectrographic view - vowels and their
formants:
(Wideband spectrograms of the vowels of American English in a
/b__d/ context. Top row, left to right: "bead" "bid" "bade"
"bed" "bad". Bottom row, left to right: "bod" "bawd" "bode"
"buhd" "booed".)
Formants are the result of different shapes of the vocal tract:
- any body of air will vibrate in way that depends on its
size and shape (e.g. blowing across the top of an
empty vs. a full bottle of water)
- smaller bodies of air (like smaller piano strings,
smaller organ pipes) produce higher pitches
- in case of the vowel sounds: vocal tract has a complex
shape so that the different bodies of air produce a
number of overtones
Position of vocal organs
See here the position of the vocal organs and the spectra of the
vowel sounds in the middle of the words
heed, hid, head, had
and
hod, hawed, hood, who'd
The peaks in each of the spectra, again, are the
formants characteristic for the respective vowels.
.
Don't forget: Formants are a feature of the vocal
tract and completely independent from any source signal. How is
that? when you form an [o] with your mouth but don't let out air
from your lungs but simply tap against your cheeks/jaw/larynx with
your fingers you'll be able to hear an [o] - this is the
vocal
tract resonance. This is why formants are also called
resonance frequencies.
Formants
| Vowel |
[i]
|
[ɪ]
|
[e]
|
[æ]
|
[ɑ]
|
[ɔ]
|
[ʊ]
|
[u]
|
[ʌ]
|
| F1 |
280 |
370 |
405 |
860 |
830 |
560 |
400 |
330 |
680 |
| F2 |
2230 |
2090 |
2080 |
1550 |
1170 |
820 |
1100 |
1260 |
1310 |
(vowels as in: bleed, hid, head, had, father, saw, put, shoe,
cut)
Each of these vowels can be placed on a graph, where
- the horizontal dimension represents the frequency of the
first formant (F1), and
- the vertical dimension represents the frequency of the
second formant (F2):
Formants
If you pay close attention, you will note that this is just a
mirror image of the familiar vowel chart! If we, however, change the
axes of the graph so that
- the horizontal dimension shows (decreasing) F2,
- and the vertical dimension shows (decreasing) F1,
Vowel chart in terms of formants
So, the formant frequencies are inversely related to the
traditional articulatory parameters (see above).
Another visualization of the formants
in the vowel quadrilateral (same thing as above!).
Blend the above chart (the second, here a reduced version of it!)
into an image of the vocal tract, and you will see where
each of the vowels is produced and how the vowel formants
are related to the shape of the human vocal tract!
Formants and Vowel-Quadrilateral
This means that a listener can essentially "hear" the position
of the speaker's tongue body:
- F1 is influenced by tongue body height.
- F2 is influenced by tongue body frontness/backness.
Homework: Revision exercises
... in preparation for next week's session on Auditory Phonetics: