Institute of Phonetic Sciences & IFOTT, University of Amsterdam
There have been studies that investigated acoustic and articulatory consonant reduction in relation to the corresponding vowel reduction, but these were generally limited to only a few classes of consonants, with only limited speech material, e.g. [1, 4, 5, 6, 11]. From these studies it is difficult to discern the general effects of consonant reduction in "normal" speech situations.
Velar Pal Alve Lab Plos k g t d p b Fric x V J S s z f v Nasal N n m V-like r j l ~ w
At the moment, any understanding of the way reduction affects the spectro-temporal structure of consonants and the way it influences consonant identification is seriously lacking. Therefore, it is difficult to point to specific features of articulation where reduction will affect the phonemic distinction of consonants. In this paper, we will limit ourselves to an inventory of consonant acoustics that parallel the vowel characteristics that are affected by vowel reduction. One important question that we want to answer is whether acoustic consonant reduction is indeed similar to vowel reduction.
Four aspects of vowels and consonants are studied to characterise consonant reduction:
Velar Pal Alve Lab Total Plos 63 65 61 189 Fric 77 3 63 75 218 Nasal 14 72 63 149 V-lik 60 21 94 60 235 e Total 214 24 294 259 791
From the phonetic transcription, all Vowel-Consonant-Vowel (VCV) segments were located in the speech recordings (also those crossing word boundaries). 1847 VCV pairs had both realizations originating from corresponding positions in the utterances with identical syllable structure, syllable boundary type, and sentence and word stress. Of these VCV-pairs, 791 have been analyzed in detail for this paper (see table 1 and 2) and will be used here to study consonant reduction in more detail.
Phoneme boundaries were placed using a waveform display with audio feedback [2] combined with synchronized displays of the Harmonicity-to-noise ratio, total energy, and the spectral balance, i.e., energy in the high- (above 3 kHz) versus low- (below 750 Hz), high- versus mid- (between 750 and 3000 Hz), and mid- versus low-frequencies. In cases were none of the displays suggested a boundary, audio cues were used exclusively. The boundaries between vowels and consonants were placed preferably on waveform zero-crossings that corresponded to "visible" changes in the spectral composition of the waveform. If present, priority was given to spectral changes that indicated the start or end of a constriction (e.g., abrupt changes in the spectral balance). LPC formant tracks were extracted using the Split-Levinson algorithm (after down sampling to 10 kHz, using 5 pole zero pairs).
The formant transitions in the vowel off- and onset bordering a consonant, especially of the F2, are both sensitive to coarticulation and are important cues for consonant identification [3, 9]. To quantify the extent of acoustic coarticulation we determined the difference between the F2 slopes at the CV- and the VC-boundaries (i.e., the F2 slope difference). We used formant track slopes normalized for vowel duration because formant track shapes are largely invariant with speaking rate [10] and because in perception one also normalizes for speaking rate [8]. The slopes were calculated from the coefficients of a 4th order polynomial fit of the F2 tracks of the vowels with the duration normalized to 1.
For the fricatives and plosives, as well as for all consonants pooled (not shown), there is a statistically significant lower slope difference between speaking styles (p <= 0.001, two tailed Sign test). The behaviour of individual phonemes is very erratic (figure 2, none reaches statistical significance).
Both vowels and consonants become shorter when spoken spontaneously. Furthermore, they become shorter by the same amount. The relative duration of consonants in the VCV segments, i.e., as a fraction of the total, does not change when speaking style changes (not shown).
For Dutch (and English), a more level spectral slope, i.e., a higher COG, strongly correlates with perceived sentence accent [12, 13]. As the de-accentuation of vowels strongly correlates with vowel reduction, we can predict that reduction will show up as a lower COG. In figure 4 this prediction bears out for the vowel realizations. For each vowel, spontaneous realizations have a lower COG than the read realizations (only shown for pooled data). For the sonorants and fricatives we see a similar picture (a lower COG for spontaneous realizations). For the release bursts of the plosives we see an erratic behaviour that does not seem to indicate a definite difference in the COG with respect to speaking style.
Quite low COG frequencies are found for sonorants (vowels and consonants) with vowels having higher values than nasals and vowel-like consonants. For the latter, the COG is dominated by the damping of the higher frequencies due to their closed articulation.
Figure 6 displays the sound energy differences for read and spontaneous speech. For all consonants, except for the nasals, the intervocalic sound energy difference is smaller in spontaneous speech. Altogether, the effects of speaking style changes on the intervocalic sound energy differences seem to be small, on the order of 1 dB. Therefore, changes in the sound level of the vowels seem to be largely matched by corresponding changes in the intervocalic consonants.
The generally lower F2 slope differences in spontaneous speech indicate a decrease of coarticulation strength. This is equivalent to the spectral effect of articulatory reduction found in vowel space.
Except for the plosives, all consonants and vowels showed a decrease in COG. This indicates that both the vowels and the non-plosive consonants show a diminishing source strength in spontaneous speech. This in return, implies a decrease in vocal and articulatory effort. As the COG is strongly linked to the spectral slope at high frequencies, this lowering might be expected to correlate with a decrease in the perceived stress of the vowels and, if consonants contribute to stress perception, the consonants [12, 13].
In spontaneous speech, the nasal consonants "weaken" somewhat more than the neighbouring vowels whereas other consonants "weaken" somewhat less than the vowels (figure 6).