The Handbook of Language and Speech Disorders. Группа авторов
improvements in overall performance whereby ceiling‐level performance is reached more often in recent studies than in past studies. This was noted by Blamey et al. (2013) who compared results from their retrospective, multi‐center study of 2,251 postlingually deafened adult CI recipients to results from their study of a similar cohort from 15 years earlier (Blamey et al., 1996). They attributed this overall improvement to better device characteristics, advances in speech processing strategies, and clinical focus on preservation of residual hearing after implantation. It was also noted that the intervening years had seen changes in candidacy criteria whereby they were less restrictive. However, despite more instances of ceiling performance and improved results, there is still considerable variation in word recognition by cohorts of CI listeners. For instance, Holden et al. (2013) tested 114 postlingually deafened adults on the identification of CNC (Consonant‐Nucleus‐Consonant) words longitudinally over 2 years. Outcomes at 2 weeks after CI activation (CNC Initial) were compared with a score corresponding to the asymptotic score to which performance converged over 2 years (CNC Final). CNC Initial ranged from 0 to 73.6% and CNC Final ranged from 2.9 to 89.3%. Correlational statistics showed that, of a large number of biographical, audiological, cognitive and device‐related variables that were included, the most important ones explaining this variation were duration of hearing loss, CI sound‐field threshold levels, the percentage of electrodes in the scala vestibuli, age at implantation, and cognitive functioning, such as working memory.
Moberly, Lowenstein, and Nittrouer (2016) tested a variety of tasks including “perceptual sensitivity” (labeling “cop” vs. “cob” and “sa” vs. “sha” based on durational or spectral cues), “perceptual attention” (discriminating quasi‐syllabic sinusoidal glides based on duration or “spectral” cues), and word recognition (CID [Central Institute for the Deaf]‐22 word lists; Hirsh et al. 1952) in 30 postlingually deafened adult CI recipients and 20 NH controls. The two groups had similar perceptual sensitivity and attention for duration cues, but the CI group had less sensitivity for spectral cues. Word recognition varied between 20 and 96% correct, with a mean accuracy of 66.5% for their clinical group, whereas the task posed little perceptual challenge to the NH group, as their mean accuracy was 97.1%. These word recognition scores were predicted by spectral cue sensitivity and attention, suggesting that speech perception deficits at the phonetic, that is the sub‐segmental, level affect those at the word level.
In a gated word recognition study, Patro and Mendel (2018) found that CI users needed on average around 35% more speech information to recognize words than NH controls, and NH participants listening to vocoded speech needed approximately 25% more information. The fact that these two groups had relatively low performance suggests that CI users' disadvantage is, at least in part, due to spectrotemporal signal degradation caused by the electrical–neuronal perceptual bottleneck, and not merely to extra‐auditory factors such demographic group characteristics. When contextual information was provided by inserting the target words either in semantically relevant (e.g., “Paul took a bath in the TUB”) or semantically neutral (e.g., “Smith knows about the TUB”) sentences, words were recognized more easily (cf. Holt, Yuen, & Demuth, 2017). Moreover, CI users benefited more from this top‐down information than the controls did. This shows that signal degradation affects word recognition and that CI users will often rely more on contextual cues.
A number of conclusions can be drawn from the studies that are summarized above. Firstly, CI users can attain a high level of word recognition. Secondly, there is a great amount of individual variation in these recognition abilities, for which a wide range of factors may be responsible. Thirdly, there is evidence that lower‐level problems stemming from signal degradation are partly responsible for higher‐level problems in speech perception. Finally, top‐down information supports speech perception, improving CI listeners’ understanding of speech by allowing partial compensation for the spectral degradation that use of their implant necessarily entails.
3.3.5 Prosody Perception
Another informational layer of speech that is affected by spectral degradation is the perception of prosody. The functions of prosody in language are threefold. First of all, prosody signals the meaning or morphological and syntactic structure of several levels of linguistic elements, such as words, sentences, and larger units of discourse. This is commonly referred to as linguistic prosody. For instance, it distinguishes certain segmentally identical words, such as REcord vs. reCORD (lexical stress) and marks word grouping (phrasing), such as blue bottle vs. bluebottle, and given as opposed to new information (topic vs. focus), such as My COLleague was supposed to do this, as opposed to, My colleague was supposed to do THIS, where capitals indicate sentential accents. Secondly, prosody reflects the emotional state of the speaker or their attitude in relation to their utterance, and this attribute of speech is termed emotional prosody. For example, any utterance, in principle, may be pronounced in a sad, happy, angry or fearful way, or with any other emotion. Attitudes such as surprise, irony and sarcasm may also be employed to signal a speaker’s stance with regards to the truthfulness of the utterance. Finally, indexical prosody is suprasegmental information about the identity of the speaker, such as age, health, and provenance (Lehiste, 1970; Rietveld & van Heuven, 2016). Prosody is mainly conveyed by means of variation in intensity (stress), voice fundamental pitch variation (F0, also referred to as intonation), duration (of any linguistic unit as well as pauses in speech) and voice quality, for example, harshness, strain and creakiness (Rietveld & van Heuven, 2016). The current discussion will focus on the ability of CI listeners to perceive linguistic and emotional prosody.
An important investigation of linguistic prosody was performed by Meister et al. (2015). They presented CI users and NH controls with increments of manipulated F0, intensity and duration cues for word stress. They also measured just‐noticeable difference discrimination thresholds of participants for these phonetic dimensions. The researchers showed that the clinical group’s performance was compromised by the F0 and intensity cue manipulations, but not by manipulation of the duration cue, suggesting that they relied more on duration than on the other cues. A similar pattern was observed in the discrimination thresholds reported in the study, which were least elevated for CI in comparison to NH listeners for duration (51 ms for CI; and 40 ms for NH), more elevated for intensity (3.9 dB for CI; and 1.8 dB for NH), and most elevated for F0 (5.8 semitones for CI; and 1.5 semitones for NH) (cf. Kalathottukaren, Purdy, & Ballard, 2015; See, Driscoll, Gfeller, Kliethermes, & Oleson, 2013). O’Halpin (2009) found that school‐aged children with CIs were outperformed by their NH peers on phrase/compound word discrimination (blue bottle vs. bluebottle) and identification of two‐way (It’s a BLUE book vs. It’s a blue BOOK) and three‐way sentence accent positions (The BOY is painting a boat vs. The boy is PAINTING a boat vs. The boy is painting a BOAT). Furthermore, the CI children had larger discrimination thresholds for F0, and relatively smaller discrimination thresholds for intensity and duration, when tested with manipulated nonsense disyllables. These discrimination thresholds were correlated per cue with the scores on linguistic prosody, which indicates that prosody perception may be supported by psychophysical capabilities in CI children. The relationship between psychophysical and speech perceptual performance may not be as straightforward when it comes to postlingually deafened adults, as Morris, Magnusson, Faulkner, Jönsson, and Juul (2013) included discrimination thresholds for F0, intensity, duration and vowel quality, which was first formant, in a logistic regression analysis with prosody identification as the dependent variable. The prosodic tasks were vowel length, word stress, and phrase/compound word identification, and these were performed in quiet and also in a 10 dB SNR noise background. Only the discrimination threshold for intensity was found to be a significant variable, indicating that adult CI recipients who can better make use of intensity changes are also better at these types of linguistic‐prosodic tasks.
In a recent review on emotional prosody processing, Jiam, Caldwell, Deroche, Chatterjee, and Limb (2017) concluded that CI users have considerable difficulties perceiving and producing emotional prosody. For example, in Gilbers et al. (2015) on a four‐way emotion identification test using nonsense words, CI users scored around 45% correct, NH listeners using vocoders around 70%, and NH