The Concise Encyclopedia of Applied Linguistics. Carol A. Chapelle
and assessment is possible (to some extent) for minimal pairs, and that automatic ratings of pronunciation accuracy can correlate with human ratings. However, the kind of feedback given to learners is not usually very helpful. For those systems that attempt to do so, there are two options: giving a global pronunciation rating or identifying specific errors. To reach either of these goals, ASR systems need to identify word boundaries, accurately align speech to intended targets, and compare the segments produced with those that should have been produced. A variety of systems have been designed to provide global evaluations of pronunciation using automatic measures including speech rate, duration, and spectral analyses (e.g., Neumeyer, Franco, Digalakis, & Weintraub, 2000; Witt & Young, 2000). All of the studies have found that automatic measures are never as good as human ratings, but a combination of automatic measures is always better than a single rating.
ASR systems also have trouble precisely identifying specific errors in articulation, sometimes identifying correct speech as containing errors, but not identifying errors that actually occur. Neri, Cucchiarini, Strik, and Boves (2002) found that only 25% of pronunciation errors were detected by their ASR system, while some correct productions were identified as errors. Truong, Neri, de Wet, Cucchiarini, and Strik (2005) studied whether an ASR system could identify mispronunciations of three sounds typically mispronounced by learners of Dutch. Errors were successfully detected for one of the three sounds, but the ASR system was less successful for the other sounds. However, even modest success in error detection has led to a reduced number of pronunciation errors in comparison to a control group (Cucchiarini, Neri, & Strik, 2009).
Other ASR Applications in Applied Linguistics
There are other areas in which ASR has been used by applied linguists: reading instruction and the use of ASR in dialogue systems used with language‐learning software. One use of ASR that seems to have been particularly successful has been in teaching children to read. Mostow and Aist (1999) found that ASR used in conjunction with an understanding of teacher–student classroom behavior was successful in teaching oral reading skills and word recognition. In a later study, Poulsen, Hastings, and Allbritton (2007) found that reading interventions for young learners of English were far more effective when an ASR system was included.
Another use of ASR technology is in spoken CALL dialogue systems. If a software program for practicing spoken language provides the first line of a dialogue, learners give one of two responses. If these responses are dissimilar, the ASR system can recognize which sentence has been spoken (even with pronunciation errors or missing words). The computer can then respond, allowing the learner to respond again from a menu of possible responses (see Bernstein, Najmi, & Ehsani, 1999; Harless et al., 1999). O'Brien (2006) reviews a number of such programs. With the recent advent of deep learning systems, which take advantage of established techniques for modeling phone recognition and for making use of spectral information from speech, it is likely that machine feedback will soon become more accurate and flexible in L2 speech recognition.
Future Directions
Automatic speech recognition holds great promise for applied linguistics. ASR research and usability testing is happening in areas likely to impact applied linguistics (e.g., Anderson, Davidson, Morton, & Jack, 2008). For example, the International Conference on Acoustics, Speech and Signal Processing (ICASSP); the annual INTERSPEECH conference (held through the International Speech Communication Association, or ISCA); and the ISCA Special Interest Group on Speech and Language Technology in Education (SlaTE; see http://hstrik.ruhosting.nl/slate/) bring together those working in areas that will eventually influence linguistic applications.
The connections between ASR and text‐to‐speech software have been insufficiently explored in applied linguistic circles, but both are regularly examined in cutting‐edge work tied to other areas of speech sciences. We expect that the ubiquity of mobile devices that use ASR‐based applications will eventually allow L2 learners to practice their L2 speaking skills and receive effective feedback on their pronunciation. Further progress in ASR will likely result in interactive language‐learning systems capable of providing authentic interaction opportunities with real or virtual interlocutors. These systems will also become able to produce specific, corrective feedback to learners on their pronunciation errors. Additionally, the development of noise‐robust ASR technologies will allow language learners to use ASR‐based products in noise‐prone environments such as classrooms, transportation, and other public places. Finally, the performance of ASR systems will improve as emotion recognition and visual speech recognition (based, for instance, on a Webcam's capturing of learners' lip movements and facial expressions) become more effective and widespread.
SEE ALSO: Computer‐Assisted Pronunciation Teaching; Foreign Accent; Innovation in Language Teaching and Learning; Pronunciation Assessment; Pronunciation Teaching Methods and Techniques
References
1 Anagnostopoulos, C.‐N., Iliou, T., & Glannoukos, I. (2015). Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011. Artificial Intelligence Review, 43(2), 155–77.
2 Anderson, J. N., Davidson, N., Morton, H., & Jack, M. A. (2008). Language learning with interactive virtual agent scenarios and speech recognition: Lessons learned. Computer Animation and Virtual Worlds, 19, 605–19.
3 Bernstein, J., Najmi, A., & Ehsani, F. (1999). Subarashii: Encounters in Japanese spoken language education. CALICO Journal, 16(3), 361–84.
4 Burileanu, D. (2008). Spoken language interfaces for embedded applications. In D. Gardner‐Bonneau & H. E. Blanchard (Eds.), Human factors and voice interactive systems (2nd ed., pp. 135–61). Norwell, MA: Springer.
5 Cucchiarini, C., Neri, A., & Strik, H. (2009). Oral proficiency training in Dutch L2: The contribution of ASR‐based corrective feedback. Speech Communication, 51(10), 853–63.
6 Dalby, J., & Kewley‐Port, D. (1999). Explicit pronunciation training using automatic speech recognition technology. CALICO Journal, 16(3), 425–45.
7 Davis, K. H., Biddulph, R., & Balashek, S. (1952). Automatic recognition of spoken digits. The Journal of the Acoustical Society of America, 24(6), 637–42.
8 Deng, L., Li, J., Huang, J.‐T., Yao, K., Yu, D., Seide, F., . . . & Acero, A. (2013). Recent advances in deep learning for speech research at Microsoft. In Acoustics, Speech and Signal Processing (ICASSP), IEEE International Conference (pp. 8604–8). Piscataway, NJ: IEEE.
9 Deng, L., & Yu, D. (2014). Deep learning: Methods and applications. Foundations and Trends® in Signal Processing, 7(3–4), 197–387.
10 Derwing, T. M., Munro, M. J., & Carbonaro, M. (2000). Does popular speech recognition software work with ESL speech? TESOL Quarterly, 34, 592–603.
11 Duan, R., Kawahara, T., Dantsuji, M., & Zhang, J. (2017). Effective articulatory modeling for pronunciation error detection of L2 learner without non‐native training data. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference (pp. 5815–19). Piscataway, NJ: IEEE.
12 Eskenazi, M. (1999). Using a computer in foreign language pronunciation training: What advantages? CALICO Journal, 16(3), 447–69.
13 Forgie, J. W., & Forgie, C. D. (1959). Results obtained from a vowel recognition computer program. The Journal of the Acoustical Society of America, 31(11), 1480–9.
14 Harless, W., Zier, M., & Duncan, R. (1999). Virtual dialogues with native speakers: The evaluation of an interactive multimedia method. CALICO Journal, 16(3), 313–37.
15 Lai, J., Karat, C.‐M., & Yankelovich, N. (2008). Conversational speech interfaces and technologies. In A. Sears & J. A. Jacko (Eds.), The human‐computer interaction handbook: Fundamentals, evolving technologies, and emerging applications (2nd ed., pp. 381–91). New York, NY: Erlbaum.
16 Liakin, D., Cardoso, W., & Liakina, N. (2017). The pedagogical use of