The Concise Encyclopedia of Applied Linguistics. Carol A. Chapelle

The Concise Encyclopedia of Applied Linguistics

The battery included metapragmatic judgments of short conversations, speech acts, and responses in short dialogs, as well as productive correction of inappropriate dialog responses and completion of DCTs with multiple gaps. Figure 2 shows an example of a metapragmatic judgment item for the speech act of refusal. Roever et al. (2014) found acceptable reliability of .8 but noted that most of their tasks were easy for their test taker sample, which consisted of participants at intermediate level and above to avoid strong proficiency effects.

Figure 1 DCT item

Figure 2 Metapragmatic judgment item

While speech acts have been the feature of central interest in the assessment of L2 pragmatics in this tradition, not all work has focused exclusively on them. Bouton (1988, 1994, 1999) did pioneering work in the assessment of implicature, that is, how speakers convey additional meaning beyond the literal meaning of the words uttered. He distinguished two types of implicature, idiosyncratic and formulaic, with the former encompassing conversational implicature (Grice, 1975), whereas the latter includes some specific types of implicature, such as indirect criticism, variations on the Pope Q (“Is the pope Catholic?”) and irony. Using this test, Bouton found that idiosyncratic implicature is fairly easy to learn on one's own but difficult to teach in the classroom, whereas the reverse is the case for formulaic implicature. Taguchi (2005, 2007, 2008a, 2008b) employed a similar instrument and took a psycholinguistic perspective on implicature, investigating learners' correct interpretation in conjunction with their processing speed. Taguchi, Li, and Liu (2013) developed an implicature test for Mandarin as a target language.

A small number of studies have combined assessment of different aspects of pragmatic competence. Roever (2005, 2006) developed a Web‐based test of implicature, routine formulas, and speech acts, and validated it using Messick's (1989) validation approach. Unlike Hudson et al.'s (1995) test, Roever's (2006) instrument focused on pragmalinguistic rather than sociopragmatic knowledge. Figure 3 shows an implicature item from Roever's (2005) test, and Figure 4 shows a routines item.

Roever's test was Web delivered and he obtained an overall reliability of .91. It covered the construct of L2 pragmatic knowledge in quite some breadth, and had a high degree of practicality due to its Web‐based delivery. However, it was clearly set in the speech act tradition and did not assess discursive abilities.

Itomitsu (2009) developed a pragmalinguistically focused instrument for Japanese as a target language. Using Web‐delivered multiple‐choice tasks, he assessed learners' knowledge of routine formulas, speech styles, and understanding of the illocutionary force of speech acts. The test also included a grammar section. Itomitsu attained overall high reliability similar to Roever's (2005) with a test that is arguably more practical, as it does not require writing or rater scoring.

Figure 3 Implicature item from Roever (2001)

Figure 4 Routines item from Roever (2001)

The instruments discussed above represent a significant step in testing pragmatics by demonstrating that aspects of learners' pragmatic ability can be assessed practically and with satisfactory reliability. However, the speech act framework underlying these tests has come under severe criticism (Kasper, 2006) as it was based strongly on the discourse‐external context factors identified by Brown and Levinson (1987), atomized speech acts rather than considering them in their discursive context, and used DCTs, which have been shown to be highly problematic (Golato, 2003). This has led to the emergence of tests taking an interactional view.

Measurement of Interactional Abilities

In an early attempt to measure interactional abilities, Walters (2007, 2009) worked in a conversation analytic framework and attempted to measure test takers' receptive and productive knowledge of features of sequence organization and responses to social actions. He attained only very low reliabilities, illustrating the difficulty of measuring minute features of interaction.

In a groundbreaking study, Youn (2013, 2015) took a different approach. Employing an interactional competence perspective, she had 102 test takers of different proficiency levels perform two role plays with a trained interlocutor as well as a monologue. She scored performances on the following criteria:

content delivery: smooth and fluid turn taking;

language use: deployment of pragmalinguistic tools;

sensitivity to the situation: tailoring contributions to the recipient;

engagement with the interaction: displaying understanding of interlocutor talk;

turn organization: providing responses without excessive pausing.

A Rasch analysis showed that the test spread test takers out well and that the criteria functioned independently and were easy for raters to implement. Youn's study was a significant step forward as it was the first that clearly demonstrated the feasibility of assessing interactional competence.

Ikeda (2017) also investigated measurement of interactional competence but employed three role plays and three monologues with six rating criteria. Similar to Youn, he found a good spread of test takers and high inter‐rater reliability. There was significant overlap between scores on the monologic and dialogic tasks, raising the possibility of capturing a large amount of variance attributable to interactional competence with monologue tasks, which would greatly increase practicality.

Focusing on another aspect of interaction, Galaczi (2014) described differences in topic management, listener contributions, and turn‐taking management between learners at different levels of the Common European Framework of Reference (Council of Europe, 2001). She found that these interactional abilities improved with increasing proficiency and argued for their greater inclusion in rating scales. It must be noted that a feature like “topic management” was more likely to figure prominently in Galaczi's data, which involved test taker dyads discussing a set topic, than in Youn's and Ikeda's work, where interactions were based around requests.

Two other interaction‐focused assessment studies have been conducted which did not situate themselves in an interactional competence framework. Grabowski (2009, 2013) employed role plays and rated test taker performance based on criteria derived from Purpura's (2004) model of communicative language ability. Timpe (2013) employed Skype‐delivered role plays as part of a larger testing battery of intercultural competence (Byram, 1997). She scored test taker performance on two large holistic criteria, discourse management and pragmatic competence.

Challenges in Testing L2 Pragmatics

Fundamentally, tests of L2 pragmatics have the same requirements and pose the same development challenges as other language tests. They must be standardized to allow comparisons between test takers, they must be reliable to ensure precise measurement, they must be practical so that they do not overtax resources, and, above all, they must allow defensible inferences to be drawn from scores that can inform real‐world decisions (Messick, 1989; Kane, 2006).

Скачать книгу