The Concise Encyclopedia of Applied Linguistics. Carol A. Chapelle
the situation. In other words, examinees use functional knowledge to express language functions in context. Consider how the proposition I'm Italian is used to convey two contextualized language functions, as shown in Figure 6.
Figure 5 Extended model of semantico‐grammatical knowledge (adapted from Purpura, 2004, reproduced with permission of Cambridge University Press through PLSclear, and Purpura, 2016, used with permission)
Figure 6 Speech acts and functional meaning
Figure 7 Meaning‐oriented model of L2 proficiency (adapted from Purpura, 2016, used with permission)
Semantico‐grammatical resources are used to form literal propositions, which embody a speaker's intended functional meaning (e.g., offer) within a communicative context. The literal and intended meaning of an utterance may have the same meaning (direct speech act) as the functional goal, or may differ significantly from the functional goal depending on context (indirect speech act). Finally, a well‐formed proposition can also be used to simultaneously encode a range of other meanings implied within the situational context. In the indirect speech act in Figure 6, the use of the expression I'm Italian to respond to an offer also encodes the sociocultural assumption that Italians like drinking red wine. The proposition in this context also encodes informality (sociolinguistic meanings) and even playfulness (psychological meanings). Purpura (2016) proposed a meaning‐oriented model of L2 knowledge that characterizes a learner's implicational knowledge as a pragmatic resource for understanding or expressing implied meanings derivable only from context (Figure 7).
In sum, the ability to perform real‐world competencies depends on the learners' semantico‐grammatical knowledge and their ability to use context to accurately form utterances that communicate not only propositions, but also contextualized pragmatic meanings. From an assessment perspective, these components are all implicated in real‐life language use. However, depending on the purpose of the test, these components can also be measured separately (see Grabowski, 2009; Kim, 2009), especially when finer grained information is needed.
Measuring the Linguistic Resources of Communication
L2 educators have proposed three approaches to measuring the linguistic resources of communication. These include a trait/task‐based approach, a production features approach, and a developmental approach. The trait/task‐based approach can be based on a conception of L2 proficiency as a mental trait involving L2 knowledge, skills, and abilities, similar to those mentioned in the meaning‐oriented model of L2 knowledge. Alternatively, it can be based on the view that L2 proficiency is determined by the knowledge, skills, and abilities underlying task completion. Regardless of the basis for trait/task‐based approach, the assessment method includes a single task or a carefully sequenced set of tasks that allows test takers to display their receptive, emergent, or productive knowledge of the L2; the responses are then scored, mostly by human raters, using scoring rubrics.
Assessment specialists have found it useful to categorize tasks according to the elicitation method (Bachman & Palmer, 2010) (Figure 8). Selected‐response (SR) tasks present input in the form of an item, requiring examinees to choose the response from two or more options. These tasks aim to measure recognition or recall. While SR tasks are typically designed to measure one area of L2 knowledge, they may actually engage more than one (e.g., form and meaning). Constructed‐response tasks elicit L2 production. Limited‐production (LP) tasks present input in the form of an item, requiring examinees to produce anywhere from a single word to a sentence or two. These tasks aim to measure emergent production and are typically designed to measure one or more areas of knowledge. Extended‐production (EP) tasks present input in the form of a prompt, requiring examinees to produce language varying in quantity. EP tasks aim to measure full production, eliciting several areas of L2 knowledge simultaneously.
Figure 8 Task types and scoring (adapted from Purpura, 2004, reproduced with permission of Cambridge University Press through PLSclear)
SR (and some LP) tasks are typically scored right/wrong for L2 features based on one criterion for correctness (e.g., accurate form). Scoring criteria might involve accuracy, precision, range, complexity, fluency, acceptability, meaningfulness, appropriateness, naturalness, or conventionality. Dichotomous scoring such as this assumes that an item elicits only one underlying dimension of knowledge (e.g., form), that it measures full or no knowledge of the feature, and that item difficulty resides in the interaction between the input and the response key, and not with the distractors.
In other SR or LP tasks, response choices may represent complete knowledge of the feature (e.g., form), partial knowledge, misinformation, or a total lack of knowledge. If the distractors represent “partial” knowledge of the feature, then the use of partial credit scoring should be considered, as dichotomous scoring would deflate test scores by failing to reward examinees for partial knowledge (e.g., Purpura, Dakin, Ameriks, & Grabowski, 2010). In the case of grammaticality judgments, for example, grammatical acceptability depends on what feature is being measured. If knowledge of both form and meaning are required for an acceptable response, then dichotomous scoring would be inappropriate, which is the case in several studies of second language acquisition (SLA) using grammatical judgment tasks. In these cases, right/wrong scoring with multiple areas of correctness or a partial credit scoring method could be used. Partial credit scores are assigned according to the dimensions of knowledge being measured (e.g., 1 point for form + 1 for meaning = 2 points). The measurement of different levels of knowledge can also be accomplished by using an analytic or holistic rubric based on a rating scale such as the following: 0, .3, .6, or 1.
EP tasks vary considerably in the quantity and quality of the response. As a result, they are typically scored with a more comprehensive rating scale (e.g., five‐point rubric: 1 to 5 or five bands from 1 to 10). Rating scales provide hierarchical descriptions of observed behavior associated with different levels of performance related to some construct. The more focused the descriptors are for each rating scale, the greater the potential for measurement precision and feedback utility (see Purpura, 2004). Although rating scales are efficient for many contexts, the information they provide may be too coarse‐grained for other assessment contexts, where detailed feedback is required.
In sum, countless studies in L2 assessment have used these techniques to measure the linguistic resources of L2 communication. These same methods have also been used in mainstream educational measurement to measure other learner characteristics.
A second approach to measuring the linguistic resources of communication is by analyzing selected features of learners' spoken or written production, the assumption being that a characterization of L2 production features can provide evidence not only of the learner's L2 knowledge and in some cases, their ability to communicate propositions, but also of their acquisitional processes, especially if the data are elicited under controlled processing conditions (e.g., planning/no planning; integrated/independent tasks) (Ellis, 2005). This approach has a long tradition in SLA