The New Art and Science of Classroom Assessment. Robert J. Marzano
which we are 95 percent sure the true score actually falls. But if we consider the pattern of these scores, we can have a relatively high degree of confidence in the scores, particularly as more time passes and we collect more scores.
This pattern is clear that over time, the student’s scores have been gradually increasing. This makes intuitive sense. If the student is learning and the assessments are accurate, we would expect to see the scores continually go up. The more scores that precede any given score, the more one can judge the accuracy of that score. In the previous series, the first score is 70. In judging its accuracy, we would have to treat it like an individual assessment—we wouldn’t have much confidence in its accuracy. But with the second score of 72, we now have two data points. Since we can reasonably assume that the student is learning, it makes sense that his or her score would increase. We now have more confidence in the score of 72 than we did with the single score of 70. By the time we have the fifth score of 81, we have amassed a good deal of antecedent information with which to judge its accuracy. Although we can’t say that 81 is precisely accurate, we can say the student’s true score is probably close to it. In subsequent chapters, we present techniques for specifying the accuracy of this final score of 81.
It’s important to note that some data patterns would indicate a lack of accuracy in the test scores. To illustrate, consider the following pattern of scores.
70, 76, 65, 82, 71
Assuming that the student who exhibited these scores is learning over time, the pattern doesn’t make much sense. The student began and ended the grading period with about the same score. In between, the student exhibited some scores that were significantly higher and some scores that were significantly lower. This pattern implies that there was probably a great deal of error in the assessments. (Again, we discuss how to interpret such aberrant patterns in subsequent chapters.) This scenario illustrates the need for a new view of summative scores.
The New View of Summative Scores
The practice of examining the mounting evidence that multiple assessments provide is a veritable sea change in the way we think of summative assessments for individual students. More specifically, we have seen school leaders initiate policies in which they make a sharp distinction between formative assessments and summative assessments. Within these policies, educators consider formative assessments as practice only, and they do not record scores from these assessments. They consider summative tests as the “real” assessments, and the scores from them play a substantive role in a student’s final grade.
As the previous discussion illustrates, this makes little sense for at least two reasons. First, the single score educators derive from the summative assessment is not precise enough to support absolute decisions about individual students. Second, not recording formative scores is tantamount to ignoring all the historical assessment information that teachers can use to estimate a student’s current status. We take the position that educators should use the terms formative and summative scores, as opposed to formative and summative assessments, to meld the two types of assessments into a unified continuum.
Also, teachers should periodically estimate students’ current summative scores by examining the pattern of the antecedent scores. We describe this process in depth in chapter 6 (page 91). Briefly, though, consider the pattern of five scores we described previously: 70, 72, 75, 77, 81. A teacher could use this pattern to assign a current summative score without administering another assessment. The pattern clearly indicates steady growth for the student and makes the last score of 81 appear quite reasonable.
The process of estimating a summative score as opposed to relying only on the score from a single summative test works best if the teacher uses a scale that automatically communicates what students already know and what they still have to learn. A single score of 81 (or 77 or pretty much any score on a one hundred–point scale) doesn’t communicate much about a student’s knowledge of specific content. However, a score on a proficiency scale does and greatly increases the precision with which a teacher can estimate an individual student’s summative score.
The Need for Proficiency Scales
We discuss the nature and function of proficiency scales in depth in chapter 3. For now, figure I.4 provides an example of a proficiency scale.
Figure I.4: Sample proficiency scale for fourth-grade science.
Notice that the proficiency scale in figure I.4 has three levels of explicit content. It is easiest to understand the nature of a proficiency scale if we start with the content at the score 3.0 level. It reads, The student will explain how vision (sight) is a product of light reflecting off objects and entering the eye. This is the desired level of expertise for students. When students can demonstrate this level of competence, teachers consider them to be proficient.
Understanding the score 2.0 content is necessary to demonstrate competency on the score 3.0 content, which teachers will directly teach to students. In the proficiency scale, score 2.0 content reads, The student will recognize or recall specific vocabulary (for example, brain, cone, cornea, image, iris, lens, light, optic nerve, perpendicular angle, pupil, reflection, retina, rod, sight, dilate) and perform basic processes, such as:
• Describe physical changes that happen in the eye as a reaction to light (for example, the pupil dilates and contracts)
• Trace the movement of light as it moves from a source, reflects off an object, and enters the eye
• Diagram the human eye and label its parts (cornea, iris, pupil, lens, retina, optic nerve)
• Describe the function of rods and cones in the eye
• Recognize that the optic nerve carries information from both eyes to the brain, which processes the information to create an image
The score 4.0 content requires students to make inferences and applications that go above and beyond the score 3.0 content. In the proficiency scale, it reads, In addition to score 3.0 performance, the student will demonstrate in-depth inferences and applications that go beyond what was taught. For example, the student will explain how distorted light impacts vision (for example, explain why a fish in clear water appears distorted due to light refraction). The example provides one way in which the student might demonstrate score 4.0 performance.
The other scores in the scale do not contain new content but do represent different levels of understanding relative to the content. For example, score 1.0 means that with help, the student has partial understanding of some of the simpler details and processes and some of the more complex ideas and processes. And score 0.0 means that even with help, the student demonstrates no understanding or skill. The scale also contains half-point scores, which signify achievement between two whole-point scores. Again, we address proficiency scales in depth in chapter 2 (page 25).
With a series of scores on a proficiency scale as opposed to a one hundred–point scale, a teacher can more accurately estimate a summative score using antecedent formative scores. This is because we can reference a score on a proficiency scale to a continuum of knowledge, regardless of the test format. A score of 3.0 on a test means that the student has demonstrated competence regardless of the type of test. This is not the case with the one hundred–point scale. For example, a teacher can only interpret a score of 85 in terms of levels of knowledge if he or she examines the items on the test. This characteristic of proficiency scales suits them well for examining trends in learning. To illustrate, consider the following pattern of proficiency scale scores for a student on a specific topic.
1.0, 2.0, 2.0, 3.0, 2.5
The