Formative Assessment & Standards-Based Grading. Robert J. Marzano

Formative Assessment & Standards-Based Grading - Robert J. Marzano


Скачать книгу
href="#ulink_ef059e1b-e8c8-5854-a281-a9517e02eccd">table 1.6.

Image

      Note: 95% confidence interval based on the assumption of a standard deviation of 12 points.

      Table 1.6 shows what can be expected in terms of the amount of error that surrounds a score of 70 when an assessment has reliabilities that range from 0.85 to 0.45. In all cases, the student is assumed to have received a score of 70 on the assessment. That is, the student’s observed score is 70.

      First, let us consider the precision of an observed score of 70 when the reliability of the assessment is 0.85. This is the typical reliability one would expect from a standardized test or a state test (Lou et al., 1996). Using statistical formulas, it is possible to compute a range of scores in which you are 95 percent sure the true score actually falls. Columns three, four, and five of table 1.6 report that range. In the first row of table 1.6, we see that for an assessment with a reliability of 0.85 and an observed score of 70, one would be 95 percent sure the student’s true score is anywhere between a score of 60 and 80. That is, the student’s true score might really be as low as 60 or as high as 80 even though he or she receives a score of 70. This is a range of 20 points. But this assumes the reliability of the assessment to be 0.85, which, again, is what you would expect from a state test or a standardized test.

      Next, let us consider the range with classroom assessments. To do so, consider the second row of table 1.6, which pertains to the reliability of 0.75. This is probably the highest reliability you could expect from an assessment designed by a teacher, school, or district (see Lou et al., 1996). Now the low score is 58 and the high score is 82—a range of 24 points. To obtain the full impact of the information presented in table 1.6, consider the last row, which depicts the range of possible true scores when the reliability is 0.45. This reliability is, in fact, probably more typical of what you could expect from a teacher-designed classroom assessment (Marzano, 2002). The lowest possible true score is 52 and the highest possible true score is 88—a range of 36 points.

      Quite obviously, no single assessment can ever be relied on as an absolute indicator of a student’s status. Gregory Cizek (2007) added a perspective on the precision of assessments in his discussion on the mathematics section of the state test in a large midwestern state. He explained that the total score reliability for the mathematics portion of the test in that state at the fourth grade is 0.87—certainly an acceptable level of reliability. That test also reports students’ scores in subareas using the National Council of Teachers of Mathematics categories: algebra, data analysis and probability, estimation and mental computation, geometry, and problem-solving strategies. Unfortunately, the reliability of these subscale scores ranges from 0.33 to 0.57 (p. 103). As evidenced by table 1.6, reliabilities this low would translate into a wide range of possible true scores.

      Imprecision in assessments can come in many forms. It can be a function of poorly constructed items on a test, or it can come from students’ lack of attention or effort when taking a test. Imprecision can also be a function of teachers’ interpretations of assessments. A study done by Herman and Choi (2008) asked two questions: How accurate are teachers’ judgments of student learning, and how does accuracy of teachers’ judgments relate to student performance? They found that “the study results show that the more accurate teachers are in their knowledge of where students are, the more effective they may be in promoting subsequent subject learning” (p. 18). Unfortunately, they also found that “average accuracy was less than 50%” (p. 19). Margaret Heritage, Jinok Kim, Terry Vendlinski, and Joan Herman (2008) added that “inaccurate analyses or inappropriate inference about students’ learning status can lead to errors in what the next instructional steps will be” (p. 1). They concluded that “using assessment information to plan subsequent instruction tends to be the most difficult task for teachers as compared to other tasks (for example, assessing student responses)” (p. 14).

      One very important consideration when interpreting scores from assessments or making inferences about a student based on an assessment is the native language of the student. Christy Kim Boscardin, Barbara Jones, Claire Nishimura, Shannon Madsen, and Jae-Eun Park (2008) conducted a review of performance assessments administered in high school biology courses. They focused their review on English language learners, noting that “the language demand of content assessments may introduce construct-irrelevant components into the testing process for EL students” (p. 3). Specifically, they found that the students with a stronger grasp of the English language would perform better on the tests even though they might not have had any better understanding of the science content. The same concept holds true for standardized tests in content-specific areas. They noted that “the language demand of a content assessment is a potential threat to the validity of the assessment” (p. 3).

      At the classroom level, any discussion of assessment ultimately ends up in a discussion of grading. As its title indicates, Formative Assessment and Standards-Based Grading focuses on grading as well as on formative assessment. Not only are teachers responsible for evaluating a student’s level of knowledge or skill at one point in time through classroom assessments, they are also responsible for translating all of the information from assessments into an overall evaluation of a student’s performance over some fixed period of time (usually a quarter, trimester, or semester). This overall evaluation is in the form of some type of overall grade commonly referred to as an “omnibus grade.” Unfortunately, grades add a whole new layer of error to the assessment process.

      Brookhart (2004) discussed the difficulties associated with grading:

      Grades have been used to serve three general purposes simultaneously: ranking (for sorting students into those eligible for higher education and those not eligible); reporting results (accounting to parents the degree to which students learned the lessons prescribed for them); and contributing to learning (providing feedback and motivating students). (p. 23)

      While all three purposes are valid, they provide very different perspectives on student achievement.

      Since the teachers in many schools and districts have not agreed on any one grading philosophy, they are forced to design their own systems. To illustrate, consider the following grading criteria Thomas Guskey (2009, p. 17) listed as elements frequently included in teachers’ grading practices:

      • Major exams or compositions

      • Class quizzes

      • Reports or projects

      • Student portfolios

      • Exhibits of students’ work

      • Laboratory projects

      • Students’ notebooks or journals

      • Classroom observations

      • Oral presentations

      • Homework completion

      • Homework quality

      • Class participation

      • Work habits and neatness

      • Effort

      • Attendance

      • Punctuality of assignments

      • Class behavior or attitude

      • Progress made

      He made the point that because of their different philosophies, different teachers rely on different combinations of these elements to construct an overall grade.


Скачать книгу