Collaborative Common Assessments. Cassandra Erkens
assessments are employed in a criterion-referenced manner against a shared set of standards, educators can use the data formatively to monitor their current reality, explore areas that require attention, and ultimately ensure equity and success for all. Schools cannot improve if they do not have evidence and data regarding what and how well students learn.
The question isn’t “Do we need large-scale assessments?” Rather, the question should be “Are large-scale assessments working in a manner that is accurate, supportive, and valuable to the schools and learners they impact?” A single test cannot cover everything, and so tests are used to gather random samples of domains of interest. Consequently, the summary of the findings can help schools and districts monitor for success. Internationally recognized assessment expert Dylan Wiliam (1998) states:
It has become increasingly clear over the past twenty years that the contents of standardised tests and examinations are not a random sample from the domain of interests. In particular, these timed written assessments can assess only limited forms of competence, and teachers are quite able to predict which aspects of competence will be assessed. Especially in high-stakes assessments, therefore, there is an incentive for teachers and students to concentrate only on those aspects of competence that are likely to be assessed. Put crudely, we start out with the intention of making the important measurable, and end up making the measurable important. The effect of this has been to weaken the correlation between standardised test scores and the wider domains for which they are claiming to be an adequate proxy. (p. 1)
Assessment is a tool, and any tool can be used to either build something up or tear something down. While large-scale assessments can be used to help build better educational systems, they have sometimes been used in destructive ways. When the tests are shallow, the results are norm referenced, or the stakes are high, the costs can be significant.
Shallow Testing
The items on any test provide, at best, a representative sampling of what a student knows at any given moment in time. How valid and reliable are the assessments themselves?
Several studies, using several different methodologies, have shown that the state tests do not measure the higher-order thinking, problem-solving, and creativity needed for students to succeed in the 21st century. These tests, with only a few exceptions, systematically over-represent basic skills and knowledge and omit the complex knowledge and reasoning we are seeking for college and career readiness. (Resnick & Berger, 2010, p. 4)
Assessments are shallow when they simply test knowledge or basic application through algorithms or procedural knowledge. If the answer to the assessment test questions or performance prompts could be googled, it probably shouldn’t be on a summative assessment. Knowledge is necessary for reasoning, and it’s helpful to check for understanding in the formative phases. But by the time learners are immersed in a summative experience, they should be applying the knowledge and reasoning in meaningful ways—to solve a problem or create something new. Summative assessments need to be robust, engaging learners in provocative tasks that require deep thinking, the application of skills, and practice with 21st century-like experiences.
Norm-Referenced Assessments
Norm-referenced tests are employed to draw comparisons. When based on criteria, they are called criterion-referenced assessments, and such assessments work in a standards-based system. However, when they compare rank order of individual performance and generate the well-known bell curve, they are called cohort-referenced assessments. These assessments do not work in a standards-based system because they measure learners against each other instead of measuring learners against a set of standards. Imagine that all of the learners are getting As, but the rules state that not everyone can get an A; now the A is sliced into whose A is higher versus whose A is lower, and the results are reported in a manner that shows who is at the top and who is not. Now the A learner who is at the bottom of the As is recognized as less than his or her peers and is not acknowledged for mastery of the expectations—which is the intended message of the A itself.
The results of schools are often normed as well, creating winners and losers in an accountability system that demands everyone be winners. The practice of norming something (customer service, marketing, scheduling, policies, and so on) is best kept as an internal decision-making strategy; in other words, an organization might norm a common practice within the industry to separate “better” from “best” and then make appropriate decisions about what it can do to improve. But norming is not appropriate as an assessment strategy, especially in a standards-based system, because it labels and sorts people.
High-Stakes Assessments
An assessment is considered to have high stakes when it generates significant consequences—positive or negative—for stakeholders. High-stakes assessments are not appropriate in a system of compulsory education: a system in which education is imposed by law, thus making participation mandatory. The concept of high stakes is patterned after credential programs in which individuals or organizations must meet certain criteria to earn or maintain licensure (such as getting a driver’s license or becoming a doctor, lawyer, pilot, teacher, or any other licensed professional). Unfortunately, the fundamental difference between credential systems and compulsory education is choice.
Credentialing works in the system of choice because the risks are lower. First, the individual chose to participate, so he or she is highly engaged during the learning, motivated to succeed, and willing to persist in multiple attempts if needed. Second, the risks are significantly reduced: should the learner fail, he or she is often offered multiple re-testing opportunities. Like the license to drive, the license to practice in a profession is open to prospective candidates on a repeat basis. And, because those studying to pass such exams often have a college degree already behind them, the absence of the licensure may cause initial unhappiness or discomfort, but it will not be detrimental to his or her future opportunities or overall success. There are other career pathways—often within the same field of interest—available. However, in a compulsory system where choice is removed, the difference can be crippling: the risks are too high, and failure eliminates future pathways entirely. The learner who does not pass high school is severely limited in future career pathways, and the stigma is too debilitating for many. Motivation and efficacy—the ingredients for success in life—can be irrevocably impaired when high stakes are applied to compulsory experiences.
All in all, when large-scale assessments are low in quality or are issued in norm-referenced or high-stakes environments, the only changes they inspire are superficial and short lived. Worse, teachers engage in teaching to the test based on specific content rather than increasing rigor or engaging in sustainable, quality instructional practices. Such a system creates visible winners and losers for both educators and their learners. What’s left behind is the invisible, residual, but palpable impact of a fixed mindset for both the losers and the winners.
Internal or Medium-Scale Assessments
Though collaborative common assessment work is exclusive to individual teams, the practice of common assessments is not. Schools and districts have an obligation to make certain their learners are ready to perform well on outside measures and are receiving equitable educational experiences and background from school to school within a larger district. Educational leaders at the district level often strive to create common assessments that will monitor progress along the way in all of the tested areas: reading, writing, mathematics, and sometimes science and social studies. Interim assessments, often known as progress monitoring assessments or benchmark assessments, are assessments given over time and with equidistant spacing (every six weeks, end of every quarter, each trimester, and so on) to monitor student progress in achieving standard expectations on a districtwide basis. The primary function of such assessments when developed within a school system is to offer teachers and administrators information about student readiness for upcoming large-scale assessments. The big picture of when the various assessments take place in the system can be found in figure 2.1.
Figure 2.1 is merely