New Learning’s Updates

The Test is Dead!

... But long live assessment!

The following is an extract from the introduction to our book, Cope and Kalantzis (eds), e-Learning Ecologies, New York: Routledge, 2016.

What evidence do we have that a learner has learned? In didactic pedagogy, the classical answer is to be found in the result of a test. At the end of a period of learning, there is a test, typically “closed book”, to see what the student has retained in long term memory. The focus is essentially cognitive, to draw inferences about an individual’s mind. Classical testing logic runs along these lines: cognition developed in learning => observation in a test => interpretation of the test results as evidence of cognition (Pellegrino et al., 2001). Cognition itself is inaccessible, so we construct instruments with which we can develop an interpretative argument based on indirect evidence. The process is linear: learn => test. The test is “summative”, or retrospective and judgmental. The result is an individualized, “mentalist” construct (Dixon-Román & Gergen, 2013). Such tests are peculiar artifacts and processes, quite different from the other artifacts and processes of learning, inside and outside of school. They are external to the learning process. There is a sharp distinction between times of learning and the time of the test. They are also “standardized,” to ensure that all learners are being tested for the same things. Their frame of reference is “normative”, to compare students with each other on the assumption that some will prove themselves smarter and others dumber. A “normal” distribution guarantees inequality. In order for the few to be smart, most have to be at least mediocre and some dumb. Comparative inequality among learners is statistically guaranteed.

Standardized, Norm-Referenced Assessment

Educational technologies can be used to deliver classical tests with no change in their underlying pedagogical and social presuppositions. In fact, they can intensify the process by mechanizing select response assessments (using computer supported psychometrics) and supply response assessments (using natural language processing). The “standardization” of inequality persists, albeit with ever more obscure algorithmic foundations. Mechanization means that educational systems can offer more tests, so teaching comes to be dominated by test prep, and the peculiar logic of the test.

But what could be different? How could educational technologies support other ways of measuring evidence of learning? If tests are linear, how could we create assessment processes that are more reflexive and recursive? In answering this question, we might learn from digital media. Not only are these intrinsically dialogical (captured in the difference between Web 2.0 and its predecessors), but the underlying data systems are recursive. Take for instance, the mechanisms that underlie “web reputation systems” (Farmer and Glass 2010)—the recursive reviewing processes that drive e-Bay, Amazon, or YouTube, with their incessant rating, commenting, commenting on comments, and ranking upvoting comments are useful. They are also dialogical. The “stickiness” of social media is in the feedback that comes with quick responses in the form of likes and retweets, then the response to response. Mass media (for instance, newspapers and television) were transmissive rather than dialogical, linear rather than recursive. So was didactic pedagogy. What is going to happen with education and training if we fail to address the disjunction of the traditional didactic discourses of school and the recursive ‘stickiness’ that keeps us engaged with the social media? In these media, not only have we now become active media creators, but we always have a responsive audience. We are always adapting based on friends’ or followers’ responses. If we don’t change our pedagogical ways, students will become (even more) disaffected with school.

Learners of today will not want to wait until the end of the course or the unit of work to be told “B-”, which is simply to say something like, “you’re a bad person, try harder next time.” They want and need continuous feedback. Not to be merely retrospective and judgmental, they require feedback that is prospective, constructive and constitutive of their learning. This may be a peer comment against the criterion of a rubric, or a comment in an online discussion. This builds upon and older tradition and literature on “formative assessment”, or assessment for learning—though all agree that formative assessment has been badly neglected given the longstanding and ongoing domination of our education systems by summative assessments (Armour-Thomas and Gordon 2013; Gorin 2013; Kaestle 2013; Ryan and Shepard 2008). The formative/summative distinction was first named by Michael Scriven in 1967 to describe educational evaluation, then applied by Benjamin Bloom and colleagues to assessment of learning (Airasian, Bloom, and Carroll 1971; Bloom 1968). The subsequent literature on formative assessment has consistently argued for its effectiveness (Baker 2007; Bass and Glaser 2004; Black and Wiliam 1998; OECD Centre for Educational Research and Innovation 2005; Shepard 2008; Wiliam 2011).

Moreover, instead of norm-referenced assessment, we might return to some other old but neglected notions. With rich, on-the-fly feedback from multiple sources and perspectives (machine, peers, teacher, self-reflection), it may be more possible for all students to achieve “mastery” (Bloom 1968). There is no reason why, against the measure of criterion-referenced assessment, all students in a class should not achieve criterion—particularly with a lot of formative feedback or interim assessment designed to bring all students up to criterion. In this context, moreover, it is not so relevant whether students meet criterion at a different pace, as long is they do. The measure then is self-referenced, or progress assessment.

Criterion-Referenced Assessment (where the aim is that every learner should reach the standard)
Self-Referenced Assessment (where progress may be at the learner's own pace and in their own way)

Could we create a no-failure educational paradigm where you can keep taking feedback until you are as good as you are supposed to be? Instead of the “B-” on the test at the end of the term in the course of that term a student may receive tens of thousands of small, incremental pieces of feedback that were responsive to their needs.

In Scholar, over the course of a single project (a piece of writing, documentation of a science experiment, a worked mathematical example, a case study of a workplace practice), students may receive many hundreds or even thousands of pieces of feedback in a process that is carefully designed by the teacher or the creator of the learning module: a comment from a peer against a criterion in a peer review rubric, a coded annotation, machine feedback from the natural language processor, an answer to a question a survey, a comment in a class discussion. It’s not just the teacher who is offering feedback and at the end. The sources are multiple—in fact there are many more items of peer and teacher feedback than a teacher alone could realistically offer. In the context of Web 2.0, this phenomenon is called “crowdsourcing” (Surowiecki 2004)—in this case crowdsourcing assessment. We have shown that average peer review ratings across multiple raters in Scholar align with expert ratings (Cope, Kalantzis, Abd-El-Khalick, and Bagley 2013).

Recursive Feedback in Scholar

In projects, for example, feedback is embedded, constructively contributing to the creation of a work during its draft phases. This involves a reframing of learning outcomes as described in standards, from retrospective and judgmental to prospective and constructive, suggesting to reviewers the kinds of feedback that might be most helpful in the revision of the work.

The result is an enormous amount of data in the Scholar Analytics area, in different forms and from multiple sources. The image below is a snapshot of the analytics area for an open plan learning environment where approximately 100 students are writing and offering peer feedback on each other’s projects. We have data showing version development, peer/self/teacher assessments, reviews written, annotations made—hundreds of thousands of words, generated over a week of work. It is possible for the teacher to drill down to see every detail including every piece of feedback and every change the student makes. They can do this at any time during the learning process, not just at the end when papers are turned in. Red warning signs might alert the teacher to a student in need of attention. Says the teacher in this space: “Analytics is allowing us to have insights that we never had, when with one teacher and a bunch of papers, it was just too overwhelming.”

Scholar Learning Analytics Overview Screen

The larger context for these educational technologies has been public discussion of the issue of “big data,” in society (Mayer-Schönberger and Cukier 2013; Podesta, Pritzker, Moniz, Holdern, and Zients 2014), and in education (Cope and Kalantzis 2015b; Cope and Kalantzis 2016; DiCerbo and Behrens 2014; Piety 2013). (For a long and technical article on the potential role of big data in education, see Cope and Kalantzis, "Big Data Comes to School".

We would like to make a series of propositions towards an agenda for the future of assessment:

  1. Assessment can increasingly be embedded in instruction, allowing us to realize long-held ambitions to offer richer formative assessment.
  2. We may now have so much interim learning or progress data, why do we even need these strange artifacts, summative assessments? With the help of data mashups and visualizations, the datapoints need only be those located within the learning process.
  3. Now that we can assess everything, and there is no learning without reflexive, recursive, machine feedback, peer and teacher feedback, and structured self-reflection, do we even need a distinction between instruction and assessment? There should be no instruction without embedded recursive feedback, and no feedback that does not directly and incrementally contribute to learning. Reflexive pedagogy ends the assessment/instruction distinction.
  4. The focus of what is assessable now shifts from individual cognition, to the artifacts of knowledge representation and their social provenance. It’s not what you can remember, but the knowledge artifact you can create, recognizing its sources in collective memory via links and citations, and tracing the collaborative construction process via the feedback offered by peers and teachers, and the revisions made in response.
  5. The focus of what is assessable moves from the repetition of facts and the correct application of theorems to what we call complex epistemic performance, or the kinds of analytical thinking that characterize disciplinary practices—being scientist, or a writer, or to apply mathematics to a problem.

The test is dead! Long live Assessment!

References

Airasian, Peter W., Benjamin S. Bloom, and John B. Carroll. 1971. "Mastery Learning: Theory and Practice." edited by J. H. Block. New York: Holt Rinehart & Winston.

Armour-Thomas, Eleanor and Edmund W. Gordon. 2013. "Toward an Understanding of Assessment as a Dynamic Component of Pedagogy " The Gordon Commission, Princeton NJ.

Baker, Eva L. 2007. "Moving to the Next Generation System Design: Integrating Cognition, Assessment, and Learning." National Center for Research on Evaluation, Standards, and Student Testing (CRESST), University of California, Los Angeles, Los Angeles.

Bass, Kristin M. and Robert Glaser. 2004. "Developing Assessments To Inform Teaching and Learning." National Center for Research on Evaluation, Standards, and Student Testing, Los Angeles CA.

Black, Paul and Dylan Wiliam. 1998. "Assessment and Classroom Learning." Assessment in Education 5:7-74.

Bloom, Benjamin S. 1968. "Learning For Mastery." Evaluation Comment 1:1-2.

Cope, Bill and Mary Kalantzis. 2015. "Sources of Evidence-of-Learning: Learning and Assessment in the Era of Big Data." Open Review of Educational Research 2:194–217.

—. 2016. "Big Data Comes to School: Implications for Learning, Assessment and Research." AERA Open 2:1-19.

Cope, Bill, Mary Kalantzis, Fouad Abd-El-Khalick, and Elizabeth Bagley. 2013. "Science in Writing: Learning Scientific Argument in Principle and Practice." e-Learning and Digital Media 10:420-441.

DiCerbo, Kristen E. and John T. Behrens. 2014. "Impacts of the Digital Ocean on Education." Pearson, London.

Farmer, F. Randall and Bryce Glass. 2010. Web Reputation Systems. Sebastapol CA: O'Reilly.

Gorin, Joanna S. 2013. "Assessment as Evidential Reasoning." The Gordon Commission, Princeton NJ.

Kaestle, Carl. 2013. "Testing Policy in the United States: A Historical Perspective " The Gordon Commission, Princeton NJ.

Mayer-Schönberger, Viktor and Kenneth Cukier. 2013. Big Data: A Revolution That Will Transform How We Live, Work, and Think. New York: Houghton Mifflin Harcourt.

OECD Centre for Educational Research and Innovation. 2005. "Formative Assessment: Improving Learning in Secondary Classrooms." Organisation for Economic Co-operation and Development, Paris.

Piety, Phillip J. 2013. Assessing the Big Data Movement. New York: Teachers College Press.

Podesta, John, Penny Pritzker, Ernest Moniz, John Holdern, and Jeffrey Zients. 2014. "Big Data: Seizing Opportunities, Preserving Values." Executive Office of the President.

Ryan, Katherine E. and Lorrie A. Shepard. 2008. "The Future of Test-based Accountability." New York: Routledge.

Shepard, L. 2008. "Formative Assessment: Caveat Emperator." Pp. 279-304 in The Future of Assessment, edited by C. A. Dwyer. Mahawah NJ: Lawrence Erlbaum.

Surowiecki, James. 2004. The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. New York: Doubleday.

Wiliam, Dylan. 2011. Embedded Formative Assessment. Bloomington IN: Solution Tree Press.

 

  • Maria Minerva Calimag
  • Tarlochan Kaur Pabla