Write a wiki-like entry defining an assessment concept. Define the concept, describe how the concept translates into practice, and provide examples. Concepts could include any of the following, or choose another concept that you would like to define. Please send a message to both admins through Scholar indicating which you would like to choose - if possible, we only want one or two people defining each concept so, across the group, we have good coverage of concepts.
As educators we look for the best method to assess our students. Assessment falls into many forms each with its pros and cons. In Foundations of Education and Instructional Assessment/Assessment Strategies/Essays the authors state that "By utilizing essays as a mean of assessments, teachers are able to better survey what the student has learned. Multiple choice questions, by their very design, can be worked around. The student can guess, and has decent chance of getting the question right, even if they did not know the answer."
To get a true understanding of the students' accomplishments or knowledge, essay tests are more common than other methods such as multiple choice or fill in the blanks. The latter doesn't provide as much deductive reasoning or the ability to discuss a topic or problem. The issue, as stated by Diki (2006) is "Revision and feedback are essential aspects of the writing process. Students need to receive feedback to increase their writing quality. However, responding to student papers can be a burden for teachers." In this vein we need a tool that can offer both feedback, to improve writing style, and assessment on the mastery of the skill or topic described.
In the evaluation of Automated Essay Assessment or AEA success is defined in the following four areas:
AEA is found by the same rules of applied to any assessment tool: validity, fairness, and reliability.
Machine grading of essays, later called Automated Essay Assessment can trace its roots to Ellis Page back in 1966. Page created the forefunner of Project Essay Grade, one of the main tools still in use today. The first actual use of scoring by computer was in 1997 using a tool called Intelligent Essay Assessor (IEA).
Statistics on the dissemination and use of AEA are difficult to find. There are case studies but no full district wide or education system wide studies available at this time. Some of this is due to proprietary nature of the software, others due to the criticism of using AEA.
Project Essay Grade provides the following metrics from http://www.measurementinc.com/Solutions/AssessmentTechnologies
- PEG used by MI to provide over two million scores to students over the past five years.
- PEG is currently being used by one state as the sole scoring method on the state summative writing assessment, and we have conducted pilot studies with three other states.
- PEG is currently being used in 1,000 schools and 3,000 public libraries as a formative assessment tool.
The number of tools available to those who wish to automate essay assessment are growing. The below list shows those that are more developed both in the proprietary and open source realms.
Current Automated Essay Assessment Tools
Tool Publisher Type Notes eRater Educational Testing Service Proprietary https://www.ets.org/erater/about Intellimetric Vantage Learning Proprietary http://www.mccanntesting.com/products-services/intellimetric/ Project Essay Grade Measurement, Inc Proprietary http://www.measurementinc.com/Solutions/AssessmentTechnologies LightBox Lightside Labs Open Source https://www.getlightbox.com/#/ EASE (Enhanced AI Scoring Engine) EdX Open Source https://github.com/edx/ease
In order to truly evaluate AEA it is critical to understand the capabilities that these program have and how they relate to the overall "burden on faculty" that set the need for these tools. In many software programs we would be able to list the main capabilites that each of these tools have. This field is in such an infancy or not near maturity, that these are not fully outlined. To underscore this, in 2012 the Hewlett Foundation (https://www.kaggle.com/c/asap-aes)created a competition In AEA to develop an automated scoring algorithim.
This challenge had three main parts:
The fact that the products above to not confrom to a set of capabilities nor are they fully disclosed so a rubric can be applied to assess what tool is best for what application.
While not a fully exhaustive list or a compare and contrast, below are the self described capabilities of each. Each of these lists are dervived from the information provided on their websites listed above, many are more corporate press releases than detailed capabilities of the products.
As with any technological advance there are critics. Automated Essay Assessment is not unique. In this case there are three major areas of criticism:
There are those who say that these programs achieve the main assumptions and goals set at the beginning of this work: That these tools help reduce the burden on instructor while giving students greater, more reliable, more instant feedback, and help improve students' writing. One study by Scott Jaschik (2011), reported that computer scoring is more consistent than fallible human raters. Similarly, Peter Foltz promoted the idea that these tools give instant feedback as a formative assessment for students. In a recent New York Times article (2013), Dr. Agarwal, president of EdX, said he believed that the software was nearing the capability of human grading. He further states that “This is machine learning and there is a long way to go, but it’s good enough and the upside is huge,” he said. “We found that the quality of the grading is similar to the variation you find from instructor to instructor.”
In a 2012 press release from the Automated Student Assessment Prize, the ability for automated assessment tools have shown how they stack up compared to human evaluation.
The rise of these programs in High-Stakes Assessment has even lead to petitions being created to stop the use of automated essay assessment. A humanreaders.org petition calls for legislators and policy makers to "stop mandating essay scores generated by machines to make crucial decisions such as grade promotion, academic placement, graduation, school ranking, school accreditation, or teacher qualification, promotion, and pay".
There are both proponents and critics to the automated essay assessment movement. As we have larger classrooms with more required writing samples these tools will continue to be deployed. As this is a very young area, there are more questions than answers. A few of the most pressing are:
To satisfy proponents and assuage critics, additional research into AEA tools, their effectiveness, and alternative human methods should be completed..