Bill Walsh, Contributing Writer
I actually felt a chill run down my spine when I saw the headline on the front page of the education newspaper I was reading: "Pennsylvania tests essay-grading software." The sub-head was even scarier: "Officials mull using artificial-intelligence system to score state exams."
It was not a joke. It was frighteningly real.
The Pennsylvania Department of Education is experimenting with a new, artificial-intelligence piece of software called Intellimetric, put out by Vantage Technologies. How it works is that actual teachers score a few hundred essays on a scale from 1-5. Those essays and grades are then fed into a computer which "learns" the characteristics of, say, a 5 rating or a 2 rating. Once the computer has mastered the standards for grading the essays, it can be let loose to grade thousands of essays based on the same criteria, in practically no time (three to six seconds each) and at a greatly reduced cost.
Vantage Technologies claim they can prove that "after 30 studies over three years, it's more accurate than expert scorers." Already the College Board, Edison Schools, and the Thompson Learning Company are using the program to grade student essays. And the state of Pennsylvania is currently conducting field tests to see if they can use the program to grade student writing on the Pennsylvania version of the MCAS tests.
Now I know that a computer program can correct (or catch) spelling errors, and I've even used a program which purports to correct for grammar (on my own writing - not on that of my students). The particular program I used disliked dashes and apostrophes (which I tend to use a lot) and treated each of them as an error (which is why I don't use it anymore). It also wanted me to "correct" every sentence written in the passive voice. And not to use fragments - even for effect.
But I've yet to see (or even imagine) a computer program which can truly understand what a writer is trying to say. No program can grade humor or an imaginatively-expressed idea. No computer can appreciate subtlety or fine shades of distinction between words. It can't follow a thought.
This is not just about an English teacher trying to protect his job against technology. This is about realizing that writing is about communicating - expressing ideas. It is a medium which requires an active writer and an active reader as well. Writing needs to be processed - thought about and understood. At the very least, writing requires a breathing reader.
Ideally, grading a student's essay takes many factors into consideration. Graders might want to know if the writer knows what he's talking about, whether he's sincere or sarcastic, whether he writes with clarity or merely to impress the reader. The best correcting is done by someone who knows the writer personally - knows how he thinks and how he expresses himself. Actually, the very best correcting is more than simply slapping a number from 1-5 on an essay; it requires a personal response from the reader. Suggestions, arguments, comments, praise for a well-expressed thought and correction for sloppy expression are all necessary.
Think back to your best English teacher. Weren't the best ones the ones who actually wrote comments on your paper as well as a grade or who talked to you about what you had written?
The funny thing is that later in the day (after I had read the news article), a fellow English teacher offered me a job. It seems that some college or university had approached him and asked him to grade student essays online - via his computer. For various reasons, he decided not to take them up on their offer, so he passed it along to me.
Grading essays on the Internet? Without knowing the kid (or even seeing him)? I, of course, thanked him but rejected the offer (as I hope any English teacher would). What makes this particularly frightening and discouraging is that the offer came from a major college which (presumably) thinks that anonymous essays can be graded by a disembodied corrector miles away.
We're in the 21st century, and I've actually gotten used to people trying to find a cheaper or a faster way to do something - I just never thought colleges or schools or Departments of Education would succumb to this very bad idea.
Writing is personal. We call it that because it involves people - one to write the words which represent thoughts, ideas, realizations or explanations and at least one to read them. A machine can't do it.
I'm certainly not anti-technology. I use computers and programs and the Internet. They're great for many things. But I've never met one yet which could understand an idea, be shocked, smile, or be impressed. Never met one which appreciated innovation or imagination. Never seen one that could read - only process.
Maybe I ought to open a coaching school to teach kids how to ace these computer-graded essays. I could teach them to use large, multi-syllabic words where easier ones would do ("nevertheless" instead of "but," "at that point in time" rather than "then"). If the program wanted precise grammar, I could teach them that a preposition is a bad word to end a sentence with. If it valued punctuation, I'd stress semicolons - a VERY impressive punctuation mark. (I'd teach them to avoid parenthesis, too).
I could teach them how to write an essay which a computer would love.
But that wouldn't be teaching them how to write for people, how to write with feeling or understanding, how to express their thoughts in creative ways or even to explain themselves plainly.
Grading essays with a computer is a bad idea. Educators ought to know better, and (frankly) I'm ashamed of those in my profession who are even considering the scheme.
Good writing is (with apologies to President Lincoln), "of people, by people, and for people."
“There is a huge value in learning with instant feedback,” Dr. Agarwal said. “Students are telling us they learn much better with instant feedback.”
But skeptics say the automated system is no match for live teachers. One longtime critic, Les Perelman, has drawn national attention several times for putting together nonsense essays that have fooled software grading programs into giving high marks. He has also been highly critical of studies that purport to show that the software compares well to human graders.
“My first and greatest objection to the research is that they did not have any valid statistical test comparing the software directly to human graders,” said Mr. Perelman, a retired director of writing and a current researcher at M.I.T.
He is among a group of educators who last month began circulating a petition opposing automated assessment software. The group, which calls itself Professionals Against Machine Scoring of Student Essays in High-Stakes Assessment, has collected nearly 2,000 signatures, including some from luminaries like .
“Let’s face the realities of automatic essay scoring,” the group’s statement reads in part. “Computers cannot ‘read.’ They cannot measure the essentials of effective written communication: accuracy, reasoning, adequacy of evidence, good sense, ethical stance, convincing argument, meaningful organization, clarity, and veracity, among others.”
But EdX expects its software to be adopted widely by schools and universities. EdX offers free online classes from Harvard, M.I.T. and the ; this fall, it will add classes from Wellesley, Georgetown and the . In all, 12 universities participate in EdX, which offers certificates for course completion and has said that it plans to continue to expand next year, including adding international schools.
The EdX assessment tool requires human teachers, or graders, to first grade 100 essays or essay questions. The system then uses a variety of machine-learning techniques to train itself to be able to grade any number of essays or answers automatically and almost instantaneously.
The software will assign a grade depending on the scoring system created by the teacher, whether it is a letter grade or numerical rank. It will also provide general feedback, like telling a student whether an answer was on topic or not.
Dr. Agarwal said he believed that the software was nearing the capability of human grading.
“This is machine learning and there is a long way to go, but it’s good enough and the upside is huge,” he said. “We found that the quality of the grading is similar to the variation you find from instructor to instructor.”
EdX is not the first to use automated assessment technology, which dates to early mainframe computers in the 1960s. There is now a range of companies offering commercial programs to grade written test answers, and four states — , , and — are using some form of the technology in secondary schools. A fifth, , has experimented with it. In some cases the software is used as a “second reader,” to check the reliability of the human graders.
But the growing influence of the EdX consortium to set standards is likely to give the technology a boost. On Tuesday, Stanford announced that it would work with EdX to develop a joint educational system that will incorporate the automated assessment technology.
Two start-ups, Coursera and Udacity, recently founded by Stanford faculty members to create “massive open online courses,” or MOOCs, are also committed to automated assessment systems because of the value of instant feedback.
“It allows students to get immediate feedback on their work, so that learning turns into a game, with students naturally gravitating toward resubmitting the work until they get it right,” said Daphne Koller, a computer scientist and a founder of Coursera.
Last year the Hewlett Foundation, a grant-making organization set up by one of the founders and his wife, sponsored two $100,000 prizes aimed at improving software that grades essays and short answers. More than 150 teams entered each category. A winner of one of the Hewlett contests, Vik Paruchuri, was hired by EdX to help design its assessment software.
“One of our focuses is to help kids learn how to think critically,” said Victor Vuchic, a program officer at the Hewlett Foundation. “It’s probably impossible to do that with multiple-choice tests. The challenge is that this requires human graders, and so they cost a lot more and they take a lot more time.”
Mark D. Shermis, a professor at the University of Akron in , supervised the Hewlett Foundation’s contest on automated essay scoring and wrote a paper about the experiment. In his view, the technology — though imperfect — has a place in educational settings.
With increasingly large classes, it is impossible for most teachers to give students meaningful feedback on writing assignments, he said. Plus, he noted, critics of the technology have tended to come from the nation’s best universities, where the level of pedagogy is much better than at most schools.
“Often they come from very prestigious institutions where, in fact, they do a much better job of providing feedback than a machine ever could,” Dr. Shermis said. “There seems to be a lack of appreciation of what is actually going on in the real world.”Continue reading the main story