I’d rather work late than let machines grade essays

Poor marks for a new technology

Sometimes when I’m halfway through a pile of 40 essays, I get tired. At these moments, if I had a grading machine, I would probably be tempted to insert the remaining essays and watch them pop out all freshly marked.

However, after 20 years of grading university essays, I know this would be terribly misguided. I’m here to help my students learn how to thrive in university—and beyond. In order to do that, they need to have strong thinking, reading, and writing skills. Machine generated grades will not help them develop these skills. With this in mind, I pick up my pen and go back to providing the constructive human feedback that will help them.

Just to be clear, I am not some remote professor in an ivory tower above Lake Ontario. In fact, I am a non-tenured “gun-for-hire,” fighting hard in the trenches of the humanities. This past year alone I have taught 914 students in six classes in writing, literature, and film. I have graded hundreds of exercises and essays, some brilliant, some hard to understand. I have worked with 19 teaching assistants, generated over 20 evaluation rubrics, assigned 12 essays. Basically, I have been battling to keep my head up, to keep teaching as well as I am able, and to keep grading the many essays that come my way.

This is nothing exceptional, actually. It’s what most humanities instructors do these days as classes expand and funds contract. While it can be extremely challenging, I think giving up this battle by letting machines grade students’ essays would have far greater negative consequences than even I would like to imagine.

Those who champion machine grading (most of whom, not coincidentally, also develop software) would rather focus on the inevitability of such automated education. The argument goes like this. Traditional higher education is old-fashioned and expensive. Our digital age demands highly specialized skills—not all those airy-fairy critical thinking, reading, and writing skills that are touted by elite, graying humanities professors. It follows that essays in the humanities—and perhaps the humanities altogether—must share in the pain that comes from such inevitable change. Of course there will be those who struggle against new pedagogical technologies, but they must be cranky luddites, frozen in the path of an avalanche that will destroy them if they do not jump on the digital bus… and fast.

Of course, the idea of machine grading essays is both economic and political. It is not simply a shortsighted idea generated by this or that brilliant engineer at M.I.T. or Harvard. It is an effect of a perceived demand for cheaper, faster, and more scientific (in other words, more consistent and objective) grading methods. But, as is the case with every effect, this effect will become the cause of some other effect: that is, machine grading will prioritize quantifiable aspects of writing (such as spelling and grammar), making those aspects most important to students who are writing-to-the-machine. This will cause students to focus on mechanics over content. This focus on mechanics will then lead students to generate shallow—though technically correct—essays, for which they will receive A’s. These A’s will shore up students’ sense of their own exceptionalism, even as they are really only signs of exceptional conformism. Sufficiently skilled and intellectually myopic, these workers will be ideal cogs in the wheels of progress because they won’t question their position in the machine.

The point is that this causal chain is reversible. We need to do five things to shift the course.

1. Challenge the emotionally loaded term, “progress.”

These days, the term “progress,” is just a sound bite that evokes a jolt of emotions. For some, progress means Tesla electric cars; for others it means the Keystone Pipeline. For some involved in higher education, the term “progress” means cutting costs by reducing brick-and-mortar redundancies. The question we have to ask and answer is this: What values lie behind our particular use of the term “progress”?

Few even try to explore the values behind machine grading, unless you count the half-baked gesture of Dr. Anant Agarwal (a professor at M.I.T. and the president of edX, which is the biggest machine grading software producer in North America) who alludes to some vague, “huge value in learning with instant feedback.”

What is this “huge value”? Well, Dr. Agarwal says, students like it. They say they “learn much better with instant feedback.”

Better? In comparison to what? And what is it they are actually learning, anyway? The answer is hard to miss: at best, they are learning basic literacy skills. Period.

Herein lies the Orwellian politics of machine grading: it’s for second-tier schools, not “prestigious universities,” which, as Professor Shermis confirms in the article, “do a much better job of providing feedback than a machine ever could”.

The circle is complete: the richest, best universities are spearheading technology that will enable the poorer schools to deal with their relative poverty by using grading software that will encourage dull, connect-the-dots writing without insight or critical thought. The elite will still be able to think outside the box—and so to make the rules of the game— while the poor will, increasingly, neither want to think outside the box nor be able to do so. They will be the ideal Proles, docile subjects who can manage to do the skilled whitecollar work that the Inner Party Members find dull, but who can never quite formulate any clear thoughts about how they themselves are positioned in and by the system in the first place.

2. Define what we mean by “writing.”

University writing is critical writing. In other words, university students write essays in which they make thoughtful claims, which evolve and develop in a reciprocal relationship with the evidence they find and analyze. Ideally, a really strong essay will also explore what is at stake in the claims in the first place. Such claim-based writing is not about the blind regurgitation of received ideas. It is about seeing outside the given frame, researching beyond what pops up first on Google, evaluating credible (and in-credible) sources, figuring out how these sources appeal to readers (even if the sources seem to lack logical reasoning), understanding the different positions at play, synthesizing what you analyze, and arriving at thoughtful interpretations. Finally, it is about articulating your own arguments clearly and coherently.

This kind of critical thinking and writing is as challenging to do as it is to teach. Instructors spend years honing pedagogical approaches that work; they then adapt these approaches in response to the knowledge, needs, and biases of each class and of individual students. Commenting on prewriting work (like outlines and rough drafts) is often a critical part of this process. There is no question that this gets harder with larger classes, but if instructors stick to their core definition of writing as a process, then they can come up with different commenting options (like online responses to outlines) that let them continue to engage in the teaching process, so they can continue to help their students engage in the thinking and writing process.

3. Consider what is really gained (and lost) by using a new technology.

A lot of new technologies seem cool at first. Clickers, for example, were going to be the rage a few years ago. With clickers you can ask questions and get immediate data on how many students answered one way or another. The problem? The questions have to be multiple choice, prepared in advance to generate useful data on the spot. This means endless hours of scripting on the instructor’s part, and rather dull—yet quantifiable—results on everyone’s part. Clickers are still used in some large classes, but to me they seem better suited for training dogs than for engaging students.

More recently, programmers and hobbyists everywhere are toying with peer-review and editing software. The problem? To date, these programs either act like glorified grammar checkers or demand so much from human beings that we humans are still much better off using our own brains to do the work. By using our brains we also benefit from the fact that we develop these brains, even as we hone our reviewing, editing, and communicating skills, all of which are increasingly necessary in today’s online matrix-structured work environments, where folks review one another’s proposals and drafts on a daily, if not hourly, basis.

4. Address human readers, not machines.

We are people who write to other people. In the process, we learn how to consider our audience as a person or people with particular needs, attitudes, and knowledge, and, in so doing, we learn how to accept that each of us is a person who both speaks and listens. Machines take away this all-important human element. We become passive observers in a truncated, virtual communication process that just doesn’t feel real… because it isn’t real.

5. Be alert when numbers replace logic.

Corporate innovators often dodge the work of logical reasoning by offering up numbers that seem to prove their point. This makes sense: if they want to make money, they need to find and present reassuring numbers to investors. As long as the numbers are accurate and objective, everything is fine. Right? Not so fast. What if the corporation is the one doing the research to figure out the numbers? What if they are designing the questions, picking the study subjects, and setting up the experiment? Of course, they will construct the study in a way that gets them the data they want—and they will publish the findings that they find informative.

This is the precisely the case for edX, as noted by M.I.T. professor Les Perelman. The problem? Faulty methodology. EdX never actually compared “the software directly to human graders.”  Why? Because they are not comparable. Certainly they can grade faster, but this begs the question of just what kind of grading they have done.  I am reminded of Woody Allen’s joke: “I just finished a course in speed reading.  It was fantastic.  Afterwards, I was able to read War and Peace in twenty minutes. It’s about Russia.”

I want to end with one final bonus point in celebration of all us human graders, teachers, students, and writers.

6) “Break any of these rules sooner than [do] anything outright barbarous.” (George Orwell)

The machine grading of essays may not seem like an earth-shattering issue, but as I have tried to suggest in this commentary, it may have serious consequences. The points I have outlined above are meant to explore some of the potential consequences of our increasing acceptance of lateral thinking and regurgitative writing, both of which are reinforced, or at the very least left unchallenged, by digitalized pedagogy. My hope is that when we are considering automated grading and other such high-tech options, we will remember what Marshall McLuhan put so well: “We shape our tools, and our tools shape us.”  We have shaped the grading machine, and my comments here are intended to highlight how this tool will, in turn, shape us . . . perhaps rendering us obsolete in the process.

Flynn lectures on English at the University of Toronto and co-authored The McGraw-Hill Handbook.