Saturday, May 17, 2014

Computer Scoring of Children:

A Tower of Babel?

After I got myself fired from my job in the Dept. of Education, the former Director of the National Center for Education Statistics, Marie Eldridge, who became a friend of mine while I was in the Department (personal,not professional!) called me at the time such computer assessment of "writing/essays" was being proposed by the Department. She was appalled and, being quite knowledgeable herself regarding how computers worked, said she couldn't believe such an assessment policy could possibly be accepted.

A May 3, 2014 article in Education Week titled "Computerized Grading: Purloining the Analysis, the Most Fundamental Exposition of Humanity." The Ed Week article begins:

Les Perelman, a researcher at the Massachusetts Institute of Technology, has been doing some interesting experiments to test the capacity of computerized grading systems to accurately judge the quality of written work.
Below are some comments by education researcher Anita Hoge concerning this Ed Week article:

An Experiment in Babel....but does it make sense?  
This may be another aspect of liability that parents may be able to use to fight the testing, or collection of data. Computer scoring of essays can not adequately or fairly judge an essay because the algorithms are limited. Machine scoring essays are not truly scoring human communication???!!

Here is the point. Computers CANNOT adequately SCORE essays. Machines DO NOT know how to score or understand true human communication. Why? Algorithms have limited constraints on scoring. Because kids are smart enough to "beat the machines", they can figure out the algorithm the computer is using to score their work. This algorithm prizes the use of obscure vocabulary, along with length. Throw enough big words in an essay, and write long enough, and you will get a good score, proficiency plus. 


According to the Ed Week article, Perelmen admitted "his purpose":

I did this as an experiment to show that what these computers are grading does not have anything to do with human communication. If you think about writing or any kind of human communication as the transfer of thoughts from one mind to another mind, then if the machine takes something that anyone would say is complete incoherent nonsense, and scores it highly, and we know that it's not, then we know that it's not grading human communication.
Students will quickly learn how to game the system instead of learning how to write intelligibly. Use big words and long sentences. Impress the machine. Meaning doesn't matter. The Ed Week article admits:

Unfortunately, the system breaks down when we do what Mr. Perelman has done. He has figured out the algorithm the computer is using to score the student work. This algorithm prizes the use of obscure vocabulary, along with length. Throw enough big words in an essay, and write long enough, and you will get a good score. Given human capacity to do what Mr. Perelman has done with his software, it is likely that once students figure out these algorithms, they can similarly generate essays that are loquacious without being elucidating. [emphasis added]
Read the whole Ed Week article HERE.