Saturday, June 16, 2007


THE FALLACY OF "HARD" TESTS



(Ithacaunleashed.blogspot.com)

A great deal of fuss is often made about failing the bar exam. The news a few weeks ago was that Governor Patakis daughter passed the exam, but it is always mentioned that it was her second try. Similarly, John Kennedy, Jr. failed the New York bar exam twice, before finally passing it on his third try.

As one who took several medical licensure and specialist exams, and the Virginia bar exam, passing all, I might be inclined to pat myself on the back, but my former background as a mathematician won’t let me do that. I do remember, however, some remarks from a noted orthopedic surgeon about his own specialty exam: “It was a hellishly hard test, and went on for hours,” he said, ”but I’m really glad I passed the first time I took it. Only about 35 percent who took it passed the exam.”

He was describing, with only the slightest tinge of boastfulness, the qualifying exam for specialists in orthopedic surgery. Passing the exam entitled one to join the “college” of orthopedic surgeons, and list oneself as specialist.

“Was it all multiple choice?” I asked. “And how did they grade it?” I was thinking of my own exams. “Did they count only the right answers.?”

When he said Yes to all the questions questions, I did not have the heart to tell him what I knew as a mathematical certainty—that the exam was, like most graduate medical exams, and large parts of legal licensing bar exams in most states , virtually a complete fraud.

The reason these tests are fraudulent—and the harder they are, the more they are fraudulent—is that for an extremely difficult test graded in that way, guessing tends to count much more than knowledge.

A simple example will describe why this is the case. To illustrate this, consider an extreme case.

Suppose you and I take a test, and you know twice as much as I do. For simplicity (this is the extreme case) suppose the test consists of 100 questions, each True or False, and moreover (this is the key point), let us agree that the test will be graded by only counting the number right.

Naturally, both of us will guess at an answer for those questions that stump us.

Now suppose the test is very hard. As hard as it could be actually. Suppose the test is so hard that I, with lesser knowledge, can only answer one question based on actual knowledge. I answer that question, and guess at the other 99. You, who know twice as much as I, can answer two questions based on knowledge. So you guess at 98 answers.

As you can readily imagine, the odds of you getting a higher grade than I are very slight. In fact, over 45 percent of the time, in repeated trials, I would outscore you, even though my knowledge is half that of yours.

I chose a True-False test for this example, but it doesn’t make any real difference were the test to be multiple choice with several choices in each question. The only thing that makes a difference is how hard is the test. Your advantage would grow substantially as the test was weakened.

For further example, if the test was so easy, and you so well-versed in the subject that you could get a perfect score, and I knew half as much, I would answer 50 questions based on knowledge, and guess at 50. In the long run, I would get half of those 50 correct, for a final score of 75. So you get 100, and I get 75, on the average.

Were the test to be multiple choice, with four choices for each question, and your knowledge was also 100 percent and mine half that, I would then (guessing at 50) get a score of 50 + (1/4 times 50), or 62.5. on the average.

These extreme cases demonstrate the point, that truly hard multiple choice tests, graded by counting only the number right and ignoring guessing, are fraudulent.

But suppose the grading attempts to adjust for guessing. There is no way of knowing what is in the mind of the test-taker, so the customary is to subtract, from the number correct, some fraction of the number wrong.

For True-False exams for example, the number subtracted would most likely be (Number Wrong ÷ 2). Let’s see how that would work out, for the sample case above. You, answering two questions correctly and guessing at 98 would be likely, on the average, to get 49 wrong, and so have a final score of 2 + 49 - (49 ÷ 2), or 75.5, while I, again on the average. answering only 1 correctly and guessing at 97, would get a final score of 1 + (97 ÷ 2) - ((97 ÷ 2) ÷ 2)), which comes out to be 25.25. Here there is a substantial difference between our scores, closer to the two-fold difference in our actual knowledge.

The situation is only a bit more complex for multiple choice tests with four or five questions, and you can readily calculate the variation between the knowledgeable you, and the ignorant me. As an old math teacher might say, we leave that for the reader to work out by himself or herself.