This morning, I was directed to an online vocabulary test that claimed to be able to estimate one’s vocabulary. I did take the test, but I also spotted several problems with it:
- It’s self-selecting. That means that people who value high vocabularies are more likely to take it, and are more likely to pass it along to friends.
- There’s no extrinsic verification. People are expected to check off words based on the Honor System. More on this later.
Entertaining, it is. Scientifically valid in its own right, not so much, although it’s not entirely useless, either.
One of their demographic questions is the respondent’s SAT score. They note on their blog that there’s a strong correlation between SAT verbal score and the results of the test. They take this to mean that the SAT verbal test is valid.
However, that’s not necessarily true. Let’s say that my “accurate” score on this test is 35,000 words; that is, that’s the score I would get if there were some way to tap into my brain, around my various filters, and find out what I really know.
What are some reasons why I would score lower than that? Two that come immediately to mind:
- I have an unrealistically high threshold for “knowing” a definition. Being able to use it in a sentence or pick out a definition from a list is not sufficient; I expect myself to be able to phrase my own definition.
- I’m risk-averse. I don’t want to go out on a limb and risk being “called on it” later.
What are some reasons why I would score higher than that? Pretty much, the complements of the points above:
- I have an unrealistically low threshold for “knowing” a definition. I’ve seen the word around, I think I know what it means, there’s no double check, and so on.
- I’m over-confident. I’m willing to give myself credit for knowing things I don’t really know.
So, okay, what about the correlation with the SATs?
Simple enough. Standardized tests of any sort test two things, commingled in a way that it’s nigh on impossible to disentangle them: The mastery of the topic being tested, and the test subject’s test taking strategies. The latter includes one’s ability to override natural inhibitions and go with “gut instinct”. Logically at least, risk-averse people do more poorly on standardized tests because they talk themselves out of answers.
It wouldn’t be surprising that two different standardized tests on vocabulary showed similar results, and it wouldn’t necessarily say that either test on vocabulary was accurate at testing vocabulary.
However, the correlation with the SATs points to something very interesting, albeit something that would need rigorous scientific exploration for confirmation and impact. If we handed a student an SAT test and said, “Just tell us which ones you think you’d get right, and that’s the score you’ll get,” students wouldn’t be honest because there’s a clear, extrinsic reward for lying. In this case, there’s no real extrinsic reward, beyond bragging rights. And there’s no confirmation, like there is with the SAT. Removing the extrinsic threat of being formally graded leads to people being fairly honest about how well they’d perform on the test if they were being graded.
To that point, it doesn’t particularly matter whether they’d do well because of mastery or because of test-taking skills. What’s of the most value is that even if you don’t (formally) grade people, you can get a sense of their knowledge.
Reflecting on this, I thought of the difference between ACT/SAT style questions and MTTC math endorsement questions. For instance, consider these questions:
- Farmer Joe has 200 feet of fencing, and wants to make a corral. To maximize the area, he’s building the corral along the straight stone wall on his property line. Which of this is the largest area he can build?
- Farmer Joe has 200 feet of fencing, and wants to make a corral. To maximize the area, he’s building the corral along the straight stone wall on his property line. Which of these is the formula for the largest area he can build?
The first is the sort of question I recall from standardized tests like the ACT or the SAT. The second is the sort of question that appears on the MTTC, which is a test for teacher certification. When I first started studying for the MTTC, I was astounded: The question is easier.
It later occurred to me, though, that the second question is basically testing what needs to be tested: Can you set up the problem correctly? Whether you get the correct answer is somewhat ancillary to that. You can set up a problem incorrectly and accidentally come up with the right answer, and you can set up a problem correctly and accidentally come up with the wrong answer.
On a formative level, there’s a yet more primitive question to ask. Starting with the assumptions in the questions above, students could be asked, “Could you figure out the answer?” And, beyond that, “What information do you need to have to answer this question?”
At this point, this information is merely rattling around in my brain. However, the correlation between SAT scores and self-reporting “yeah, I know that word” scores is indeed intriguing. It opens potential innovative possibilities for measuring student mastery that don’t necessarily involve careful grading of materials.