Rupert Goodwins' IQ Is Lower than ChatGPT's: Or, Yes, IQ Tests Are a Useful Measure of AI Performance
In a recent article, Rupert Goodwins dismissed the value of making AI’s like ChatGPT take IQ tests. In doing so, he proved that he might be suffering from some cognitive deficiencies of his own.
If IQ is a complete pseudoscience, then assuredly you should oppose its use in determining whether individuals qualify for the death penalty. Yet, I seriously doubt that anyone who utters this bromide would support that position. At the very least, IQ tests are useful tools for diagnosing cognitive deficiency: Most of the individuals who call IQ a pseudoscience not only support reduced sentencing for those who score poorly on them but also support disability payments and increased government services for those whom IQ tests diagnose as being “mentally challenged.” People making these sorts of statements would do well to think through their real world consequences and whether they would stand by all of them. If IQ is a pseudoscience, then certainly it should not be used to determine whether people have access to special education programs and disability payments. Not even the world’s most dedicated new ager would want astrology to determine who can receive government assistance. Of course, there are those who would retort that Downs Syndrome can be diagnosed chromosomally, but there are many people whose receive payments because of their mentally challenged status and do not have an easily diagnosed underlying disorder.
Psychometrics, which includes IQ tests as well as tests to diagnose depression, autism, etc., is a complicated field. It is not, however, a pseudoscience. The statistical findings that underpin it are far more robust, and for more replicable, than the vast majority of findings in the social sciences---including economics, which is the most successful of the social sciences despite its many crises.
It is perfectly valid to test AI against human IQ tests---esp. if your goal is to find deficiencies in the AI. Most people who oppose IQ philosophically accept that IQ is a valid measure of deficiency at some threshold---whether they place that point at a score of 75 or 60.
Indeed, even if you think IQ is a pseudoscience, the fact that we have a series of tasks that humans have performed over and over again---and whose relationship to other task types has been studied in depth---makes it a useful repository of field tests. If IQ is valid, applying it to AI is certainly valid. However, the converse is not true: The fact that one thinks IQ is not a valid construct when applied to humans doesn’t mean IQ tests are not a useful way of field testing AI. After all, the machines must be tested in some way or other—and the main criticisms against IQ tests, that they are culturally biased, cannot be leveraged against AI’s since they do not partake in human culture (and can have the forms of culture they partake in carefully controlled during their construction).
We have intelligent people who have put a great deal of effort into creating tasks meant to challenge others and carefully collating statistics about what percentage of the population, for better or worse, is able to solve these tasks: Why not use them to challenge and test AI? It would be wasteful not to utilize this vast repository.
Here is the link to Goodwins' article: https://www.theregister.com/2022/12/12/chatgpt_has_mastered_the_confidence/