Rationality Quotient

Book review: The Rationality Quotient: Toward a Test of Rational Thinking, by Keith E. Stanovich, Richard F. West and Maggie E. Toplak.

This book describes an important approach to measuring individual rationality: an RQ test that loosely resembles an IQ test. But it pays inadequate attention to the most important problems with tests of rationality.

Coachability

My biggest concern about rationality testing is what happens when people anticipate the test and are motivated to maximize their scores (as is the case with IQ tests). Do they:

learn to score high by “cheating” (i.e. learn what answers the test wants, without learning to apply that knowledge outside of the test)?
learn to score high by becoming more rational?
not change their score much, because they’re already motivated to do as well as their aptitudes allow (as is mostly the case with IQ tests)?

Alas, the book treats these issues as an afterthought. Their test knowingly uses questions for which cheating would be straightforward, such as asking whether the test subject believes in science, and whether they prefer to get $85 now rather than $100 in three months. (If they could use real money, that would drastically reduce my concerns about cheating. I’m almost tempted to advocate doing that, but doing so would hinder widespread adoption of the test, even if using real money added enough value to pay for itself.)

There will be difficult issues over coachability even if the test is carefully designed to minimize cheating.

For example, the knowledge calibration subtest requires a modest amount of rationality in order to score well, given the scoring rules described in the book: anyone who can be deliberately be underconfident can score perfectly here. That’s maybe hard enough that many people will score poorly even if they’re well coached.

It would be simple to change the scoring so as to penalize underconfidence as well as overconfidence, in which case a high score would require a pretty rational evaluation of one’s confidence. If I were using the test to hire the most rational employees, I’d certainly want this improvement, so that I could distinguish the top 1% from the rest of the top 10%, whereas I’d guess that scoring as described in the book would generate perfect scores for the most rational 5 or 10% if they knew what to expect.

Even with a really good calibration test, there’s plenty of uncertainty about how well it generalizes to real-world problems. Being well calibrated requires both an understanding of how to evaluate one’s confidence, and a desire to be well calibrated. A nontrivial part of why overconfidence is common is that it’s rewarded. So I expect people to sometimes succeed at avoiding overconfidence while taking a test such as this, while eagerly embracing overconfidence in other contexts.

The probability matching questions look easy to coach for. Learning to answer them rationally would cause a very real increase in potentially valuable knowledge, and it seems quite hard to predict how well that would transfer to real-world situations.

On the other hand, the anchoring subtest looks fairly hard to coach for.

Other limitations

The authors ended up with four different versions of their test, with typical times required to finish ranging from under 30 minutes, to several hours. Much of the book discusses the longest version of the test. But that’s clearly less valuable than the shorter versions. The costs of administering the long version are enough to make it relatively unimportant. But my main complaint is that it’s more heavily weighted toward culturally biased questions and questions that are easier to cheat on.

The test length seems to result from a strong desire to be comprehensive. I saw no clear explanation of why being comprehensive is a valuable enough goal to justify the costs of long tests.

Another important shortcoming of the test is that it’s more culture-dependent that I’d want. Understanding the term “no-load mutual fund” may be good evidence of rationality in some important contexts. But the authors don’t seem to have a coherent idea of how it will add value to the test. My guess is that the evidence it provides for genuine rationality is moderately redundant with the rest of the test, and that a large part of the non-redundant information that it adds will be about cultural circumstances that have only a modest connection to rationality.

The conspiracy theory subtest rewards subjects for strongly rejecting conspiracy claims. That will end up measuring some weird mixture of: (1) overconfidence; (2) use of scientific reasoning; and (3) loyalty to pro-science subcultures. They could easily have avoided rewarding overconfidence, by scoring weak disagreement with conspiracy claims at least as highly as they score strong disagreement [1]. But I suspect that even with that improvement, it wouldn’t do as well as some of the other subtests at measuring the rationality failures that underlie typical conspiracy beliefs.

Even with highly optimistic assumptions about how well those questions are designed, the “correct” response will be harder to objectively determine than with IQ test questions. IQ tests focus on fairly artificial domains in order to minimize disputes about which response is correct. The RQ test has many questions whose quality is similar to the quality of IQ tests. But the RQ test also includes a number of questions which can’t be answered so rigorously.

The argument evaluation subtest has potential ideological biases in how it is scored. It involves evaluating how strong several political arguments are. The “correct” answers were selected by a not-very-diverse-looking set of experts. The answers have enough subjectivity built into them that it would be easy for the experts’ political opinions to influence their choices. This isn’t a very serious problem if the current versions of the RQ test are treated as prototypes. And this subtest is only included in the (flawed) long version of the RQ test. But it would have been reassuring if the authors had expressed some concern for the rigor of this subtest.

What does it accomplish?

The book shows evidence that RQ scores increase as a function of years in college, but they suggest that might just be due to selection effects. One possibility that I thought of is that scores tend to increase with age. Yet in a book that is filled with reports of mostly unimportant correlations, they don’t seem to mention how RQ scores correlate with age (except to note that subjects from Mechanical Turk were much older than student subjects, and scored higher – that’s likely influenced by selection effects).

The authors are clearer about explaining what their goals aren’t (e.g. making a better test of intelligence) than at describing what does motivate them.

I was fairly confused about how much they think they’ve accomplished, until almost the end of the book. I recommend reading chapter 14 before reading about the individual components, in order to reduce that confusion.

The authors claim that their test should be more credible with the general public than IQ tests, because they “included measures saturated with specific knowledge domains that are relevant to everyday life”. That seems naive: that makes it more likely to be criticized for having cultural biases, and it doesn’t eliminate the aspects that make IQ tests controversial.

I expect any rationality test to become controversial when used outside of academic research. Many people won’t want to be ranked by their rationality. Better tests of rationality will cause those people to become more hostile.

Conclusion

Psychology seems ripe for a Kuhnian paradigm shift. This book seems heavily focused on the old paradigm, with a strategy of finding as many real phenomena as possible, and paying little attention to their importance. Roughly half of the research that went into the book could form part of a research program under a better paradigm in which researchers care more about distinguishing important effects from trivial ones.

The book is mostly written to appeal to fairly specialized researchers (much more so than Stanovich’s prior books), and is inadequately focused on the value of information. In spite of that, it represents important progress.

I hope it will catalyze more research into rationality testing by people with more ambition for promoting rationality training and/or for helping employers to hire the best workers.

Excessively Long, Trollish(?) Footnote

[1]

One example is:

The pharmaceutical industry has conspired with the medical industry to fabricate new diseases in order to make money.

I agree with the authors that there’s something wrong with strongly agreeing with that claim. But I object to scoring an answer of “strongly disagree” as more rational than “disagree moderately”.

Part of my objection is that it’s hard to translate that into something clear enough that it could be evaluated at all rigorously. Does “fabricate” mean create new pathogens, or new disease categories?

Testing for conspiracy beliefs seems to naturally tempt test designers to create ambiguous language, because precise reference to a specific crackpot theory might exclude many similar theories (there are many more crazy models of the world than there are sane ones).

I’ll analyze the “new disease category” interpretation.

It’s fairly easy to find evidence that the medical system has been influenced by drug company’s incentives.

There are plenty of things that have been classified as diseases, but which many people consider to be not a disease:

homosexuality
Hypoactive sexual desire disorder: similar to homosexuality, except they found a drug for it?
Drapetomania and Dysaesthesia aethiopica: ok, maybe I’m exaggerating a bit here – for all I know it could have been mainly one doctor using these classifications.
conjoined twins: often treated as a disease (although I don’t see much discussion of that classification), in spite of the fact that most people with the condition don’t consider it a disease, and don’t want it “cured”.
hyperlipidemia: maybe this condition contributes to heart disease, but after massive study by the statin industry, the scientific literature still treats the subject as debatable (beware that the linked article is somewhat one-sided).

When the medical system admits it was wrong about one of these, it doesn’t appear to adopt a general rule to prevent itself from creating other fake diseases. So I’m unclear why we should expect them to have stopped creating fake diseases.

The motives for creating fake diseases seem sufficiently complex that I don’t expect a clear-cut case of creating one in order for drug companies to profit, but I won’t be very shocked if I do find a case where that looks like an important motive.

So I’d consider expressing weak opinions about this conspiracy theory to be more rational than any strong opinion.

Bayesian Investor Blog

Ramblings of a somewhat libertarian stock market speculator