The Flynn Effect

Book review: The Measure of All Minds: Evaluating Natural and Artificial Intelligence, by José Hernández-Orallo.

Much of this book consists of surveys of the psychometric literature. But the best parts of the book involve original results that bring more rigor and generality to the field. The best parts of the book approach the quality that I saw in Judea Pearl’s Causality, and E.T. Jaynes’ Probability Theory, but Measure of All Minds achieves a smaller fraction of its author’s ambitions, and is sometimes poorly focused.

Hernández-Orallo has an impressive ambition: measure intelligence for any agent. The book mentions a wide variety of agents, such as normal humans, infants, deaf-blind humans, human teams, dogs, bacteria, Q-learning algorithms, etc.

The book is aimed at a narrow and fairly unusual target audience. Much of it reads like it’s directed at psychology researchers, but the more original parts of the book require thinking like a mathematician.

The survey part seems pretty comprehensive, but I wasn’t satisfied with his ability to distinguish the valuable parts (although he did a good job of ignoring the politicized rants that plague many discussions of this subject).

For nearly the first 200 pages of the book, I was mostly wondering whether the book would address anything important enough for me to want to read to the end. Then I reached an impressive part: a description of an objective IQ-like measure. Hernández-Orallo offers a test (called the C-test) which:

  • measures a well-defined concept: sequential inductive inference,
  • defines the correct responses using an objective rule (based on Kolmogorov complexity),
  • with essentially no arbitrary cultural bias (the main feature that looks like an arbitrary cultural bias is the choice of alphabet and its order)[1],
  • and gives results in objective units (based on Levin’s Kt).

Yet just when I got my hopes up for a major improvement in real-world IQ testing, he points out that what the C-test measures is too narrow to be called intelligence: there’s a 960 line Perl program that exhibits human-level performance on this kind of test, without resembling a breakthrough in AI.
Continue Reading

Book review: Hive Mind: How your nation’s IQ matters so much more than your own, by Garett Jones.

Hive Mind is a solid and easy to read discussion of why high IQ nations are more successful than low IQ nations.

There’s a pretty clear correlation between national IQ and important results such as income. It’s harder to tell how much of the correlation is caused by IQ differences. The Flynn Effect hints that high IQ could instead be a symptom of increased wealth.

The best evidence for IQ causing wealth (more than being caused by wealth) is that Hong Kong and Taiwan had high IQs back in the 1960s, before becoming rich.

Another piece of similar evidence (which Hive Mind doesn’t point to) is that Saudi Arabia is the most conspicuous case of a country that became wealthy via luck. Its IQ is lower than countries of comparable wealth, and lower than neighbors of similar culture/genes.

Much of the book is devoted to speculations about how IQ could affect a nation’s success.

High IQ is associated with more patience, probably due to better ability to imagine the future:

Imagine two societies: one in which the future feels like a dim shadow, the other in which the future seems a real as now. Which society will have more restaurants that care about repeat customers? Which society will have more politicians who turn down bribes because they worry about eventually getting caught?

Hive Mind describes many possible causes of the Flynn Effect, without expressing much of a preference between them. Flynn’s explanation still seems strongest to me. The most plausible alternative that Hive Mind mentions is anxiety and stress from poverty-related problems distracting people during tests (and possibly also from developing abstract cognitive skills). But anxiety / stress explanations seem less likely to produce the Hong Kong/Taiwan/Saudi Arabia results.

Hive Mind talks about the importance of raising national IQ, especially in less-developed countries. That goal would be feasible if differences in IQ were mainly caused by stress or nutrition. Flynn’s cultural explanation points to causes that are harder for governments or charities to influence (how do you legislate an increased desire to think abstractly?).

What about the genetic differences that contribute to IQ differences? The technology needed to fix that contributing factor to low IQs is not ready today, but looks near enough that we should pay attention. Hive Mind implies [but avoids saying] that potentially large harm from leaving IQ unchanged could outweigh the risks of genetic engineering. Fears about genetic engineering of IQ often involve fears of competition, but Hive Mind shows that higher IQ means more cooperation. More cooperation suggests less war, less risk of dangerous nanotech arms races, etc.

It shouldn’t sound paradoxical to say that aggregate IQ matters more than individual IQ. It should start to seem ordinary if more people follow the example of Hive Mind and focus more attention on group success than on individual success as they relate to IQ.

I’d like to see more discussion of uploaded ape risks.

There is substantial disagreement over how fast an uploaded mind (em) would improve its abilities or the abilities of its progeny. I’d like to start by analyzing a scenario where it takes between one and ten years for an uploaded bonobo to achieve human-level cognitive abilities. This scenario seems plausible, although I’ve selected it more to illustrate a risk that can be mitigated than because of arguments about how likely it is.

I claim we should anticipate at least a 20% chance a human-level bonobo-derived em would improve at least as quickly as a human that uploaded later.

Considerations that weigh in favor of this are: that bonobo minds seem to be about as general-purpose as humans, including near-human language ability; and the likely ease of ems interfacing with other software will enable them to learn new skills faster than biological minds will.

The most concrete evidence that weighs against this is the modest correlation between IQ and brain size. It’s somewhat plausible that it’s hard to usefully add many neurons to an existing mind, and that bonobo brain size represents an important cognitive constraint.

I’m not happy about analyzing what happens when another species develops more powerful cognitive abilities than humans, so I’d prefer to have some humans upload before the bonobos become superhuman.

A few people worry that uploading a mouse brain will generate enough understanding of intelligence to quickly produce human-level AGI. I doubt that biological intelligence is simple / intelligible enough for that to work. So I focus more on small tweaks: the kind of social pressures which caused the Flynn Effect in humans, selective breeding (in the sense of making many copies of the smartest ems, with small changes to some copies), and faster software/hardware.

The risks seem dependent on the environment in which the ems live and on the incentives that might drive their owners to improve em abilities. The most obvious motives for uploading bonobos (research into problems affecting humans, and into human uploading) create only weak incentives to improve the ems. But there are many other possibilities: military use, interesting NPCs, or financial companies looking for interesting patterns in large databases. No single one of those looks especially likely, but with many ways for things to go wrong, the risks add up.

What could cause a long window between bonobo uploading and human uploading? Ethical and legal barriers to human uploading, motivated by risks to the humans being uploaded and by concerns about human ems driving human wages down.

What could we do about this risk?

Political activism may mitigate the risks of hostility to human uploading, but if done carelessly it could create a backlash which worsens the problem.

Conceivably safety regulations could restrict em ownership/use to people with little incentive to improve the ems, but rules that looked promising would still leave me worried about risks such as irresponsible people hacking into computers that run ems and stealing copies.

A more sophisticated approach is to improve the incentives to upload humans. I expect the timing of the first human uploads to be fairly sensitive to whether we have legal rules which enable us to predict who will own em labor. But just writing clear rules isn’t enough – how can we ensure political support for them at a time when we should expect disputes over whether they’re people?

We could also find ways to delay ape uploading. But most ways of doing that would also delay human uploading, which creates tradeoffs that I’m not too happy with (partly due to my desire to upload before aging damages me too much).

If a delay between bonobo and human uploading is dangerous, then we should also ask about dangers from other uploaded species. My intuition says the risks are much lower, since it seems like there are few technical obstacles to uploading a bonobo brain shortly after uploading mice or other small vertebrates.

But I get the impression that many people associated with MIRI worry about risks of uploaded mice, and I don’t have strong evidence that I’m wiser than they are. I encourage people to develop better analyses of this issue.

Book review: The Beginning of Infinity by David Deutsch.

This is an ambitious book centered around the nature of explanation, why it has been an important part of science (misunderstood by many who think of science as merely prediction), and why it is important for the future of the universe.

He provides good insights on jump during the Enlightenment to thinking in universals (e.g. laws of nature that apply to a potentially infinite scope). But he overstates some of its implications. He seems confident that greater-than-human intelligences will view his concept of “universal explainers” as the category that identifies which beings have the rights of people. I find this about as convincing as attempts to find a specific time when a fetus acquires the rights of personhood. I can imagine AIs deciding that humans fail often enough at universalizing their thought to be less than a person, or that they will decide that monkeys are on a trajectory toward the same kind of universality.

He neglects to mention some interesting evidence of the spread of universal thinking – James Flynn’s explanation of the Flynn Effect documents that low IQ cultures don’t use the abstract thought that we sometimes take for granted, and describes IQ increases as an escape from concrete thinking.

Deutsch has a number of interesting complaints about people who attempt science but are confused about the philosophy of science, such as people who imagine that measuring heritability of a trait tells us something important without further inquiry – he notes that being enslaved was heritable in 1860, but that was useless for telling us how to change slavery.

He has interesting explanations for why anthropic arguments, the simulation argument, and the doomsday argument are weaker in a spatially infinite universe. But I was disappointed that he didn’t provide good references for his claim that the universe is infinite – a claim which I gather is controversial and hasn’t gotten as much attention as it deserves.

He sometimes gets carried away with his ambition and seems to forget his rule that explanations should be hard to vary in order to make it hard to fool ourselves.

He focuses on the beauty of flowers in an attempt to convince us that beauty is partially objective. But he doesn’t describe this objective beauty in a way that would make it hard to alter to fit whatever evidence he wants it to fit. I see an obvious alternative explanation for humans finding flowers beautiful – they indicate where fruit will be.

He argues that creativity evolved to help people find better ways of faithfully transmitting knowledge (understanding someone can require creative interpretation of the knowledge that they are imperfectly expressing). That might be true, but I can easily create other explanations that fit the evidence he’s trying to explain, such as that creativity enabled people to make better choices about when to seek a new home.

He imagines that he has a simple way to demonstrate that hunter-gatherer societies could not have lived in a golden age (the lack of growth of their knowledge):

Since static societies cannot exist without effectively extinguishing the growth of knowledge, they cannot allow their members much opportunity to pursue happiness.

But that requires implausible assumptions such as that happiness depends more on the pursuit of knowledge than availability of sex. And it’s not clear that hunter-gatherer societies were stable – they may have been just a few mistakes away from extinction, and accumulating knowledge faster than any previous species had. (I think Deutsch lives in a better society than hunter-gatherers, but it would take a complex argument to show that the average person today does).

But I generally enjoyed his arguments even when I thought they were wrong.

See also the review in the New York Times.

Book review: Create Your Own Economy: The Path to Prosperity in a Disordered World by Tyler Cowen.

This somewhat misleadingly titled book is mainly about the benefits of neurodiversity and how changing technology is changing our styles of thought, and how we ought to improve our styles of thought.

His perspective on these subjects usually reflects a unique way of ordering his thoughts about the world. Few things he says seem particularly profound, but he persistently provides new ways to frame our understanding of the human mind that will sometimes yield better insights than conventional ways of looking at these subjects. Even if you think you know a good deal about autism, he’ll illuminate some problems with your stereotypes of autistics.

Even though it is marketed as an economics book, it only has about one page about financial matters, but that page is an eloquent summary of two factors that are important causes of our recent problems.

He’s an extreme example of an infovore who processes more information than most people can imagine. E.g. “Usually a blog will fail if the blogger doesn’t post … at least every weekday.” His idea of failure must be quite different from mine, as I more often stop reading a blog because it has too many posts than because it goes a few weeks without a post.

One interesting tidbit hints that healthcare costs might be high because telling patients their treatment was expensive may enhance the placebo effect, much like charging more for a given bottle of wine makes it taste better.

The book’s footnotes aren’t as specific as I would like, and sometimes leave me wondering whether he’s engaging in wild speculation or reporting careful research. His conjecture that “self-aware autistics are especially likely to be cosmopolitans in their thinking” sounds like something that results partly from the selection biases that come from knowing more autistics who like economics than autistics who hate economics. I wish he’d indicated whether he found a way to avoid that bias.

This review by Cosma Shalizi of James Flynn’s book What Is Intelligence? provides some interesting criticisms of Flynn (while agreeing with much of what Flynn says).

Shalizi’s most important argument is that Flynn and others who attach a good deal of importance to g haven’t made much of an argument that it measures a single phenomenon.

After a century of IQ testing, there is still no theory which says which questions belongs on an intelligence test, just correlational analyses and tradition.

Flynn and others have good arguments that whatever g measures is important. But Shalizi leaves me with the impression that the only way to decide whether it’s a single phenomenon is to compare its usefulness to models which describe multiple flavors of intelligence. So far those attempts that I’ve looked at seem underwhelming. Maybe that means trying to break down intelligence into components which deserve separate measures isn’t fruitful, but it might also mean that the people who might do a good job of it have been scared away by the political controversies over IQ.

HT Kenny Easwaran.

Book review: What is Intelligence?: Beyond the Flynn Effect by James Flynn
This book may not be the final word on the Flynn Effect, but it makes enough progress in that direction that it is no longer reasonable to describe the Flynn Effect as a mystery. I’m surprised at how much Flynn has changed since the last essay of his I’ve read (a somewhat underwhelming chapter in The Rising Curve (edited by Ulric Neisser)).
Flynn presents evidence of very divergent trends in subsets of IQ tests, and describes a good hypothesis about how that divergence might be explained by increasing cultural pressure for abstract, scientific thought that could create increasing effort to develop certain kinds of cognitive skills that were less important in prior societies.
This helps explain the puzzle of why the Flynn Effect doesn’t imply that 19th century society consisted primarily of retarded people – there has been relatively little change in how people handle concrete problems that constituted the main challenges to average people then. He presents an interesting example of how to observe cognitive differences between modern U.S. society and societies that are very isolated, showing big differences in how they handle some abstract questions.
He also explains why we see very different results for IQ differences over time from what we see when using tests such as twin studies to observe the IQ effects of changes in environment on IQ: the twin studies test unimportant things such as different parenting styles, but don’t test major cultural changes that distinguish the 19th century from today.
None of this suggests that the concept of g is unimportant or refers to something unreal, but a strong focus on g has helped blind some people to the ideas that are needed to understand the Flynn Effect.
Flynn also reports that the rise in IQs is, at least by some measures, fairly uniform across the entire range of IQs (contrary to The Bell Curve’s report that it appeared to affect mainly the low end of the IQ spectrum). This weakens one of the obvious criticisms of David Friedman’s conjecture that modern obstetrics caused the Flynn Effect by reducing the birth related obstacles to large skulls (although if that were the main cause of the Flynn Effect, I’d expect the IQ increase to be largest at the high end of the IQ spectrum).
It also weakens the inference I drew from Fogel’s book on malnutrition. Flynn does little to directly address Fogel’s argument that the benefits of improved nutrition show up with longer delays than most people realize, but he does report some evidence that the Flynn Effect continues even when the height increases that Fogel relies on to measure the benefits of nutrition stop.
Flynn reports that the Flynn Effect has probably stopped in Scandinavia but hasn’t shown signs of stopping in the U.S. His comments on the future of IQ gains are unimpressive.
There are a few disappointing parts of the book near the end where he wanders into political issues where he has relatively little expertise, and his relatively ordinary opinions are no better than a typical academic discussion of politics. In spite of that, the book is fairly short and can be read quickly.
One interesting experiment that Flynn discusses tested whether students preferred one dollar now or two dollars next week. The results were twice as useful in predicting their grades as IQ tests. Flynn infers that this is a test of self control. I presume that is part of what it tests, but I wonder whether it also tests whether the students were able to realize that the testers’ word could be trusted (due to better ability to analyze the relevant incentives? or due to a general willingness to trust strangers because of how the ways they met people selected for trustworthy people?).

Book review: A Farewell to Alms: A Brief Economic History of the World by Gregory Clark
This book provides very interesting descriptions of the Malthusian era, and a surprising explanation of how parts of the world escaped Malthusian conditions starting around 1800. The process involved centuries of wealthier people outreproducing the poor, and passing on traits/culture which were better adapted to modern living. This process almost certainly made some contribution to the industrial revolution, but I can’t find a plausible way to guess the magnitude of the contribution. Clark is not the kind of author I trust to evaluate that magnitude.
His arguments against other explanations of the industrial revolution are unconvincing. His criticisms of institutional explanations imply at most that those explanations are incomplete. But combining those explanations with a normal belief that knowledge/technology matters produces a model against which his criticisms are ineffective. (See Bryan Caplan for more detailed replies about institutional explanations).
He makes interesting claims about how differently we should think about the effects in Malthusian world of phenomena that would be obviously bad today. E.g. he thinks the black plague had good long-term effects. He made me rethink those effects, but he only convinced me that the effects weren’t as bad as commonly believed. His confidence that they were good depends on some unlikely quantitative assumptions about benefits of increased income per capita, and he seems oblivious to the numerous problems with evaluating these assumptions. His comments in the last few pages of the book about how little average happiness has changed over time leads me to doubt that his beliefs are consistent on this subject.
While many parts of the book appear at first glance to be painting a very unpleasant picture of the Malthusian era, he ends up concluding it wasn’t a particularly bad era, and he describes people as being farther from starvation than Robert Fogel indicates in The Escape from Hunger and Premature Death, 1700-2100. Their ability to reach somewhat different conclusions by looking at different sets of evidence implies that there’s more uncertainty than they admit.
He does a neat job of pointing out that economists have often overstated the comparative advantage argument against concerns that labor will be replaced by machines: horses were a clear example of laborers who suffered massive unemployment a century ago when the value of their labor dropped below the cost of their food.

Book Review: Race, Evolution, and Behavior: A Life History Perspective by J. Philippe Rushton
Rushton has a plausible theory that some human populations are more k-selected than others. He presents lots of marginal-quality evidence, but that’s no substitute for what he should be able to show if his theory is true.
Much of the book is devoted to evidence about IQs and brain sizes, but he fails to provide much of an argument for his belief that k-selected humans ought to have higher intelligence. It’s easy to imagine that it might work that way. But I can come up with an alternative based on the sexual selection theory in Geoffrey Miller’s book The Mating Mind that seems about as plausible: r-selected humans have more of their reproductive fitness determined by success at competition for mates (as opposed to k-selected humans for whom child support has a higher contribution to reproductive fitness). Since The Mating Mind presents a strong argument that human intelligence evolved largely due to such competition for mates, it is easy to imagine that r-selected humans had stronger selection for the kind of social intelligence needed to compete for mates. Note that this theory suggests the intelligence of k-selected humans might be easier to measure via standardized tests than that of r-selected humans.
Rushton’s analysis of the genetic aspects of IQ makes the usual mistake of failing to do much to control for the effects of motivation on IQ scores (see pages 249-251 of Judith Rich Harris’s book The Nurture Assumption for evidence that this matters for Rushton’s goals).
He also devotes a good deal of space to evidence such as crime rates where it’s very hard to distinguish genetic from cultural differences, and there’s no reason to think he has succeeded in controlling for culture here.
Rushton mentions a number of other traits that are more directly connected to degree of k-selection and less likely to be culturally biased. It’s disappointing that he provides little evidence of the quality of the data he uses. The twinning data seem most interesting to me, as the high twin rates of the supposedly r-selected population follow quite clearly from his theory, it’s hard to come up with alternative theories that would explain such twinning rates, and the numbers he gives look surprisingly different from random noise. But Rushton says so little about these data that I can’t have much confidence that they come from representative samples of people. (He failed to detect problems with the widely used UN data on African AIDS rates, which have recently been shown to have been strongly biased by poor sampling methods, so it’s easy to imagine that he uses equally flawed data for more obscure differences). (Aside – the book’s index is poor enough that page 214, which is where he lists most of his references for the twinning data, is not listed under the entry for twins/twinning).
Rushton occasionally produces some interesting but irrelevant tidbits, such as that Darwin “affirmed human unity” by ending the debate over whether all humanity had a common origin, or that there’s evidence that “introverts are more punctual, absent less often, and stay longer at a job”.
Edward M. Miller has a theory that is similar to but slightly more convincing than Rushton’s in a paper titled Paternal Provisioning versus Mate Seeking in Human Populations.