Book review: The Measure of All Minds: Evaluating Natural and Artificial Intelligence, by José Hernández-Orallo.
Much of this book consists of surveys of the psychometric literature. But the best parts of the book involve original results that bring more rigor and generality to the field. The best parts of the book approach the quality that I saw in Judea Pearl’s Causality, and E.T. Jaynes’ Probability Theory, but Measure of All Minds achieves a smaller fraction of its author’s ambitions, and is sometimes poorly focused.
Hernández-Orallo has an impressive ambition: measure intelligence for any agent. The book mentions a wide variety of agents, such as normal humans, infants, deaf-blind humans, human teams, dogs, bacteria, Q-learning algorithms, etc.
The book is aimed at a narrow and fairly unusual target audience. Much of it reads like it’s directed at psychology researchers, but the more original parts of the book require thinking like a mathematician.
The survey part seems pretty comprehensive, but I wasn’t satisfied with his ability to distinguish the valuable parts (although he did a good job of ignoring the politicized rants that plague many discussions of this subject).
For nearly the first 200 pages of the book, I was mostly wondering whether the book would address anything important enough for me to want to read to the end. Then I reached an impressive part: a description of an objective IQ-like measure. Hernández-Orallo offers a test (called the C-test) which:
- measures a well-defined concept: sequential inductive inference,
- defines the correct responses using an objective rule (based on Kolmogorov complexity),
- with essentially no arbitrary cultural bias (the main feature that looks like an arbitrary cultural bias is the choice of alphabet and its order),
- and gives results in objective units (based on Levin’s Kt).
Yet just when I got my hopes up for a major improvement in real-world IQ testing, he points out that what the C-test measures is too narrow to be called intelligence: there’s a 960 line Perl program that exhibits human-level performance on this kind of test, without resembling a breakthrough in AI.
Book review: Superforecasting: The Art and Science of Prediction, by Philip E. Tetlock and Dan Gardner.
This book reports on the Good Judgment Project (GJP).
Much of the book recycles old ideas: 40% of the book is a rerun of Thinking Fast and Slow, 15% of the book repeats Wisdom of Crowds, and 15% of the book rehashes How to Measure Anything. Those three books were good enough that it’s very hard to improve on them. Superforecasting nearly matches their quality, but most people ought to read those three books instead. (Anyone who still wants more after reading them will get decent value out of reading the last 4 or 5 chapters of Superforecasting).
The book’s style is very readable, using an almost Gladwell-like style (a large contrast to Tetlock’s previous, more scholarly book), at a moderate cost in substance. It contains memorable phrases, such as “a fox with the bulging eyes of a dragonfly” (to describe looking at the world through many perspectives).
Book review: Notes on a New Philosophy of Empirical Science (Draft Version), by Daniel Burfoot.
Standard views of science focus on comparing theories by finding examples where they make differing predictions, and rejecting the theory that made worse predictions.
Burfoot describes a better view of science, called the Compression Rate Method (CRM), which replaces the “make prediction” step with “make a compression program”, and compares theories by how much they compress a standard (large) database.
These views of science produce mostly equivalent results(!), but CRM provides a better perspective.
Machine Learning (ML) is potentially science, and this book focuses on how ML will be improved by viewing its problems through the lens of CRM. Burfoot complains about the toolkit mentality of traditional ML research, arguing that the CRM approach will turn ML into an empirical science.
This should generate a Kuhnian paradigm shift in ML, with more objective measures of the research quality than any branch of science has achieved so far.
Burfoot focuses on compression as encoding empirical knowledge of specific databases / domains. He rejects the standard goal of a general-purpose compression tool. Instead, he proposes creating compression algorithms that are specialized for each type of database, to reflect what we know about topics (such as images of cars) that are important to us.
Book review: The Human Advantage: A New Understanding of How Our Brain Became Remarkable, by Suzana Herculano-Houzel.
I used to be uneasy about claims that the human brain was special because it is large for our body size: relative size just didn’t seem like it could be the best measure of whatever enabled intelligence.
At last, Herculano-Houzel has invented a replacement for that measure. Her impressive technique for measuring the number of neurons in a brain has revolutionized this area of science.
We can now see an important connection between the number of cortical neurons and cognitive ability. I’m glad that the book reports on research that compares the cognitive abilities of enough species to enable moderately objective tests of the relevant hypotheses (although the research still has much room for improvement).
We can also see that the primate brain is special, in a way that enables large primates to be smarter than similarly sized nonprimates. And that humans are not very special for a primate of our size, although energy constraints make it tricky for primates to reach our size.
I was able to read the book quite quickly. Much of it is arranged in an occasionally suspenseful story about how the research was done. It doesn’t have lots of information, but the information it does have seems very new (except for the last two chapters, where Herculano-Houzel gets farther from her area of expertise).
Wikipedia has a List of animals by number of neurons which lists the long-finned pilot whale as having 37.2 billion cortical neurons, versus 21 billion for humans.
The paper reporting that result disagrees somewhat with Herculano-Houzel:
Our results underscore that correlations between cognitive performance and absolute neocortical neuron numbers across animal orders or classes are of limited value, and attempts to quantify the mental capacity of a dolphin for cross-species comparisons are bound to be controversial.
But I don’t see much of an argument against the correlation between intelligence and cortical neuron numbers. The lack of good evidence about long-finned pilot whale intelligence mainly implies we ought to be uncertain.
The Quantified Self 2013 Global Conference attracted many interesting people.
There were lots of new devices to measure the usual things more easily or to integrate multiple kinds of data.
Airo is an ambitious attempt to detect a wide variety of things, including food via sensing metabolites.
TellSpec plans to detect food nutrients and allergens through Raman spectroscopy.
OMsignal has a t-shirt with embedded sensors.
The M1nd should enable users to find more connections and spurious correlations between electromagnetic fields and health.
Ios is becoming a more important platform for trendy tools. As an Android user who wants to stick to devices with a large screen and traditional keyboard, I feel a bit left out.
The Human Locomotome Project is an ambitious attempt to produce an accurate and easy to measure biomarker of aging, using accelerometer data from devices such as FitBit. They’re measuring something that was previously not well measured, but there doesn’t appear to be any easy way to tell whether that information is valuable.
The hug brigade that was at last year’s conference (led by Paul Grasshoff?) was missing this year.
Attempts to attract a critical mass to the QS Forum seem to be having little effect.
Book review: How to Measure Anything, by Douglas Hubbard.
I procrastinated about reading this book because it appeared to be only relevant to a narrow type of business problem. But it is much more ambitious, and aims to convince us that anything that matters can be measured. It should be a good antidote to people who give up on measuring important values on grounds such as it’s too hard or too subjective (i.e. it teaches people to do Fermi estimates).
A key part of this is to use a sensible definition of the word measurement:
A quantitatively expressed reduction of uncertainty based on one or more observations
He urges us to focus on figuring out what observations are most valuable, because there are large variations in the value of different pieces of information. If we focus on valuable observations, the first few observations are much more valuable than subsequent ones.
He emphasizes the importance of calibration training which, in addition to combating overconfidence, makes it hard for people to claim they don’t know how to assign numbers to possible observations.
He succeeds in convincing me that anything that matters to a business can be measured. There are a few goals for which his approach doesn’t seem useful (e.g. going to heaven), but they’re rarer than our intuition tells us. Even vague-sounding concepts such as customer satisfaction can either be observed (possible with large errors) via customer behavior or surveys, or they don’t matter.
It will help me avoid the temptation of making Quantified-Self types measurements to show off how good I am at quantifying things, and focus instead on being proud to get valuable information out of a minimal number of observations.
The recent Quantified Self conference was my first QS event, and was one of the best conferences I’ve attended.
I had been hesitant to attend QS events because they seem to attract large crowds, where I usually find it harder to be social. But this conference was arranged so that there was no real center where crowds gathered, so people spread out into smaller groups where I found it easier to join a conversation.
Kevin Kelly called this “The Measured Century”. People still underestimate how much improved measurement contributed to the industrial revolution. If we’re seeing a much larger improvement in measurement, people will likely underestimate the importance of that for quite a while.
The conference had many more ideas than I had time to hear, and I still need to evaluate many of he ideas I did hear. Here are a few:
I finally got around to looking at DIYgenomics, and have signed up for their empathy study (not too impressive so far) and their microbiome study (probiotics) which is waiting for more people before starting.
LUMOback looks like it will be an easy way to improve my posture. The initial version will require a device I don’t have, but it sounds like they’ll have an Android version sometime next year.
Steve Fowkes’ talk about urine pH testing sounds worth trying out.
The most interesting talk at the Singularity Summit 2010 was Shane Legg‘s description of an Algorithmic Intelligence Quotient (AIQ) test that measures something intelligence-like automatically in a way that can test AI programs (or at least the Monte-Carlo AIXI that he uses) on 1000+ environments.
He had a mathematical formula which he thinks rigorously defines intelligence. But he didn’t specify what he meant by the set of possible environments, saying that would be a 50 page paper (he said a good deal of the work on the test had been done last week, so presumably he’s still working on the project). He also included a term that applies Occam’s razor which I didn’t completely understand, but it seems likely that that should have a fairly non-controversial effect.
The environments sound like they imitate individual questions on an IQ test, but with a much wider range of difficulties. We need a more complete description of the set of environments he uses in order to evaluate whether they’re heavily biased toward what Monte-Carlo AIXI does well or whether they closely resemble the environments an AI will find in the real world. He described two reasons for having some confidence in his set of environments: different subsets provided roughly similar results, and a human taking a small subset of the test found some environments easy, some very challenging, and some too hard to understand.
It sounds like with a few more months worth of effort, he could generate a series of results that show a trend in the AIQ of the best AI program in any given year, and also the AIQ of some smart humans (although he implied it would take a long time for a human to complete a test). That would give us some idea of whether AI workers have been making steady progress, and if so when the trend is likely to cross human AIQ levels. An educated guess about when AI will have a major impact on the world should help a bit in preparing for it.
A more disturbing possibility is that this test will be used as a fitness function for genetic programming. Given sufficient computing power, that looks likely to generate superhuman intelligence that is almost certainly unfriendly to humans. I’m confident that sufficient computing power is not available yet, but my confidence will decline over time.
Brian Wang has a few more notes on this talk
Book Review: Freakonomics : A Rogue Economist Explores the Hidden Side of Everything by Steven D. Levitt
This book does a pretty good job of tackling subjects that are worth thinking about but which few would think to tackle. Their answers are interesting but not always as rigorous as I hoped.
The implication that this is an economics book is a bit misleading. While it is occasionally guided by the principle that incentives matter, it is at least as much about Kelvinism (the belief that we ought to quantify knowledge whenever possible), but then some of the book consists of stories which have little to do with either.
My favorite parts of the book explore the extent to which experts’ incentives cause them to pursue goals that don’t coincide with their clients’ interests. But his arguments about realtors exaggerate the extent of their conflict of interest – it is likely that part of the reason they are quick to sell a client’s house more cheaply than they would sell their own is that the realtor is less likely to need to sell by a deadline.
I am left puzzled by the claim that crack gang leaders want to avoid gang wars, but gang members are rewarded for starting violence with promotions. Who’s controlling the rewards if it isn’t the leader? Why can’t he ensure that members who engage in nondefensive violence aren’t promoted?