Idea Futures

Book review: Superforecasting: The Art and Science of Prediction, by Philip E. Tetlock and Dan Gardner.

This book reports on the Good Judgment Project (GJP).

Much of the book recycles old ideas: 40% of the book is a rerun of Thinking Fast and Slow, 15% of the book repeats Wisdom of Crowds, and 15% of the book rehashes How to Measure Anything. Those three books were good enough that it’s very hard to improve on them. Superforecasting nearly matches their quality, but most people ought to read those three books instead. (Anyone who still wants more after reading them will get decent value out of reading the last 4 or 5 chapters of Superforecasting).

The book’s style is very readable, using an almost Gladwell-like style (a large contrast to Tetlock’s previous, more scholarly book), at a moderate cost in substance. It contains memorable phrases, such as “a fox with the bulging eyes of a dragonfly” (to describe looking at the world through many perspectives).

Continue Reading

Automated market-making software agents have been used in many prediction markets to deal with problems of low liquidity.

The simplest versions provide a fixed amount of liquidity. This either causes excessive liquidity when trading starts, or too little later.

For instance, in the first year that I participated in the Good Judgment Project, the market maker provided enough liquidity that there was lots of money to be made pushing the market maker price from its initial setting in a somewhat obvious direction toward the market consensus. That meant much of the reward provided by the market maker went to low-value information.

The next year, the market maker provided less liquidity, so the prices moved more readily to a crude estimate of the traders’ beliefs. But then there wasn’t enough liquidity for traders to have an incentive to refine that estimate.

One suggested improvement is to have liquidity increase with increasing trading volume.

I present some sample Python code below (inspired by equation 18.44 in E.T. Jaynes’ Probability Theory) which uses the prices at which traders have traded against the market maker to generate probability-like estimates of how likely a price is to reflect the current consensus of traders.

This works more like human market makers, in that it provides the most liquidity near prices where there’s been the most trading. If the market settles near one price, liquidity rises. When the market is not trading near prices of prior trades (due to lack of trading or news that causes a significant price change), liquidity is low and prices can change more easily.

I assume that the possible prices a market maker can trade at are integers from 1 through 99 (percent).

When traders are pushing the price in one direction, this is taken as evidence that increases the weight assigned to the most recent price and all others farther in that direction. When traders reverse the direction, that is taken as evidence that increases the weight of the two most recent trade prices.

The resulting weights (p_px in the code) are fractions which should be multiplied by the maximum number of contracts the market maker is willing to offer when liquidity ought to be highest (one weight for each price at which the market maker might position itself (yes there will actually be two prices; maybe two weight ought to be averaged)).

There is still room for improvement in this approach, such as giving less weight to old trades after the market acts like it has responded to news. But implementers should test simple improvements before worrying about finding the optimal rules.

trades = [(1, 51), (1, 52), (1, 53), (-1, 52), (1, 53), (-1, 52), (1, 53), (-1, 52), (1, 53), (-1, 52),]
p_px = {}
num_agree = {}

probability_list = range(1, 100)
num_probabilities = len(probability_list)

for i in probability_list:
    p_px[i] = 1.0/num_probabilities
    num_agree[i] = 0

num_trades = 0
last_trade = 0
for (buy, price) in trades: # test on a set of made-up trades
    num_trades += 1
    for i in probability_list:
        if last_trade * buy < 0: # change of direction
            if buy < 0 and (i == price or i == price+1):
                num_agree[i] += 1
            if buy > 0 and (i == price or i == price-1):
                num_agree[i] += 1
            if buy < 0 and i <= price:
                num_agree[i] += 1
            if buy > 0 and i >= price:
                num_agree[i] += 1
        p_px[i] = (num_agree[i] + 1.0)/(num_trades + num_probabilities)
    last_trade = buy

for i in probability_list:
    print i, num_agree[i], '%.3f' % p_px[i]

The CFTC is suing Intrade for apparently allowing U.S. residents to trade contracts on gold, unemployment rates and a few others that it had agreed to prevent U.S. residents from trading. The CFTC is apparently not commenting on whether Intrade’s political contracts violate any laws.

U.S. traders will need to close our accounts.

The email I got says

In the near future we’ll announce plans for a new exchange model that will allow legal participation from all jurisdictions – including the US.

(no statement about whether it will involve real money, which suggests that it won’t).

I had already been considering closing my account because of the hassle of figuring out my Intrade income for tax purposes.

Book review: The Signal and the Noise: Why So Many Predictions Fail-but Some Don’t by Nate Silver.

This is a well-written book about the challenges associated with making predictions. But nearly all the ideas in it were ones I was already familiar with.

I agree with nearly everything the book says. But I’ll mention two small disagreements.

He claims that 0 and 100 percent are probabilities. Many Bayesians dispute that. He has a logically consistent interpretation and doesn’t claim it’s ever sane to believe something with probability 0 or 100 percent, so I’m not sure the difference matters, but rejecting the idea that those can represent probabilities seems at least like a simpler way of avoiding mistakes.

When pointing out the weak correlation between calorie consumption and obesity, he says he doesn’t know of an “obesity skeptics” community that would be comparable to the global warming skeptics. In fact there are people (e.g. Dave Asprey) who deny that excess calories cause obesity (with better tests than the global warming skeptics).

It would make sense to read this book instead of alternatives such as Moneyball and Tetlock’s Expert Political Judgment, but if you’ve been reading books in this area already this one won’t seem important.

[See here and here for some context.]

John Salvatier has drawn my attention to a paper describing A Practical Liquidity-Sensitive Automated Market Maker [pdf] which fixes some of the drawbacks of the Automated Market Maker that Robin Hanson proposed.

Most importantly, it provides a good chance that the market maker makes money in roughly the manner that a profit-oriented human market maker would.

It starts out by providing a small amount of liquidity, and increases the amount of liquidity it provides as it profits from providing liquidity. This allows markets to initially make large moves in response to a small amount of trading volume, and then as a trading range develops that reflects agreement among traders, it takes increasingly large amounts of money to move the price.

A disadvantage of following this approach is that it provides little reward to being one of the first traders. If traders need to do a fair amount of research to evaluate the contract being traded, it may be that nobody is willing to inform himself without an expectation that trading volume will become significant. Robin Hanson’s version of the market maker is designed to subsidize this research. If we can predict that several traders will actively trade the contract without a clear-cut subsidy, then the liquidity-sensitive version of the market maker is likely to be appropriate. If we can predict that a subsidy is needed to generate trading activity, then the best approach is likely to be some combination of the two versions. The difficulty of predicting how much subsidy is needed to generate trading volume leaves much uncertainty.

[Updated 2010-07-01:
I’ve reread the paper more carefully in response to John’s question, and I see I was confused by the reference to “a variable b(q) that increases with market volume”. It seems that it is almost unrelated to what I think of as market volume, and is probably better described as related to the market maker’s holdings.

That means that the subsidy is less concentrated on later trading than I originally thought. If the first trader moves the price most of the way to the final price, he gets most of the subsidy. If the first trader is hesitant and wants to see that other traders don’t quickly find information that causes them to bet much against the first trader, then the first trader probably gets a good deal less subsidy under the new algorithm. The latter comes closer to describing how I approach trading on an Intrade contract where I’m the first to place orders.

I also wonder about the paper’s goal of preserving path independence. It seems to provide some mathematical elegance, but I suspect the market maker can do better if it is allowed to make a profit if the market cycles back to a prior state.

Some comments on last weekend’s Foresight Conference:

At lunch on Sunday I was in a group dominated by a discussion between Robin Hanson and Eliezer Yudkowsky over the relative plausibility of new intelligences having a variety of different goal systems versus a single goal system (as in a society of uploads versus Friendly AI). Some of the debate focused on how unified existing minds are, with Eliezer claiming that dogs mostly don’t have conflicting desires in different parts of their minds, and Robin and others claiming such conflicts are common (e.g. when deciding whether to eat food the dog has been told not to eat).

One test Eliezer suggested for the power of systems with a unified goal system is that if Robin were right, bacteria would have outcompeted humans. That got me wondering whether there’s an appropriate criterion by which humans can be said to have outcompeted bacteria. The most obvious criterion on which humans and bacteria are trying to compete is how many copies of their DNA exist. Using biomass as a proxy, bacteria are winning by several orders of magnitude. Another possible criterion is impact on large-scale features of Earth. Humans have not yet done anything that seems as big as the catastrophic changes to the atmosphere (“the oxygen crisis”) produced by bacteria. Am I overlooking other appropriate criteria?

Kartik Gada described two humanitarian innovation prizes that bear some resemblance to a valuable approach to helping the world’s poorest billion people, but will be hard to turn into something with a reasonable chance of success. The Water Liberation Prize would be pretty hard to judge. Suppose I submit a water filter that I claim qualifies for the prize. How will the judges test the drinkability of the water and the reusability of the filter under common third world conditions (which I suspect vary a lot and which probably won’t be adequately duplicated where the judges live)? Will they ship sample devices to a number of third world locations and ask whether it produces water that tastes good, or will they do rigorous tests of water safety? With a hoped for prize of $50,000, I doubt they can afford very good tests. The Personal Manufacturing Prizes seem somewhat more carefully thought out, but need some revision. The “three different materials” criterion is not enough to rule out overly specialized devices without some clear guidelines about which differences are important and which are trivial. Setting specific award dates appears to assume an implausible ability to predict how soon such a device will become feasible. The possibility that some parts of the device are patented is tricky to handle, as it isn’t cheap to verify the absence of crippling patents.

There was a debate on futarchy between Robin Hanson and Mencius Moldbug. Moldbug’s argument seems to boil down to the absence of a guarantee that futarchy will avoid problems related to manipulation/conflicts of interest. It’s unclear whether he thinks his preferred form of government would guarantee any solution to those problems, and he rejects empirical tests that might compare the extent of those problems under the alternative systems. Still, Moldbug concedes enough that it should be possible to incorporate most of the value of futarchy within his preferred form of government without rejecting his views. He wants to limit trading to the equivalent of the government’s stockholders. Accepting that limitation isn’t likely to impair the markets much, and may make futarchy more palatable to people who share Moldbug’s superstitions about markets.

I once proposed using life expectancy as the primary indicator of what society should try to maximize.

Recently there have been reports that life expectancy is negatively correlated with standard measures of economic growth. I accept the conclusion that depressions and recessions are less harmful than is commonly believed, but I want to point out the dangers of looking at only the life expectancy in the same year as an event that influences life expectancy. Depressions may have harmful effects that take a decade to show up in life expectancy figures (e.g. long-term wealth effects, effects on willingness to wage war, etc). So I’d like to see how life expectancy averaged over the ensuing 10 or 15 years correlates with a year’s gdp change.

Book review: The Soulful Science: What Economists Really Do and Why It Matters by Diane Coyle.
This book provides a nice overview of economic theory, with an emphasis on how it has been changing recently. The style is eloquent, but the author is too nerdy to appeal to as wide an audience as she hopes. How many critics of economics will put up with quips such as “my Hamiltonian is bigger than yours!”?

The most thought-provoking part of the book, where she argues that economics has a soul, convinced me she convinced me she’s rather confused about why economics makes people uncomfortable.
One of her few good analogies mentions the similarities between critics of evolution and critics of economics. I wished she had learned more about the motives of her critics from this. Both sciences disturb people because their soulless autistic features destroy cherished illusions.
Evolutionary theory tells us that the world is crueler than we want it to be, and weakens beliefs about humans having something special and immaterial that makes us noble.
Likewise, economics tells us that people aren’t as altruistic as we want them to be, and encourages a mechanistic view of people that interferes with attempts to see mystical virtues in humans.

Some of her defenses of mainstream economics from “post-autistic” criticism deals with archaic uses of the word autistic (abnormal subjectivity, acceptance of fantasy). These disputes seem to be a disorganized mix of good and bad criticisms of mainstream economics that don’t suggest any wholesale rejection of mainstream economics. It’s the uses of autistic that resemble modern medical uses of the term that generate important debates.

She repeats the misleading claim that Malthusian gloom caused Carlyle to call economics the dismal science. This suggests she hasn’t studied critics of economics as well as she thinks. Carlyle’s real reason (defending racism from an assault by economists) shows the benefits of economists’ autistic tendencies. Economists’ mechanistic models and lack of empathy for slaveowners foster a worldview in which having different rules for slaves seemed unnatural (even to economists who viewed slaves as subhuman).

I just happened to run across this thought from an economist describing his autistic child: “his utter inability to comprehend why Jackie Robinson wasn’t welcomed by every major league team”.

She tries to address specific complaints about what economists teach without seeing a broad enough picture to see when those are just symptoms of a broader pattern of discomfort. Hardly anyone criticizes physics courses that teach Newtonian mechanics for their less-accurate-than-Einstein simplifications. When people criticize economics for simplifications in ways that resemble creationists’ complaints about simplifications made in teaching evolution, it seems unwise (and autistic) to avoid modeling deeper reasons that would explain the broad pattern of complaints.
She points to all the effort that economists devote to analyzing empirical data as evidence that economists are in touch with the real world. I’ll bet that analyzing people as numbers confirms critics’ suspicions about how cold and mechanistic economists are.

She seems overconfident about the influence economists have had on monetary and antitrust policies. Anyone familiar with public choice economics would look harder for signs that the agencies in question aren’t following economists’ advice as carefully as they want economists to think.

I’m puzzled by this claim:

The straightforward policy implication [of happiness research] is that to increase national well-being, more people need to have more sex. This doesn’t sound like a reasonable economic policy prescription

She provides no explanation of why we shouldn’t conclude that sex should replace some other leisure activities. It’s not obvious that there are policies which would accomplish this goal, but it sure looks like economists aren’t paying as much attention to this issue as they ought to.

She appears wrong when she claims that it’s reasonable to assume prediction market traders are risk neutral, and that that is sufficient to make prediction market prices reflect probabilities. Anyone interested in this should instead follow her reference to Manski’s discussion and see the response by Justin Wolfers and Eric Zitzewitz.

A number of people have compared the final forecasts for the election (e.g. this), but I’m more interested in longer term forecasting, so I’m comparing the state-by-state predictions of Intrade and FiveThirtyEight on the dates for which I saved FiveThirtyEight data a month or more before the election.

Here is a table of states where Intrade disagreed with FiveThirtyEight on one of the first four dates for which I saved FiveThirtyEight data or where they were both wrong on July 24. The numbers are probability of a Democrat winning the state’s electoral votes, with the Intrade forecast first and the FiveThirtyEight forecast second.

State 2008-07-24 2008-08-22 2008-09-14 2008-10-01
CO 71/68 60/53 54.5/46 67.5/84
FL 42/29 34.5/28 30/14 55.2/70
IN 38/26 34.1/15 20/11 38/51
MO 50/26 32.9/13 22.1/11 42.5/48
NC 30/22 25/21 14/7 51/50
NV 51.2/49 49/45 44.9/32 55/66
OH 65/53 50/38 40/29 53.5/68
VA 60.5/50 52.3/36 42/22 59/79

On July 24, both sites predicted Florida, Indiana, and North Carolina wrong. FiveThirtyEight got Indiana right on Oct 1 when Intrade was still wrong, but Intrade got North Carolina right on that date (just barely) while FiveThirtyEight rated it a toss-up.
Intrade got Nevada right on July 24 (just barely) while FiveThirtyEight got it wrong (just barely).
For Virginia, Intrade was right in July and August while FiveThirtyEight was undecided and then wrong.
FiveThirtyEight got Colorado wrong on September 14, but Intrade didn’t.
FiveThirtyEight got Ohio wrong on August 22, while Intrade got it right.
Intrade rated Missouri a toss-up on July 24, while FiveThirtyEight got it right.

On September 14, FiveThirtyEight was fooled by McCain’s post convention bounce by a larger margin than Intrade, but by Oct 1 FiveThirtyEight was more confident about correcting those errors.
For states that were not closely contested, there were numerous examples where Intrade prices where closer to 50 than FiveThirtyEight. It’s likely that this represents long-shot bias on Intrade.

In sum, Intrade made slightly better forecasts for the closely contested states through at least mid September, but after that FiveThirtyEight was at least as good and more decisive. Except for Intrade’s Missouri forecast on July 24, the errors seem largely due to underestimating the effects of economic problems – errors which were also widespread in most forecasts for other things affected by the recession.

In the senate races, I didn’t save FiveThirtyEight forecasts from before November 1. It looks like both Intrade and FiveThirtyEight made similar errors on the Alaska and Minnesota races.
[Update on 2009-01-13: contrary to initial reports, they apparently got the Alaska and Minnesota races right, although there’s still some doubt about Minnesota.]

On the other hand, Intrade had been fairly consistently (but not confidently) saying since early July that California’s Proposition 8 (banning same-sex marriage) would be defeated. Pollsters as a group did a somewhat better job there by issuing conflicting reports.