Artificial Intelligence

Book review: Superintelligence: Paths, Dangers, Strategies, by Nick Bostrom.

This book is substantially more thoughtful than previous books on AGI risk, and substantially better organized than the previous thoughtful writings on the subject.

Bostrom’s discussion of AGI takeoff speed is disappointingly philosophical. Many sources (most recently CFAR) have told me to rely on the outside view to forecast how long something will take. We’ve got lots of weak evidence about the nature of intelligence, how it evolved, and about how various kinds of software improve, providing data for an outside view. Bostrom assigns a vague but implausibly high probability to AI going from human-equivalent to more powerful than humanity as a whole in days, with little thought of this kind of empirical check.

I’ll discuss this more in a separate post which is more about the general AI foom debate than about this book.

Bostrom’s discussion of how takeoff speed influences the chance of a winner-take-all scenario makes it clear that disagreements over takeoff speed are pretty much the only cause of my disagreement with him over the likelihood of a winner-take-all outcome. Other writers aren’t this clear about this. I suspect those who assign substantial probability to a winner-take-all outcome if takeoff is slow will wish he’d analyzed this in more detail.

I’m less optimistic than Bostrom about monitoring AGI progress. He says “it would not be too difficult to identify most capable individuals with a long-standing interest in [AGI] research”. AGI might require enough expertise for that to be true, but if AGI surprises me by only needing modest new insights, I’m concerned by the precedent of Tim Berners-Lee creating a global hypertext system while barely being noticed by the “leading” researchers in that field. Also, the large number of people who mistakenly think they’ve been making progress on AGI may obscure the competent ones.

He seems confused about the long-term trends in AI researcher beliefs about the risks: “The pioneers of artificial intelligence … mostly did not contemplate the possibility of greater-than-human AI” seems implausible; it’s much more likely they expected it but were either overconfident about it producing good results or fatalistic about preventing bad results (“If we’re lucky, they might decide to keep us as pets” – Marvin Minsky, LIFE Nov 20, 1970).

The best parts of the book clarify many issues related to ensuring that an AGI does what we want.

He catalogs more approaches to controlling AGI than I had previously considered, including tripwires, oracles, and genies, and clearly explains many limits to what they can accomplish.

He briefly mentions the risk that the operator of an oracle AI would misuse it for her personal advantage. Why should we have less concern about the designers of other types of AGI giving them goals that favor the designers?

If an oracle AI can’t produce a result that humans can analyze well enough to decide (without trusting the AI) that it’s safe, why would we expect other approaches (e.g. humans writing the equivalent seed AI directly) to be more feasible?

He covers a wide range of ways we can imagine handling AI goals, including strange ideas such as telling an AGI to use the motivations of superintelligences created by other civilizations

He does a very good job of discussing what values we should and shouldn’t install in an AGI: the best decision theory plus a “do what I mean” dynamic, but not a complete morality.

I’m somewhat concerned by his use of “final goal” without careful explanation. People who anthropomorphise goals are likely to confuse at least the first few references to “final goal” as if it worked like a human goal, i.e. something that the AI might want to modify if it conflicted with other goals.

It’s not clear how much of these chapters depend on a winner-take-all scenario. I get the impression that Bostrom doubts we can do much about the risks associated with scenarios where multiple AGIs become superhuman. This seems strange to me. I want people who write about AGI risks to devote more attention to whether we can influence whether multiple AGIs become a singleton, and how they treat lesser intelligences. Designing AGI to reflect values we want seems almost as desirable in scenarios with multiple AGIs as in the winner-take-all scenario (I’m unsure what Bostrom thinks about that). In a world with many AGIs with unfriendly values, what can humans do to bargain for a habitable niche?

He has a chapter on worlds dominated by whole brain emulations (WBE), probably inspired by Robin Hanson’s writings but with more focus on evaluating risks than on predicting the most probable outcomes. Since it looks like we should still expect an em-dominated world to be replaced at some point by AGI(s) that are designed more cleanly and able to self-improve faster, this isn’t really an alternative to the scenarios discussed in the rest of the book.

He treats starting with “familiar and human-like motivations” (in an augmentation route) as an advantage. Judging from our experience with humans who take over large countries, a human-derived intelligence that conquered the world wouldn’t be safe or friendly, although it would be closer to my goals than a smiley-face maximizer. The main advantage I see in a human-derived superintelligence would be a lower risk of it self-improving fast enough for the frontrunner advantage to be large. But that also means it’s more likely to be eclipsed by a design more amenable to self-improvement.

I’m suspicious of the implication (figure 13) that the risks of WBE will be comparable to AGI risks.

  • Is that mainly due to “neuromorphic AI” risks? Bostrom’s description of neuromorphic AI is vague, but my intuition is that human intelligence isn’t flexible enough to easily get the intelligence part of WBE without getting something moderately close to human behavior.
  • Is the risk of uploaded chimp(s) important? I have some concerns there, but Bostrom doesn’t mention it.
  • How about the risks of competitive pressures driving out human traits (discussed more fully/verbosely at Slate Star Codex)? If WBE and AGI happen close enough together in time that we can plausibly influence which comes first, I don’t expect the time between the two to be long enough for that competition to have large effects.
  • The risk that many humans won’t have enough resources to survive? That’s scary, but wouldn’t cause the astronomical waste of extinction.

Also, I don’t accept his assertion that AGI before WBE eliminates the risks of WBE. Some scenarios with multiple independently designed AGIs forming a weakly coordinated singleton (which I consider more likely than Bostrom does) appear to leave the last two risks in that list unresolved.

This books represents progress toward clear thinking about AGI risks, but much more work still needs to be done.

Book review: Our Mathematical Universe: My Quest for the Ultimate Nature of Reality, by Max Tegmark.

His most important claim is the radical Platonist view that all well-defined mathematical structures exist, therefore most physics is the study of which of those we inhabit. His arguments are more tempting than any others I’ve seen for this view, but I’m left with plenty of doubt.

He points to ways that we can imagine this hypothesis being testable, such as via the fine-tuning of fundamental constants. But he doesn’t provide a good reason to think that those tests will distinguish his hypothesis from other popular approaches, as it’s easy to imagine that we’ll never find situations where they make different predictions.

The most valuable parts of the book involve the claim that the multiverse is spatially infinite. He mostly talks as if that’s likely to be true, but his explanations caused me to lower my probability estimate for that claim.

He gets that infinity by claiming that inflation continues in places for infinite time, and then claiming there are reference frames for which that infinite time is located in a spatial rather than a time direction. I have a vague intuition why that second step might be right (but I’m fairly sure he left something important out of the explanation).

For the infinite time part, I’m stuck with relying on argument from authority, without much evidence that the relevant authorities have much confidence in the claim.

Toward the end of the book he mentions reasons to doubt infinities in physics theories – it’s easy to find examples where we model substances such as air as infinitely divisible, when we know that at some levels of detail atomic theory is more accurate. The eternal inflation theory depends on an infinitely expandable space which we can easily imagine is only an approximation. Plus, when physicists explicitly ask whether the universe will last forever, they don’t seem very confident. I’m also tempted to say that the measure problem (i.e. the absence of a way to say some events are more likely than others if they all happen an infinite number of times) is a reason to doubt infinities, but I don’t have much confidence that reality obeys my desire for it to be comprehensible.

I’m disappointed by his claim that we can get good evidence that we’re not Boltzmann brains. He wants us to test our memories, because if I am a Boltzmann brain I’ll probably have a bunch of absurd memories. But suppose I remember having done that test in the past few minutes. The Boltzmann brain hypothesis suggests it’s much more likely for me to have randomly acquired the memory of having passed the test than for me to actually be have done the test. Maybe there’s a way to turn Tegmark’s argument into something rigorous, but it isn’t obvious.

He gives a surprising argument that the differences between the Everett and Copenhagen interpretations of quantum mechanics don’t matter much, because unrelated reasons involving multiverses lead us to expect results comparable to the Everett interpretation even if the Copenhagen interpretation is correct.

It’s a bit hard to figure out what the book’s target audience is – he hides the few equations he uses in footnotes to make it look easy for laymen to follow, but he also discusses hard concepts such as universes with more than one time dimension with little attempt to prepare laymen for them.

The first few chapters are intended for readers with little knowledge of physics. One theme is a historical trend which he mostly describes as expanding our estimate of how big reality is. But the evidence he provides only tells us that the lower bounds that people give keep increasing. Looking at the upper bound (typically infinity) makes that trend look less interesting.

The book has many interesting digressions such as a description of how to build Douglas Adams’ infinite improbability drive.

Book review: Our Final Invention: Artificial Intelligence and the End of the Human Era by James Barrat.

This book describes the risks that artificial general intelligence will cause human extinction, presenting the ideas propounded by Eliezer Yudkowsky in a slightly more organized but less rigorous style than Eliezer has.

Barrat is insufficiently curious about why many people who claim to be AI experts disagree, so he’ll do little to change the minds of people who already have opinions on the subject.

He dismisses critics as unable or unwilling to think clearly about the arguments. My experience suggests that while it’s normally the case that there’s an argument that any one critic hasn’t paid much attention to, that’s often because they’ve rejected with some thought some other step in Eliezer’s reasoning and concluded that the step they’re ignoring wouldn’t influence their conclusions.

The weakest claim in the book is that an AGI might become superintelligent in hours. A large fraction of people who have worked on AGI (e.g. Eric Baum’s What is Thought?) dismiss this as too improbable to be worth much attention, and Barrat doesn’t offer them any reason to reconsider. The rapid takeoff scenarios influence how plausible it is that the first AGI will take over the world. Barrat seems only interested in talking to readers who can be convinced we’re almost certainly doomed if we don’t build the first AGI right. Why not also pay some attention to the more complex situation where an AGI takes years to become superhuman? Should people who think there’s a 1% chance of the first AGI conquering the world worry about that risk?

Some people don’t approve of trying to build an immutable utility function into an AGI, often pointing to changes in human goals without clearly analyzing whether those are subgoals that are being altered to achieve a stable supergoal/utility function. Barrat mentions one such person, but does little to analyze this disagreement.

Would an AGI that has been designed without careful attention to safety blindly follow a narrow interpretation of its programmed goal(s), or would it (after achieving superintelligence) figure out and follow the intentions of its authors? People seem to jump to whatever conclusion supports their attitude toward AGI risk without much analysis of why others disagree, and Barrat follows that pattern.

I can imagine either possibility. If the easiest way to encode a goal system in an AGI is something like “output chess moves which according to the rules of chess will result in checkmate” (turning the planet into computronium might help satisfy that goal).

An apparently harder approach would have the AGI consult a human arbiter to figure out whether it wins the chess game – “human arbiter” isn’t easy to encode in typical software. But AGI wouldn’t be typical software. It’s not obviously wrong to believe that software smart enough to take over the world would be smart enough to handle hard concepts like that. I’d like to see someone pin down people who think this is the obvious result and get them to explain how they imagine the AGI handling the goal before it reaches human-level intelligence.

He mentions some past events that might provide analogies for how AGI will interact with us, but I’m disappointed by how little thought he puts into this.

His examples of contact between technologically advanced beings and less advanced ones all refer to Europeans contacting Native Americans. I’d like to have seen a wider variety of analogies, e.g.:

  • Japan’s contact with the west after centuries of isolation
  • the interaction between neanderthals and humans
  • the contact that resulted in mitochondria becoming part of our cells

He quotes Vinge saying an AGI ‘would not be humankind’s “tool” – any more than humans are the tools of rabbits or robins or chimpanzees.’ I’d say that humans are sometimes the tools of human DNA, which raises more complex questions of how well the DNA’s interests are served.

The book contains many questionable digressions which seem to be designed to entertain.

He claims Google must have an AGI project in spite of denials by Google’s Peter Norvig (this was before it bought DeepMind). But the evidence he uses to back up this claim is that Google thinks something like AGI would be desirable. The obvious conclusion would be that Google did not then think it had the skill to usefully work on AGI, which would be a sensible position given the history of AGI.

He thinks there’s something paradoxical about Eliezer Yudkowsky wanting to keep some information about himself private while putting lots of personal information on the web. The specific examples Barrat gives strongly suggests that Eliezer doesn’t value the standard notion of privacy, but wants to limit peoples’ ability to distract him. Barrat also says Eliezer “gave up reading for fun several years ago”, which will surprise those who see him frequently mention works of fiction in his Author’s Notes.

All this makes me wonder who the book’s target audience is. It seems to be someone less sophisticated than a person who could write an AGI.

Book review: Singularity Hypotheses: A Scientific and Philosophical Assessment.

This book contains papers of widely varying quality on superhuman intelligence, plus some fairly good discussions of what ethics we might hope to build into an AGI. Several chapters resemble cautious versions of LessWrong, others come from a worldview totally foreign to LessWrong.

The chapter I found most interesting was Richard Loosemore and Ben Goertzel’s attempt to show there are no likely obstacles to a rapid “intelligence explosion”.

I expect what they label as the “inherent slowness of experiments and environmental interaction” to be an important factor limiting the rate at which an AGI can become more powerful. They think they see evidence from current science that this is an unimportant obstacle compared to a shortage of intelligent researchers: “companies complain that research staff are expensive and in short supply; they do not complain that nature is just too slow.”

Some explanations that come to mind are:

  • Complaints about nature being slow are not very effective at speeding up nature.
  • Complaints about specific tools being slow probably aren’t very unusual, but there are plenty of cases where people know complaints aren’t effective (e.g. complaints about spacecraft traveling slower than the theoretical maximum [*]).
  • Hiring more researchers can increase the status of a company even if the additional staff don’t advance knowledge.

They also find it hard to believe that we have independently reached the limit of the physical rate at which experiments can be done at the same time we’ve reached the limits of how many intelligent researchers we can hire. For literal meanings of physical limits this makes sense, but if it’s as hard to speed up experiments as it is to throw more intelligence into research, then the apparent coincidence could be due to wise allocation of resources to whichever bottleneck they’re better used in.

None of this suggests that it would be hard for an intelligence explosion to produce the 1000x increase in intelligence they talk about over a century, but it seems like an important obstacle to the faster time periods some people believe (days or weeks).

Some shorter comments on other chapters:

James Miller describes some disturbing incentives that investors would create for companies developing AGI if AGI is developed by companies large enough that no single investor has much influence on the company. I’m not too concerned about this because if AGI were developed by such a company, I doubt that small investors would have enough awareness of the project to influence it. The company might not publicize the project, or might not be honest about it. Investors might not believe accurate reports if they got them, since the reports won’t sound much different from projects that have gone nowhere. It seems very rare for small investors to understand any new software project well enough to distinguish between an AGI that goes foom and one that merely makes some people rich.

David Pearce expects the singularity to come from biological enhancements, because computers don’t have human qualia. He expects it would be intractable for computers to analyze qualia. It’s unclear to me whether this is supposed to limit AGI power because it would be hard for AGI to predict human actions well enough, or because the lack of qualia would prevent an AGI from caring about its goals.

Itamar Arel believes AGI is likely to be dangerous, and suggests dealing with the danger by limiting the AGI’s resources (without saying how it can be prevented from outsourcing its thought to other systems), and by “educational programs that will help mitigate the inevitable fear humans will have” (if the dangers are real, why is less fear desirable?).

* No, that example isn’t very relevant to AGI. Better examples would be atomic force microscopes, or the stock market (where it can take a generation to get a new test of an important pattern), but it would take lots of effort to convince you of that.

Book review: The Beginning of Infinity by David Deutsch.

This is an ambitious book centered around the nature of explanation, why it has been an important part of science (misunderstood by many who think of science as merely prediction), and why it is important for the future of the universe.

He provides good insights on jump during the Enlightenment to thinking in universals (e.g. laws of nature that apply to a potentially infinite scope). But he overstates some of its implications. He seems confident that greater-than-human intelligences will view his concept of “universal explainers” as the category that identifies which beings have the rights of people. I find this about as convincing as attempts to find a specific time when a fetus acquires the rights of personhood. I can imagine AIs deciding that humans fail often enough at universalizing their thought to be less than a person, or that they will decide that monkeys are on a trajectory toward the same kind of universality.

He neglects to mention some interesting evidence of the spread of universal thinking – James Flynn’s explanation of the Flynn Effect documents that low IQ cultures don’t use the abstract thought that we sometimes take for granted, and describes IQ increases as an escape from concrete thinking.

Deutsch has a number of interesting complaints about people who attempt science but are confused about the philosophy of science, such as people who imagine that measuring heritability of a trait tells us something important without further inquiry – he notes that being enslaved was heritable in 1860, but that was useless for telling us how to change slavery.

He has interesting explanations for why anthropic arguments, the simulation argument, and the doomsday argument are weaker in a spatially infinite universe. But I was disappointed that he didn’t provide good references for his claim that the universe is infinite – a claim which I gather is controversial and hasn’t gotten as much attention as it deserves.

He sometimes gets carried away with his ambition and seems to forget his rule that explanations should be hard to vary in order to make it hard to fool ourselves.

He focuses on the beauty of flowers in an attempt to convince us that beauty is partially objective. But he doesn’t describe this objective beauty in a way that would make it hard to alter to fit whatever evidence he wants it to fit. I see an obvious alternative explanation for humans finding flowers beautiful – they indicate where fruit will be.

He argues that creativity evolved to help people find better ways of faithfully transmitting knowledge (understanding someone can require creative interpretation of the knowledge that they are imperfectly expressing). That might be true, but I can easily create other explanations that fit the evidence he’s trying to explain, such as that creativity enabled people to make better choices about when to seek a new home.

He imagines that he has a simple way to demonstrate that hunter-gatherer societies could not have lived in a golden age (the lack of growth of their knowledge):

Since static societies cannot exist without effectively extinguishing the growth of knowledge, they cannot allow their members much opportunity to pursue happiness.

But that requires implausible assumptions such as that happiness depends more on the pursuit of knowledge than availability of sex. And it’s not clear that hunter-gatherer societies were stable – they may have been just a few mistakes away from extinction, and accumulating knowledge faster than any previous species had. (I think Deutsch lives in a better society than hunter-gatherers, but it would take a complex argument to show that the average person today does).

But I generally enjoyed his arguments even when I thought they were wrong.

See also the review in the New York Times.

Book review: Wired for War: The Robotics Revolution and Conflict in the 21st Century, by P. W. Singer.

This book covers a wide range of topics related to robotics and war. The author put a good deal of thought into what topics we ought to pay attention to, but provides few answers that will tell us how to avoid problems. The style is entertaining. That doesn’t necessarily interfere with the substance, but I have some suspicions that the style influenced the author to be a bit more superficial than he ought to be.

I’m disappointed by his three-paragraph treatment of EMP risks. He understands that EMPs could cause major problems, but he failed to find any of the ideas people have about mitigating the risk.

With some lesser-known risks, the attention he provides may be helpful at reducing the danger. For instance, he identifies overconfidence as an important cause of war, and points out that the hype often created by designers of futuristic devices such as robots can cause leaders to overestimate their military value. This ought to be repeated widely enough that leaders will be aware of the danger.

He expresses some interesting concerns about how unmanned vehicles blur the lines between soldiers in battle and innocent civilians. Is a civilian technician who is actively working on an autonomous vehicle that is about to engage in hostile action against an enemy an ‘illegal combatant’? Does a pilot walking to work in Nevada to pilot a drone that will drop bombs in Afghanistan a military target?

The most interesting talk at the Singularity Summit 2010 was Shane Legg‘s description of an Algorithmic Intelligence Quotient (AIQ) test that measures something intelligence-like automatically in a way that can test AI programs (or at least the Monte-Carlo AIXI that he uses) on 1000+ environments.

He had a mathematical formula which he thinks rigorously defines intelligence. But he didn’t specify what he meant by the set of possible environments, saying that would be a 50 page paper (he said a good deal of the work on the test had been done last week, so presumably he’s still working on the project). He also included a term that applies Occam’s razor which I didn’t completely understand, but it seems likely that that should have a fairly non-controversial effect.

The environments sound like they imitate individual questions on an IQ test, but with a much wider range of difficulties. We need a more complete description of the set of environments he uses in order to evaluate whether they’re heavily biased toward what Monte-Carlo AIXI does well or whether they closely resemble the environments an AI will find in the real world. He described two reasons for having some confidence in his set of environments: different subsets provided roughly similar results, and a human taking a small subset of the test found some environments easy, some very challenging, and some too hard to understand.

It sounds like with a few more months worth of effort, he could generate a series of results that show a trend in the AIQ of the best AI program in any given year, and also the AIQ of some smart humans (although he implied it would take a long time for a human to complete a test). That would give us some idea of whether AI workers have been making steady progress, and if so when the trend is likely to cross human AIQ levels. An educated guess about when AI will have a major impact on the world should help a bit in preparing for it.

A more disturbing possibility is that this test will be used as a fitness function for genetic programming. Given sufficient computing power, that looks likely to generate superhuman intelligence that is almost certainly unfriendly to humans. I’m confident that sufficient computing power is not available yet, but my confidence will decline over time.

Brian Wang has a few more notes on this talk

Some comments on last weekend’s Foresight Conference:

At lunch on Sunday I was in a group dominated by a discussion between Robin Hanson and Eliezer Yudkowsky over the relative plausibility of new intelligences having a variety of different goal systems versus a single goal system (as in a society of uploads versus Friendly AI). Some of the debate focused on how unified existing minds are, with Eliezer claiming that dogs mostly don’t have conflicting desires in different parts of their minds, and Robin and others claiming such conflicts are common (e.g. when deciding whether to eat food the dog has been told not to eat).

One test Eliezer suggested for the power of systems with a unified goal system is that if Robin were right, bacteria would have outcompeted humans. That got me wondering whether there’s an appropriate criterion by which humans can be said to have outcompeted bacteria. The most obvious criterion on which humans and bacteria are trying to compete is how many copies of their DNA exist. Using biomass as a proxy, bacteria are winning by several orders of magnitude. Another possible criterion is impact on large-scale features of Earth. Humans have not yet done anything that seems as big as the catastrophic changes to the atmosphere (“the oxygen crisis”) produced by bacteria. Am I overlooking other appropriate criteria?

Kartik Gada described two humanitarian innovation prizes that bear some resemblance to a valuable approach to helping the world’s poorest billion people, but will be hard to turn into something with a reasonable chance of success. The Water Liberation Prize would be pretty hard to judge. Suppose I submit a water filter that I claim qualifies for the prize. How will the judges test the drinkability of the water and the reusability of the filter under common third world conditions (which I suspect vary a lot and which probably won’t be adequately duplicated where the judges live)? Will they ship sample devices to a number of third world locations and ask whether it produces water that tastes good, or will they do rigorous tests of water safety? With a hoped for prize of $50,000, I doubt they can afford very good tests. The Personal Manufacturing Prizes seem somewhat more carefully thought out, but need some revision. The “three different materials” criterion is not enough to rule out overly specialized devices without some clear guidelines about which differences are important and which are trivial. Setting specific award dates appears to assume an implausible ability to predict how soon such a device will become feasible. The possibility that some parts of the device are patented is tricky to handle, as it isn’t cheap to verify the absence of crippling patents.

There was a debate on futarchy between Robin Hanson and Mencius Moldbug. Moldbug’s argument seems to boil down to the absence of a guarantee that futarchy will avoid problems related to manipulation/conflicts of interest. It’s unclear whether he thinks his preferred form of government would guarantee any solution to those problems, and he rejects empirical tests that might compare the extent of those problems under the alternative systems. Still, Moldbug concedes enough that it should be possible to incorporate most of the value of futarchy within his preferred form of government without rejecting his views. He wants to limit trading to the equivalent of the government’s stockholders. Accepting that limitation isn’t likely to impair the markets much, and may make futarchy more palatable to people who share Moldbug’s superstitions about markets.

Book review: Moral Machines: Teaching Robots Right from Wrong by Wendell Wallach and Collin Allen.

This book combines the ideas of leading commentators on ethics, methods of implementing AI, and the risks of AI, into a set of ideas on how machines ought to achieve ethical behavior.

The book mostly provides an accurate survey of what those commentators agree and disagree about. But there’s enough disagreement that we need some insights into which views are correct (especially about theories of ethics) in order to produce useful advice to AI designers, and the authors don’t have those kinds of insights.

The book focuses more on near term risks of software that is much less intelligent than humans, and is complacent about the risks of superhuman AI.

The implications of superhuman AIs for theories of ethics ought to illuminate flaws in them that aren’t obvious when considering purely human-level intelligence. For example, they mention an argument that any AI would value humans for their diversity of ideas, which would help AIs to search the space of possible ideas. This seems to have serious problems, such as what stops an AI from fiddling with human minds to increase their diversity? Yet the authors are too focused on human-like minds to imagine an intelligence which would do that.

Their discussion of the advocates friendly AI seems a bit confused. The authors wonder if those advocates are trying to quell apprehension about AI risks, when I’ve observed pretty consistent efforts by those advocates to create apprehension among AI researchers.