Science and Technology

Book review: The AI Does Not Hate You: Superintelligence, Rationality and the Race to Save the World, by Tom Chivers.

This book is a sympathetic portrayal of the rationalist movement by a quasi-outsider. It includes a well-organized explanation of why some people expect tha AI will create large risks sometime this century, written in simple language that is suitable for a broad audience.

Caveat: I know many of the people who are described in the book. I’ve had some sort of connection with the rationalist movement since before it became distinct from transhumanism, and I’ve been mostly an insider since 2012. I read this book mainly because I was interested in how the rationalist movement looks to outsiders.

Chivers is a science writer. I normally avoid books by science writers, due to an impression that they mostly focus on telling interesting stories, without developing a deep understanding of the topics they write about.

Chivers’ understanding of the rationalist movement doesn’t quite qualify as deep, but he was surprisingly careful to read a lot about the subject, and to write only things he did understand.

Many times I reacted to something he wrote with “that’s close, but not quite right”. Usually when I reacted that way, Chivers did a good job of describing the the rationalist message in question, and the main problem was either that rationalists haven’t figured out how to explain their ideas in a way that a board audience can understand, or that rationalists are confused. So the complaints I make in the rest of this review are at most weakly directed in Chivers direction.

I saw two areas where Chivers overlooked something important.

Rationality

One involves CFAR.

Chivers wrote seven chapters on biases, and how rationalists view them, ending with “the most important bias”: knowing about biases can make you more biased. (italics his).

I get the impression that Chivers is sweeping this problem under the rug (Do we fight that bias by being aware of it? Didn’t we just read that that doesn’t work?). That is roughly what happened with many people who learned rationalism solely via written descriptions.

Then much later, when describing how he handled his conflicting attitudes toward the risks from AI, he gives a really great description of maybe 3% of what CFAR teaches (internal double crux), much like a blind man giving a really clear description of the upper half of an elephant’s trunk. He prefaces this narrative with the apt warning: “I am aware that this all sounds a bit mystical and self-helpy. It’s not.”

Chivers doesn’t seem to connect this exercise with the goal of overcoming biases. Maybe he was too busy applying the technique on an important problem to notice the connection with his prior discussions of Bayes, biases, and sanity. It would be reasonable for him to argue that CFAR’s ideas have diverged enough to belong in a separate category, but he seems to put them in a different category by accident, without realizing that many of us consider CFAR to be an important continuation of rationalists’ interest in biases.

World conquest

Chivers comes very close to covering all of the layman-accessible claims that Yudkowsky and Bostrom make. My one complaint here is that he only give vague hints about why one bad AI can’t be stopped by other AI’s.

A key claim of many leading rationalists is that AI will have some winner take all dynamics that will lead to one AI having a decisive strategic advantage after it crosses some key threshold, such as human-level intelligence.

This is a controversial position that is somewhat connected to foom (fast takeoff), but which might be correct even without foom.

Utility functions

“If I stop caring about chess, that won’t help me win any chess games, now will it?” – That chapter title provides a good explanation of why a simple AI would continue caring about its most fundamental goals.

Is that also true of an AI with more complex, human-like goals? Chivers is partly successful at explaining how to apply the concept of a utility function to a human-like intelligence. Rationalists (or at least those who actively research AI safety) have a clear meaning here, at least as applied to agents that can be modeled mathematically. But when laymen try to apply that to humans, confusion abounds, due to the ease of conflating subgoals with ultimate goals.

Chivers tries to clarify, using the story of Odysseus and the Sirens, and claims that the Sirens would rewrite Odysseus’ utility function. I’m not sure how we can verify that the Sirens work that way, or whether they would merely persuade Odysseus to make false predictions about his expected utility. Chivers at least states clearly that the Sirens try to prevent Odysseus (by making him run aground) from doing what his pre-Siren utility function advises. Chivers’ point could be a bit clearer if he specified that in his (nonstandard?) version of the story, the Sirens make Odysseus want to run aground.

Philosophy

“Essentially, he [Yudkowsky] (and the Rationalists) are thoroughgoing utilitarians.” – That’s a bit misleading. Leading rationalists are predominantly consequentialists, but mostly avoid committing to a moral system as specific as utilitarianism. Leading rationalists also mostly endorse moral uncertainty. Rationalists mostly endorse utilitarian-style calculation (which entails some of the controversial features of utilitarianism), but are careful to combine that with worry about whether we’re optimizing the quantity that we want to optimize.

I also recommend Utilitarianism and its discontents as an example of one rationalist’s nuanced partial endorsement of utilitarianism.

Political solutions to AI risk?

Chivers describes Holden Karnofsky as wanting “to get governments and tech companies to sign treaties saying they’ll submit any AGI designs to outside scrutiny before switching them on. It wouldn’t be iron-clad, because firms might simply lie”.

Most rationalists seem pessimistic about treaties such as this.

Lying is hardly the only problem. This idea assumes that there will be a tiny number of attempts, each with a very small number of launches that look like the real thing, as happened with the first moon landing and the first atomic bomb. Yet the history of software development suggests it will be something more like hundreds of attempts that look like they might succeed. I wouldn’t be surprised if there are millions of times when an AI is turned on, and the developer has some hope that this time it will grow into a human-level AGI. There’s no way that a large number of designs will get sufficient outside scrutiny to be of much use.

And if a developer is trying new versions of their system once a day (e.g. making small changes to a number that controls, say, openness to new experience), any requirement to submit all new versions for outside scrutiny would cause large delays, creating large incentives to subvert the requirement.

So any realistic treaty would need provisions that identify a relatively small set of design choices that need to be scrutinized.

I see few signs that any experts are close to developing a consensus about what criteria would be appropriate here, and I expect that doing so would require a significant fraction of the total wisdom needed for AI safety. I discussed my hope for one such criterion in my review of Drexler’s Reframing Superintelligence paper.

Rationalist personalities

Chivers mentions several plausible explanations for what he labels the “semi-death of LessWrong”, the most obvious being that Eliezer Yudkowsky finished most of the blogging that he had wanted to do there. But I’m puzzled by one explanation that Chivers reports: “the attitude … of thinking they can rebuild everything”. Quoting Robin Hanson:

At Xanadu they had to do everything different: they had to organize their meetings differently and orient their screens differently and hire a different kind of manager, everything had to be different because they were creative types and full of themselves. And that’s the kind of people who started the Rationalists.

That seems like a partly apt explanation for the demise of the rationalist startups MetaMed and Arbital. But LessWrong mostly copied existing sites, such as Reddit, and was only ambitious in the sense that Eliezer was ambitious about what ideas to communicate.

Culture

I guess a book about rationalists can’t resist mentioning polyamory. “For instance, for a lot of people it would be difficult not to be jealous.” Yes, when I lived in a mostly monogamous culture, jealousy seemed pretty standard. That attititude melted away when the bay area cultures that I associated with started adopting polyamory or something similar (shortly before the rationalists became a culture). Jealousy has much more purpose if my partner is flirting with monogamous people than if he’s flirting with polyamorists.

Less dramatically, We all know people who are afraid of visiting their city centres because of terrorist attacks, but don’t think twice about driving to work.

This suggests some weird filter bubbles somewhere. I thought that fear of cities got forgotten within a month or so after 9/11. Is this a difference between London and the US? Am I out of touch with popular concerns? Does Chivers associate more with paranoid people than I do? I don’t see any obvious answer.

Conclusion

It would be really nice if Chivers and Yudkowsky could team up to write a book, but this book is a close substitute for such a collaboration.

See also Scott Aaronson’s review.

Book review: Prediction Machines: The Simple Economics of Artificial Intelligence, by Ajay Agrawal, Joshua Gans, and Avi Goldfarb.

Three economists decided to write about AI. They got excited about AI, and that distracted them enough that they only said a modest amount about the standard economics principles that laymen need to better understand. As a result, the book ended up mostly being simple descriptions of topics on which the authors had limited expertise. I noticed fewer amateurish mistakes than I expected from this strategy, and they mostly end up doing a good job of describing AI in ways that are mildly helpful to laymen who only want a very high-level view.

The book’s main goal is to advise business on how to adopt current types of AI (“reading this book is almost surely an excellent predictor of being a manager who will use prediction machines”), with a secondary focus on how jobs will be affected by AI.

The authors correctly conclude that a modest extrapolation of current trends implies at most some short-term increases in unemployment.

Continue Reading

Book review: The Finders, by Jeffery A Martin.

This book is about the states of mind that Martin labels Fundamental Wellbeing.

These seem to be what people seek through meditation, but Martin carefully avoids focusing on Buddhism, and says that other spiritual approaches produce similar states of mind.

Martin approaches the subject as if he were an anthropologist. I expect that’s about as rigorous as we should hope for on many of the phenomena that he studies.

The most important change associated with Fundamental Wellbeing involves the weakening or disappearance of the Narrative-Self (i.e. the voice that seems to be the center of attention in most human minds).

I’ve experienced a weak version of that. Through a combination of meditation and CFAR ideas (and maybe The Mating Mind, which helped me think of the Narrative-Self as more of a press secretary than as a leader), I’ve substantially reduced the importance that my brain attaches to my Narrative-Self, and that has significantly reduced how much I’m bothered by negative stimuli.

Some more “advanced” versions of Fundamental Wellbeing also involve a loss of “self” – something along the lines of being one with the universe, or having no central locus or vantage point from which to observe the world. I don’t understand this very well. Martin suggests an analogy which describes this feeling as “zoomed-out”, i.e. the opposite extreme from Hyperfocus or a state of Flow. I guess that gives me enough hints to say that I haven’t experienced anything that’s very close to it.

I’m tempted to rephrase this as turning off what Dennett calls the Cartesian Theater. Many of the people that Martin studied seem to have discarded this illusion.

Alas, the book says little about how to achieve Fundamental Wellbeing. The people who he studied tend to have achieved it via some spiritual path, but it sounds like there was typically a good deal of luck involved. Martin has developed an allegedly more reliable path, available at FindersCourse.com, but that requires a rather inflexible commitment to a time-consuming schedule, and a fair amount of money.

Should I want to experience Fundamental Wellbeing?

Most people who experience it show a clear preference for remaining in that state. That’s a clear medium strength reason to suspect that I should want it, and it’s hard to see any counter to that argument.

The weak version of Fundamental Wellbeing that I’ve experienced tends to confirm that conclusion, although I see signs that some aspects require continuing attention to maintain, and the time required to do so sometimes seems large compared to the benefits.

Martin briefly discusses people who experienced Fundamental Wellbeing, and then rejected it. It reminds me of my reaction to an SSRI – it felt like I got a nice vacation, but vacation wasn’t what I wanted, since it conflicted with some of my goals for achieving life satisfaction. Those who reject Fundamental Wellbeing disliked the lack of agency and emotion (I think this refers only to some of the harder-to-achieve versions of Fundamental Wellbeing). That sounds like it overlaps a fair amount with what I experienced on the SSRI.

Martin reports that some of the people he studied have unusual reactions to pain, feeling bliss under circumstances that appear to involve lots of pain. I can sort of see how this is a plausible extreme of the effects that I understand, but it still sounds pretty odd.

Will the world be better if more people achieve Fundamental Wellbeing?

The world would probably be somewhat better. Some people become more willing and able to help others when they reduce their own suffering. But that’s partly offset by people with Fundamental Wellbeing feeling less need to improve themselves, and feeling less bothered by the suffering of others. So the net effect is likely just a minor benefit.

I expect that even in the absence of people treating each other better, the reduced suffering that’s associated with Fundamental Wellbeing would mean that the world is a better place.

However, it’s tricky to determine how important that is. Martin mentions a clear case of a person who said he felt no stress, but exhibited many physical signs of being highly stressed. Is that better or worse than being conscious of stress? I think my answer is very context-dependent.

If it’s so great, why doesn’t everyone learn how to do it?

  • Achieving Fundamental Wellbeing often causes people to have diminished interest in interacting with other people. Only a modest fraction of people who experience it attempt to get others to do so.
  • I presume it has been somewhat hard to understand how to achieve Fundamental Wellbeing, and why we should think it’s valuable.
  • The benefits are somewhat difficult to observe, and there are sometimes visible drawbacks. E.g. one anecdote of a manager who became more generous with his company’s resources – that was likely good for some people, but likely at some cost to the company and/or his career.

Conclusion

The ideas in this book deserve to be more widely known.

I’m unsure whether that means lots of people should read this book. Maybe it’s more important just to repeat simple summaries of the book, and to practice more meditation.

[Note: I read a pre-publication copy that was distributed at the Transformative Technology conference.]

Book review: Principles: Life and Work, by Ray Dalio.

Most popular books get that way by having an engaging style. Yet this book’s style is mundane, almost forgetable.

Some books become bestsellers by being controversial. Others become bestsellers by manipulating reader’s emotions, e.g. by being fun to read, or by getting the reader to overestimate how profound the book is. Principles definitely doesn’t fit those patterns.

Some books become bestsellers because the author became famous for reasons other than his writings (e.g. Stephen Hawking, Donald Trump, and Bill Gates). Principles fits this pattern somewhat well: if an obscure person had published it, nothing about it would have triggered a pattern of readers enthusiastically urging their friends to read it. I suspect the average book in this category is rather pathetic, but I also expect there’s a very large variance in the quality of books in this category.

Principles contains an unusual amount of wisdom. But it’s unclear whether that’s enough to make it a good book, because it’s unclear whether it will convince readers to follow the advice. Much of the advice sounds like ideas that most of us agree with already. The wisdom comes more in selecting the most underutilized ideas, without being particularly novel. The main benefit is likely to be that people who were already on the verge of adopting the book’s advice will get one more nudge from an authority, providing the social reassurance they need.

Advice

Some of why I trust the book’s advice is that it overlaps a good deal with other sources from which I’ve gotten value, e.g. CFAR.

Key ideas include:

  • be honest with yourself
  • be open-minded
  • focus on identifying and fixing your most important weaknesses

Continue Reading

Eric Drexler has published a book-length paper on AI risk, describing an approach that he calls Comprehensive AI Services (CAIS).

His primary goal seems to be reframing AI risk discussions to use a rather different paradigm than the one that Nick Bostrom and Eliezer Yudkowsky have been promoting. (There isn’t yet any paradigm that’s widely accepted, so this isn’t a Kuhnian paradigm shift; it’s better characterized as an amorphous field that is struggling to establish its first paradigm). Dueling paradigms seems to be the best that the AI safety field can manage to achieve for now.

I’ll start by mentioning some important claims that Drexler doesn’t dispute:

  • an intelligence explosion might happen somewhat suddenly, in the fairly near future;
  • it’s hard to reliably align an AI’s values with human values;
  • recursive self-improvement, as imagined by Bostrom / Yudkowsky, would pose significant dangers.

Drexler likely disagrees about some of the claims made by Bostrom / Yudkowsky on those points, but he shares enough of their concerns about them that those disagreements don’t explain why Drexler approaches AI safety differently. (Drexler is more cautious than most writers about making any predictions concerning these three claims).

CAIS isn’t a full solution to AI risks. Instead, it’s better thought of as an attempt to reduce the risk of world conquest by the first AGI that reaches some threshold, preserve existing corrigibility somewhat past human-level AI, and postpone need for a permanent solution until we have more intelligence.

Continue Reading

Descriptions of AI-relevant ontological crises typically choose examples where it seems moderately obvious how humans would want to resolve the crises. I describe here a scenario where I don’t know how I would want to resolve the crisis.

I will incidentally ridicule express distate for some philosophical beliefs.

Suppose a powerful AI is programmed to have an ethical system with a version of the person-affecting view. A version which says only persons who exist are morally relevant, and “exist” only refers to the present time. [Note that the most sophisticated advocates of the person-affecting view are willing to treat future people as real, and only object to comparing those people to other possible futures where those people don’t exist.]

Suppose also that it is programmed by someone who thinks in Newtonian models. Then something happens which prevents the programmer from correcting any flaws in the AI. (For simplicity, I’ll say programmer dies, and the AI was programmed to only accept changes to its ethical system from the programmer).

What happens when the AI tries to make ethical decisions about people in distant galaxies (hereinafter “distant people”) using a model of the universe that works like relativity?

Continue Reading

Book review: Artificial Intelligence Safety and Security, by Roman V. Yampolskiy.

This is a collection of papers, with highly varying topics, quality, and importance.

Many of the papers focus on risks that are specific to superintelligence, some assuming that a single AI will take over the world, and some assuming that there will be many AIs of roughly equal power. Others focus on problems that are associated with current AI programs.

I’ve tried to arrange my comments on individual papers in roughly descending order of how important the papers look for addressing the largest AI-related risks, while also sometimes putting similar topics in one group. The result feels a little more organized than the book, but I worry that the papers are too dissimilar to be usefully grouped. I’ve ignored some of the less important papers.

The book’s attempt at organizing the papers consists of dividing them into “Concerns of Luminaries” and “Responses of Scholars”. Alas, I see few signs that many of the authors are even aware of what the other authors have written, much less that the later papers are attempts at responding to the earlier papers. It looks like the papers are mainly arranged in order of when they were written. There’s a modest cluster of authors who agree enough with Bostrom to constitute a single scientific paradigm, but half the papers demonstrate about as much of a consensus on what topic they’re discussing as I would expect to get from asking medieval peasants about airplane safety.

Continue Reading

Book(?) review: The Great Stagnation: How America Ate All The Low-Hanging Fruit of Modern History, Got Sick, and Will (Eventually) Feel Better, by Tyler Cowen.

Tyler Cowen wrote what looks like a couple of blog posts, and published them in book form.

The problem: US economic growth slowed in the early 1970s, and hasn’t recovered much. Median family income would be 50% higher if the growth of 1945-1970 had continued.

Continue Reading

Book review: Where Is My Flying Car? A Memoir of Future Past, by J. Storrs Hall (aka Josh).

If you only read the first 3 chapters, you might imagine that this is the history of just one industry (or the mysterious lack of an industry).

But this book attributes the absence of that industry to a broad set of problems that are keeping us poor. He looks at the post-1970 slowdown in innovation that Cowen describes in The Great Stagnation[1]. The two books agree on many symptoms, but describe the causes differently: where Cowen says we ate the low hanging fruit, Josh says it’s due to someone “spraying paraquat on the low-hanging fruit”.

The book is full of mostly good insights. It significantly changed my opinion of the Great Stagnation.

The book jumps back and forth between polemics about the Great Strangulation (with a bit too much outrage porn), and nerdy descriptions of engineering and piloting problems. I found those large shifts in tone to be somewhat disorienting – it’s like the author can’t decide whether he’s an autistic youth who is eagerly describing his latest obsession, or an angry old man complaining about how the world is going to hell (I’ve met the author at Foresight conferences, and got similar but milder impressions there).

Josh’s main explanation for the Great Strangulation is the rise of Green fundamentalism[2], but he also describes other cultural / political factors that seem related. But before looking at those, I’ll look in some depth at three industries that exemplify the Great Strangulation.

Continue Reading

Book review: The Book of Why, by Judea Pearl and Dana MacKenzie.

This book aims to turn the ideas from Pearl’s seminal Causality into something that’s readable by a fairly wide audience.

It is somewhat successful. Most of the book is pretty readable, but parts of it still read like they were written for mathematicians.

History of science

A fair amount of the book covers the era (most of the 20th century) when statisticians and scientists mostly rejected causality as an appropriate subject for science. They mostly observed correlations, and carefully repeated the mantra “correlation does not imply causation”.

Scientists kept wanting to at least hint at causal implications of their research, but statisticians rejected most attempts to make rigorous claims about causes.

Continue Reading