kelvinism

All posts tagged kelvinism

Book review: Noise: A Flaw in Human Judgment, by Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein.

Doctors are more willing to order a test for patients they see in the morning than for those they see late in the day.

Asylum applicants chances of prevailing may be as low as 5% or as high as 88% purely due to which judge hears their case.

Clouds Make Nerds Look Good, in the sense that university admissions officers give higher weight to academic attributes on cloudy days.

These are examples of what the authors describe as an important and neglected problem.

A more precise description of the book’s topic is variations in judgment, with judgment defined as “measurement in which the instrument is a human mind”.

Continue Reading

Oura

I’ve been using an Oura sleep tracking ring for six months.

In some ways it’s an impressive piece of technology. It’s small enough to not distract me much, and they went overboard in making the user interface simple. Simple, as in there basically aren’t any controls. I just put it on my finger, and occasionally put it on the charger.

Yet it does a poor job of what I expected it to do: track how long I sleep. It occasionally thinks I’m in bed when I’m not wearing it. If I get up to use the bathroom, it’s hard to predict whether it will decide that’s the start or end of my time in bed.

But the Oura reminded me that “8 hours of sleep” isn’t a good description of what I want – that’s just a crude heuristic for “slept well enough that further sleep won’t improve my productivity / health”. The Oura observes other relevant evidence: body temperature, breathing rate, heart rate, and heart rate variability. I.e. things I ignored because they were too hard to evaluate, rather than because I decided they weren’t important.

If I did a strenuous hike yesterday, it will tell me that 7.5 hours of sleep wasn’t enough, whereas if I’d spent yesterday relaxing, it might have told me that 7 hours was plenty, and that I should be ambitious.

It’s somewhat obvious that I need more sleep when a cold raises my body temperature. The Oura convinced me that there’s a much more general pattern of above average body temperature indicating an increased need for sleep.

I’ve tried comparing the Oura’s heart rate variability measurements with those of the emWave2, and I couldn’t see much correlation. I’m inclined to trust the emWave2 more, but I’m not aware of good evidence on the subject.

The Oura also helps track exercise, at least for hiking (it doesn’t seem to do much for weightlifting, but most of my exercise comes from walking/hiking). It reports slightly less calories burned than what I calculate from a cheap Garmin GPS and this calculator. I’m unsure which of those 2 measures is more accurate. If I were only using the GPS to measure calories burned, I’d give up on the GPS, because the Oura doesn’t have problems such as poor reception, or me forgetting to turn it on or off at the start and end of a hike.

It said I slept 3 hours on a red eye flight. My subjective impression was that it was somewhat debatable whether any of that ought to be classified as sleep. But what do I know? I have some evidence that I can sleep without being aware of sleeping (mainly from people reporting that I was snoring, at a time when I thought I was awake and not snoring).

My ring isn’t quite the right size for my ring finger. I ordered it based on prior information about what ring size worked for me, rather than using Oura’s measuring procedure. I’ve ended up wearing on the middle segment of my middle finger instead. That’s works well enough that the difference seems unimportant.

See this comparison with several alternatives for a more detailed analysis.

Mostly, the Oura simply reassured me that I don’t have significant sleep problems, other than the times when it’s obvious that I took too long to fall asleep, or woke up too early. I suspect that the Oura would have been moderately valuable if I had had sleep problems that were hard for me to detect.

No, this isn’t about cutlery.

I’m proposing to fork science in the sense that Bitcoin was forked, into an adversarial science and a crowdsourced science.

As with Bitcoin, I have no expectation that the two branches will be equal.

These ideas could apply to most fields of science, but some fields need change more than others. P-values and p-hacking controversy are signs that a field needs change. Fields that don’t care much about p-values don’t need as much change, e.g. physics and computer science. I’ll focus mainly on medicine and psychology, and leave aside the harder-to-improve social sciences.

What do we mean by the word Science?

The term “science” has a range of meanings.

One extreme focuses on “perform experiments in order to test hypotheses”, as in The Scientist In The Crib. I’ll call this the personal knowledge version of science.

A different extreme includes formal institutions such as peer review, RCTs, etc. I’ll call this the authoritative knowledge version of science.

Both of these meanings of the word science are floating around, with little effort to distinguish them [1]. I suspect that promotes confusion about what standards to apply to scientific claims. And I’m concerned that people will use the high status of authoritative science to encourage us to ignore knowledge that doesn’t fit within its paradigm.

Continue Reading

I got interested in trying ashwagandha due to The End of Alzheimer’s. That book also caused me to wonder whether I should optimize my thyroid hormone levels. And one of the many features of ashwagandha is that it improves thyroid levels, at least in hypothyroid people – I found conflicting reports about what it does to hyperthyroid people.

I had plenty of evidence that my thyroid levels were lower than optimal, e.g. TSH levels measured at 2.58 in 2012, 4.69 in 2013, and 4.09 this fall [1]. And since starting alternate day calorie restriction, I saw increasing hypothyroid symptoms: on calorie restriction days my feet felt much colder around bedtime, my pulse probably slowed a bit, my body burned fewer calories, and I got vague impressions of having less energy. Presumably my body was lowering my thyroid levels to keep my weight from dropping.

I researched the standard treatments for hypothyroidism, but was discouraged by the extent of disagreement among doctors about the wisdom of treating hypothyroidism when it’s as mild as mine was. It seems like mainstream medical opinion says the risks slightly outweigh the rewards, and a sizable minority of doctors, relying on more subjective evidence, say the rewards are large, and don’t say much about the risks. Plus, the evidence for optimal thyroid levels protecting against Alzheimer’s seems to come mainly from correlations that are seen only in women.

Also, the standard treatments for hypothyroidism require a prescription (probably for somewhat good reasons), which may have deterred me by more than a rational amount.

So I decided to procrastinate any attempt to optimize my thyroid hormones, and since I planned to try ashwagandha and DHEA for other reasons, I hoped to get some evidence from the small increases to thyroid hormones that I expected from those two supplements.

I decided to try ashwagandha first, due mainly to the large number of problems it may improve – anxiety, inflammation, stress, telomeres, cholesterol, etc.
Continue Reading

[Warning: long post, of uncertain value, with annoyingly uncertain conclusions.]

This post will focus on how hardware (cpu power) will affect AGI timelines. I will undoubtedly overlook some important considerations; this is just a model of some important effects that I understand how to analyze.

I’ll make some effort to approach this as if I were thinking about AGI timelines for the first time, and focusing on strategies that I use in other domains.

I’m something like 60% confident that the most important factor in the speed of AI takeoff will be the availability of computing power.

I’ll focus here on the time to human-level AGI, but I suspect this reasoning implies getting from there to superintelligence at speeds that Bostrom would classify as slow or moderate.
Continue Reading

In this post, I’ll describe features of the moral system that I use. I expect that it’s similar enough to Robin Hanson’s views I’ll use his name dealism to describe it, but I haven’t seen a well-organized description of dealism. (See a partial description here).

It’s also pretty similar to the system that Drescher described in Good and Real, combined with Anna Salamon’s description of causal models for Newcomb’s problem (which describes how to replace Drescher’s confused notion of “subjunctive relations” with a causal model). Good and Real eloquently describes why people should want to follow dealist-like moral system; my post will be easier to understand if you understand Good and Real.

The most similar mainstream system is contractarianism. Dealism applies to a broader set of agents, and depends less on the initial conditions. I haven’t read enough about contractarianism to decide whether dealism is a special type of contractarianism or whether it should be classified as something separate. Gauthier’s writings look possibly relevant, but I haven’t found time to read them.

Scott Aaronson’s eigenmorality also overlaps a good deal with dealism, and is maybe a bit easier to understand.

Under dealism, morality consists of rules / agreements / deals, especially those that can be universalized. We become more civilized as we coordinate better to produce more cooperative deals. I’m being somewhat ambiguous about what “deal” and “universalized” mean, but those ambiguities don’t seem important to the major disagreements over moral systems, and I want to focus in this post on high-level disagreements.
Continue Reading

[Another underwhelming book; I promise to get out of the habit of posting only book reviews Real Soon Now.]

Book review: Seeing like a State: How Certain Schemes to Improve the Human Condition Have Failed, by James C. Scott.

Scott begins with a history of the tension between the desire for legibility versus the desire for local control. E.g. central governments wanted to know how much they could tax peasants without causing famine or revolt. Yet even in the optimistic case where they got an honest tax collector to report how many bushels of grain John produced, they had problems due to John’s village having an idiosyncratic meaning of “bushel” that the tax collector couldn’t easily translate to something the central government knew. And it was hard to keep track of whether John had paid the tax, since the central government didn’t understand how the villagers distinguished that John from the John who lived a mile away.

So governments that wanted to grow imposed lots of standards on people. That sometimes helped peasants by making their taxes fairer and more predictable, but often trampled over local arrangements that had worked well (especially complex land use agreements).

I found that part of the book to be a fairly nice explanation of why an important set of conflicts was nearly inevitable. Scott gives a relatively balanced view of how increased legibility had both good and bad effects (more efficient taxation, diseases tracked better, Nazis found more Jews, etc.).

Then Scott becomes more repetitive and one-sided when describing high modernism, which carried the desire for legibility to a revolutionary, authoritarian extreme (especially between 1920 and 1960). I didn’t want 250 pages of evidence that Soviet style central planning was often destructive. Maybe that conclusion wasn’t obvious to enough people when Scott started writing the book, but it was painfully obvious by the time the book was published.

Scott’s complaints resemble the Hayekian side of the socialist calculation debate, except that Scott frames in terms that minimize associations with socialism and capitalism. E.g. he manages to include Taylorist factory management in his cluster of bad ideas.

It’s interesting to compare Fukuyama’s description of Tanzania with Scott’s description. They both agree that villagization (Scott’s focus) was a disaster. Scott leaves readers with the impression that villagization was the most important policy, whereas Fukuyama only devotes one paragraph to it, and gives the impression that the overall effects of Tanzania’s legibility-increasing moves were beneficial (mainly via a common language causing more cooperation). Neither author provides a balanced view (but then they were both drawing attention to neglected aspects of history, not trying to provide a complete picture).

My advice: read the SlateStarCodex review, don’t read the whole book.

Book review: The Measure of All Minds: Evaluating Natural and Artificial Intelligence, by José Hernández-Orallo.

Much of this book consists of surveys of the psychometric literature. But the best parts of the book involve original results that bring more rigor and generality to the field. The best parts of the book approach the quality that I saw in Judea Pearl’s Causality, and E.T. Jaynes’ Probability Theory, but Measure of All Minds achieves a smaller fraction of its author’s ambitions, and is sometimes poorly focused.

Hernández-Orallo has an impressive ambition: measure intelligence for any agent. The book mentions a wide variety of agents, such as normal humans, infants, deaf-blind humans, human teams, dogs, bacteria, Q-learning algorithms, etc.

The book is aimed at a narrow and fairly unusual target audience. Much of it reads like it’s directed at psychology researchers, but the more original parts of the book require thinking like a mathematician.

The survey part seems pretty comprehensive, but I wasn’t satisfied with his ability to distinguish the valuable parts (although he did a good job of ignoring the politicized rants that plague many discussions of this subject).

For nearly the first 200 pages of the book, I was mostly wondering whether the book would address anything important enough for me to want to read to the end. Then I reached an impressive part: a description of an objective IQ-like measure. Hernández-Orallo offers a test (called the C-test) which:

  • measures a well-defined concept: sequential inductive inference,
  • defines the correct responses using an objective rule (based on Kolmogorov complexity),
  • with essentially no arbitrary cultural bias (the main feature that looks like an arbitrary cultural bias is the choice of alphabet and its order)[1],
  • and gives results in objective units (based on Levin’s Kt).

Yet just when I got my hopes up for a major improvement in real-world IQ testing, he points out that what the C-test measures is too narrow to be called intelligence: there’s a 960 line Perl program that exhibits human-level performance on this kind of test, without resembling a breakthrough in AI.
Continue Reading

Book review: Superforecasting: The Art and Science of Prediction, by Philip E. Tetlock and Dan Gardner.

This book reports on the Good Judgment Project (GJP).

Much of the book recycles old ideas: 40% of the book is a rerun of Thinking Fast and Slow, 15% of the book repeats Wisdom of Crowds, and 15% of the book rehashes How to Measure Anything. Those three books were good enough that it’s very hard to improve on them. Superforecasting nearly matches their quality, but most people ought to read those three books instead. (Anyone who still wants more after reading them will get decent value out of reading the last 4 or 5 chapters of Superforecasting).

The book’s style is very readable, using an almost Gladwell-like style (a large contrast to Tetlock’s previous, more scholarly book), at a moderate cost in substance. It contains memorable phrases, such as “a fox with the bulging eyes of a dragonfly” (to describe looking at the world through many perspectives).

Continue Reading