14 comments on “Moore’s Law and AGI Timelines

  1. Pingback: Rational Feed – deluks917

  2. Yes it is plausible that data on algorithms improving with hardware are selected for where algorithms don’t run out.

    On hardware, we’ve already seem timing gains stop, but cost gains have not slowed as a result. It may be that cost gains won’t stop when feature size limits are hit as well. But even assuming cost per gate falls at the same speed, after the “Landauer limit” such gains will have to be split between more time per operation and more operations. So after that point effective progress may go half as fast.

    On learning, ti seems that you are assuming that a simple general learning mechanism is sufficient; there won’t be lots of specifics to discover about how to learn particular things. I’m skeptical of that.

  3. Hey, people!
    Landauer’s limit allows us to have ~100 exaflops per Watt at room temperature.
    Moreover, it allows us to have ~10 000 exaflops / Watt at cosmic microwave background temperature.

    Why do you mention that “restriction” at all?

  4. I think that for AI safety we need to look not on median arrival time but on the first 10 percent probability. To the median arrival time we will be dead with 50 percent probability.

    I analysed other evidence, like the neural net performance growth and dataset size growth against “human size dataset” in the nonfinished yet article and in the presentation here: https://www.slideshare.net/avturchin/near-term-ai-safety

    And come to similar conclusions about timing of first infra-human AIs: 2020s.

    However, after first infra-human AI I expect that the curve will change because higher investment and lowering prices for specialised hardware will increase the number of tested infrahuman designs very quickly, so thousands will be tested in 1-2 years after first one.

  5. Robin,
    I think I agree with you (and disagree with Eliezer) that an AGI will need lots of specifics. I get the impression that we disagree mainly about the extent to which those specifics require human work to implement.

    I see a trend toward increased use of learning as opposed to having knowledge built in. E.g. humans learning from culture what to eat, versus some innate knowledge of what to eat in other primates; or Google Neural Machine Translation replacing software which depended more on humans telling it how language works. I expect this pattern to continue, so that systems with more computing power and data will acquire increasing fractions of what you call specifics without human intervention.

    Questioner,
    I mentioned the “Landauer limit” because it’s expected to have some important effects on how Moore’s law works. I’m unsure why I mentioned Koomey’s law – FLOPS/watt doesn’t seem important in the most likely AGI timeframes. Also, experts seem confident that cooling below room temperature won’t save money.

  6. Thanks for writing this. Great read.

    Your footnote #3 not withstanding (maybe I missed the argument), if the learnt algo becomes easily reproducible/transferable across new AI instances, then shouldn’t the cost of learning be somehow amortized against all present and future AI instances that use that algo?

    Another factor to consider is the notion of collective intelligence. That is, the idea that multiple weakly communicating instances can be better problem solvers than a single instance with the same computing resources. If this turns out to be the case (as some recent papers suggest), then there’d be every incentive to make this learning transferability work and deploy many (still learning) AI instances from “pre-learnt” instances.

  7. Thanks for the detailed analysis, and apologies for the nitpick on an otherwise very interesting post, but I think you mean to say that cerebral cortex mass scales polynomially, not exponentially, with number of neurons. If the power is fixed but the base varies, then you have polynomial growth. It looks to me like the rest of your discussion treats this correctly, so probably this is just a matter of terminology.

  8. It’s just an aside, largely irrelevant to the main point, but Footnote 2 seems wrong to me, though I’m not quite sure what you’re saying. Yes, there are no hard limits on clock speed, and actual choices depend on tradeoffs, but the tradeoff curve really has changed. It is true that phones sacrifice clock speed for other goals. Laptops, too, to a lesser degree. But desktop and supercomputer clock speeds ceased increasing because of technical problems. Dennard scaling meant that every time transistors got smaller, you could increase clock speed while keeping the power consumption (per area) the same. But it broke down.

  9. Babak,
    When I said “hardest to integrate with the rest of the system”, I meant to suggest difficulties which will recur each time a moderately new AI version is created. But my intuitions are fairly vague here.

    The collective intelligence approach seems likely to offer some benefits, but I find it hard to imagine they’ll have a big effect on the timing of human-level AGI compared to the other sources of uncertainty.

    David,
    Thanks, I’ve fixed that.

    Douglas,
    Yes, it looks like I overstated the importance of changing user preferences.

  10. Yes, a system that starts out with a simple learning algorithm of course ends up with lots of specifics. But my claim is that to learn well a system needs to start out with a lot of the right sort of specifics at the learning level. Yes, even with the right learning approach it could take a system many years to reach maturity. But it could take many times longer to search to find the right sort of learning approach. The more detail is needed in the right approach, the longer it should take.

  11. I have an argument about evolution to make. Current deep neural networks are evolving in a few weeks what took biological systems millions or billions of years to evolve. They are basically learning the lower visual pathways and upwards far more efficiently than is possible with biological evolution. Mainly because digital systems are more finely quanititized than biological systems.
    I made some comments on the Numenta website about it:

    https://discourse.numenta.org/t/omg-ai-winter/3195

    The thing that is missing from current deep neural networks is connection to vast (associative) memory reserves. Also backpropagation may be a little too weak for wiring everything together but very simple evolutionary algorithms will do where there are a large excess of parameters.
    If you can accept that digital evolution is far more efficient than biological evolution then AI can push way beyond human capabilities in fairly short order, even on current hardware.

  12. I’d like to build on Babak’s observation about collaborative intelligence. Disclosure: I’m Chairman and lead investor in a company that just released an open source architecture for building collaborative intelligence networks, so I’ve drunk the Koolaid.

    To me, AGI is misguided from the start because in fact humans don’t have “general intelligence.” Our entire physiology is a network of special-purpose cells and subsystems. Our brains are kinda general purpose but through learning develop specialist areas, and they interconnect to a network of nerves that become specialized subsystems.

    And we’ve evolved as a species because of language, collaboration and specialization. Nature, nurture, culture, education and economics foster specialization across the population.

    An important part of our evolution is due to abstraction at various fractal layers of physiology and culture. Our consciousness doesn’t process every pixel our eyes see, every sound wave our ears hear. Our consciousness isn’t aware of everything our body does. Only abstractions of concepts or emotions rise all the way up. That’s an efficient architecture.

    And that’s why ML isn’t the path to AGI, IMO. Networks of specialized MLs, combined with other types of AI, bring computational efficiency through distributed network architecture rather than through centralized algorithms.

    Those networks may include Bayesian learning trees, fractal abstraction networks, and competitive evolutionary ecosystems. Math, economics and ecology offer a variety of architectures. None of them are “general purpose” in the sense that they provide “most likely the best” answers in all contexts.

    But contextual specialization does lead to efficiency of the system.

    Babak referred to “weakly communicating instances.” A microservices architecture configured in a Bayesian or fractal network can create “strongly communicating instances.” Or if self organized criticality yields emergent most likely best choices, then a weakly communicating network is a better network architecture.

    It’s possible (likely?) this principle also applies to hardware. Way out of my league here, but from afar it seems like special-purpose architectures outperform general purpose ones in well-defined contexts. CPU/GPU/TPU is just one of many examples. Perhaps increasingly specialized AI hardware will approach the cost of general purpose AI hardware over time through automation, yielding price/performance improvements not shown in a forecast of extrapolating the general purpose approach.

    Not sure what that means for your forecast.

    If this stirs curiosity, you can learn more at https://www.introspectivesystems.com/

  13. Pingback: Where Is My Flying Car? | Bayesian Investor Blog

Comments are closed.