10. Compute does not scale like you think it does

520 words

One argument for why AGI might be unimaginably smarter than humans is that the physical limits of computation are so large. If humans are some amount of intelligent with some amount of compute, then an AGI with many times more compute will be many times more intelligent. This line of thought does not match modern thinking on computation.

The first obvious obstacle is that not every problem is linear time solvable. If intelligence scales as log(compute), then adding more compute will hardly affect the amount of intelligence of a system.1Whatever ‘intelligence’ might mean, let alone representing it by a number. Principal component analysis is bullshit. But if you believe in AI Risk then this likely won’t convince you.

The second, more concrete, obstacle is architecture. Let’s compare two computing devices. Device A is a cluster consisting of one billion first generation Raspberry Pi’s, for a total of 41 PFLOPS. Device B is a single PlayStation 4, coming in at 1.84 TFLOPS. Although the cluster has 22,000 times more FLOPS, there are plenty of problems that we can solve faster on the single PlayStation 4. Not all problems can be solved quicker through parallelization.2In theory, this is the open problem of P vs NC. In practice, you can easily see it to be true by imagining that the different rpi’s are all on different planets across the galaxy, which wouldn’t change their collective FLOPS but would affect their communication delay and hence their ability to compute anything together.

Modern computers are only as fast as they are because of very specific properties of existing software. Locality of reference is probably the biggest one. There is spacial locality of reference: if a processor accesses memory location x, it is likely to use location x+1 soon after that. Modern RAM exploits this fact by optimizing for sequential access, and slows down considerably when you do actual random access. There is also temporal locality of reference: if a processor accesses value x now, it is likely to access value x again in a short while. This is why processor cache provides speedup over just having RAM, and why having RAM provides a speedup over just having flash memory.3There has been some nice theory on this in the past decades. I quite like Albers, Favrholdt and Giel’s On paging with locality of reference (2005) in Journal of Computer and System Sciences.

Brains don’t exhibit such locality nearly as much. As a result, it is much easier to simulate a small “brain” than a large “brain”. Adding neurons increases the practical difficulty of simulation much more than linearly.4One caveat here is that this does not apply so much to artificial neural networks. Those can be optimized quickly partly because they are so structured. This is because of specific features of GPU’s that are outside the scope of this post. It might be possible that this would not be an obstacle for AGI, but it might also be possible for the ocean to explode, so that doesn’t tell us anything.5New cause area: funding a Fluid Intelligence Research Institute to prevent the dangers from superintelligent bodies of water.

Post-scripts

  1. Maybe this post needs more examples. Here is one:
    If you want to solve a 9×9 sudoku puzzle, it doesn’t matter if you can have 10 people working on it or 10,000 people, the puzzle will get solved in pretty much the same amount of time. Sudoku solving won’t parallelize beyond a certain point.

    Moore’s law used to effectively also tell us about our computers getting faster, but only because clock speeds were going up. These days there is no clear connection between number of transistors and speed of solving a plethora of real-world problems.

Leave a Reply