Moore’s Law
Article
Moore’s Law is a recurring concept in the Astral Codex Ten archive, appearing 2 times across 2 issues between February 23, 2022 and June 17, 2022. The archive places it in contexts such as “It’s not easy to graph as exactly as Moore’s Law”; “which is even faster than Moore’s Law”. It most often appears alongside AGI, AI Impacts, AIXI.
Metadata
- Category: Concepts
- Mention count: 2
- Issue count: 2
- First seen: February 23, 2022
- Last seen: June 17, 2022
Appears In
- Biological Anchors: A Trick That Might Or Might Not Work
- Your Book Review: The Future Of Fusion Energy
Related Pages
-
- AGI (1 shared issues)
-
- AI Impacts (1 shared issues)
-
- AIXI (1 shared issues)
-
- Ajeya (1 shared issues)
-
- Ajeya Cotra (1 shared issues)
-
- Ajeya et al (1 shared issues)
-
- Ajeya’s Evolutionary Anchor (1 shared issues)
-
- Ajeya’s report (1 shared issues)
-
- Alcator C-Mod (1 shared issues)
-
- Alignment Newsletter (1 shared issues)
-
- alpha-beta pruning (1 shared issues)
-
- AlphaGo (1 shared issues)
External Links
Source Context
Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.
SS Great Eastern, the extreme outlier large steamship from 1858. This has become sort of a mascot for quantitative technological progress forecasters. What is this scientist’s error? The big one is thinking that spaceship progress depends on some easily-measured quantity (size) instead of on fundamental advances (eg figuring out how rockets work). You can make the same accusation against Ajeya et al: you can have all the FLOPs in the world, but if you don’t understand how to make a machine think, your AI will be, well, a flop. Ajeya discusses this a bit on page 143 of her report. There is some sense in which FLOPs and knowing-what-you’re-doing trade of against each other. If you have literally no idea what you’re doing, you can sort of kind of re-run evolution until it comes up with something that looks good. If things are somehow even worse than that, you could always run AIXI, a hypothetical AI design guaranteed to get excellent results as long as you have infinite computation. You could run a Go engine by searching the entire branching tree structure of Go - you shouldn’t, and it would take a zillion times more compute than exists in the entire world, but you could. So in some sense what you’re doing, when you’re figuring out what you’re doing, is coming up with ways to do already-possible things more efficiently. But that’s just algorithmic progress, which Ajeya has already baked into her model. (our Victorian scientist: “As a reductio ad absurdum, you could always stand the ship on its end, and then climb up it to reach space. We’re just trying to make ships that are more efficient than that.”) Part II: Biology-Inspired AI Timelines: The Trick That Never Works Eliezer Yudkowsky presents a more subtle version of these kinds of objection in an essay called Biology-Inspired AI Timelines: The Trick That Never Works, published December 2021. Ajeya’s report is a 169-page collection of equations, graphs, and modeling assumptions. Yudkowsky’s rebuttal is a fictional dialogue between himself, younger versions of himself, famous AI scientists, and other bit players. At one point, a character called “Humbali” shows up begging Yudkowsky to be more humble, and Yudkowsky defeats him with devastating counterarguments. Still, he did found the field, so I guess everyone has to listen to him. He starts: in 1988, famous AI scientist Hans Moravec predicted human-level AI by 2010. He was using the same methodology as Ajeya: extrapolate how quickly processing power would grow (in FLOP/S), and see when it would match some estimate of the human brain. Moravec got the processing power almost exactly right (it hit his 2010 projection in 2008) and his human brain estimate pretty close (he says 10^13 FLOP/S, Ajeya says 10^15, this 2 OOM difference only delays things a few years), yet there was not human-level AI in 2010. What happened? Ajeya's answer could be: Moravec didn't realize that, in the modern ML paradigm, any given size of program requires a much bigger program to train. Ajeya, who has a 35-year advantage on Moravec, estimates approximately the same power for the finished program (10^16 vs. 10^13 FLOP/S) but says that training the 10^16 FLOP/S program will require 10^33ish FLOPs. Eliezer agrees as far as it goes, but says this points to a much deeper failure mode, which was that Moravec had no idea what he was doing. He was assuming processing power of human brain = processing power of computer necessary for AGI. Why? The human brain consumes around 20 watts of power. Can we thereby conclude that an AGI should consume around 20 watts of power, and that, when technology advances to the point of being able to supply around 20 watts of power to computers, we'll get AGI? […] You say that AIs consume energy in a very different way from brains? Well, they'll also consume computations in a very different way from brains! The only difference between these two cases is that you know something about how humans eat food and break it down in their stomachs and convert it into ATP that gets consumed by neurons to pump ions back out of dendrites and axons, while computer chips consume electricity whose flow gets interrupted by transistors to transmit information. Since you know anything whatsoever about how AGIs and humans consume energy, you can see that the consumption is so vastly different as to obviate all comparisons entirely. You are ignorant of how the brain consumes computation, you are ignorant of how the first AGIs built would consume computation, but "an unknown key does not open an unknown lock" and these two ignorant distributions should not assert much internal correlation between them. Cars don’t move by contracting their leg muscles and planes don’t fly by flapping their wings like birds. Telescopes do form images the same way as the lenses in our eyes, but differ by so many orders of magnitude in every important way that they defy comparison. Why should AI be different? You have to use some specific algorithm when you’re creating AI; why should we expect it to be anywhere near the same efficiency as the ones Nature uses in our brains? The same is true for arguments from evolution, eg Ajeya’s Evolutionary Anchor, ie “it took evolution 10^43 FLOPs of computation to evolve the human brain so maybe that will be the training cost”. AI scientists sitting in labs trying to figure things out, and nematodes getting eaten by other nematodes, are such different methods for designing things that it’s crazy to use one as an estimate for the other. Algorithmic Progress vs. Algorithmic Paradigm Shifts This post is a dialogue, so (Eliezer’s hypothetical model of) OpenPhil gets a chance to respond. They object: this is why we put a term for algorithmic progress in our model. The model isn’t very sensitive to changes in that term. If you want you can set it to some kind of crazy high value and see what happens, but you can’t say we didn’t consider it. OpenPhil: We did already consider that and try to take it into account: our model already includes a parameter for how algorithmic progress reduces hardware requirements. It's not easy to graph as exactly as Moore's Law, as you say, but our best-guess estimate is that compute costs halve every 2-3 years […] Eliezer: The makers of AGI aren't going to be doing 10,000,000,000,000 rounds of gradient descent, on entire brain-sized 300,000,000,000,000-parameter models, algorithmically faster than today. They're going to get to AGI via some route that you don't know how to take, at least if it happens in 2040. If it happens in 2025, it may be via a route that some modern researchers do know how to take, but in this case, of course, your model was also wrong. They're not going to be taking your default-imagined approach algorithmically faster, they're going to be taking an algorithmically different approach that eats computing power in a different way than you imagine it being consumed. OpenPhil: Shouldn't that just be folded into our estimate of how the computation required to accomplish a fixed task decreases by half every 2-3 years due to better algorithms? Eliezer: Backtesting this viewpoint on the previous history of computer science, it seems to me to assert that it should be possible to: Train a pre-Transformer RNN/CNN-based model, not using any other techniques invented after 2017, to GPT-2 levels of performance, using only around 2x as much compute as GPT-2;
Inline links: AIXI, Biology-Inspired AI Timelines: The Trick That Never Works
Figure 3: This looks great ! The fusion triple product has grown exponentially. It has doubled every 1.8 years, which is even faster than Moore's Law. The best triple product we've gotten is five orders of magnitude better than what we started with in 1970. But wait. This data only goes up to 2000. If we extrapolate the trend line, we would have built a commercial fusion reactor in 2005. The world is not awash in fusion energy, so this trend clearly did not continue. There has been little progress towards a larger triple product since 2000. Why did this trendline stop? Why do I think that this is about to get started again? I will answer these questions, but first, a few words on how we've made progress so far. Plasma Basics Fusion occurs at such high temperatures that everything is ionized: The electrons and nuclei cannot stick together as atoms and instead move independently. Matter in this state is called a ‘plasma' [8]. Plasma is by far the most common state of matter in the universe. Stars are made of plasma, as well as the low density matter in the space between stars. When a fusion plasma comes in contact with anything solid (or liquid or even gas), either the solid will vaporize or the plasma will cool down. Both of these are very bad for achieving controlled fusion on Earth. We can't just put our fusion plasma in a container. How do we bottle the core of the sun? With a magnetic field. The electrons and ions in a fusion plasma are charged. Charged particles spiral around magnetic field lines and will not move freely perpendicular to the magnetic field. This confines the plasma in two dimensions. To confine the plasma in the third dimension, loop the magnetic field around in the shape of a doughnut [9]. The particles can move around the doughnut, but stay confined within it. Figure 4: A charged particle spiraling around a doughnut-shaped magnetic field. This is still not quite enough. Charged particles will drift in a curved magnetic field, which causes them to leak out the outer side of the doughnut. We can solve this problem by making the magnetic field twist, like a French cruller. Particles near the outer edge, drifting outwards, will follow the magnetic field line around to the inner edge, where they will drift back towards the core. The easiest way to make the magnetic field twist is to run a current through the plasma. You don't need to (and can't) run a wire there. Plasmas are full of charged particles that can move. When more of the electrons move in one direction around the doughnut then in the other direction, it will create a current. So a fusion experiment should (1) create an extremely strong magnetic field pointing around the doughnut, (2) heat deuterium and tritium to 100 million degrees inside the doughnut, and (3) drive a current around the doughnut. The magnetic field can be created by superconducting electromagnetic coils which go around and through the doughnut. Turning on the coils provides some initial heating and current, but to sustain it, you need to inject accelerated particles or waves from the side. This kind of fusion experiment is called a tokamak [10]. Figure 5: The coils and magnetic fields of a tokamak. Small, Medium, and Large Experiments I find it helpful to classify fusion experiments by their size. This is not standardized, so different people will classify them differently. The larger the experiment, the farther the particles have to move (perpendicular to the magnetic field) to get from the core to the outer edge. Larger experiments inherently have a longer confinement time. Small fusion experiments are sometimes called ‘tabletop' experiments. This doesn't always mean that they fit on a tabletop, but they can fit in the physics building of a research university without too much disruption. The doughnut has a radius of about 1 m. The support requirements (power supply, control systems, measuring equipment, etc.) aren't too different from other physics labs. Figure 6: The first tokamak, T-1, did fit on a tabletop. Medium fusion experiments have a radius of about 1.5 - 3 m. They require their own facility for all of their support systems, but they typically fit in a single building. One prominent medium experiment is JET [11]. Figure 7: Someone inside JET. They have to wear a protective suit because tritium is nasty stuff. Large fusion experiment means ITER [12], an experiment currently under construction in southern France. ITER has a diameter of over 6 meters. The experiment itself has a five story building. Supporting buildings cover about 100 acres or 0.5 km2. Figure 8: Construction at ITER as of May 2021. ITER We can now answer some of our earlier questions. The reason why progress has stalled is because we did as much as we could do on medium experiments. No country has been willing to provide enough money to build its own large experiment. So the fusion community has been gathering money from all around the world for decades for a single project [13]. ITER is supported by Europe (EU + UK + Switzerland), the US (which withdrew in 1999 and rejoined in 2003 [14]), Russia, Japan, China, South Korea, and India. Figure 9: There are three people in this diagram. Can you find them? ITER is designed to get Q=10. Despite getting 10 times as much energy from fusion as we put into the plasma, ITER is not designed to get engineering breakeven. ITER is designed as an experiment, not as a power plant. There will be tons of measuring devices pointed inwards. There are four different ways to heat the plasma and drive the current. This all allows you to learn more, but it requires extra power and lowers the overall plant efficiency. ITER will be followed by a demonstration power plant, named DEMO [15]. A fully optimized power plant should be able to reach engineering breakeven as long as Q>5. This is why I chose Q=5 as my criterion for ‘getting fusion’. ITER is also testing multiple designs for the tritium breeding blanket. Tritium is expensive and radioactive, so you want to produce it on site. The D-T fusion reaction produces a neutron, which we want to absorb, so we can use it to produce tritium. ‘Breeding' is when we use a neutron to produce a more useful isotope. It is a ‘blanket' because it surrounds the entire plasma, keeping the neutrons from going anywhere else. The best reaction to produce tritium involves lithium-6: 36Li +01n 24He +13T . This reaction also releases energy, which increases the power produced by about 25%. The tritium breeding blanket needs to make this reaction occur as much as possible, to efficiently carry the heat away so it can be used to generate electricity, and to provide a way to extract the tritium produced. ITER is scheduled to begin their first experiments in 2025. Part of why I think that we are about to make rapid progress again is because we are finally getting a large experiment. There have been problems with ITER staying on schedule and under budget. This isn't surprising for a collaboration between governments representing over half the world's population. In 2014, ITER got a new director, recalculated its expected cost, and underwent a major restructuring. Since then, ITER has largely stuck to this schedule and budget. Recently, there has been a 6 month delay because the French nuclear agency did what nuclear regulatory agencies do best, but this has been the longest delay since 2014. It is still possible for ITER to fail. The biggest risk involves disruptions. Sometimes, the plasma in a tokamak becomes unstable and all of the plasma hits the wall at once. This could melt some extremely expensive equipment and take years to repair. If ITER cannot get disruptions under control, then it would be a failed experiment. This is especially challenging because pushing for higher Q makes disruptions more likely. ITER is planning on being extremely cautious: Experiments begin in 2025, but it won't operate at full capacity until 2035. ITER has been the focus of the fusion community now for decades. The Future of Fusion Energy similarly makes ITER the centerpiece of the book. Things. Have. Changed. ITER by itself is not enough to justify the high level of confidence I express at the start. When Parisi & Ball finished writing this book in April 2018, ITER was basically the only game in town. Since then, Things. Have. Changed. Historically, private fusion companies were almost entirely jokes or frauds. They make outlandish claims, use completely different designs so they can't build on the progress of Figure 3, and they can be safely ignored. For example, Lockheed Martin [16] claims that it will take them five years to build a prototype of a fusion power plant that will fit in a truck. They have yet to publish evidence that they have produced a fully ionized plasma. Maybe they're just being secretive, but their design has solid components in the plasma. That won't work. A new generation of private companies have surged into fusion. Leading the charge is Commonwealth Fusion Systems and their tokamak SPARC [17]. Recent advances in high temperature superconductors have been a game changer. They can produce a much stronger magnetic field which allows for better confinement in a smaller experiment. We should now be able to get Q=10 in a medium experiment, which costs ten times less than ITER [18] and is within the reach of private venture capital. Figure 10: Finding the person here is much easier. When the Department of Energy decided to close the third largest plasma experiment in the US, the MIT group which ran it found itself adrift. They founded Commonwealth Fusion Systems in 2018 with a goal of getting fusion within 10 years [19]. Since then, they have built the first ever high temperature superconducting coil in 2019, released their engineering plans for SPARC in 2020, began construction in 2021, and plan on finishing construction in 2025. Commonwealth Fusion had just been founded when Parisi & Ball wrote in 2018. Now they're leading the race to fusion. Several other startups are following SPARC's strategy of using stronger magnetic fields to get fusion in a smaller experiment. They use a variety of designs. Alternative Designs To understand how the alternative designs are different, we need to make sure we understand the basic strategy for getting fusion in a tokamak. Let's run through it again: (A) We want to get lots of fusion reactions … … so we want a large triple product (density * temperature * confinement time). (B) The fusion plasma is too hot to touch solid objects … … so we put it in a magnetic bottle shaped like a doughnut. (C) The particles drift outwards, leaving the bottle … … so we twist the magnetic field with a current in the plasma. I will start with the alternatives that are most similar to a tokamak. For each one, I will list the best experiments that currently exist, where they're located, and the year they began operation. Tokamaks have been better researched than any other strategy. There are currently 10 medium tokamaks: T-10 (Russia, 1975)
Inline links: https://substackcdn.com/image/fetch/$s_!2BEX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbf3cd22-d0d7-4085-8c82-53b6a0044077_1537x921.jpeg, https://substackcdn.com/image/fetch/$s_!DPSJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2818f967-a7b1-4c43-abe5-0fb9d31652d0_600x414.png, https://substackcdn.com/image/fetch/$s_!kowm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbc65bfe-e234-463b-beab-b8253808cfba_900x658.jpeg, https://substackcdn.com/image/fetch/$s_!-mJo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fc8133323-617f-4c9a-92c2-b0859184ef77_850x623.jpeg, https://substackcdn.com/image/fetch/$s_!DEp_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0c8e44c-275b-4056-93f7-9ed10fb79bd8_1600x987.jpeg, https://substackcdn.com/image/fetch/$s_!RLSc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4d89bbdd-da25-4297-afd6-871f116f2355_1600x783.jpeg, https://substackcdn.com/image/fetch/$s_!n_7L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F80bdc472-0b30-4841-8b08-9527c70f16f4_820x369.jpeg