Paul

Article

Paul is a recurring person in the Astral Codex Ten archive, appearing 11 times across 11 issues between February 23, 2022 and February 12, 2026. The archive places it in contexts such as “‘chess ones Paul recently commissioned’”; “Paul’s table of how effectively Nature tends to outperform humans”; “Paul doesn’t give specifics”. It most often appears alongside Eliezer Yudkowsky, ACX, AGI.

Metadata

  • Category: People
  • Mention count: 11
  • Issue count: 11
  • First seen: February 23, 2022
  • Last seen: February 12, 2026

Appears In

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

February 23, 2022 · Original source
Ajeya Cotra is a senior research analyst at OpenPhil. She's assisted by her fiancee Paul Christiano (compsci PhD, OpenAI veteran, runs an AI alignment nonprofit) and to a lesser degree by other leading lights. Although not everyone involved has formal ML training, if you care a lot about whether efforts are “establishment” or “contrarian”, this one is probably more establishment.
Source: This document by Paul Christiano. Ajeya combines this with another metric where they see how existing AI compares to animals with apparently similar computational capacity; for example, she says that DeepMind’s Starcraft engine has about as much inferential compute as a honeybee and seems about equally subjectively impressive. I have no idea what this means. Impressive at what? Winning multiplayer online games? Stinging people? In any case, they decide to penalize AI by one order of magnitude compared to Nature, so a human-level AI would need to do 10^16 floating point operations per second. How Much Compute Would It Take To Train A Model That Does 10^16 Floating Point Operations Per Second? So an AI could potentially equal the human brain with 10^16 FLOP/S. Good news! There’s a supercomputer in Japan that can do 10^17 FLOP/S! It looks like this (source) So why don’t we have AI yet? Why don’t we have ten AIs? In the modern paradigm of machine learning, it takes very big computers to train relatively small end-product AIs. If you tried to train GPT-3 on the same kind of medium-sized computers you run it on, it would take between tens and hundreds of years. Instead, you train GPT-3 on giant supercomputers like the ones above, get results in a few months, then run it on medium-sized computers, maybe ~10x better than the average desktop. But our hypothetical future human-level AI is 10^16 FLOP/S in inference mode. It needs to run on a giant supercomputer like the one in the picture. Nothing we have now could even begin to train it. There’s no direct and obvious way to convert inference requirements to training requirements. Ajeya tries assuming that each parameter will contribute about 10 FLOPs, which would mean the model would have about 10^15 parameters (GPT-3 has about 10^11 parameters). Finally, she uses some empirical scaling laws derived from looking at past machine learning projects to estimate that training 10^15 parameters would require H*10^30 FLOPs, where H represents the model’s “horizon”. If I understand this correctly, “horizon” is a reinforcement learning concept: how long does it take to learn how much reward you got for something? If you’re playing a slot machine, the answer is one second. If you’re starting a company, the answer might be ten years. So what horizon do you need for human level AI? Who knows? It probably depends on what human-level task you want the AI to do, plus how well an AI can learn to do that task from things less complex than the entire task. If writing a good book is mostly about learning to write good sentence and then stringing them together, a book-writing AI can get away with a short horizon. If nothing short of writing an entire book and then evaluating it to see whether it is good or bad can possibly teach you book-writing, the AI will need a long time horizon. Ajeya doesn’t claim to have a great answer for this, and considers three models: horizons of a few minutes, a few hours, and a few years. Each step up adds another three orders of magnitude, so she ends up with three estimates of 10^30, 10^33, and 10^36 FLOPs. (for reference, the lowest training estimate - 10^30 - would take the supercomputer pictured above 300,000 years to complete; the highest, 300 billion.) Or What If We Ignore All Of That And Do Something Else? This is piling a lot of assumptions atop each other, so Ajeya tries three other methods of figuring out how hard this training task is. Humans seem to be human-level AIs. How much training do we need? You can analogize our childhood to an AI’s training period. We receive a stream of sense-data. We start out flailing kind of randomly. Some of what we do gets rewarded. Some of what we do gets punished. Eventually our behavior becomes more sophisticated. We subject our new behavior to reward or punishment, fine-tune it further. Rent asks us: how do you measure the life of a woman or man? It answers: “in daylights, in sunsets, in midnights, in cups of coffee; in inches, in miles, in laughter, in strife.” But you can also measure in floating point operations, in which case the answer is about 10^24. This is actually trivial: multiply the 10^15 FLOP/S of the human brain by the ~10^9 seconds of childhood and adolescence. This new estimate of 10^24 is much lower than our neural net estimate of 10^30 - 10^36 above. In fact, it’s only a hair above the amount it took to train GPT-3! If human-level AI was this easy, we should have hit it by accident sometime in the process of making a GPT-4 prototype. Since OpenAI hasn’t mentioned this, probably it’s harder than this and we’re missing something. Probably we’re missing that humans aren’t blank slates. We don’t start at zero and then only use our childhood to train us further. The very structure of our brain encodes certain assumptions about what kinds of data we should be looking out for and how we should use it. Our training data isn’t just what we observed during childhood, it’s everything that any of our ancestors observed during evolution. How many floating-point operations is the evolutionary process? Ajeya estimates 10^41. I can’t believe I’m writing this. I can’t believe someone actually estimated the number of floating point operations involved in jellyfish rising out of the primordial ooze and eventually becoming fish and lizards and mammals and so on all the way to the Ascent of Man. Still, the idea is simple. You estimate how long animals with neurons have been around for (10^16 seconds), total number of animals at any given second (10^20) times average number of FLOPS per animal (10^5) and you can read more here but it comes out to 10^41 FLOs. I would not call this an exact estimate - for one thing, it assumes that all animals are nematodes, on the grounds that non-nematode animals are basically a rounding error in the grand scheme of things. But it does justify this bizarre assumption, and I don’t feel inclined to split hairs here - surely the total amount of computation performed by evolution is irrelevant except as an extreme upper bound? Surely the part where Australia got all those weird marsupials wasn’t strictly necessary for the human brain to have human-level intelligence? One more weird human training data estimate attempt: what about the genome? If in some sense a bit of information in the genome is a “parameter”, how many parameters does that suggest humans have, and how does it affect training time? Ajeya calculates that the genome has about 7.5x10^8 parameters (compared to 10^15 parameters in our neural net calculation, and 10^11 for GPT-3). So we can… Okay, I’ve got to admit, this doesn’t have quite the same “huh?!” factor as trying to calculate the number of FLOs in evolution, but it is in a lot of ways even crazier. The Japanese canopy plant has a genome fifty times larger than ours, which suggests that genome size doesn’t correspond very well to organism awesomeness. Also, most of the genome is coding for weird proteins that stabilize the shape of your kidney tubule or something, why should this matter for intelligence? The Japanese canopy plant. I think it is very pretty, but probably low prettiness per megabyte of DNA. I think Ajeya would answer that she’s debating orders of magnitude here, and each of these weird things costs only a few OOMs and probably they all even out. That still leaves the question of why she thinks this approach is interesting at all, to which she answers that: The motivating intuition is that evolution performed a search over a space of small, compact genomes which coded for large brains rather than directly searching over the much larger space of all possible large brains, and human researchers may be able to compete with evolution on this axis. So maybe instead of having to figure out how to generate a brain per se, you figure out how to generate some short(er) program that can output a brain? But this would be very different from how ML works now. Also, you need to give each short program the chance to unfold into a brain before you can evaluate it, which evolution has time for but we probably don’t. Ajeya sort of mentions these problems and counters with an argument that maybe you could think of the genome as a reinforcement learner with a long horizon. I don’t quite follow this but it sounds like the sort of thing that almost might make sense. Anyway, when you apply the scaling laws to a 7.5*10^8 parameter genome and penalize it for a long horizon, you get about 10^33 FLOPs, which is weirdly similar to some of the other estimates. So now we have six different training cost estimates. First, neural nets with short, medium, and long horizons, which are 10^30, 10^33, and 10^36 FLOPs, respectively. Next, the amount of training data in a human lifetime - 10^24 FLOs - and in all of evolutionary history - 10^41 FLOPs. And finally, this weird genome thing, which is 10^33 FLOPs. An optimist might say “Well, our lowest estimate is 10^24 FLOPs, our highest is 10^41 FLOPs, those sound like kind of similar numbers, at least there’s no “5 FLOPs” or “10^9999 FLOPs” in there. A pessimist might say “The difference between 10^24 and 10^41 is seventeen orders of magnitude, ie a factor of 100,000,000,000,000,000 times. This barely constrains our expectations at all!” Before we decide who to trust, let’s remember that we’re still only at Step 2 of our eight step Methodology, and continue. How Do We Adjust For Algorithmic Progress? So today, in 2022 (or in 2020 when this was written, or whenever), assume it would take about 10^33 FLOs to train a human-level AI. But technology constantly advances. Maybe we’ll discover ways to train AIs faster, or run AIs more efficiently, or something like that. How does that factor into our estimate? Ajeya draws on Hernandez & Brown’s Measuring The Algorithmic Efficiency Of Neural Networks. They look at how many FLOPs it took to train various image recognition AIs to an equivalent level of performance between 2012 and 2019, and find that over those seven years it decreased by a factor of 44x, ie training efficiency doubles every sixteen months! Ajeya assumes a doubling time slightly longer than that, because it’s easier to make progress in simple well-understood fields like image recognition than in the novel task of human-level AI. She chooses a doubling time of “merely” 2 - 3 years. If training efficiency doubles every 2-3 years, it would dectuple in about 10 years. So although it might take 10^33 FLOPs to train a human level AI today, in ten years or so it may take only 10^32, in twenty years 10^31, and so on. When Will Anyone Have Enough Computational Resources To Train A Human-Level AI? In 2020, AI researchers could buy computational resources at about $1 for 10^17 FLOPs. That means the 10^33 FLOPs you’d need to train a human-level AI would cost $10^16, ie ten quadrillion dollars. This is about twenty times more money than exists in the entire world. But compute costs fall quickly. Some formulations of Moore’s Law suggest it halves every eighteen months. These no longer seem to hold exactly, but it does seem to be halving maybe once every 2.5 years. The exact number is kind of controversial: Ajeya admits it’s been more like once every 3-4 years lately, but she heard good things about some upcoming chips and predicted it might revert back to the longer-term faster trend (it’s been two years now, some new chips have come out, and this prediction is looking pretty good). So as time goes on, algorithmic progress will cut the cost of training (in FLOPs), and hardware progress will also cut the cost of FLOPs (in dollars). So training will become gradually more affordable as time goes on. Once it reaches a cost somebody is willing to pay, they’ll buy human-level AI, and then that will be the year human-level AI happens. What is the cost that somebody (company? government? billionaire?) is willing to pay for human-level AI? The most expensive AI training in history was AlphaStar, a DeepMind project that spent over $1 million to train an AI to play StarCraft (in their defense, it won). But people have been pouring more and more money into AI lately: Source here. This is about compute rather than cost, but most of the increase seen here has been companies willing to pay for more compute over time, rather than algorithmic or hardware progress. The StarCraft AI was kind of a vanity project, or science for science’s sake, or whatever you want to call it. But AI is starting to become profitable, and human-level AI would be very profitable. Who knows how much companies will be willing to pay in the future? Ajeya extrapolates the line on the graph forward to 2025 and gets $1 billion. This is starting to sound kind of absurd - the entire company OpenAI was founded with $1 billion in venture capital, it seems like a lot to expect them to spend more than $1 billion on a single training run. So Ajeya backs off from this after 2025 and predicts a “two year doubling time”. This is not much of a concession. It still means that in 2040 someone might be spending $100 billion to train one AI. Is this at all plausible? At the height of the Manhattan Project, the US was investing about 0.5% of its GDP into the effort; a similar investment today would be worth $100 billion. And we’re about twice as rich as 2000, so 2040 might be twice as rich as we are. At that point, $100 billion for training an AI is within reach of Google and maybe a few individual billionaires (though it would still require most or all of their fortune). Ajeya creates a complicated function to assess how much money people will be willing to pay on giant AI projects per year. This looks like an upward-sloping curve. The line representing the likely cost of training a human-level AI looks like a downward sloping curve. At some point, those two curves meet, representing when human-level AI will first be trained. So When Will We Get Human-Level AI? The report gives a long distribution of dates based on weights assigned to the six different models, each of which has really wide confidence intervals and options for adjusting the mean and variance based on your assumptions. But the median of all of that is 10% chance by 2031, 50% chance by 2052, and almost 80% chance by 2100. Ajeya takes her six models and decides to weigh them like so, based on how plausible she thinks each one is: 20% neural net, short horizon 30% neural net, medium horizon 15% neural net, long horizon 5% human lifetime as training data 10% evolutionary history as training data 10% genome as parameter number She ends up with this: How Sensitive Is This To Changes In Assumptions? She very helpfully gives us a Colab notebook and Google spreadsheet to play around with. The notebook lets you change some of the more detailed parameters of the individual models, and the spreadsheet lets you change the big picture. I leave the notebook to people more dedicated to forecasting than I am, and will talk about the spreadsheet here. If you’re following along at home, the default spreadsheet won’t reflect Ajeya’s findings until you fill in the table in the bottom left like so: Great. Now that we’ve got that, let’s try changing some stuff. I like the human childhood training data argument (Lifetime Anchor) more than Ajeya does, and I like the size-of-the-genome argument less. I’m going to change the weights to 20-20-0-20-20-20. Also, Ajeya thinks that someone might be willing to spend 1% of national GDP on training AIs, but that sounds really high to me, so I’m going to down to 0.1%. Also, Ajeya’s estimate of 3% GDP growth sounds high for the sort of industrialized nations who might do AI research, I’m going to lower it to 2%. Since I’m feeling mistrustful today, let’s use the Hernandez&Brown estimate for compute halving (1.5 years) in place of Ajeya’s ad hoc adjustments. And let’s use the current compute halving time (3.5 years) instead of Ajeya’s overly rosy version (2.5 years). All these changes… …don’t really do much. The median goes from 2052 to about 2065. Four of the models give results between 2030 and 2070. The last two, Neural Net With Long Horizon and Evolution, suggest probably no AI this century (although Neural Net With Long Horizon does think there’s a 40% chance by 2100). Ajeya doesn’t really like either of these models and they’re not heavily weighted in her main result. Does The Truth Point To Itself? Back up a second. Here’s something that makes me kind of nervous. Most of Ajeya’s numbers are kind of made up, with several order-of-magnitude error bars and simplifying assumptions like “all animals are nematodes”. For a single parameter, we get estimates spanning seventeen different orders of magnitude: the upper bound is one hundred quadrillion times the lower bound. And yet four of the six models, including two genuinely exotic ones, manage to get dates within twenty years of 2050. And 2050 is also the date everyone else focuses on. Here’s the prediction-market-like site Metaculus: Their distribution looks a lot like Ajeya’s, and even has the same median, 2052 (though forecasters could have read Ajeya’s report). Katja Grace et al surveyed 352 AI experts, and they gave a median estimate of 2062 for an AI that could “outperform humans at all tasks” (though with many caveats and high sensitivity to question framing). This was before Ajeya’s report, so they definitely didn’t read it. So lots of Ajeya’s different methods and lots of other people presumably using different methodologies or no methodology at all, all converge on this same idea of 2050 give or take a decade or two. An optimist might say “The truth points to itself! There are 371 known proofs of the Pythagorean Theorem, and they all end up in the same place. That’s because no matter what methodology you use, if you use it well enough you get to the correct answer.” A pessimist might be more suspicious; we’ll return to this part later. FLOPS Alone Turn The Wheel Of History One more question: what if this is all bullshit? What if it’s an utterly useless total garbage steaming pile of grade A crap? Imagine a scientist in Victorian Britain, speculating on when humankind might invent ships that travel through space. He finds a natural anchor: the moon travels through space! He can observe things about the moon: for example, it is 220 miles in diameter (give or take an order of magnitude). So when humankind invents ships that are 220 miles in diameter, they can travel through space! Ships have certainly grown in size tremendously, from primitive kayaks to Roman triremes to Spanish galleons to the great ocean liners of the (Victorian) present. The AI forecasting organization AI Impacts actually has a whole report on historical ship size trends to prove an unrelated point about technological progress, so I didn’t even have to make this graph up. Suppose our Victorian scientist lived in 1858, right when the Great Eastern was launched. The trend line for ship size crossed 100m around 1843, and 200m in 1858, so doubling time is 15 years - but perhaps they notice this is going to be an outlier, so let’s round up a bit and say 18 years. The (one order of magnitude off estimate for the size of the) Moon is 350,000m, so you’d need ships to scale up by 350,000/200 = 1,750x before they’re as big as the Moon. That’s about 10.8 doublings, and a doubling time is 18 years, so we’ll get spaceships in . . . 2052 exactly. (fudging numbers to land where you want is actually fun and easy) SS Great Eastern, the extreme outlier large steamship from 1858. This has become sort of a mascot for quantitative technological progress forecasters. What is this scientist’s error? The big one is thinking that spaceship progress depends on some easily-measured quantity (size) instead of on fundamental advances (eg figuring out how rockets work). You can make the same accusation against Ajeya et al: you can have all the FLOPs in the world, but if you don’t understand how to make a machine think, your AI will be, well, a flop. Ajeya discusses this a bit on page 143 of her report. There is some sense in which FLOPs and knowing-what-you’re-doing trade of against each other. If you have literally no idea what you’re doing, you can sort of kind of re-run evolution until it comes up with something that looks good. If things are somehow even worse than that, you could always run AIXI, a hypothetical AI design guaranteed to get excellent results as long as you have infinite computation. You could run a Go engine by searching the entire branching tree structure of Go - you shouldn’t, and it would take a zillion times more compute than exists in the entire world, but you could. So in some sense what you’re doing, when you’re figuring out what you’re doing, is coming up with ways to do already-possible things more efficiently. But that’s just algorithmic progress, which Ajeya has already baked into her model. (our Victorian scientist: “As a reductio ad absurdum, you could always stand the ship on its end, and then climb up it to reach space. We’re just trying to make ships that are more efficient than that.”) Part II: Biology-Inspired AI Timelines: The Trick That Never Works Eliezer Yudkowsky presents a more subtle version of these kinds of objection in an essay called Biology-Inspired AI Timelines: The Trick That Never Works, published December 2021. Ajeya’s report is a 169-page collection of equations, graphs, and modeling assumptions. Yudkowsky’s rebuttal is a fictional dialogue between himself, younger versions of himself, famous AI scientists, and other bit players. At one point, a character called “Humbali” shows up begging Yudkowsky to be more humble, and Yudkowsky defeats him with devastating counterarguments. Still, he did found the field, so I guess everyone has to listen to him. He starts: in 1988, famous AI scientist Hans Moravec predicted human-level AI by 2010. He was using the same methodology as Ajeya: extrapolate how quickly processing power would grow (in FLOP/S), and see when it would match some estimate of the human brain. Moravec got the processing power almost exactly right (it hit his 2010 projection in 2008) and his human brain estimate pretty close (he says 10^13 FLOP/S, Ajeya says 10^15, this 2 OOM difference only delays things a few years), yet there was not human-level AI in 2010. What happened? Ajeya's answer could be: Moravec didn't realize that, in the modern ML paradigm, any given size of program requires a much bigger program to train. Ajeya, who has a 35-year advantage on Moravec, estimates approximately the same power for the finished program (10^16 vs. 10^13 FLOP/S) but says that training the 10^16 FLOP/S program will require 10^33ish FLOPs. Eliezer agrees as far as it goes, but says this points to a much deeper failure mode, which was that Moravec had no idea what he was doing. He was assuming processing power of human brain = processing power of computer necessary for AGI. Why? The human brain consumes around 20 watts of power. Can we thereby conclude that an AGI should consume around 20 watts of power, and that, when technology advances to the point of being able to supply around 20 watts of power to computers, we'll get AGI? […] You say that AIs consume energy in a very different way from brains? Well, they'll also consume computations in a very different way from brains! The only difference between these two cases is that you know something about how humans eat food and break it down in their stomachs and convert it into ATP that gets consumed by neurons to pump ions back out of dendrites and axons, while computer chips consume electricity whose flow gets interrupted by transistors to transmit information. Since you know anything whatsoever about how AGIs and humans consume energy, you can see that the consumption is so vastly different as to obviate all comparisons entirely. You are ignorant of how the brain consumes computation, you are ignorant of how the first AGIs built would consume computation, but "an unknown key does not open an unknown lock" and these two ignorant distributions should not assert much internal correlation between them. Cars don’t move by contracting their leg muscles and planes don’t fly by flapping their wings like birds. Telescopes do form images the same way as the lenses in our eyes, but differ by so many orders of magnitude in every important way that they defy comparison. Why should AI be different? You have to use some specific algorithm when you’re creating AI; why should we expect it to be anywhere near the same efficiency as the ones Nature uses in our brains? The same is true for arguments from evolution, eg Ajeya’s Evolutionary Anchor, ie “it took evolution 10^43 FLOPs of computation to evolve the human brain so maybe that will be the training cost”. AI scientists sitting in labs trying to figure things out, and nematodes getting eaten by other nematodes, are such different methods for designing things that it’s crazy to use one as an estimate for the other. Algorithmic Progress vs. Algorithmic Paradigm Shifts This post is a dialogue, so (Eliezer’s hypothetical model of) OpenPhil gets a chance to respond. They object: this is why we put a term for algorithmic progress in our model. The model isn’t very sensitive to changes in that term. If you want you can set it to some kind of crazy high value and see what happens, but you can’t say we didn’t consider it. OpenPhil: We did already consider that and try to take it into account: our model already includes a parameter for how algorithmic progress reduces hardware requirements. It's not easy to graph as exactly as Moore's Law, as you say, but our best-guess estimate is that compute costs halve every 2-3 years […] Eliezer: The makers of AGI aren't going to be doing 10,000,000,000,000 rounds of gradient descent, on entire brain-sized 300,000,000,000,000-parameter models, algorithmically faster than today. They're going to get to AGI via some route that you don't know how to take, at least if it happens in 2040. If it happens in 2025, it may be via a route that some modern researchers do know how to take, but in this case, of course, your model was also wrong. They're not going to be taking your default-imagined approach algorithmically faster, they're going to be taking an algorithmically different approach that eats computing power in a different way than you imagine it being consumed. OpenPhil: Shouldn't that just be folded into our estimate of how the computation required to accomplish a fixed task decreases by half every 2-3 years due to better algorithms? Eliezer: Backtesting this viewpoint on the previous history of computer science, it seems to me to assert that it should be possible to: Train a pre-Transformer RNN/CNN-based model, not using any other techniques invented after 2017, to GPT-2 levels of performance, using only around 2x as much compute as GPT-2;
Play pro-level Go using 8-16 times as much computing power as AlphaGo, but only 2006 levels of technology. For reference, recall that in 2006, Hinton and Salakhutdinov were just starting to publish that, by training multiple layers of Restricted Boltzmann machines and then unrolling them into a "deep" neural network, you could get an initialization for the network weights that would avoid the problem of vanishing and exploding gradients and activations. At least so long as you didn't try to stack too many layers, like a dozen layers or something ridiculous like that. This being the point that kicked off the entire deep-learning revolution. Your model apparently suggests that we have gotten around 50 times more efficient at turning computation into intelligence since that time; so, we should be able to replicate any modern feat of deep learning performed in 2021, using techniques from before deep learning and around fifty times as much computing power. OpenPhil: No, that's totally not what our viewpoint says when you backfit it to past reality. Our model does a great job of retrodicting past reality. Eliezer: How so? OpenPhil: <Eliezer cannot predict what they will say here.> I think the argument here is that OpenPhil is accounting for normal scientific progress in algorithms, but not for paradigm shifts. Directional Error These are the two arguments Eliezer makes against OpenPhil that I find most persuasive. First, that you shouldn’t be using biological anchors at all. Second, that unpredictable paradigm shifts are more realistic than gradual algorithmic progress. These mostly add uncertainty to OpenPhil’s model, but Eliezer ends his essay making a stronger argument: he thinks OpenPhil is directionally wrong, and AI will come earlier than they think. Mostly this is the paradigm argument again. Five years from now, there could be a paradigm shift that makes AI much easier to build. It’s happened before; from GOFAI’s pre-programmed logical rules to Deep Blue’s tree searches to the sorts of Big Data methods that won the Netflix Prize to modern deep learning. Instead of just extrapolating deep learning scaling thirty years out, OpenPhil should be worried about the next big idea. Hypothetical OpenPhil retorts that this is a double-edged sword. Maybe the deep learning paradigm can’t produce AGI, and we’ll have to wait decades or centuries for someone to have the right insight. Or maybe the new paradigm you need for AGI will take more compute than deep learning, in the same way deep learning takes more compute than whatever Moravec was imagining. This is a pretty strong response, since it would have been true for every previous forecaster: remember, Moravec erred in thinking AI would come too soon, not too late. So although Eliezer is taking the cheap shot of saying OpenPhil’s estimate will be wrong just as everyone else’s was wrong before, he’s also giving himself the much harder case of arguing it might be wrong in the opposite direction as all its predecessors. Eliezer takes this objection seriously, but feels like on balance probably new paradigms will speed up AI rather than slow it down. Here he grudgingly and with suitable embarrassment does try to make an object-level semi-biological-anchors-related argument: Moravec was wrong because he ignored the training phase. And the proper anchor for the training phase is somewhere between evolution and a human childhood, where evolution represents “blind chance eventually finding good things” and human childhood represents “an intelligent cognitive engine trying to squeeze as much data out of experience as possible”. And part of what he expects paradigm shifts to do is to move from more evolutionary processes to more childhood-like processes, and that’s a net gain in efficiency. So he still thinks OpenPhil’s methods are more likely to overestimate the amount of time until AGI rather than underestimate it. What Moore’s Law Giveth, Platt’s Law Taketh Away Eliezer’s other argument is kind of a low blow: he refers to Platt’s Law Of AI Forecasting: “any AI forecast will put strong AI thirty years out from when the forecast is made.” This isn’t exact. Hans Moravec, writing in 1988, said 2010 - so 22 years. Ray Kurzweil, writing in 2001, said 2023 - another 22 years. Vernor Vinge, in a 1993 speech, said 2023, and that was exactly 30 years, but Vinge knew about Platt’s Law and might have been joking. The point is: OpenPhil wrote a report in 2020 that predicted strong AI in 2052, isn’t that kind of suspicious? I’d previously mentioned it as a plus that Ajeya got around the same year everyone else got. The forecasters on Metaculus. The experts surveyed in Grace et al. Lots of other smart experts with clever models. But what if all of these experts and models and analyses are just fudging the numbers for the same Platt’s-Law-related reasons? Hypothetical OpenPhil is BTFO: OpenPhil: That part about Charles Platt's generalization is interesting, but just because we unwittingly chose literally exactly the median that Platt predicted people would always choose in consistent error, that doesn't justify dismissing our work, right? We could have used a completely valid method of estimation which would have pointed to 2050 no matter which year it was tried in, and, by sheer coincidence, have first written that up in 2020. In fact, we try to show in the report that the same methodology, evaluated in earlier years, would also have pointed to around 2050 - Eliezer: Look, people keep trying this. It's never worked. It's never going to work. 2 years before the end of the world, there'll be another published biologically inspired estimate showing that AGI is 30 years away and it will be exactly as informative then as it is now. I'd love to know the timelines too, but you're not going to get the answer you want until right before the end of the world, and maybe not even then unless you're paying very close attention. Timing this stuff is just plain hard. Part III: Responses And Commentary Response 1: Less Wrong Comments Less Wrong is a site founded by Eliezer Yudkowsky for Eliezer Yudkowsky fans who wanted to discuss Eliezer Yudkowsky’s ideas. So, for whatever it’s worth - the comments on his essay were pretty negative. Carl Shulman, an independent researcher with links to both OpenPhil and MIRI (Eliezer’s org), writes the top-voted comment. He works from a model where there is hardware progress, software progress downstream of hardware progress, and independent (ie unrelated to algorithms) software progress, and where the first two make up most progress on the margin. Researchers generally develop new paradigms once they have enough compute available to tinker with them. Progress in AI has largely been a function of increasing compute, human software research efforts, and serial time/steps. Throwing more compute at researchers has improved performance both directly and indirectly (e.g. by enabling more experiments, refining evaluation functions in chess, training neural networks, or making algorithms that work best with large compute more attractive). Historically compute has grown by many orders of magnitude, while human labor applied to AI and supporting software by only a few. And on plausible decompositions of progress (allowing for adjustment of software to current hardware and vice versa), hardware growth accounts for more of the progress over time than human labor input growth. So if you're going to use an AI production function for tech forecasting based on inputs (which do relatively OK by the standards tech forecasting), it's best to use all of compute, labor, and time, but it makes sense for compute to have pride of place and take in more modeling effort and attention, since it's the biggest source of change (particularly when including software gains downstream of hardware technology and expenditures). […] A perfectly correlated time series of compute and labor would not let us say which had the larger marginal contribution, but we have resources to get at that, which I was referring to with 'plausible decompositions.' This includes experiments with old and new software and hardware, like the chess ones Paul recently commissioned, and studies by AI Impacts, OpenAI, and Neil Thompson. There are AI scaling experiments, and observations of the results of shocks like the end of Dennard scaling, the availability of GPGPU computing, and Besiroglu's data on the relative predictive power of computer and labor in individual papers and subfields. In different ways those tend to put hardware as driving more log improvement than software (with both contributing), particularly if we consider software innovations downstream of hardware changes. Vanessa Kosoy makes the obvious objection, which echoes a comment of Eliezer’s in the dialogue above: I'm confused how can this pass some obvious tests. For example, do you claim that alpha-beta pruning can match AlphaGo given some not-crazy advantage in compute? Do you claim that SVMs can do SOTA image classification with not-crazy advantage in compute (or with any amount of compute with the same training data)? Can Eliza-style chatbots compete with GPT3 however we scale them up? Mark Xu answers: My model is something like: For any given algorithm, e.g. SVMs, AlphaGo, alpha-beta pruning, convnets, etc., there is an "effective compute regime" where dumping more compute makes them better. If you go above this regime, you get steep diminishing marginal returns.
April 04, 2022 · Original source
For transhumanists, this debate has a kind of iconic status, like Lincoln-Douglas or the Scopes Trial. But Robin’s ideas seem a bit weird now (they also seemed a bit weird in 2008) - he thinks AIs will start out as uploaded human brains, and even wrote an amazing science-fiction-esque book of predictions about exactly how that would work. Since machine learning has progressed a lot faster than brain uploading has, this is looking less likely and probably makes his position less relevant than in 2008. The gradualist torch has passed to Paul Christiano, who wrote a 2018 post Takeoff Speeds revisiting some of Hanson’s old arguments and adding new ones.
(I didn’t realize this until talking to Paul, but “holder of the gradualist torch” is a relative position - Paul still thinks there’s about a 1/3 chance of a fast takeoff.)
Around the end of last year, Paul and Eliezer had a complicated, protracted, and indirect debate, culminating in a few hours on the same Discord channel. Although the real story is scattered over several blog posts and chat logs, I’m going to summarize it as if it all happened at once.
August 13, 2022 · Original source
The wolves belong to Leto Atreides II, the grandson of Duke Leto Atreides and son of Paul Muad’ib Atreides, the Kwisatz Haderach and protagonist of Dune I: The One You’ve Probably Read. At the end of the third book, Leto fused his body with Arakeen sandtrout, the larval form of the Sandworms on which the plot of the series mostly hangs. This symbiosis gave Leto super-human physical powers to match the clairvoyance already enjoyed by his family and allowed him to seize control of the galactic empire.
Though Leto spends the better part of 3500 years doing this, time is not the most vital aspect of his plan. The Bene Gesserit spent over 10,000 years on their breeding program, but focused on power alone; they succeeded in the form of Paul, but quickly lost control of their creation. Nor does the difference lie in creativity. The Teilaxu are short-term genetic engineers who exert a great deal of control in their alterations. In doing so they create wonders, but also bend humanity in unnatural and inhuman directions. They are universally hated, a sort of society-sized uncanny valley reviled by all. The Ixians put their faith in technology, speeding towards instead of away from humanity’s demise.
Herbert’s books all have a theme. Dune is about teaching the reader about the untapped power of the desert, while Dune Messiah seeks to show them that power in action. Children of Dune is a warning that success can make you soft - since the harshness of the desert is what brings you power in this universe, bringing it under your control domesticates it and cuts into the very power base you rely on. When Paul Atreides blinks in the face of an overwhelming fate and is usurped by a younger, more vital generation, it’s no surprise.
October 31, 2022 · Original source
Paul T writes:
Thank you, Paul, for this very thoughtful explanation.
January 11, 2023 · Original source
Did anyone in your family (as per your best guess) die of COVID vaccine side effects? I got 917 responses so far. On Kirsch’s original poll, the answers were 3.5% and 7.9%; on my survey, they were 6.8% and 0.9%. I think my higher rate of COVID deaths was because I carelessly changed “household” to “family”, which includes eg extended family. But why did I get so many fewer vaccine deaths? Looking at these people's other responses, they did not show a consistent tendencies to make things up or say outrageous things (except for one who listed their religion as “Satanist”). That having been said, they did have an atypical response pattern; most ACX readers are white male Westerners, but these people were 38% female, 38% nonwhite, and 88% non-American. Highest degree was 12% high school, 25% college grad, and 63% postgrad; IQs were listed as extremely high, just like everyone else who gives their IQs on my survey. Politics were significant for 25% Marxist (otherwise a rarity in my survey), but otherwise typical, and did not lean right-wing. They were slightly, but not overwhelmingly, more likely to distrust the media and dislike strong COVID responses than other survey respondents. Overall I don't feel like I learned too much from examining them. The survey is still open (take it now if you haven’t already!) and I'm hoping to get more data on this later. 5: Comments Pointing Out Very Clear Examples Of Media Lies Several people agreed with the wider point, but tried to find a counterexample - a media lie so explicit that nobody could ever deny it. Some people noted that the term “fake news”, when invented in 2016, was originally applied to a very specific kind of fake article, often from weird Macedonian article mills, that were saying utterly fake stuff in a way that even Infowars didn’t. Robert Stadler: This was what was interesting about the phenomenon of "fake news" during the 2016 election, before that term was successfully hijacked by Donald Trump to mean "news stories I don't like." There was a wave of what looked like news articles, spread largely via Facebook, that were entirely fictitious. The people writing those "articles" were not journalists and were not trying to be journalists. They made up the stories out of a mix of rumor and complete fabrications, either for political purposes or just as click-bait (this has never been entirely clear to me). It's unfortunate that the term "fake news" has been so thoroughly tainted, because the existence of those articles was genuinely noteworthy, and it's now harder to talk about them . . . I don't remember any myself (since it's been 6 years), but here's a study which has some specifics - http://web.stanford.edu/~gentzkow/research/fakenews.pdf After some searching, Benjamin Jest (writes As Fair A Name) was finally able to produce a specific example - Nancy Pelosi Hanged At Gitmo - which does, indeed, claim that leading US Democrat Nancy Pelosi was hanged at Guantanamo Bay for “treason and conspiracy” on December 27, 2022. It seems to suggest that the order was given by Donald Trump, who is still President, and that Hillary Clinton had already been executed in the same manner in April 2021. I will admit this is definitely an example of a “news source” making things up rather than just stretching the truth. The source, RealRawNews, claims on its About Page to be a “parody site”, but this outside article about them says they go back and forth between claiming to be a parody and claiming to be real. Some of their claims are more plausible than the Gitmo one - for example, that many Air Force pilots were resigning because of the COVID vaccine mandate - but equally false. They seem to go back and forth between “things that some conservatives might believe to be true” and “things that are obviously false but maybe gratify conservatives’ id”, adding or subtracting the “parody” label based on which one they’re doing at the time. It’s a fascinating business model, and I guess the term “fake news” fairly applies to it. Yug Gnirob writes: I don't know how to find them, but I definitely remember several completely fake articles about Trump during and immediately after the election. One of them was him citing "an ancient law" that prevented President Obama from doing... some liberal thing, I don't remember what. The most memorable one was immediately after the "Muslim Ban", where they claimed it had resulted in the arrest of a high-priority terrorist on day 1. I feel like that one showed up on one of the fact check sites, but I'm not seeing it on Snopes. I remember Stephen Colbert reporting the articles had been tracked down to a couple of Macedonian teens, who had discovered that writing fabricated pro-Trump articles was an easy way to make money. 6: Comments Making Other Claims Of Media Lies And Misdeeds — Beowulf888 on the LA Times and COVID: Well, there are media outlets that propagandize—but I think it boils down to if it bleeds it leads. Most corporate media outlets have the economic incentive to increase the readership by grabbing one's attention with scary headlines and articles. The perfect example of this phenomenon was in April 2020 when the LA Times interviewed an atmospheric chemist at Scripps. She made the claim that SARS2 virus particles in sewage were being carried back to land by sea spray. The reporters and editors uncritically relayed her comments as if she were an expert with the same credentialled expertise as a virologist or epidemiologist. There are numerous reasons why this would be very very low on the threat level even with what little we knew about the SARS2 virus at that time. This story was picked up by the media everywhere, and county health officials (either because there was public pressure to do so, or because they really believed her) shut down beaches up and down the coast of California. Did the LA Times and the news media really have any motivation to promote the closure of public beaches? I can't imagine they did. But they did have a scary headline that would promote readership and spread LA Times as a news source. Some weeks later the LA Times did a retraction, but by that time it had entered the popular imagination that beaches were a potential vector for COVID infection. I’m developing an allergy to the word “uncritically”. Being able to fact-check scientists is a rare skill - I’m not surprised nobody at the LA Times had it ready to deploy for this exact article. — Mike Mulligan writes: The pushback is largely because you are doing a false equivocation between the New York Times (who you hate and have a vendetta against) and Infowars (who you are pretending does basically the same thing as other outlets). And you know this, but on your own metric it won't count as a lie, because you just selectively misrepresented things. On the two articles in this series, I’ve included phrases like “This doesn’t mean these establishment papers are exactly as bad as Infowars; just that when they do err, it’s by committing a more venial version of the same sin Infowars commits” and “Again, my goal here isn’t to . . . say NYT is exactly as bad as Infowars” and tried to explain the exact way that two things can both commit a similar error without one being exactly as the other (Hitler and someone who shot a robber in self-defense both committed a similar action called “killing people”, but this doesn’t mean they both killed exactly the same people with exactly the same level of justification). Still, I got numerous comments getting angry at me for saying that I was calling NYT exactly as bad as Infowars, and saying I was being deceptive / lying because of this. This is why I’m so convinced people are erring on the side of too mistrustful - you can fill your articles with sentences about how you’re not claiming X, and people will still find ways to accuse you of lying because you said X. — Garrett writes: [The way Infowars covered Obama’s birth certificate] isn't any different from eg. mainstream media coverage of anything which involves firearms. They make (or promulgate) so many stupid technical errors I've stopped paying attention to them at all. They could have 1 person on staff who's responsibility is to understand firearms and run everything past them. But they don't. To what should I attribute this continual stream of errors? Is mainstream media coverage of firearms honestly flawed? Is it “reckless disregard for truth?” Is it a “lie of egregious sloppiness?” I think your answer to this question will depend more on how bad you want to accuse the mainstream media of being, relative to other forms of media, than on how you define these inherently slippery terms. — Jeremy Goldberg writes: There's an outright lie right now on the Washington Post homepage. A caption above a graph showing the inflation rate over time states, "Elevated prices coming down, annualized rate shows." The chart shows the current inflation rate is 7.1 percent, down from a high of around 9 percent. Elevated prices are not coming down at all. They just aren't elevating as fast anymore. I asked Jeremy to guess the probability that this was an honest mistake vs. malice. He said (thanks for giving a clear answer!) 60-40 in favor of malice. I think this is pretty high, given that I had to read Jeremy’s comment several times before I realized what the error was supposed to be, but I’ve already said I lean towards the “all the rest of you are extremely paranoid” side of things. — Jiro writes: I opened a thread on dsl: https://www.datasecretslox.com/index.php/topic,8430.0.html People brought up several examples there. You can read the thread. One of the more famous examples was saying that Kyle Rittenhouse crossed state lines with a weapon. There are also a bunch of cases where the media says there's "no evidence" for something that has evidence. Someone also brought up your own example of people "tested for drugs" when they were actually just asked if they used drugs. I would count that as an outright lie, even though you don't. I disagree that being asked if someone used drugs is a "test". Oh god, if saying there’s “no evidence” for something counts as a lie, then every media source in the country stands hopelessly condemned. I did write an article (here) on what the people who use that phrase might be thinking (if you can call it that). I agree the Rittenhouse situation was pretty egregious, though commenters bring up that since he went across state lines and had a weapon, it wasn’t unreasonable for people to assume he brought the weapon across state lines. Still, you wonder whether news sources would have repeated reasonable-sounding-but-didn’t-actually-check slanders about someone they liked. I do think this is a good antidote to some of the “mainstream media is actually very careful and fact-checks everything in their original reporting” takes in the comments section. — David Riceman says: How about Richard Landes's new book "Can the whole world be wrong?" about the many lies in the cognitive war against Israel (e.g. Muhammad Al Dura) See his discussion here for why he thinks this is a good example. — FractalCycle writes: I'm collecting examples from other people, will post ones that seem like real counterexamples as I get them. Here's one from recently: https://forum.effectivealtruism.org/posts/jsByfxvNA4x23stLY/a-letter-to-the-bulletin-of-atomic-scientists Yes, I included this issue with the Bulletin Of Atomic Scientists in my last links post, and they really do come out looking very bad here. See here for more discussion. — Hank Wilbon (writes Partial Magic) writes: I think the false Rolling Stone story a decade ago about the frat gang rape counts as the media explicitly lying, particularly as Rolling Stone is historically known for good fact checking (It is a plot point in the movie Almost Famous), however I think that counts as a "very rare" case and that Scott's claim is correct. I asked “Why? A woman said she had been raped, and Rolling Stone believed her. The woman was making it up, but Rolling Stone wasn't” and Deepa commented “Isn't it the job of a reporter to investigate? And be good at it?” I don’t want to pick on Deepa, but this is what happens when you have an overly expansive definition of “lie”! — TorontoLLB writes: The most straightforward counterexample I can think of is the NBC manipulation of the George Zimmerman 911 call. For example this: "The 9-1-1 operator then asked: "OK, and this guy, is he black, white or Hispanic?", and Zimmerman answered, "He looks black." was changed to: ""This guy looks like he's up to no good. He looks black." In another segment they combined completely separate parts of the call to create an audio clip that presents him as saying ""This guy looks like he's up to no good or he's on drugs or something. He's got his hand in his waistband, and he's a black male." There was other bits of reporting from the major networks that appear to be closer to fraud than selective amplification or choosing what not to report. Enough so that in Twitter threads asking people how they got "red-pilled" person after person refers to the media response to the incident. I haven’t looked into this and I can’t confirm or deny that this is true. I hope everyone finds at least one of these comments obviously fair, and at least another obviously unfair, in a way that encourages you to think more about these issues. 7: Other Comments — Paul writes: What's funny is the Weekly World News - the supermarket tabloid with headlines declaring Bigfoot had been found, and married to a local man's sister!; JFK was still alive, etc. - would pass muster under this analysis. They always had sources report stories to them. Those sources were just batshit crazy. Their strategy was simply not to question them skeptically to poke holes in their story as an ordinary reporter/person would, but to encourage them - "Wow, really, a wedding; what was Bigfoot wearing?" I don't mean to entirely dismiss the distinction you make. But in insisting that not a single story - not even one of the most egregious stories by the most irresponsible, disreputable, of barely-extant publications - is a lie, I think you try to prove too much. In doing so, you retreat so far that you defend only a weak and emasculated position, not any of the broader or more meaningful points implicated by your piece. Thanks for this - I always wondered what those tabloids thought they were doing, and for some reason this matches my model of human psychology better than my previous theories about “maybe they just made it up” - though I bet they do some of that too. — John Buridan writes: I used to have very low priors against conspiracy theories and so was willing to hear out the arguments at length and go back and forth for many weeks and months on a single theory. I would say my conspiracy theory expertise is in creationism and government conspiracies, especially ones involving either Catholicism or Judaism. And I'm okay on one's involving fluoridation, chemtrails, and GMOs etc. One of my housemates was a senior when I was a freshman in college gave me the Adobe illustrator birth certificate shtick, and we went through it together. We downloaded the birth certificate, uploaded it to Adobe illustrator, and saw the weird things. Then I went back to my day job where I was learning Adobe Illustrator. This is maybe 2 weeks later. And what do I find but that when I do this with any PDF, Illustrator renders it in the same janky way? Conspiracy dissolved. I grew up surrounded by people who believed conspiracy theories, although none of those people were my parents. And I have to say that the fact that so few people know other people who believe conspiracy theories kind of bothers me. It's like their epistemic immune system has never really been at risk of infection. If your mind hasn't been very sick at least sometimes, how can you be sure you've developed decent priors this time? Of course, this just all goes back to the dark matter beliefs of people in our outgroup. And the eternal question of where do good priors come from? How do some people's beliefs get so messed up? Thanks for this. I agree that a little bit of experience personally believing conspiracy theories, or knowing people who do, goes a long way. When I was a teenager, I flirted with a lot of pseudoarchaeology theories - think Graham Hancock, underwater pyramids, that kind of thing. I got better, but it left me with a visceral understanding of how people can genuinely believe weird things - not be lying about it, not be secretly making some kind of emotional point about how they hate the system, not be deliberately trying to be as sloppy as possible because you’re a bad person - just genuinely believe it because you tried to reason about it and failed. I think if you haven’t had that experience, then it’s really hard to understand people who have. 8: My Actual Thoughts I should probably try to say, as clearly as possible, what I think. It seems like all of these are different things: Reasoning well, and getting things right
December 05, 2023 · Original source
Specifically: Paul said 50% of severe problems but only ~15% extinction. The “average AI engineer” number is from a survey with likely response bias. The extinction tournament numbers given in the original are for catastrophe, not extinction. I cannot find a source for the average American number - it doesn’t seem to be in the linked Rethink Priorities report. Let me know if you can find it. 3: New Metaculus tournaments opening, including respiratory illnesses and the Global Pulse Tournament (with $1500 in prizes).
August 29, 2024 · Original source
Contact: Paul Contact Info: saturndoesmars[plus]acx[at]gmail[dot]com Time: Thursday, September 26th, 06:00 PM Location: We'll be hosted by an awesome event venue with a 4-season timber frame "barn" w/ recreation/games/patio/fires and lodging for 12 available in 2 cottages. Dinner for purchase. Full bar inside. rain/shine. We are 45min from both Portland, ME and Portsmouth, NH. Coordinates: https://plus.codes/87MFG3HM+75 Notes: Please RSVP and I'll send a simple Square $0 ticket link. Meeting up and hanging out is free. Buying a dinner is optional. Max capacity 150 people inside.
Contact: Victor Contact Info: wooddellv[at]yahoo[do t]com Time: Friday, September 27th, 06:00 PM Location: The Panera Bread at the corner of 13 mile and Woodward Ave, in Royal Oak, MI. There will be a sign indicating the section of the restaurant reserved for us. Coordinates: https://plus.codes/86JRGR87+X3 Notes: RSVP Required (so that I can reserve enough space) Contact me at wooddellv@yahoo.com Minnesota ST. PAUL, MINNESOTA, USA Contact: Aaron Contact Info: ironlordbyron[a t]gmail[d ot]com Time: Sunday, October 13th, 04:00 PM Location: Davanni's Pizza: 41 Cleveland Ave S, St Paul, MN 55105 Coordinates: https://plus.codes/86P8WRQ6+XX Group Link: Discord link for the MSP ACX meetup group: https://discord.gg/m2xJcuC937 Notes: I'll be providing pizza! Vegans are free to bring their own food (Davanni's selection here isn't great); I'll be getting a vegetarian and gluten-free pizza along with other kinds of pizza. RSVPs on lesswrong would be great.
Contact: Aaron Contact Info: ironlordbyron[a t]gmail[d ot]com Time: Sunday, October 13th, 04:00 PM Location: Davanni's Pizza: 41 Cleveland Ave S, St Paul, MN 55105 Coordinates: https://plus.codes/86P8WRQ6+XX Group Link: Discord link for the MSP ACX meetup group: https://discord.gg/m2xJcuC937 Notes: I'll be providing pizza! Vegans are free to bring their own food (Davanni's selection here isn't great); I'll be getting a vegetarian and gluten-free pizza along with other kinds of pizza. RSVPs on lesswrong would be great.
November 12, 2024 · Original source
Of inscriptions on the Jewish catacombs in Rome, 76% are in Greek, 22% in Latin, and only 2% in Hebrew or Aramaic. Reform Judaism is unstable. The Law of Moses is central to the Jewish faith; relax it too much, and believers can justly wonder what’s left. In America, Reform Jews are over-represented not only among atheists and agnostics, but among every cult under the sun. 33% of American Buddhists come from a Jewish background, and even the Moonies were 30% Jewish at one point! (they’re now down to 6%) As the Jews were assimilating into Greeks, some Greeks were assimilating into Judaism. They were impressed enough with monotheism and the Jews’ upright behavior to adopt some of the rituals, but they couldn’t take the final step and circumcise themselves. Instead, they hung around the fringes of Jewish society, admiring it from without. The Bible and the historical record call them “God-fearers”, but by analogy I can’t help but think of them as “weajoos”. These weajoos would have been easy prey for the first semi-Jewish sect to shed the circumcision requirement and explicitly pivot away from being an ethnic religion. The Apostles and other early Christians, leaving Palestine to minister to the wider world, would have made use of existing Jewish networks and connections. They would have found themselves in the middle of the spiritually-disaffected, half-assimilated pseudo-Reform Jewish communities of the Roman world, plus their half-assimilated-the-other direction Greek hangers-on. They would have preached that Judaism was basically true, but that you can drop the restrictive Law of Moses and avoid getting circumcised. They would have sliced through the cultural angst of these in-between communities, saying that Jews could join together with Gentiles in a big friendly tent under the leadership of the God of Abraham, Isaac, and Jacob. Here, says Stark, were the early Christians’ first few million converts. Because, I Regret To Inform You, The Pronatalists Are Right About Everything We found above that the Christian population needed to grow at 40% per decade, and assumed this meant conversion. But you could also do this through a fertility advantage. If a generation lasts thirty years, and Christians have 3x more children than pagans per generation, they can get 40%/decade growth without converting anyone at all. In reality, it was probably a mix: some conversion plus some fertility advantage. Here I start to worry that some right-wing pronatalist organization bribed Rodney Stark to abandon his usual scholarly attitude and write some kind of over-the-top pronatalist fanfic. I was waiting for the part where the eagle named MORE BIRTHS perches on the blackboard and the childfree professor was tossed into the lake of fire for all eternity. Still, let’s take it at face value and see what the fanfic has to say. By the Imperial era, Roman fertility was plummeting. Partly this was because the Romans practiced sex-selective infanticide, there were 130 men for every 100 women, and so many men would never be able to find a wife. But partly this was because the men who could find wives dragged their feet. (Male) Roman culture took it as a given that women were terrible, that you couldn’t possibly enjoy interacting with them, and that there was no reason besides duty that you would ever marry one. In 131 BC, the Roman censor Quintus Caecilius Metellus Macedonicus2 proposed that that the senate make marriage compulsory because so many men, especially in the upper classes, preferred to stay single. Acknowledging that “we cannot have a really harmonious life with our wives”, the censor pointed out that "since “we cannot have any sort of life without them,” the long term welfare of the state must be served”… As Beryl Rawsom has reported, “one theme that recurs in Latin literature is that wives are difficult and therefore men do not care much for marriage.” The Romans understood that this was long-term fatal for their empire, and tried all sorts of schemes to increase family formation. In the mid-first-century BC, Cicero re-proposed Metellus’ scheme to make marriage compulsory, but it failed once again. Augustus contented himself with punitive taxes and second-class citizenship for unmarried and childless couples, combined with subsidies and affirmative action for men with at least three children. Formal and informal social pressure eventually convinced most Roman men to take wives, but no amount of love or money could make them have children. Dense cities discouraged large families, Roman children were expensive (nobles would have to spend immense effort and political favors grooming them for high positions), and (the scourge of all nobilities) too many children risked splitting the inheritance. Also, if you had a girl you’d probably just kill her (she would consume resources without continuing the family line), and half of children died before adulthood from some disease or another anyway. It was just a really bad value proposition. Nor did the sex drive force the matter. Horny Roman men had their choice of a wide variety of male and female slaves and prostitutes - despite Augustus and his spiritual heirs’ fuming about monogamy, this was never really enforced on the male half of the population. When men did have sex with women, it was usually oral or anal sex, specifically to avoid procreation. When they did have vaginal sex, they had a wide variety of birth control methods available, including the famous silphium but also proto-condoms and spermicidal ointments. If a child was conceived despite these efforts, abortion was common albeit unsanitary (maternal death rates were extremely high, but this was not really a deal-breaker for the Roman men making the decision). If a baby was born in spite of all this, infanticide was legal and extremely common: Far more babies were born than were allowed to live. Seneca regarded the drowning of children at birth as both reasonable and commonplace. Tacitus charged that the Jewish teaching that it is “a deadly sin to kill an unwanted child” was but another of their “sinister and revolting practices” . . . not only was the exposure of infants a common practice, it was justified by law and advocated by philosophers.” Christians followed the opposite of all these practices. They recommended that men love their wives, and held this as a plausible and expected outcome. This was not exactly unprecedented, but it was a dramatic reversal of Roman custom. From Ephesians 5: Husbands, love your wives, just as Christ loved the church and gave himself up for her to make her holy, cleansing her by the washing with water through the word, and to present her to himself as a radiant church, without stain or wrinkle or any other blemish, but holy and blameless. In this same way, husbands ought to love their wives as their own bodies. He who loves his wife loves himself. After all, no one ever hated their own body, but they feed and care for their body, just as Christ does the church — for we are members of his body. “For this reason a man will leave his father and mother and be united to his wife, and the two will become one flesh.” This is a profound mystery — but I am talking about Christ and the church. However, each one of you also must love his wife as he loves himself, and the wife must respect her husband. The Christians banned adultery (and, unlike the Roman bans, gave it teeth), meaning that married men who wanted sex had no choice but to go to their wives. They held that sex had to be procreative, banning anal sex, oral sex, homosexual sex, and birth control. And obviously they banned infanticide (many of these bans weren’t active decisions, but carry-overs from the movement’s Jewish roots). Also, I regret to say I fell for the liberal meme that Republicans tricked Christians into being anti-abortion in 1960, and previous generations of Christian had thought abortion was fine. This is absolutely not true. The Didache, the first Christian text outside the New Testament itself, probably dating from about 90 AD, says that “Thou shalt not murder a child by abortion nor kill them when born”. The second-century church father Athenagoras wrote: We say that women who use drugs to bring on an abortion commit murder, and will have to give an account to God for the abortion . . . for we regard the very foetus in the womb as a created being, and therefore an object of God’s care . . . and [we do not] expose an infant, because those who expose them are chargeable with child-murder. The end result is that while pagans delayed marriage, cheated, had nonprocreative sex, used birth control, performed abortions, and committed infanticide, Christians did none of these things. This section gave me a new appreciation for conservative Christian purity culture: it was obviously suited for the environment in which it evolved, and it’s also obvious why its founders would etch it so deeply into its memetic DNA that it’s still going strong millennia later. But I’ll end this section with a note of caution - I’m not sure how relevant any of this is. Stark refuses to speculate on pagan vs. Christian fertility rates, but when I look up modern scholarship, they reasonably point out that pagan rates must have been around “replacement”, given that the Roman population stayed steady (or slowly increased) for hundreds of years. “Replacement” is in quotes because Romans were constantly dying of plague, warfare, fire, and a million other causes; since only a third to half of people survived to reproduce, “replacement” here is something like 4-6 children per women. This doesn’t sound like the antinatalist disaster Stark describes! I think Stark is mostly talking about Roman elites - the group who Augustus kept pestering to have at least three children - and more broadly about the urban population. These people were constantly dying and being replaced by commoners and villagers. Early Christianity was primarily an urban and upper-class movement (does this surprise you? Stark urges us to think of modern cults and new religions, like American Buddhism, which predominantly recruit disillusioned children of the upper classes). So perhaps it did better than its urban upper-class pagan comparison group. Still, since the urban upper-class pagans were constantly being replaced by village lower-class pagans as soon as they died out, how much, in numerical terms, can this contribute to Christianity’s growth? A possible synthesis: if you imagine a city as having a constant population (because it’s walled, plus its hinterland can only support a certain number of non-food-producing urbanites), and villagers as replacing urbanites on a one-to-one basis as they die, then greater Christian urban fertility rates can at least contribute to the cities and upper classes becoming Christian. And once the cities and upper classes are Christian, you get Constantine, and the lower classes can be forced to comply. Remember, “pagan” originally meant “rural”! Because Where Women Go, Men Will Follow One thing Stark did not mention discovering in his study of cults, but which I have heard anecdotally - a lot of male cult members join because the cult has hot girls. This seems to have been a big factor in the spread of early Christianity as well. Stark collects various forms of evidence that early Christians were predominantly women. Paul’s Epistle to the Romans greets thirty-three prominent Christians by name, of whom 15 were men and 18 women; if (as seems likely) men were more likely to become prominent than women, this near-equality at the upper ranks suggests a female predominance at the lower. A third-century inventory of property at a Christian church includes “sixteen men’s tunics and eighty-two women’s tunics”. The book quotes historian Adolf von Harnack, who says: [Ancient sources] simply swarm with tales of how women of all ranks were converted in Rome and in the provinces; although the details of these stories are untrustworthy, they express correctly enough the general truth that Christianity was laid hold of by women in particular, and also that the percentage of Christian women, especially among the upper classes, was larger than that of men. Why were women converted in such disproportionate numbers? Again, Stark’s sociological background serves him well: he is able to find reports of the same phenomenon in modern religions: By examining manuscript census returns for the latter half of the nineteenth century, Bainbridge (1983) found that approximately two-third of the Shakers were female. Data on religious movements included in the 1926 census of religious bodies show that 75% of Christian Scientists were women, as were more than 60% of Theosophists, Swedenborgians, and Spiritualists. The same is true of the immense wave of Protestant conversions taking place in Latin America. But along with a general tendency for women to convert, Stark notes that Christianity was especially attractive to women. The pagan world treated women as their husbands’ property, and not particularly well-liked property at that. The book cites the Athenian laws as typical: The status of Athenian women was very low. Girls received little or no education. Typically, Athenian females were married at puberty and often before. Under Athenian law, a woman was classified as a child, regardless of age, and therefore was the legal property of some man at all stages of her life. Males could divorce by simply ordering a wife out of the household. Moreover, if a woman was seduced or raped, her husband was legally compelled to divorce her. If a woman wanted a divorce, she had to have her father or some other man bring her case before a judge. Finally, Athenian women could own property, but control of the property was always vested in the male to whom she “belonged”. Meanwhile, Christian woman had relatively high status, sometimes rising to the position of deacon within a church. Christian men were ordered to treat their wives kindly, were prohibited from cheating on them, and mostly could not divorce. Christianity, unlike paganism, did not especially pressure widows to remarry (important since a remarrying widow lost all her property to her new husband). Christian women were only a third as likely as Roman women to be married off before age 13. Women noticed all these benefits and flocked to Christianity. Aside from all of this, the Romans were practicing sex-selective infanticide, reducing their female numbers still further, and making the Christians even more proportionally female-heavy. If the Christians, like many modern cults, were 65% female, and the Romans (as some sources attest) were about 40 - 45% female, this is a pretty profound difference. The Romans grumbled about marriage, but in the end most Roman men did want wives (if only to avoid government penalties). But 1.4 men per women - maybe even less among the upper classes - puts young men seeking wives in a difficult situation (for comparison, modern San Francisco is only 1.05 men per women, and dating is already hell). To any remotely heterosexual Roman men, the 65% female Christian community must have started looking pretty good. Meanwhile, the Christians had the opposite problem: too many women, not enough men. There’s an obvious solution, and it sounds like the pagans and Christians had also figured it out: From 1 Peter 3: Wives ... submit yourselves to your own husbands so that, if any of them do not believe the Word, they may be won over without words by the behavior of their wives, when they see the purity and reverence of your lives. History records many such intermarriages, almost always ending with the conversion of the pagan husband. If you are a Christian of English descent, you may owe your religion to Queen Bertha of Kent, who convinced her husband, one of the early Anglo-Saxon kings, to take her faith. But Ruxandro Teslo has a great post reviewing the work of historian Michele Salzman, who disagrees with all of this. Salzman has a database of 400 aristocratic Romans during the 4th century period of Christianity’s fastest growth. She finds few intermarriages, few examples of women converting their husbands, and equal (or slightly male-biased) conversion ratios. Granted, this is only a small sample from one period. But it makes us question how good our evidence really is. Doesn’t all this hinge on one passage from Paul which, technically, named more men than women, plus one inventory of tunics which was so female-biased that it couldn’t possibly have been representative of even a very woman-heavy church? Are we sure that we can make the leap from “Christianity promised women more rights” to “Therefore, women flocked to Christianity?” Wasn’t that the same argument that pundits used last week to predict a blue wave for Kamala? Didn’t white women actually go for Trump, 53-46? Salzman has one more concern, which is that women had so few rights in ancient Roman society that it’s hard to see how they could have converted at all. When unmarried, they were under the care of their father, who would hardly have let them go out visiting churches full of strange men. When married, they were under the care of their husband, who likewise. A typical Roman man wouldn’t have cared about his wife’s religious opinions, which is maybe why so many of our stories about intermarriages and conversions come from later periods like the Anglo-Saxons. I don’t know enough about history to referee this dispute, except that say that I think the answer could easily have been different for each of early Romans, late Romans, Hellenized-Jewish-Romans, pagan Romans, upper-class Romans, and lower-class Romans, plus all combinations thereof. Because Of The Testimony Of The Martyrs The martyrs are one of the most dramatic parts of the early Christian story. Men and women would endure seemingly-unbearable tortures, continuing to praise God the whole time, sometimes in spite of Roman officials who promised to let them go free if they would just make the tiniest concession to praising Jupiter. These martyrdoms impressed their contemporaries as much as they impress us, and were a major factor driving pagans to Christianity. The Christian Martyrs’ Last Prayer, by Jean-Leon Gerome (maybe slight nominative determinism?) Stark is writing in the 1990s, and martyrology c. 1995 does not exactly cover itself in glory. At the time of writing, the most popular theory among scholars (claims Stark) was that the martyrs were masochists. He considers this dumb and offensive theory a natural consequence of historians being reluctant to accept anything that sounds too miraculous or amazing, and there being few other hard-headed rational explanations of the martyrs’ behavior (for some reason, the obvious one - that they believed in God and Heaven - impresses neither Stark’s foils nor himself). He sets out to build an alternative theory: the martyrs were rationally seeking the approval of their community. Martyrdom not only occurred in public, often before a large audience, but it was often the culmination of a long period of preparation during which those faced with martyrdom were the object of intense, face-to-face adulation. Consider the case of Ignatius of Antioch … Ignatius was condemned to death as a Christian. But instead of being executed in Antioch, he was sent off to Rome in the custody of ten Roman soldiers. Thus began a long, leisurely journey during which local Christians came out to meet him all along the route, which passed through many of the more important sites of early Christianity in Asia Minor on its way to the West. At each stop Ignatius was allowed to preach to and meet with those who gathered, none of whom was in any apparent danger although their Christian identity was obvious. Moreover, his guards allowed Ignatius to write letters to many Christian congregations in cities bypassed along the way, such as Ephesus and Philadelphia … As William Schoedel remarked, “It is no doubt as a conquering hero that Ignatius thinks of himself as he looks back on part of his journey and says that the churches who received him dealt with him not as a ‘transient traveller,’ noting that ‘even churches that do not lie on my way according to the flesh went before me city by city.’” What Ignatius feared was not death in the arena, but that well-meaning Christians might gain him a pardon…He expected to be remembered through the ages, and compares himself to martyrs gone before him, including Paul, “in whose footsteps I wish to be found when I come to meet God.” It soon was clear to all Christians that extraordinary fame and honor attached to martyrdom. Nothing illustrates this better than the description of the martyrdom of Polycarp, contained in a letter sent by the church in Smyrna to the church in Philomelium. Polycarp was the bishop of Smyrna who was burned alive in about 156. After the execution his bones were retrieved by some of his followers - an act witnessed by Roman officials, who took no action against them. The letter spoke of “his sacred flesh” and described his bones as “being of more value than precious stones and more esteemed than gold.” The letter-writer reported that the Christians in Smyrna would gather at the burial place of Polycarp’s bones every year “to celebrate with great gladness and joy the birthday of his martyrdom.” The letter concluded, “The blessed Polycarp ... to whom be glory, honour, majesty, and a throne eternal, from generation to generation. Amen.” It also included the instruction: “On receiving this, send on the letter to the more distant brethren that they may glorify the Lord who makes choice of his own servants.” In fact, today we actually know the names of nearly all of the Christian martyrs because their contemporaries took pains that they should be remembered for their very great holiness. I don’t know, I’m not putting too much effort into writing up this section, because it doesn’t feel like as much of a mystery as some of the others. Maybe all of this was weird in 1996. But since then, we’ve seen plenty of suicide bombers willing to die for their faith. I accept that the Christian martyrs were more impressive - a slow death in the Colosseum takes more grit than the quick detonation of an explosive vest, and dying for peace is more impressive than dying in war - but it hardly seems like as much of a leap. Honestly, Stark’s “social approval” theory seems only slightly less objectifying than the masochism theory. Some people just have a tendency towards self-sacrifice. I know many effective altruists who, for example, deliberately let themselves be infected with malaria to help speed vaccine research. If someone told them a way that they could help the neediest people in the world by feeding themselves to lions, the lions would no doubt eat well. Because They Survived The Plagues However bad you imagine daily life in ancient Rome, it was worse. Historians estimate that ancient Rome had a population density of 300 people per acre. That’s almost ten times denser than modern New York City, two thousand years before anyone invented the skyscraper3. How did they do it? By cramming people together in unbearable filth and misery: Most people lived in tiny cubicles in multistoried tenements…”there was only one private house for every 26 blocks of apartments”. Within these tenements, the crowding was extreme - the tenants rarely had more than one room in which “entire families were herded together”. Thus, as Stambaugh tells us, privacy was “a hard thing to find”. Not only were people terribly crowded within these buildings, the streets were so narrow that if people leaned out their window they could chat with someone living across the street without having to raise their voices… To make matters worse, Greco-Roman tenements lacked both furnaces and fireplaces. Cooking was done over wood or charcoal braziers, which were also the only source of heat; since tenements lacked chimneys, the rooms were always smoky in winter. Because windows could be “closed” only by “hanging cloths or skins blown by rain”, the tenements were sufficiently drafty to prevent frequent asphyxiation. But the drafts increased the danger of rapidly spreading fires, and “dread of fire was an obsession among rich and poor alike.” Packer4 (1967) doubted that people could actually spend much time in quarters so cramped and squalid. Thus he concluded that the typical residents of Greco-Roman cities spent their lives mainly in public places and that the average “domicile must have served only as a place to sleep and store possessions.” These tenements had no plumbing. Waste was eliminated by pouring it onto the street, often to the detriment of people walking underneath. Water was brought home from public wells; if you were out, you either walked back to the well or made do. The total public baths capacity of Rome was about 30,000; the total population of Rome was about a million; in practice, the upper classes used the “public” baths and the average citizen had never bathed in their life. Soap had been invented a century or two earlier but was limited to a small pool of early adopters. The cities buzzed with flies, mosquitos, and other insects. It would be eighteen hundred years before anyone invented germ theory. Tenements were six stories high and frequently collapsed, killing everyone inside. Fires consumed the city on a regular basis, giving rise to colorful legends like Nero fiddling while Rome burnt. Police were limited, and it was understood that you would be robbed immediately if you set foot outside at nighttime. This kind of smart, walkable, mixed-use urbanism is illegal to build in most American cities. How did people survive? Mostly they didn’t. Cities were destroyed regularly - multiple times within a single human lifetime! - then rebuilt and replenished with rural population. Stark focuses on Antioch, a Syrian city which was a center of early Christianity. During “six hundred years of intermittent Roman rule”, he finds: It was conquered 11 times
Aside from all of this, the Romans were practicing sex-selective infanticide, reducing their female numbers still further, and making the Christians even more proportionally female-heavy. If the Christians, like many modern cults, were 65% female, and the Romans (as some sources attest) were about 40 - 45% female, this is a pretty profound difference. The Romans grumbled about marriage, but in the end most Roman men did want wives (if only to avoid government penalties). But 1.4 men per women - maybe even less among the upper classes - puts young men seeking wives in a difficult situation (for comparison, modern San Francisco is only 1.05 men per women, and dating is already hell). To any remotely heterosexual Roman men, the 65% female Christian community must have started looking pretty good. Meanwhile, the Christians had the opposite problem: too many women, not enough men. There’s an obvious solution, and it sounds like the pagans and Christians had also figured it out: From 1 Peter 3: Wives ... submit yourselves to your own husbands so that, if any of them do not believe the Word, they may be won over without words by the behavior of their wives, when they see the purity and reverence of your lives. History records many such intermarriages, almost always ending with the conversion of the pagan husband. If you are a Christian of English descent, you may owe your religion to Queen Bertha of Kent, who convinced her husband, one of the early Anglo-Saxon kings, to take her faith. But Ruxandro Teslo has a great post reviewing the work of historian Michele Salzman, who disagrees with all of this. Salzman has a database of 400 aristocratic Romans during the 4th century period of Christianity’s fastest growth. She finds few intermarriages, few examples of women converting their husbands, and equal (or slightly male-biased) conversion ratios. Granted, this is only a small sample from one period. But it makes us question how good our evidence really is. Doesn’t all this hinge on one passage from Paul which, technically, named more men than women, plus one inventory of tunics which was so female-biased that it couldn’t possibly have been representative of even a very woman-heavy church? Are we sure that we can make the leap from “Christianity promised women more rights” to “Therefore, women flocked to Christianity?” Wasn’t that the same argument that pundits used last week to predict a blue wave for Kamala? Didn’t white women actually go for Trump, 53-46? Salzman has one more concern, which is that women had so few rights in ancient Roman society that it’s hard to see how they could have converted at all. When unmarried, they were under the care of their father, who would hardly have let them go out visiting churches full of strange men. When married, they were under the care of their husband, who likewise. A typical Roman man wouldn’t have cared about his wife’s religious opinions, which is maybe why so many of our stories about intermarriages and conversions come from later periods like the Anglo-Saxons. I don’t know enough about history to referee this dispute, except that say that I think the answer could easily have been different for each of early Romans, late Romans, Hellenized-Jewish-Romans, pagan Romans, upper-class Romans, and lower-class Romans, plus all combinations thereof. Because Of The Testimony Of The Martyrs The martyrs are one of the most dramatic parts of the early Christian story. Men and women would endure seemingly-unbearable tortures, continuing to praise God the whole time, sometimes in spite of Roman officials who promised to let them go free if they would just make the tiniest concession to praising Jupiter. These martyrdoms impressed their contemporaries as much as they impress us, and were a major factor driving pagans to Christianity. The Christian Martyrs’ Last Prayer, by Jean-Leon Gerome (maybe slight nominative determinism?) Stark is writing in the 1990s, and martyrology c. 1995 does not exactly cover itself in glory. At the time of writing, the most popular theory among scholars (claims Stark) was that the martyrs were masochists. He considers this dumb and offensive theory a natural consequence of historians being reluctant to accept anything that sounds too miraculous or amazing, and there being few other hard-headed rational explanations of the martyrs’ behavior (for some reason, the obvious one - that they believed in God and Heaven - impresses neither Stark’s foils nor himself). He sets out to build an alternative theory: the martyrs were rationally seeking the approval of their community. Martyrdom not only occurred in public, often before a large audience, but it was often the culmination of a long period of preparation during which those faced with martyrdom were the object of intense, face-to-face adulation. Consider the case of Ignatius of Antioch … Ignatius was condemned to death as a Christian. But instead of being executed in Antioch, he was sent off to Rome in the custody of ten Roman soldiers. Thus began a long, leisurely journey during which local Christians came out to meet him all along the route, which passed through many of the more important sites of early Christianity in Asia Minor on its way to the West. At each stop Ignatius was allowed to preach to and meet with those who gathered, none of whom was in any apparent danger although their Christian identity was obvious. Moreover, his guards allowed Ignatius to write letters to many Christian congregations in cities bypassed along the way, such as Ephesus and Philadelphia … As William Schoedel remarked, “It is no doubt as a conquering hero that Ignatius thinks of himself as he looks back on part of his journey and says that the churches who received him dealt with him not as a ‘transient traveller,’ noting that ‘even churches that do not lie on my way according to the flesh went before me city by city.’” What Ignatius feared was not death in the arena, but that well-meaning Christians might gain him a pardon…He expected to be remembered through the ages, and compares himself to martyrs gone before him, including Paul, “in whose footsteps I wish to be found when I come to meet God.” It soon was clear to all Christians that extraordinary fame and honor attached to martyrdom. Nothing illustrates this better than the description of the martyrdom of Polycarp, contained in a letter sent by the church in Smyrna to the church in Philomelium. Polycarp was the bishop of Smyrna who was burned alive in about 156. After the execution his bones were retrieved by some of his followers - an act witnessed by Roman officials, who took no action against them. The letter spoke of “his sacred flesh” and described his bones as “being of more value than precious stones and more esteemed than gold.” The letter-writer reported that the Christians in Smyrna would gather at the burial place of Polycarp’s bones every year “to celebrate with great gladness and joy the birthday of his martyrdom.” The letter concluded, “The blessed Polycarp ... to whom be glory, honour, majesty, and a throne eternal, from generation to generation. Amen.” It also included the instruction: “On receiving this, send on the letter to the more distant brethren that they may glorify the Lord who makes choice of his own servants.” In fact, today we actually know the names of nearly all of the Christian martyrs because their contemporaries took pains that they should be remembered for their very great holiness. I don’t know, I’m not putting too much effort into writing up this section, because it doesn’t feel like as much of a mystery as some of the others. Maybe all of this was weird in 1996. But since then, we’ve seen plenty of suicide bombers willing to die for their faith. I accept that the Christian martyrs were more impressive - a slow death in the Colosseum takes more grit than the quick detonation of an explosive vest, and dying for peace is more impressive than dying in war - but it hardly seems like as much of a leap. Honestly, Stark’s “social approval” theory seems only slightly less objectifying than the masochism theory. Some people just have a tendency towards self-sacrifice. I know many effective altruists who, for example, deliberately let themselves be infected with malaria to help speed vaccine research. If someone told them a way that they could help the neediest people in the world by feeding themselves to lions, the lions would no doubt eat well. Because They Survived The Plagues However bad you imagine daily life in ancient Rome, it was worse. Historians estimate that ancient Rome had a population density of 300 people per acre. That’s almost ten times denser than modern New York City, two thousand years before anyone invented the skyscraper3. How did they do it? By cramming people together in unbearable filth and misery: Most people lived in tiny cubicles in multistoried tenements…”there was only one private house for every 26 blocks of apartments”. Within these tenements, the crowding was extreme - the tenants rarely had more than one room in which “entire families were herded together”. Thus, as Stambaugh tells us, privacy was “a hard thing to find”. Not only were people terribly crowded within these buildings, the streets were so narrow that if people leaned out their window they could chat with someone living across the street without having to raise their voices… To make matters worse, Greco-Roman tenements lacked both furnaces and fireplaces. Cooking was done over wood or charcoal braziers, which were also the only source of heat; since tenements lacked chimneys, the rooms were always smoky in winter. Because windows could be “closed” only by “hanging cloths or skins blown by rain”, the tenements were sufficiently drafty to prevent frequent asphyxiation. But the drafts increased the danger of rapidly spreading fires, and “dread of fire was an obsession among rich and poor alike.” Packer4 (1967) doubted that people could actually spend much time in quarters so cramped and squalid. Thus he concluded that the typical residents of Greco-Roman cities spent their lives mainly in public places and that the average “domicile must have served only as a place to sleep and store possessions.” These tenements had no plumbing. Waste was eliminated by pouring it onto the street, often to the detriment of people walking underneath. Water was brought home from public wells; if you were out, you either walked back to the well or made do. The total public baths capacity of Rome was about 30,000; the total population of Rome was about a million; in practice, the upper classes used the “public” baths and the average citizen had never bathed in their life. Soap had been invented a century or two earlier but was limited to a small pool of early adopters. The cities buzzed with flies, mosquitos, and other insects. It would be eighteen hundred years before anyone invented germ theory. Tenements were six stories high and frequently collapsed, killing everyone inside. Fires consumed the city on a regular basis, giving rise to colorful legends like Nero fiddling while Rome burnt. Police were limited, and it was understood that you would be robbed immediately if you set foot outside at nighttime. This kind of smart, walkable, mixed-use urbanism is illegal to build in most American cities. How did people survive? Mostly they didn’t. Cities were destroyed regularly - multiple times within a single human lifetime! - then rebuilt and replenished with rural population. Stark focuses on Antioch, a Syrian city which was a center of early Christianity. During “six hundred years of intermittent Roman rule”, he finds: It was conquered 11 times
The Christian Martyrs’ Last Prayer, by Jean-Leon Gerome (maybe slight nominative determinism?) Stark is writing in the 1990s, and martyrology c. 1995 does not exactly cover itself in glory. At the time of writing, the most popular theory among scholars (claims Stark) was that the martyrs were masochists. He considers this dumb and offensive theory a natural consequence of historians being reluctant to accept anything that sounds too miraculous or amazing, and there being few other hard-headed rational explanations of the martyrs’ behavior (for some reason, the obvious one - that they believed in God and Heaven - impresses neither Stark’s foils nor himself). He sets out to build an alternative theory: the martyrs were rationally seeking the approval of their community. Martyrdom not only occurred in public, often before a large audience, but it was often the culmination of a long period of preparation during which those faced with martyrdom were the object of intense, face-to-face adulation. Consider the case of Ignatius of Antioch … Ignatius was condemned to death as a Christian. But instead of being executed in Antioch, he was sent off to Rome in the custody of ten Roman soldiers. Thus began a long, leisurely journey during which local Christians came out to meet him all along the route, which passed through many of the more important sites of early Christianity in Asia Minor on its way to the West. At each stop Ignatius was allowed to preach to and meet with those who gathered, none of whom was in any apparent danger although their Christian identity was obvious. Moreover, his guards allowed Ignatius to write letters to many Christian congregations in cities bypassed along the way, such as Ephesus and Philadelphia … As William Schoedel remarked, “It is no doubt as a conquering hero that Ignatius thinks of himself as he looks back on part of his journey and says that the churches who received him dealt with him not as a ‘transient traveller,’ noting that ‘even churches that do not lie on my way according to the flesh went before me city by city.’” What Ignatius feared was not death in the arena, but that well-meaning Christians might gain him a pardon…He expected to be remembered through the ages, and compares himself to martyrs gone before him, including Paul, “in whose footsteps I wish to be found when I come to meet God.” It soon was clear to all Christians that extraordinary fame and honor attached to martyrdom. Nothing illustrates this better than the description of the martyrdom of Polycarp, contained in a letter sent by the church in Smyrna to the church in Philomelium. Polycarp was the bishop of Smyrna who was burned alive in about 156. After the execution his bones were retrieved by some of his followers - an act witnessed by Roman officials, who took no action against them. The letter spoke of “his sacred flesh” and described his bones as “being of more value than precious stones and more esteemed than gold.” The letter-writer reported that the Christians in Smyrna would gather at the burial place of Polycarp’s bones every year “to celebrate with great gladness and joy the birthday of his martyrdom.” The letter concluded, “The blessed Polycarp ... to whom be glory, honour, majesty, and a throne eternal, from generation to generation. Amen.” It also included the instruction: “On receiving this, send on the letter to the more distant brethren that they may glorify the Lord who makes choice of his own servants.” In fact, today we actually know the names of nearly all of the Christian martyrs because their contemporaries took pains that they should be remembered for their very great holiness. I don’t know, I’m not putting too much effort into writing up this section, because it doesn’t feel like as much of a mystery as some of the others. Maybe all of this was weird in 1996. But since then, we’ve seen plenty of suicide bombers willing to die for their faith. I accept that the Christian martyrs were more impressive - a slow death in the Colosseum takes more grit than the quick detonation of an explosive vest, and dying for peace is more impressive than dying in war - but it hardly seems like as much of a leap. Honestly, Stark’s “social approval” theory seems only slightly less objectifying than the masochism theory. Some people just have a tendency towards self-sacrifice. I know many effective altruists who, for example, deliberately let themselves be infected with malaria to help speed vaccine research. If someone told them a way that they could help the neediest people in the world by feeding themselves to lions, the lions would no doubt eat well. Because They Survived The Plagues However bad you imagine daily life in ancient Rome, it was worse. Historians estimate that ancient Rome had a population density of 300 people per acre. That’s almost ten times denser than modern New York City, two thousand years before anyone invented the skyscraper3. How did they do it? By cramming people together in unbearable filth and misery: Most people lived in tiny cubicles in multistoried tenements…”there was only one private house for every 26 blocks of apartments”. Within these tenements, the crowding was extreme - the tenants rarely had more than one room in which “entire families were herded together”. Thus, as Stambaugh tells us, privacy was “a hard thing to find”. Not only were people terribly crowded within these buildings, the streets were so narrow that if people leaned out their window they could chat with someone living across the street without having to raise their voices… To make matters worse, Greco-Roman tenements lacked both furnaces and fireplaces. Cooking was done over wood or charcoal braziers, which were also the only source of heat; since tenements lacked chimneys, the rooms were always smoky in winter. Because windows could be “closed” only by “hanging cloths or skins blown by rain”, the tenements were sufficiently drafty to prevent frequent asphyxiation. But the drafts increased the danger of rapidly spreading fires, and “dread of fire was an obsession among rich and poor alike.” Packer4 (1967) doubted that people could actually spend much time in quarters so cramped and squalid. Thus he concluded that the typical residents of Greco-Roman cities spent their lives mainly in public places and that the average “domicile must have served only as a place to sleep and store possessions.” These tenements had no plumbing. Waste was eliminated by pouring it onto the street, often to the detriment of people walking underneath. Water was brought home from public wells; if you were out, you either walked back to the well or made do. The total public baths capacity of Rome was about 30,000; the total population of Rome was about a million; in practice, the upper classes used the “public” baths and the average citizen had never bathed in their life. Soap had been invented a century or two earlier but was limited to a small pool of early adopters. The cities buzzed with flies, mosquitos, and other insects. It would be eighteen hundred years before anyone invented germ theory. Tenements were six stories high and frequently collapsed, killing everyone inside. Fires consumed the city on a regular basis, giving rise to colorful legends like Nero fiddling while Rome burnt. Police were limited, and it was understood that you would be robbed immediately if you set foot outside at nighttime. This kind of smart, walkable, mixed-use urbanism is illegal to build in most American cities. How did people survive? Mostly they didn’t. Cities were destroyed regularly - multiple times within a single human lifetime! - then rebuilt and replenished with rural population. Stark focuses on Antioch, a Syrian city which was a center of early Christianity. During “six hundred years of intermittent Roman rule”, he finds: It was conquered 11 times
November 14, 2024 · Original source
Then do whatever your opponent did last round. This was so boring that Axelrod sponsored a second tournament specifically for strategies that could displace TIT-FOR-TAT. When the dust cleared, TIT-FOR-TAT still won - although some strategies could beat it in head-to-head matches, they did worst against each other, and when all the points were added up TIT-FOR-TAT remained on top. In certain situations, this strategy is dominated by a slight variant, TIT-FOR-TAT-WITH-FORGIVENESS. That is, in situations where a bot can “make mistakes” (eg “my finger slipped”), two copies of TIT-FOR-TAT can get stuck in an eternal DEFECT-DEFECT equilibrium against each other; the forgiveness-enabled version will try cooperating again after a while to see if its opponent follows. Otherwise, it’s still state-of-the-art. The tournament became famous because - well, you can see how you can sort of round it off to morality. In a wide world of people trying every sort of con, the winning strategy is to be nice to people who help you out and punish people who hurt you. But in some situations, it’s also worth forgiving someone who harmed you once to see if they’ve become a better person. I find the occasional claims to have successfully grounded morality in self-interest to be facile, but you can at least see where they’re coming from here. And pragmatically, this is good, common-sense advice. For example, compare it to one of the losers in Axelrod’s tournament. COOPERATE-BOT always cooperates. A world full of COOPERATE-BOTS would be near-utopian. But add a single instance of its evil twin, DEFECT-BOT, and it folds immediately. A smart human player, too, will easily defeat COOPERATE-BOT: the human will start by testing its boundaries, find that it has none, and play DEFECT thereafter (whereas a human playing against TIT-FOR-TAT would soon learn not to mess with it). Again, all of this seems natural and common-sensical. Infinitely-trusting people, who will always be nice to everyone no matter what, are easily exploited by the first sociopath to come around. You don’t want to be a sociopath yourself, but prudence dictates being less-than-infinitely nice, and reserving your good nature for people who deserve it. Reality is more complicated than a game theory tournament. In Iterated Prisoners’ Dilemma, everyone can either benefit you or harm you an equal amount. In the real world, we have edge cases like poor people, who haven’t done anything evil but may not be able to reciprocate your generosity. Does TIT-FOR-TAT help the poor? Stand up for the downtrodden? Care for the sick? Domain error; the question never comes up. Still, even if you can’t solve every moral problem, it’s at least suggestive that, in those domains where the question comes up, you should be TIT-FOR-TAT and not COOPERATE-BOT. This is why I’m so fascinated by the early Christians. They played the doomed COOPERATE-BOT strategy and took over the world. II. Matthew 5: You have heard that it was said, ‘Love your neighbor and hate your enemy.’ But I tell you, love your enemies and pray for those who persecute you . . . If you love those who love you, what reward will you get? Are not even the tax collectors doing that? And if you greet only your own people, what are you doing more than others? Do not even pagans do that? Talk is cheap, but The Rise Of Christianity suggests the early Christians pulled it off. For example, even though pagan institutions would not help indigent Christians, Christians tried to give charity to Christian and pagan alike, even going so far as to help nurse pagans during the plague (when nursing a victim conferred a high risk of contagion and death). Even Emperor Julian, an enemy of Christianity, admitted it lived up to its own standards: When the poor happened to be neglected and overlooked by the priests, the impious Galileans observed this and devoted themselves to benevolence . . . [they] support not only their poor, but ours as well, [when] everyone can see that our people lack aid from us.” In 1 Corinthians 6, Paul is asked whether it is acceptable for one Christian to pursue a lawsuit against another Christian in a pagan court. He answers: The very fact that you have lawsuits among you means you have been completely defeated already. Why not rather be wronged? Why not rather be cheated? We get a similar picture from the stories of the martyrs. Many of them prayed for the Romans while the Romans were in the process of torturing and killing them; Polycarp even cooked them a meal. If the Christians had merely been TIT-FOR-TAT, it would be easy to tell a story of their victory. The Roman Empire was corrupt and decadent to the core. People were looking for a community they could trust. Christianity offered access to a better class of friends who wouldn’t immediately rob or betray you when your guard was down. By providing a superior alternative to the low-trust pagan world, it was irresistible on a purely rational economic basis. But this story sounds more worthy of the mystery cults. Mystery cults are a great structure for mutual aid; we see this today in groups like the Freemasons (cf. Backscratcher Clubs). Everybody knows who’s on the inside (and needs to be mutually aided) and who’s on the outside (and can be ignored). The initiatory structure holds off freeloaders and makes sure the people on the inside are of approximately equal rank (so that you get as many benefits as you give) and can be held accountable if they don’t contribute. Since Christianity did better than the mystery cults, there must have been some reason that COOPERATE-BOT beat TIT-FOR-TAT in the particular environment of Roman religion, defying all normal game theoretic logic. III. Is this a consistent feature of COOPERATE-BOT strategies, or was it just luck? This is hard to say, because in all normal cases it’s impossible to follow a COOPERATE-BOT strategy at scale and for any period of time. Consider the Quakers, who gave it a better try than most. They were persecuted by the British and fled to America (is this kosher? it sort of seems like resisting evil). There they founded the colony of Pennsylvania, intended to be a utopia of pacifism and benevolence. They were very serious about this; history records many Quakers who were arrested or even killed rather than compromise their principles, and the British Crown seized Pennsylvania from the Quakers a few times because they wouldn’t make extremely cheap gestures like pay taxes or swear oaths. But in the end, the Crown frog-boiled the Quakers into compliance. They promised to return self-government if the Quakers would budge an inch - in one compromise, if they agreed to pay taxes that could go to non-combat functions of the military. The Quakers eventually agreed, and the British ratcheted up their demands the next time. Finally, in 1755, some Indians launched a major assault on Pennsylvania, and all the Quakers voluntarily resigned from government to let the non-Quaker Pennsylvanians (who by this time outnumbered them) conduct the war without restraint. The Quakers performed better than most COOPERATE-BOTs. They stuck to their principles most of the time, and in the end their religion survived. But look deeper, and you see a gradual process of surrender to reality. First was the flight to America, an implicit admission that living was better than being martyred for the faith. Then came the various compromises; an implicit admission that getting to keep self-government while being 99% pure was better than being subjects while 100% pure. Finally, they gave up Pennsylvania itself rather than be wiped out, again choosing the practical option over martyrdom. My point isn’t to knock the Quakers, who may come in a close 2nd in “historical groups that stuck to their cooperative principles despite all odds” and were certainly more ethical than I am. My point is that even very committed groups of religious fanatics fail the non-violent COOPERATE-BOT strategy eventually. Or maybe the ones who didn’t fail were wiped out? I hear good things about the Cathars, but we can’t know for sure because they were very thoroughly killed off - unrepentant to the last. Are there any other groups who deserve mention in this section besides early Christians, Quakers, and Cathars? I think some German and Russian sects have tried similar strategies, though they mostly failed and I don’t know much about them. Not exactly the same, but maybe rhyming: what about modern liberalism? To the monarchs and dictators of the past, free speech might seem kind of like COOPERATE-BOT in a limited domain: the idea that elites shouldn’t make any forceful/legal effort to protect their ideological and spiritual position must sound almost as crazy as them not making any forceful/legal effort to protect themselves if attacked, or to prevent themselves from getting cheated. It is, in some sense, a unilateral surrender in the war of ideas; fascists and communists will do their best to crush liberalism, but liberals cannot ban discussion of fascism or communism. The fact that this, too, has worked, makes me think early Christianity wasn’t just a one-off, but suggests some larger point. IV. Still, I don’t really know what it is. Here are some weak theories: Advertisement: Being kind to outsiders is good PR and encourages those outsiders to join you. This effect is stronger than the corresponding disincentive (that they won’t get much better treatment than they’re getting already, and they will have to be nice to other outsiders in their turn).
February 12, 2026 · Original source
Epoch/Croxton are current best estimates, and can probably fairly be read as the “real” answer against which Cotra and Davidson’s earlier guesses should be judged. All numbers are yearly multiples, so 1.4 means that willingness to spend grows 1.4x per year, ie 40%. Willingness To Spend: How much money are companies willing to spend on AI, in the form of chips and data centers? $/FLOP: How quickly do Moore’s Law, economies of scale, and other factors bring down the price of AI compute? Training Run Length: How long are companies spending on AI training runs for frontier models (instead of using those chips for smaller models, experiments, or consumer services)? Real Compute: The product of the three parameters above. Algorithmic Progress: How effectively do researchers discover new algorithms that makes training AIs cheaper and more efficient? Total Effective Compute: The product of real compute and algorithmic progress. So for example, the Epoch column’s 10.7x means that in any given year, you can train an AI 10.7x better than the last year, because you have 3.6x more compute available, and that compute is 3.0x more efficient. Cotra and Davidson were pretty close on willingness to spend and on FLOPs/$. This is an impressive achievement; they more or less predicted the giant data center buildout of the past few years. They ignored training run length, which probably seemed like a reasonable simplification at the time. But they got killed on algorithmic progress, which was 200% per year instead of 30%. How did they get this one so wrong? Here’s Cotra’s section on algorithmic progress: Algorithmic progress forecasts Note: I have done very little research into algorithmic progress trends. Of the four main components of my model (2020 compute requirements, algorithmic progress, compute price trends, and spending on computation) I have spent the least time thinking about algorithmic progress. I consider two types of algorithmic progress: relatively incremental and steady progress from iteratively improving architectures and learning algorithms, and the chance of “breakthrough” progress which brings the technical difficulty of training a transformative model down from “astronomically large” / “impossible” to “broadly feasible.” For incremental progress, the main source I used was Hernandez and Brown 2020, ”Measuring the Algorithmic Efficiency of Neural Networks”. The authors reimplemented open source state-of-the-art (SOTA) ImageNet models between 2012 and 2019 (six models in total). They trained each model up to the point that it achieved the same performance as AlexNet achieved in 2012, and recorded the total FLOP that required. They found that the SOTA model in 2019, EfficientNet B0, required ~44 times fewer training FLOP to achieve AlexNet performance than AlexNet did; the six data points fit a power law curve with the amount of computation required to match AlexNet halving every ~16 months over the seven years in the dataset.² They also show that linear programming displayed a similar trend over a longer period of time: when hardware is held fixed, the time in seconds taken to solve a standard basket of mixed integer programs by SOTA commercial software packages halved every ~13 months over the 21 years from 1996 to 2017.³ Grace 2013 (”Algorithmic Progress in Six Domains”) is the only other paper attempting to systematically quantify algorithmic progress that I am currently aware of, although I have not done a systematic literature review and may be missing others. I have chosen not to examine it in detail because a) it was written largely before the deep learning boom and mostly does not focus on ML tasks, and b) it is less straightforward to translate Grace’s results into the format that I am most interested in (”How has the amount of computation required to solve a fixed task decreased over time?”). Paul is familiar with the results, and he believes that algorithmic progress across the six domains studied in Grace 2013⁴ is consistent with a similar but slightly slower rate of progress, ranging from 13 to 36 months to halve the computation required to reach a fixed level of performance. Additionally, it seems plausible to me that both sets of results would overestimate the pace of algorithmic progress on a transformative task, because they are both focusing on relatively narrow problems with simple, well-defined benchmarks that large groups of researchers could directly optimize.⁵ Because no one has trained a transformative model yet, to the extent that the computation required to train one is falling over time, it would have to happen via proxies rather than researchers directly optimizing that metric (e.g. perhaps architectural innovations that improve training efficiency for image classifiers or language models would translate to a transformative model). Additionally, it may be that halving the amount of computation required to train a transformative model would require making progress on multiple partially-independent sub-problems (e.g. vision and language and motor control). I have attempted to take the Hernandez and Brown 2020 halving times (and Paul’s summary of the Grace 2013 halving times) as anchoring points and shade them upward to account for the considerations raised above. There is massive room for judgment in whether and how much to shade upward; I expect many readers will want to change my assumptions here, and some will believe it is more reasonable to shade downward. Cotra’s estimate comes primarily from one paper, Hernandez & Brown, which looks at algorithmic progress on a task called AlexNet. But later research demonstrated that the apparent speed of algorithmic progress varies by an order of magnitude based on whether you’re looking at an easy task (low-hanging fruit already picked) or a hard task (still lots of room to improve). AlexNet was an easy task, but pushing the frontier of AI is a hard task, so algorithmic progress in frontier AI has been faster than the AlexNet paper estimated. In Cotra’s defense, she admitted that this was the area where she was least certain, and that she had rounded the progress rate down based on various considerations when other people might round it up based on various other considerations. But the sheer extent of the error here, compounded with a few smaller errors that unfortunately all shared the same direction, was enough to throw off the estimate entirely. Since Cotra and Davidson were expecting AI to get 3.6x more effective compute each year, but it actually got 10.7x more, it’s no mystery why their timelines were off. When John recalculates Davidson’s model with Epoch’s numbers, he finds that it estimates AGI in 2030, which matches the current vibes. IV. With this information in place, it’s worth looking at some prominent contemporaneous critiques of Bio Anchors. Various people criticized Bio Anchors’ many strange anchors for how much compute it would take to produce AGI. For example, one anchor estimated that it would take 10^45 FLOPs, because that was how many calculations happened in all the brains of all animals throughout the evolutionary history (which eventually produced the human brain that AIs are trying to imitate). To make things even weirder, this anchor assumed away all animals other than nematodes as a rounding error (fact check: true!) All of these seemed to detract from the main show, an attempt to estimate the compute involved in the human brain. But even this more sober anchor was complicated by time horizons - it’s not enough to imitate the human brain for one second; AIs need to be able to imitate the human brain’s capacity for long-term planning. Cotra calculated how much compute AGI would require if it needed a planning horizon of seconds, weeks, or years. Thanks to METR, we now know that existing AIs have already passed a point where they can do most tasks that take humans seconds, are moving through the hour range, and are just about to touch one day. So the “seconds” anchor is ruled out. But it also seems unlikely that AGI will require years, because most human projects don’t take years, or at least can be split into tasks that take less than one year each (intuition pump: are we sure the average employee stays at an AI lab for more than a year? If not, that proves that a chain of people with sub-one-year time horizons can do valuable work). The AI Futures team guessed that the time horizon necessary for AIs to really start serious recursive self-improvement was between a few weeks and a few months (though this might look like a totally different number on the METR graph, which doesn’t translate perfectly into real life). If this is true, then all three anchors (seconds, hours, years) were off by at least an order of magnitude. But it turns out that none of this matters very much. The highest and lowest anchors cancel out, so that the most plausible anchor - human brain with time horizon of hours to days - is around the average. If you remove all the other anchors and just keep that one, the model’s estimates barely change. But also, we’re talking about crossing twelve orders of magnitude here. The difference between the different time horizon anchors doesn’t register much on that level, compared to things like algorithmic progress which have exponential effects. Maybe this is the model basically working as intended. You try lots of different anchors, put more weight on the more plausible ones, take a weighted average of each of them, and hopefully get something close to the real value. Bio Anchors did. Or maybe it was just good luck. Still hard to tell. Eliezer Yudkowsky argued that the whole methodology was fundamentally flawed. Partly because of the argument above - he didn’t trust the anchors - but also partly because he expected the calculations to be obviated by some sort of paradigm shift that couldn’t be shoehorned into “algorithmic progress” (like how you couldn’t build an airplane in 1900 but you could in 1920). As of 2026 - still before AGI has been invented and we get a good historical perspective - no such shift has occurred. The scaling laws have mostly held; whatever artificial space you try to measure models in, the measurement has mostly worked in a predictable way. There have really only been two kinks in the history of AI so far. First, a kink in training run size around 2010: Second, a kink in time horizons around 2024 and the invention of test-time compute: The 2010 kink was before Cotra’s forecast and priced in. The 2024 kink is interesting and relevant - but since it was on a parameter Cotra wasn’t measuring, and probably too small to show up on the orders-of-magnitude scale we’re talking about, it’s probably not a major cause of the model’s inaccuracy. Other things have been even more predictable: So Cotra’s bet on progress being smooth and measurable has mostly paid off so far. But Yudkowsky further explained that his timelines were shorter than Bio Anchors because people would be working hard to discover new paradigms, and if the current paradigm would only pay off in the 2050s, then probably they would discover one before then. You could think of this as a disjunction: timelines will be shorter than Cotra thinks, either because deep learning pays off quickly, or because a new paradigm gets invented in the interim. It turned out to be the first one. So although Yudkowsky’s new paradigm has yet to materialize, his disjunctive reasoning in favor of shorter-than-2050 timelines was basically on the mark. Nostalgebraist argued that Cotra’s whole model was a wrapper for an assumption that Moore’s Law will continue indefinitely. If it does, obviously you get enough compute for AI at some point, even if it requires some absurd process like simulating all 500 million years of multicellular evolution. I never entirely understood this objection, because - although Bio Anchors does depend on a story where Moore’s Law doesn’t break before we get the relevant amount of compute - this is only one of many background assumptions (like that a meteor doesn’t hit Earth before we get the relevant amount of compute). Given those assumptions, it does a useful not-just-assumption-repeating job of calculating when transformative AI will happen. As Cotra implicitly predicted, we seem on track to get AGI before Moore’s Law breaks down, and so Moore’s Law didn’t end up mattering very much. And if all of Cotra’s non-Moore’s-Law parameter estimates had been correct, her model would have given about the same timelines we have now, and surprised everyone with a revolutionary claim about the AI future. But Nostalgebraist added, almost as an aside: Cotra has a whole other forecast I didn’t mention for “algorithmic progress,” and the last number is what you get from just algorithmic progress and no Moore’s Law. So depending on how much you trust that forecast, you might want to take all these numbers with an even bigger grain of salt than you’d expected from everything else we’ve seen. How much should you trust Cotra’s algorithmic progress forecast? She writes: “I have done very little research into algorithmic progress trends. Of the four main components of my model (2020 compute requirements, algorithmic progress, compute price trends, and spending on computation) I have spent the least time thinking about algorithmic progress.” ...and bases the forecast on one paper about ImageNet classifiers. I want to be clear that when I quote these parts about Cotra not spending much time on something, I’m not trying to make fun of her. It’s good to be transparent about this kind of thing! I wish more people would do that. My complaint is not that she tells us what she spent time on, it’s that she spent time on the wrong things. Like Cotra herself, I think Nostalgebraist was spiritually correct even if his bottom line (about Moore’s Law) was wrong. His meta-level point was that a seemingly complicated model could actually hinge on one or two parameters, and that many of Cotra’s parameter values were vague hand-wavey best guess estimates. He gave algorithmic progress as a secondary example of this to shore up his Moore’s Law case, but in fact it turned out to be where all the action was. V. Those were the rare good critiques. The bad critiques were the same ones everyone in this space gets: You’re just trying to build hype.