DeepMind

Article

DeepMind is a recurring organization in the Astral Codex Ten archive, appearing 21 times across 21 issues between February 08, 2021 and October 22, 2025. The archive places it in contexts such as “In 2016, DeepMind’s AlphaGo beat first Fan Hui”; “DeepMind got their Go AI AlphaZero”; “DeepMind got their Go AI AlphaZero to try learning chess”. It most often appears alongside OpenAI, Anthropic, Eliezer Yudkowsky.

Metadata

Category: Organizations
Mention count: 21
Issue count: 21
First seen: February 08, 2021
Last seen: October 22, 2025

Appears In

- OpenAI (13 shared issues)
- Anthropic (8 shared issues)
- Eliezer Yudkowsky (7 shared issues)
- Elon Musk (7 shared issues)
- Google (7 shared issues)
- AGI (6 shared issues)
- China (6 shared issues)
- Metaculus (6 shared issues)
- FDA (5 shared issues)
- AI (4 shared issues)
- Astralcodexten Com (4 shared issues)
- COVID (4 shared issues)

External Links

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

February 08, 2021 · Original source

First, some history. In 2016, DeepMind’s AlphaGo beat first Fan Hui, a medium-level professional Go player, and then Lee Sedol, a top professional Go player. This was one of the more unexpected events in AI history; everyone thought it would be a few more years before Go AIs were ready for prime time. We can see this on Metaculus; their prediction that a Go program would beat a professional went from 30% before the Fan Hui match to 90% afterwards (there was some debate on whether the Fan Hui match was official enough to count, so it wasn’t 100, but everyone agreed that beating Fan Hui meant the program could probably beat other people in more official settings. After that people thought it was moderately likely AlphaGo could beat Lee Sedol too, and they were right.

Highlights From The Comments On Acemoglu And AI

August 06, 2021 · Original source

But can the learning algorithm learn to play chess? Yes, extremely well. DeepMind got their Go AI AlphaZero to try learning chess, and it became world champion within a day. Then they asked it to learn a different game called shogi, and it became world champion of that one too. Could AlphaZero learn how to invent new rockets? No, because that’s not the class of problems it knows how to learn about (it’s not a board game where it can play against itself a bunch of times and observe its mistakes). So is the learning algorithm a narrow AI or a general AI? It’s not infinitely narrow - it can learn any board game you throw at it - but it’s not infinitely general either. Certainly it’s more general, smarter, and at least slightly scarier than a polynomial that predicts parole decisions.

Right now a lot of research is going into making things that are slightly more general than AlphaZero. For example, could you get something which, in addition to being able to play any board game, can also play any video game? This turns out to be a really different problem; my understanding is that they’re pretty close but not quite there. What about just games in general? Last week, DeepMind published a paper, Open-Ended Learning Leads To Generally Capable Agents. They created a simulated 3D physical environment, stuck an AI in a simulated body in that environment, and made it go through various obstacle courses and stuff. They found that the knowledge generalized, so that the AI was eventually able to learn to play games they hadn’t taught it, like hide-and-seek and capture-the-flag, coming up with decent strategies on their first attempt based on the general principles it had learned from other things. Where does this place it on the “it’s just an algorithm” vs. “real intelligence” dichotomy?

Inline links: Open-Ended Learning Leads To Generally Capable Agents

My personal estimates are more like 75% chance, 25% chance, and a distribution that peaks about 20 years later than this one. I think the Metaculus position is consistent with all of “this probably won’t happen”, “THIS IS SUPER-TERRIFYING”, “this is most likely far away”, and “BUT FOR ALL WE KNOW IT COULD BE TOMORROW!” I realize this is an annoying way for things to be. ————————————————— CraigMichael writes: >But all the AI regulation in the world won’t help us unless we humans resist the urge to spread misinformation to maximize clicks. Was with you up to this point. There are several solutions to this other than willpower (resisting the urge). The basic idea - change incentives so that while spreading misinformation is possible but substantially less desirable/lucrative than other options for online behaviors. This isn’t so hard to imagine. Say there’s a lot of incentives to earn money online doing creative or useful things. Like Mechanical Turk, but less route behavior and more performing a service or matching needs. Like I wish I had a help desk for English questions where the answers were good and not people posturing to look good to other people on the English Stack Exchange, for example. I would pay them per call or per minute or whatever. Totally unexplored market AFAIK because technology hasn’t been developed yet. Another idea - Give people more options to pay at an article-level for information that’s useful to them or to have related questions answered or something like that without needing a subscription or a bundle. Say there’s some article about anything and I want to contact the author and be like “hey, here’s a related question, I’m willing to offer you X dollars to answer.” The person says “I’ll do it for x+10 dollars.” One site used to unlock articles to the public after a threshold of Bitcoin have been donated on a PPV basis. It both incentives the author and had a positive externality. Everyone is so invested in ads that they don’t work on technology and ideas to create new markets. To paraphrase Jaron Lanier we need to make technology so good it seduces away from destroying ourselves. Partly I want to complain that obviously I was using the quoted sentence as a rhetorical device. But I guess the whole point of that sentence and its paragraph was to argue against saying false things as a rhetorical device, so - hoist on my own petard, I guess. I’m less optimistic than Craig is about this solution, because it seems to me that socially virtuous technology will always be less fun/addictive than nonvirtuous technology, simply because the virtuous technology has to hit two targets (virtuous, fun/addictive), the nonvirtuous technology only has to hit one target, and it’s easier to optimize for a target with zero other constraints than with one other constraint. See eg Meditations on Moloch. ————————————————— Souf asks: Is there a convincing argument that AGI is possible within any reasonable timeframe (like... 50 years), other than the intuitions of esteemed AI researchers? Do they have any way to back up their estimates (of some tens of percent), and why they shouldn't be millionths of a percent? It is, as another poster said, an "extraordinary claim." I'd like to see some extraordinary support of those particular numbers. If I had to answer this question, I would point to the sorts of work AI Impacts does, where they try to estimate how capable computers were in 1980, 1990, etc, draw a line to represent the speed at which computers are becoming more capable, figure out where humans are at the same metric, and check the time when that line crosses however capable you’ve decided humans are. This is obviously really hard because you have to operationalize some definition of “capable” or “intelligent” or some other word that is hard to operationalize, but when you do it you usually get sometime in the mid-21st century. You’re going to point out that this argument doesn’t really qualify as “convincing”. I admit it doesn’t meet trial-by-jury standards of evidence. So I guess my real answer would be “it’s the #$@&ing prior”. Like, you certainly don’t have knock-down evidence that it’s impossible, I don’t have a knock-down evidence that it’s certain, so it might happen and it might not. How “might” are we talking? I don’t know, it would seem weird if this quickly-advancing technology being researched by incredibly smart people with billions of dollars in research funding from lots of megacorporations just reached some point and then stopped. Okay, fine, maybe it will keep advancing at the same rate, how fast is that in terms of time-to-AGI? Now we’re back at AI Impacts drawing lines again. The stupidest possible prior is always 50-50. We would have to be very stupid people to use the stupidest possible prior. But here we are. I wouldn’t want to give a 50-50 chance of us inventing FTL travel by 2100, because FTL travel seems physically impossible. I wouldn’t want to give a 50-50 chance of us inventing slower-than-light-but-still-pretty-good starships by 2100, because, I dunno, space travel isn’t advancing that fast and nobody is really working on it that hard. For AI, I don’t know, I kinda want to say 50-50. If I were going to try to update away from 50-50, I would want to look at AI Impacts style line graphs, expert opinion, and prediction markets. All of those seem to make me update up instead of down, so I don’t think I would go lower than 50-50. But there’s enough Knightian uncertainty to make an entire Round Table here, so who knows? Hardly a “convincing” argument, but I’m just trying to avoid the McAfee Fallacy: ————————————————— Souf continues: The argument that we are "in the middle of a period of extremely rapid progress in AI research, when barrier after barrier is being breached" makes it seem like all AI "progress" is on some sort of line that ends in AGI. That feels like sleight-of-hand. Even Scott himself refers to AGI here as a "new class of actor," so I'm failing to see how current lines of "progress" will indubitably result the emergence of something completely novel and different? Lots of smart people disagree with me on this one, but I think the path from here to AGI is pretty straight. I mean, it will take thousands of people who are all much smarter than I am to do it, but it’ll happen. My argument is something like - human brains are remarkably similar to rat brains, only much bigger. They’re still a little similar to insect brains. It looks like if you have a basic functioning brain, and you scale it up, it gets human intelligence. Existing AIs like AlphaGo or GPT seem to be basically a blob of learning-ability, a plan for pointing the blob at a specific problem, and lots and lots of training data. I think the past five years have shown that this basic model generalizes really well. OpenAI’s programs can now write essays, compose music, and generate pictures, not because they had three parallel amazing teams working on writing/music/art AIs, but because they took a blob of learning ability and figured out how to direct it at writing/music/art, and they were able to get giant digital corpuses of text / music / pictures to train it. DeepMind is finding that it can win lots of games, from Go to StarCraft to obstacle courses in simulated environments, by pointing a blob of learning-ability at the game and making it play against itself a zillion times (ie generate its own training data). My impression is that human/rat/insect brains are a blob of learning-ability which the rest of the nervous system successfully points at the world, and especially at aspects of the world that the organism needs to pay attention to (eg food sources, sex, etc). This isn’t exactly right, there are a few genetically-encoded programs, but not that many and it’s pretty hard. Right now I think our main advantages over AI systems are something like: our nervous system is pretty good at pointing us at the world and extracting training data from it. If you wanted an AI that learned being-in-the-world skills as well as we do, it would have to have an amazing robot body, and right now robot bodies aren’t that amazing.

Inline links: writes, Meditations on Moloch, Souf, the sorts of work AI Impacts does, https://substackcdn.com/image/fetch/$s_!3MgL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7db78f49-9ccb-4b6e-ac18-cfb79f52cb04_584x232.png, not that many and it’s pretty hard

Practically-A-Book Review: Yudkowsky Contra Ngo On Agents

January 19, 2022 · Original source

The story thus far: AI safety, which started as the hobbyhorse of a few weird transhumanists in the early 2000s, has grown into a medium-sized respectable field. OpenAI, the people responsible for GPT-3 and other marvels, have a safety team. So do DeepMind, the people responsible for AlphaGo, AlphaFold, and AlphaWorldConquest (last one as yet unreleased). So do Stanford, Cambridge, UC Berkeley, etc, etc. Thanks to donations from people like Elon Musk and Dustin Moskowitz, everyone involved is contentedly flush with cash. They all report making slow but encouraging progress.

Links For February

February 22, 2022 · Original source

14: You’ve probably heard statistics about how 50% of transgender youth attempt suicide before age 21. This paper tries to analyze the situation in more depth. The 50% number usually comes from surveys, but there’s some evidence people exaggerate on surveys, rounding up “I think about it a lot” to “I attempted”. The authors gather data on completed suicides among trans people, and find that they’re about 0.01%/year (which is about 5x the cisgender rate). If we suppose that people have about 5 years between becoming transgender and turning 21, then the 50% attempted suicide rate → 0.05% completed suicide rate implies that 1/1000th of the youth who report attempting suicide on surveys complete suicide - which sounds about right to me [but see this comment for a critique] 15: Gwern on the failures of 20th century eugenics. I’ve previously linked a piece about how, aside from the general moral failure, the 20th century eugenicists got lots of implementation details really wrong. Gwern adds to the picture: they had a purely Mendelian (as opposed to polygenic) model of intelligence, and felt that bad traits were probably caused by single recessive genes. This dichotomized the population in a way that contributed to the moral problems - if IQ is truly a continuum, then someone with 120 IQ might still wonder if they were “inferior” to someone with 130 IQ, in a way that made them feel some sympathy to someone with 80 IQ who was being pronounced “inferior” by the eugenicists of the time. But instead, they thought some people had the specific recessive “low intelligence” gene, those people could be “cleansed” from the population, and then everyone else would be fine! It also prevented them from considering improving the populace by encouraging intelligent people to breed more (as opposed to sterilizing unintelligent people) - this wouldn’t eliminate the recessive variants that were causing all the trouble! I’m confused how they could have believed this even with the limited knowledge of the time; this was long after Galton had proven that genius was genetic, and once you have genetic genius you know there’s more going on than Mendelian inheritance of subnormality. 16: Sexual selection bridges peaks in adaptive fitness landscapes 17: NFTorah: “The Torah [is] the original blockchain”. I think it’s funny that this exists, but it’s exactly what you would expect, and you don’t have to click on the link. 18: More IRB nightmares. 19: @ethanbdm When we piloted a public lottery to evaluate cash transfers in Liberia, the potential recipients arranged beforehand to insure one another. After the randomization and grant, the winners compensated the losers and unraveled the field experiment.","username":"cblatts","name":"Chris Blattman","profile_image_url":"","date":"Tue Jan 18 19:01:29 +0000 2022","photos":[],"quoted_tweet":{},"reply_count":0,"retweet_count":77,"like_count":678,"impression_count":0,"expanded_url":{},"video_url":null,"belowTheFold":true}" data-component-name="Twitter2ToDOM"> 20: DeepMind made a programming AI that was able to participate in a human coding competition and place around the middle. Nostalgebraist gives his thoughts: “impressed with the raw performance, not massively surprised, not sold that it implies anything big in particular”. A lot of people will be watching whether it can win programming competitions outright a year or two from now, though I bet their perspectives on how relevant this is for AI takeoff speeds will be pretty mixed. 21: Effective altruist organizations as Zendaya outfits. 22: Brain Efficiency: Much More Than You Wanted To Know. “Why should we care? Brain efficiency matters a great deal for AGI timelines and takeoff speeds, as AGI is implicitly/explicitly defined in terms of brain parity.” 23: I’m not going throw out my copy of The Case Against Education just yet - I haven’t checked this study but I bet there are lots of possible confounders. Still, this would be fun for somebody more interested to analyze in depth: 24: Best of Scott Sumner archives: There’s Only One Sensible Way To Measure Economic Inequality. “You cannot put the burden of a tax on someone unless you cut into his or her consumption. If … tax increases did not cause Gates and Buffett to tighten their belts, then they paid precisely 0% of that tax increase. Someone else paid, even if they wrote the check. If they invested less due to the tax, then workers might have received lower wages. If they gave less to charity then very poor Africans paid the tax.” 25: The latest in the Greater Male Variability Hypothesis: Harrison, Noble, and Jennions publish a meta-analysis failing to find evidence of greater male variability in the personality of non-human animals. Del Giudice and Gangestad have a rebuttal saying that they were underpowered to detect it even if it did exist, plus noting the ways that media coverage of this study was incredibly irresponsible even by its own terms. 26: Some recent critiques of Cook (2014) on racial violence vs. black patents, including Michael Wiebe challenging the violence measures and AnechoicMedia arguing that the black patent measure declines right when switching from one (more complete) dataset to another (less complete) one. Rebuttal by Brad DeLong here, he argues that Cook uses multiple methods and some of them don’t have this problem. Relevant since Cook is now being considered for the Federal Reserve; see eg this Wall Street Journal editorial against. 27: Claim: 31% of British people say they have seen or met Queen Elizabeth (this seems plausible to me, I would answer ‘yes’ to this because she visited Ireland when I lived there, I watched the parade in her honor, and I could vaguely glimpse her on the inside of her car). 28: This couple-of-month-period in wokeness: Scientific American attacks late biologist EO Wilson, in a screed whose highlight is calling him problematic for describing ants as having “colonies”. This is part of a more general (and surprisingly fast) pivot at Scientific American from real science to culture warring; when even Eric Turkheimer thinks you’ve gotten too woke, you’ve gotten too woke.

Biological Anchors: A Trick That Might Or Might Not Work

February 23, 2022 · Original source

Source: This document by Paul Christiano. Ajeya combines this with another metric where they see how existing AI compares to animals with apparently similar computational capacity; for example, she says that DeepMind’s Starcraft engine has about as much inferential compute as a honeybee and seems about equally subjectively impressive. I have no idea what this means. Impressive at what? Winning multiplayer online games? Stinging people? In any case, they decide to penalize AI by one order of magnitude compared to Nature, so a human-level AI would need to do 10^16 floating point operations per second. How Much Compute Would It Take To Train A Model That Does 10^16 Floating Point Operations Per Second? So an AI could potentially equal the human brain with 10^16 FLOP/S. Good news! There’s a supercomputer in Japan that can do 10^17 FLOP/S! It looks like this (source) So why don’t we have AI yet? Why don’t we have ten AIs? In the modern paradigm of machine learning, it takes very big computers to train relatively small end-product AIs. If you tried to train GPT-3 on the same kind of medium-sized computers you run it on, it would take between tens and hundreds of years. Instead, you train GPT-3 on giant supercomputers like the ones above, get results in a few months, then run it on medium-sized computers, maybe ~10x better than the average desktop. But our hypothetical future human-level AI is 10^16 FLOP/S in inference mode. It needs to run on a giant supercomputer like the one in the picture. Nothing we have now could even begin to train it. There’s no direct and obvious way to convert inference requirements to training requirements. Ajeya tries assuming that each parameter will contribute about 10 FLOPs, which would mean the model would have about 10^15 parameters (GPT-3 has about 10^11 parameters). Finally, she uses some empirical scaling laws derived from looking at past machine learning projects to estimate that training 10^15 parameters would require H*10^30 FLOPs, where H represents the model’s “horizon”. If I understand this correctly, “horizon” is a reinforcement learning concept: how long does it take to learn how much reward you got for something? If you’re playing a slot machine, the answer is one second. If you’re starting a company, the answer might be ten years. So what horizon do you need for human level AI? Who knows? It probably depends on what human-level task you want the AI to do, plus how well an AI can learn to do that task from things less complex than the entire task. If writing a good book is mostly about learning to write good sentence and then stringing them together, a book-writing AI can get away with a short horizon. If nothing short of writing an entire book and then evaluating it to see whether it is good or bad can possibly teach you book-writing, the AI will need a long time horizon. Ajeya doesn’t claim to have a great answer for this, and considers three models: horizons of a few minutes, a few hours, and a few years. Each step up adds another three orders of magnitude, so she ends up with three estimates of 10^30, 10^33, and 10^36 FLOPs. (for reference, the lowest training estimate - 10^30 - would take the supercomputer pictured above 300,000 years to complete; the highest, 300 billion.) Or What If We Ignore All Of That And Do Something Else? This is piling a lot of assumptions atop each other, so Ajeya tries three other methods of figuring out how hard this training task is. Humans seem to be human-level AIs. How much training do we need? You can analogize our childhood to an AI’s training period. We receive a stream of sense-data. We start out flailing kind of randomly. Some of what we do gets rewarded. Some of what we do gets punished. Eventually our behavior becomes more sophisticated. We subject our new behavior to reward or punishment, fine-tune it further. Rent asks us: how do you measure the life of a woman or man? It answers: “in daylights, in sunsets, in midnights, in cups of coffee; in inches, in miles, in laughter, in strife.” But you can also measure in floating point operations, in which case the answer is about 10^24. This is actually trivial: multiply the 10^15 FLOP/S of the human brain by the ~10^9 seconds of childhood and adolescence. This new estimate of 10^24 is much lower than our neural net estimate of 10^30 - 10^36 above. In fact, it’s only a hair above the amount it took to train GPT-3! If human-level AI was this easy, we should have hit it by accident sometime in the process of making a GPT-4 prototype. Since OpenAI hasn’t mentioned this, probably it’s harder than this and we’re missing something. Probably we’re missing that humans aren’t blank slates. We don’t start at zero and then only use our childhood to train us further. The very structure of our brain encodes certain assumptions about what kinds of data we should be looking out for and how we should use it. Our training data isn’t just what we observed during childhood, it’s everything that any of our ancestors observed during evolution. How many floating-point operations is the evolutionary process? Ajeya estimates 10^41. I can’t believe I’m writing this. I can’t believe someone actually estimated the number of floating point operations involved in jellyfish rising out of the primordial ooze and eventually becoming fish and lizards and mammals and so on all the way to the Ascent of Man. Still, the idea is simple. You estimate how long animals with neurons have been around for (10^16 seconds), total number of animals at any given second (10^20) times average number of FLOPS per animal (10^5) and you can read more here but it comes out to 10^41 FLOs. I would not call this an exact estimate - for one thing, it assumes that all animals are nematodes, on the grounds that non-nematode animals are basically a rounding error in the grand scheme of things. But it does justify this bizarre assumption, and I don’t feel inclined to split hairs here - surely the total amount of computation performed by evolution is irrelevant except as an extreme upper bound? Surely the part where Australia got all those weird marsupials wasn’t strictly necessary for the human brain to have human-level intelligence? One more weird human training data estimate attempt: what about the genome? If in some sense a bit of information in the genome is a “parameter”, how many parameters does that suggest humans have, and how does it affect training time? Ajeya calculates that the genome has about 7.5x10^8 parameters (compared to 10^15 parameters in our neural net calculation, and 10^11 for GPT-3). So we can… Okay, I’ve got to admit, this doesn’t have quite the same “huh?!” factor as trying to calculate the number of FLOs in evolution, but it is in a lot of ways even crazier. The Japanese canopy plant has a genome fifty times larger than ours, which suggests that genome size doesn’t correspond very well to organism awesomeness. Also, most of the genome is coding for weird proteins that stabilize the shape of your kidney tubule or something, why should this matter for intelligence? The Japanese canopy plant. I think it is very pretty, but probably low prettiness per megabyte of DNA. I think Ajeya would answer that she’s debating orders of magnitude here, and each of these weird things costs only a few OOMs and probably they all even out. That still leaves the question of why she thinks this approach is interesting at all, to which she answers that: The motivating intuition is that evolution performed a search over a space of small, compact genomes which coded for large brains rather than directly searching over the much larger space of all possible large brains, and human researchers may be able to compete with evolution on this axis. So maybe instead of having to figure out how to generate a brain per se, you figure out how to generate some short(er) program that can output a brain? But this would be very different from how ML works now. Also, you need to give each short program the chance to unfold into a brain before you can evaluate it, which evolution has time for but we probably don’t. Ajeya sort of mentions these problems and counters with an argument that maybe you could think of the genome as a reinforcement learner with a long horizon. I don’t quite follow this but it sounds like the sort of thing that almost might make sense. Anyway, when you apply the scaling laws to a 7.5*10^8 parameter genome and penalize it for a long horizon, you get about 10^33 FLOPs, which is weirdly similar to some of the other estimates. So now we have six different training cost estimates. First, neural nets with short, medium, and long horizons, which are 10^30, 10^33, and 10^36 FLOPs, respectively. Next, the amount of training data in a human lifetime - 10^24 FLOs - and in all of evolutionary history - 10^41 FLOPs. And finally, this weird genome thing, which is 10^33 FLOPs. An optimist might say “Well, our lowest estimate is 10^24 FLOPs, our highest is 10^41 FLOPs, those sound like kind of similar numbers, at least there’s no “5 FLOPs” or “10^9999 FLOPs” in there. A pessimist might say “The difference between 10^24 and 10^41 is seventeen orders of magnitude, ie a factor of 100,000,000,000,000,000 times. This barely constrains our expectations at all!” Before we decide who to trust, let’s remember that we’re still only at Step 2 of our eight step Methodology, and continue. How Do We Adjust For Algorithmic Progress? So today, in 2022 (or in 2020 when this was written, or whenever), assume it would take about 10^33 FLOs to train a human-level AI. But technology constantly advances. Maybe we’ll discover ways to train AIs faster, or run AIs more efficiently, or something like that. How does that factor into our estimate? Ajeya draws on Hernandez & Brown’s Measuring The Algorithmic Efficiency Of Neural Networks. They look at how many FLOPs it took to train various image recognition AIs to an equivalent level of performance between 2012 and 2019, and find that over those seven years it decreased by a factor of 44x, ie training efficiency doubles every sixteen months! Ajeya assumes a doubling time slightly longer than that, because it’s easier to make progress in simple well-understood fields like image recognition than in the novel task of human-level AI. She chooses a doubling time of “merely” 2 - 3 years. If training efficiency doubles every 2-3 years, it would dectuple in about 10 years. So although it might take 10^33 FLOPs to train a human level AI today, in ten years or so it may take only 10^32, in twenty years 10^31, and so on. When Will Anyone Have Enough Computational Resources To Train A Human-Level AI? In 2020, AI researchers could buy computational resources at about $1 for 10^17 FLOPs. That means the 10^33 FLOPs you’d need to train a human-level AI would cost $10^16, ie ten quadrillion dollars. This is about twenty times more money than exists in the entire world. But compute costs fall quickly. Some formulations of Moore’s Law suggest it halves every eighteen months. These no longer seem to hold exactly, but it does seem to be halving maybe once every 2.5 years. The exact number is kind of controversial: Ajeya admits it’s been more like once every 3-4 years lately, but she heard good things about some upcoming chips and predicted it might revert back to the longer-term faster trend (it’s been two years now, some new chips have come out, and this prediction is looking pretty good). So as time goes on, algorithmic progress will cut the cost of training (in FLOPs), and hardware progress will also cut the cost of FLOPs (in dollars). So training will become gradually more affordable as time goes on. Once it reaches a cost somebody is willing to pay, they’ll buy human-level AI, and then that will be the year human-level AI happens. What is the cost that somebody (company? government? billionaire?) is willing to pay for human-level AI? The most expensive AI training in history was AlphaStar, a DeepMind project that spent over $1 million to train an AI to play StarCraft (in their defense, it won). But people have been pouring more and more money into AI lately: Source here. This is about compute rather than cost, but most of the increase seen here has been companies willing to pay for more compute over time, rather than algorithmic or hardware progress. The StarCraft AI was kind of a vanity project, or science for science’s sake, or whatever you want to call it. But AI is starting to become profitable, and human-level AI would be very profitable. Who knows how much companies will be willing to pay in the future? Ajeya extrapolates the line on the graph forward to 2025 and gets $1 billion. This is starting to sound kind of absurd - the entire company OpenAI was founded with $1 billion in venture capital, it seems like a lot to expect them to spend more than $1 billion on a single training run. So Ajeya backs off from this after 2025 and predicts a “two year doubling time”. This is not much of a concession. It still means that in 2040 someone might be spending $100 billion to train one AI. Is this at all plausible? At the height of the Manhattan Project, the US was investing about 0.5% of its GDP into the effort; a similar investment today would be worth $100 billion. And we’re about twice as rich as 2000, so 2040 might be twice as rich as we are. At that point, $100 billion for training an AI is within reach of Google and maybe a few individual billionaires (though it would still require most or all of their fortune). Ajeya creates a complicated function to assess how much money people will be willing to pay on giant AI projects per year. This looks like an upward-sloping curve. The line representing the likely cost of training a human-level AI looks like a downward sloping curve. At some point, those two curves meet, representing when human-level AI will first be trained. So When Will We Get Human-Level AI? The report gives a long distribution of dates based on weights assigned to the six different models, each of which has really wide confidence intervals and options for adjusting the mean and variance based on your assumptions. But the median of all of that is 10% chance by 2031, 50% chance by 2052, and almost 80% chance by 2100. Ajeya takes her six models and decides to weigh them like so, based on how plausible she thinks each one is: 20% neural net, short horizon 30% neural net, medium horizon 15% neural net, long horizon 5% human lifetime as training data 10% evolutionary history as training data 10% genome as parameter number She ends up with this: How Sensitive Is This To Changes In Assumptions? She very helpfully gives us a Colab notebook and Google spreadsheet to play around with. The notebook lets you change some of the more detailed parameters of the individual models, and the spreadsheet lets you change the big picture. I leave the notebook to people more dedicated to forecasting than I am, and will talk about the spreadsheet here. If you’re following along at home, the default spreadsheet won’t reflect Ajeya’s findings until you fill in the table in the bottom left like so: Great. Now that we’ve got that, let’s try changing some stuff. I like the human childhood training data argument (Lifetime Anchor) more than Ajeya does, and I like the size-of-the-genome argument less. I’m going to change the weights to 20-20-0-20-20-20. Also, Ajeya thinks that someone might be willing to spend 1% of national GDP on training AIs, but that sounds really high to me, so I’m going to down to 0.1%. Also, Ajeya’s estimate of 3% GDP growth sounds high for the sort of industrialized nations who might do AI research, I’m going to lower it to 2%. Since I’m feeling mistrustful today, let’s use the Hernandez&Brown estimate for compute halving (1.5 years) in place of Ajeya’s ad hoc adjustments. And let’s use the current compute halving time (3.5 years) instead of Ajeya’s overly rosy version (2.5 years). All these changes… …don’t really do much. The median goes from 2052 to about 2065. Four of the models give results between 2030 and 2070. The last two, Neural Net With Long Horizon and Evolution, suggest probably no AI this century (although Neural Net With Long Horizon does think there’s a 40% chance by 2100). Ajeya doesn’t really like either of these models and they’re not heavily weighted in her main result. Does The Truth Point To Itself? Back up a second. Here’s something that makes me kind of nervous. Most of Ajeya’s numbers are kind of made up, with several order-of-magnitude error bars and simplifying assumptions like “all animals are nematodes”. For a single parameter, we get estimates spanning seventeen different orders of magnitude: the upper bound is one hundred quadrillion times the lower bound. And yet four of the six models, including two genuinely exotic ones, manage to get dates within twenty years of 2050. And 2050 is also the date everyone else focuses on. Here’s the prediction-market-like site Metaculus: Their distribution looks a lot like Ajeya’s, and even has the same median, 2052 (though forecasters could have read Ajeya’s report). Katja Grace et al surveyed 352 AI experts, and they gave a median estimate of 2062 for an AI that could “outperform humans at all tasks” (though with many caveats and high sensitivity to question framing). This was before Ajeya’s report, so they definitely didn’t read it. So lots of Ajeya’s different methods and lots of other people presumably using different methodologies or no methodology at all, all converge on this same idea of 2050 give or take a decade or two. An optimist might say “The truth points to itself! There are 371 known proofs of the Pythagorean Theorem, and they all end up in the same place. That’s because no matter what methodology you use, if you use it well enough you get to the correct answer.” A pessimist might be more suspicious; we’ll return to this part later. FLOPS Alone Turn The Wheel Of History One more question: what if this is all bullshit? What if it’s an utterly useless total garbage steaming pile of grade A crap? Imagine a scientist in Victorian Britain, speculating on when humankind might invent ships that travel through space. He finds a natural anchor: the moon travels through space! He can observe things about the moon: for example, it is 220 miles in diameter (give or take an order of magnitude). So when humankind invents ships that are 220 miles in diameter, they can travel through space! Ships have certainly grown in size tremendously, from primitive kayaks to Roman triremes to Spanish galleons to the great ocean liners of the (Victorian) present. The AI forecasting organization AI Impacts actually has a whole report on historical ship size trends to prove an unrelated point about technological progress, so I didn’t even have to make this graph up. Suppose our Victorian scientist lived in 1858, right when the Great Eastern was launched. The trend line for ship size crossed 100m around 1843, and 200m in 1858, so doubling time is 15 years - but perhaps they notice this is going to be an outlier, so let’s round up a bit and say 18 years. The (one order of magnitude off estimate for the size of the) Moon is 350,000m, so you’d need ships to scale up by 350,000/200 = 1,750x before they’re as big as the Moon. That’s about 10.8 doublings, and a doubling time is 18 years, so we’ll get spaceships in . . . 2052 exactly. (fudging numbers to land where you want is actually fun and easy) SS Great Eastern, the extreme outlier large steamship from 1858. This has become sort of a mascot for quantitative technological progress forecasters. What is this scientist’s error? The big one is thinking that spaceship progress depends on some easily-measured quantity (size) instead of on fundamental advances (eg figuring out how rockets work). You can make the same accusation against Ajeya et al: you can have all the FLOPs in the world, but if you don’t understand how to make a machine think, your AI will be, well, a flop. Ajeya discusses this a bit on page 143 of her report. There is some sense in which FLOPs and knowing-what-you’re-doing trade of against each other. If you have literally no idea what you’re doing, you can sort of kind of re-run evolution until it comes up with something that looks good. If things are somehow even worse than that, you could always run AIXI, a hypothetical AI design guaranteed to get excellent results as long as you have infinite computation. You could run a Go engine by searching the entire branching tree structure of Go - you shouldn’t, and it would take a zillion times more compute than exists in the entire world, but you could. So in some sense what you’re doing, when you’re figuring out what you’re doing, is coming up with ways to do already-possible things more efficiently. But that’s just algorithmic progress, which Ajeya has already baked into her model. (our Victorian scientist: “As a reductio ad absurdum, you could always stand the ship on its end, and then climb up it to reach space. We’re just trying to make ships that are more efficient than that.”) Part II: Biology-Inspired AI Timelines: The Trick That Never Works Eliezer Yudkowsky presents a more subtle version of these kinds of objection in an essay called Biology-Inspired AI Timelines: The Trick That Never Works, published December 2021. Ajeya’s report is a 169-page collection of equations, graphs, and modeling assumptions. Yudkowsky’s rebuttal is a fictional dialogue between himself, younger versions of himself, famous AI scientists, and other bit players. At one point, a character called “Humbali” shows up begging Yudkowsky to be more humble, and Yudkowsky defeats him with devastating counterarguments. Still, he did found the field, so I guess everyone has to listen to him. He starts: in 1988, famous AI scientist Hans Moravec predicted human-level AI by 2010. He was using the same methodology as Ajeya: extrapolate how quickly processing power would grow (in FLOP/S), and see when it would match some estimate of the human brain. Moravec got the processing power almost exactly right (it hit his 2010 projection in 2008) and his human brain estimate pretty close (he says 10^13 FLOP/S, Ajeya says 10^15, this 2 OOM difference only delays things a few years), yet there was not human-level AI in 2010. What happened? Ajeya's answer could be: Moravec didn't realize that, in the modern ML paradigm, any given size of program requires a much bigger program to train. Ajeya, who has a 35-year advantage on Moravec, estimates approximately the same power for the finished program (10^16 vs. 10^13 FLOP/S) but says that training the 10^16 FLOP/S program will require 10^33ish FLOPs. Eliezer agrees as far as it goes, but says this points to a much deeper failure mode, which was that Moravec had no idea what he was doing. He was assuming processing power of human brain = processing power of computer necessary for AGI. Why? The human brain consumes around 20 watts of power. Can we thereby conclude that an AGI should consume around 20 watts of power, and that, when technology advances to the point of being able to supply around 20 watts of power to computers, we'll get AGI? […] You say that AIs consume energy in a very different way from brains? Well, they'll also consume computations in a very different way from brains! The only difference between these two cases is that you know something about how humans eat food and break it down in their stomachs and convert it into ATP that gets consumed by neurons to pump ions back out of dendrites and axons, while computer chips consume electricity whose flow gets interrupted by transistors to transmit information. Since you know anything whatsoever about how AGIs and humans consume energy, you can see that the consumption is so vastly different as to obviate all comparisons entirely. You are ignorant of how the brain consumes computation, you are ignorant of how the first AGIs built would consume computation, but "an unknown key does not open an unknown lock" and these two ignorant distributions should not assert much internal correlation between them. Cars don’t move by contracting their leg muscles and planes don’t fly by flapping their wings like birds. Telescopes do form images the same way as the lenses in our eyes, but differ by so many orders of magnitude in every important way that they defy comparison. Why should AI be different? You have to use some specific algorithm when you’re creating AI; why should we expect it to be anywhere near the same efficiency as the ones Nature uses in our brains? The same is true for arguments from evolution, eg Ajeya’s Evolutionary Anchor, ie “it took evolution 10^43 FLOPs of computation to evolve the human brain so maybe that will be the training cost”. AI scientists sitting in labs trying to figure things out, and nematodes getting eaten by other nematodes, are such different methods for designing things that it’s crazy to use one as an estimate for the other. Algorithmic Progress vs. Algorithmic Paradigm Shifts This post is a dialogue, so (Eliezer’s hypothetical model of) OpenPhil gets a chance to respond. They object: this is why we put a term for algorithmic progress in our model. The model isn’t very sensitive to changes in that term. If you want you can set it to some kind of crazy high value and see what happens, but you can’t say we didn’t consider it. OpenPhil: We did already consider that and try to take it into account: our model already includes a parameter for how algorithmic progress reduces hardware requirements. It's not easy to graph as exactly as Moore's Law, as you say, but our best-guess estimate is that compute costs halve every 2-3 years […] Eliezer: The makers of AGI aren't going to be doing 10,000,000,000,000 rounds of gradient descent, on entire brain-sized 300,000,000,000,000-parameter models, algorithmically faster than today. They're going to get to AGI via some route that you don't know how to take, at least if it happens in 2040. If it happens in 2025, it may be via a route that some modern researchers do know how to take, but in this case, of course, your model was also wrong. They're not going to be taking your default-imagined approach algorithmically faster, they're going to be taking an algorithmically different approach that eats computing power in a different way than you imagine it being consumed. OpenPhil: Shouldn't that just be folded into our estimate of how the computation required to accomplish a fixed task decreases by half every 2-3 years due to better algorithms? Eliezer: Backtesting this viewpoint on the previous history of computer science, it seems to me to assert that it should be possible to: Train a pre-Transformer RNN/CNN-based model, not using any other techniques invented after 2017, to GPT-2 levels of performance, using only around 2x as much compute as GPT-2;

The Japanese canopy plant. I think it is very pretty, but probably low prettiness per megabyte of DNA. I think Ajeya would answer that she’s debating orders of magnitude here, and each of these weird things costs only a few OOMs and probably they all even out. That still leaves the question of why she thinks this approach is interesting at all, to which she answers that: The motivating intuition is that evolution performed a search over a space of small, compact genomes which coded for large brains rather than directly searching over the much larger space of all possible large brains, and human researchers may be able to compete with evolution on this axis. So maybe instead of having to figure out how to generate a brain per se, you figure out how to generate some short(er) program that can output a brain? But this would be very different from how ML works now. Also, you need to give each short program the chance to unfold into a brain before you can evaluate it, which evolution has time for but we probably don’t. Ajeya sort of mentions these problems and counters with an argument that maybe you could think of the genome as a reinforcement learner with a long horizon. I don’t quite follow this but it sounds like the sort of thing that almost might make sense. Anyway, when you apply the scaling laws to a 7.5*10^8 parameter genome and penalize it for a long horizon, you get about 10^33 FLOPs, which is weirdly similar to some of the other estimates. So now we have six different training cost estimates. First, neural nets with short, medium, and long horizons, which are 10^30, 10^33, and 10^36 FLOPs, respectively. Next, the amount of training data in a human lifetime - 10^24 FLOs - and in all of evolutionary history - 10^41 FLOPs. And finally, this weird genome thing, which is 10^33 FLOPs. An optimist might say “Well, our lowest estimate is 10^24 FLOPs, our highest is 10^41 FLOPs, those sound like kind of similar numbers, at least there’s no “5 FLOPs” or “10^9999 FLOPs” in there. A pessimist might say “The difference between 10^24 and 10^41 is seventeen orders of magnitude, ie a factor of 100,000,000,000,000,000 times. This barely constrains our expectations at all!” Before we decide who to trust, let’s remember that we’re still only at Step 2 of our eight step Methodology, and continue. How Do We Adjust For Algorithmic Progress? So today, in 2022 (or in 2020 when this was written, or whenever), assume it would take about 10^33 FLOs to train a human-level AI. But technology constantly advances. Maybe we’ll discover ways to train AIs faster, or run AIs more efficiently, or something like that. How does that factor into our estimate? Ajeya draws on Hernandez & Brown’s Measuring The Algorithmic Efficiency Of Neural Networks. They look at how many FLOPs it took to train various image recognition AIs to an equivalent level of performance between 2012 and 2019, and find that over those seven years it decreased by a factor of 44x, ie training efficiency doubles every sixteen months! Ajeya assumes a doubling time slightly longer than that, because it’s easier to make progress in simple well-understood fields like image recognition than in the novel task of human-level AI. She chooses a doubling time of “merely” 2 - 3 years. If training efficiency doubles every 2-3 years, it would dectuple in about 10 years. So although it might take 10^33 FLOPs to train a human level AI today, in ten years or so it may take only 10^32, in twenty years 10^31, and so on. When Will Anyone Have Enough Computational Resources To Train A Human-Level AI? In 2020, AI researchers could buy computational resources at about $1 for 10^17 FLOPs. That means the 10^33 FLOPs you’d need to train a human-level AI would cost $10^16, ie ten quadrillion dollars. This is about twenty times more money than exists in the entire world. But compute costs fall quickly. Some formulations of Moore’s Law suggest it halves every eighteen months. These no longer seem to hold exactly, but it does seem to be halving maybe once every 2.5 years. The exact number is kind of controversial: Ajeya admits it’s been more like once every 3-4 years lately, but she heard good things about some upcoming chips and predicted it might revert back to the longer-term faster trend (it’s been two years now, some new chips have come out, and this prediction is looking pretty good). So as time goes on, algorithmic progress will cut the cost of training (in FLOPs), and hardware progress will also cut the cost of FLOPs (in dollars). So training will become gradually more affordable as time goes on. Once it reaches a cost somebody is willing to pay, they’ll buy human-level AI, and then that will be the year human-level AI happens. What is the cost that somebody (company? government? billionaire?) is willing to pay for human-level AI? The most expensive AI training in history was AlphaStar, a DeepMind project that spent over $1 million to train an AI to play StarCraft (in their defense, it won). But people have been pouring more and more money into AI lately: Source here. This is about compute rather than cost, but most of the increase seen here has been companies willing to pay for more compute over time, rather than algorithmic or hardware progress. The StarCraft AI was kind of a vanity project, or science for science’s sake, or whatever you want to call it. But AI is starting to become profitable, and human-level AI would be very profitable. Who knows how much companies will be willing to pay in the future? Ajeya extrapolates the line on the graph forward to 2025 and gets $1 billion. This is starting to sound kind of absurd - the entire company OpenAI was founded with $1 billion in venture capital, it seems like a lot to expect them to spend more than $1 billion on a single training run. So Ajeya backs off from this after 2025 and predicts a “two year doubling time”. This is not much of a concession. It still means that in 2040 someone might be spending $100 billion to train one AI. Is this at all plausible? At the height of the Manhattan Project, the US was investing about 0.5% of its GDP into the effort; a similar investment today would be worth $100 billion. And we’re about twice as rich as 2000, so 2040 might be twice as rich as we are. At that point, $100 billion for training an AI is within reach of Google and maybe a few individual billionaires (though it would still require most or all of their fortune). Ajeya creates a complicated function to assess how much money people will be willing to pay on giant AI projects per year. This looks like an upward-sloping curve. The line representing the likely cost of training a human-level AI looks like a downward sloping curve. At some point, those two curves meet, representing when human-level AI will first be trained. So When Will We Get Human-Level AI? The report gives a long distribution of dates based on weights assigned to the six different models, each of which has really wide confidence intervals and options for adjusting the mean and variance based on your assumptions. But the median of all of that is 10% chance by 2031, 50% chance by 2052, and almost 80% chance by 2100. Ajeya takes her six models and decides to weigh them like so, based on how plausible she thinks each one is: 20% neural net, short horizon 30% neural net, medium horizon 15% neural net, long horizon 5% human lifetime as training data 10% evolutionary history as training data 10% genome as parameter number She ends up with this: How Sensitive Is This To Changes In Assumptions? She very helpfully gives us a Colab notebook and Google spreadsheet to play around with. The notebook lets you change some of the more detailed parameters of the individual models, and the spreadsheet lets you change the big picture. I leave the notebook to people more dedicated to forecasting than I am, and will talk about the spreadsheet here. If you’re following along at home, the default spreadsheet won’t reflect Ajeya’s findings until you fill in the table in the bottom left like so: Great. Now that we’ve got that, let’s try changing some stuff. I like the human childhood training data argument (Lifetime Anchor) more than Ajeya does, and I like the size-of-the-genome argument less. I’m going to change the weights to 20-20-0-20-20-20. Also, Ajeya thinks that someone might be willing to spend 1% of national GDP on training AIs, but that sounds really high to me, so I’m going to down to 0.1%. Also, Ajeya’s estimate of 3% GDP growth sounds high for the sort of industrialized nations who might do AI research, I’m going to lower it to 2%. Since I’m feeling mistrustful today, let’s use the Hernandez&Brown estimate for compute halving (1.5 years) in place of Ajeya’s ad hoc adjustments. And let’s use the current compute halving time (3.5 years) instead of Ajeya’s overly rosy version (2.5 years). All these changes… …don’t really do much. The median goes from 2052 to about 2065. Four of the models give results between 2030 and 2070. The last two, Neural Net With Long Horizon and Evolution, suggest probably no AI this century (although Neural Net With Long Horizon does think there’s a 40% chance by 2100). Ajeya doesn’t really like either of these models and they’re not heavily weighted in her main result. Does The Truth Point To Itself? Back up a second. Here’s something that makes me kind of nervous. Most of Ajeya’s numbers are kind of made up, with several order-of-magnitude error bars and simplifying assumptions like “all animals are nematodes”. For a single parameter, we get estimates spanning seventeen different orders of magnitude: the upper bound is one hundred quadrillion times the lower bound. And yet four of the six models, including two genuinely exotic ones, manage to get dates within twenty years of 2050. And 2050 is also the date everyone else focuses on. Here’s the prediction-market-like site Metaculus: Their distribution looks a lot like Ajeya’s, and even has the same median, 2052 (though forecasters could have read Ajeya’s report). Katja Grace et al surveyed 352 AI experts, and they gave a median estimate of 2062 for an AI that could “outperform humans at all tasks” (though with many caveats and high sensitivity to question framing). This was before Ajeya’s report, so they definitely didn’t read it. So lots of Ajeya’s different methods and lots of other people presumably using different methodologies or no methodology at all, all converge on this same idea of 2050 give or take a decade or two. An optimist might say “The truth points to itself! There are 371 known proofs of the Pythagorean Theorem, and they all end up in the same place. That’s because no matter what methodology you use, if you use it well enough you get to the correct answer.” A pessimist might be more suspicious; we’ll return to this part later. FLOPS Alone Turn The Wheel Of History One more question: what if this is all bullshit? What if it’s an utterly useless total garbage steaming pile of grade A crap? Imagine a scientist in Victorian Britain, speculating on when humankind might invent ships that travel through space. He finds a natural anchor: the moon travels through space! He can observe things about the moon: for example, it is 220 miles in diameter (give or take an order of magnitude). So when humankind invents ships that are 220 miles in diameter, they can travel through space! Ships have certainly grown in size tremendously, from primitive kayaks to Roman triremes to Spanish galleons to the great ocean liners of the (Victorian) present. The AI forecasting organization AI Impacts actually has a whole report on historical ship size trends to prove an unrelated point about technological progress, so I didn’t even have to make this graph up. Suppose our Victorian scientist lived in 1858, right when the Great Eastern was launched. The trend line for ship size crossed 100m around 1843, and 200m in 1858, so doubling time is 15 years - but perhaps they notice this is going to be an outlier, so let’s round up a bit and say 18 years. The (one order of magnitude off estimate for the size of the) Moon is 350,000m, so you’d need ships to scale up by 350,000/200 = 1,750x before they’re as big as the Moon. That’s about 10.8 doublings, and a doubling time is 18 years, so we’ll get spaceships in . . . 2052 exactly. (fudging numbers to land where you want is actually fun and easy) SS Great Eastern, the extreme outlier large steamship from 1858. This has become sort of a mascot for quantitative technological progress forecasters. What is this scientist’s error? The big one is thinking that spaceship progress depends on some easily-measured quantity (size) instead of on fundamental advances (eg figuring out how rockets work). You can make the same accusation against Ajeya et al: you can have all the FLOPs in the world, but if you don’t understand how to make a machine think, your AI will be, well, a flop. Ajeya discusses this a bit on page 143 of her report. There is some sense in which FLOPs and knowing-what-you’re-doing trade of against each other. If you have literally no idea what you’re doing, you can sort of kind of re-run evolution until it comes up with something that looks good. If things are somehow even worse than that, you could always run AIXI, a hypothetical AI design guaranteed to get excellent results as long as you have infinite computation. You could run a Go engine by searching the entire branching tree structure of Go - you shouldn’t, and it would take a zillion times more compute than exists in the entire world, but you could. So in some sense what you’re doing, when you’re figuring out what you’re doing, is coming up with ways to do already-possible things more efficiently. But that’s just algorithmic progress, which Ajeya has already baked into her model. (our Victorian scientist: “As a reductio ad absurdum, you could always stand the ship on its end, and then climb up it to reach space. We’re just trying to make ships that are more efficient than that.”) Part II: Biology-Inspired AI Timelines: The Trick That Never Works Eliezer Yudkowsky presents a more subtle version of these kinds of objection in an essay called Biology-Inspired AI Timelines: The Trick That Never Works, published December 2021. Ajeya’s report is a 169-page collection of equations, graphs, and modeling assumptions. Yudkowsky’s rebuttal is a fictional dialogue between himself, younger versions of himself, famous AI scientists, and other bit players. At one point, a character called “Humbali” shows up begging Yudkowsky to be more humble, and Yudkowsky defeats him with devastating counterarguments. Still, he did found the field, so I guess everyone has to listen to him. He starts: in 1988, famous AI scientist Hans Moravec predicted human-level AI by 2010. He was using the same methodology as Ajeya: extrapolate how quickly processing power would grow (in FLOP/S), and see when it would match some estimate of the human brain. Moravec got the processing power almost exactly right (it hit his 2010 projection in 2008) and his human brain estimate pretty close (he says 10^13 FLOP/S, Ajeya says 10^15, this 2 OOM difference only delays things a few years), yet there was not human-level AI in 2010. What happened? Ajeya's answer could be: Moravec didn't realize that, in the modern ML paradigm, any given size of program requires a much bigger program to train. Ajeya, who has a 35-year advantage on Moravec, estimates approximately the same power for the finished program (10^16 vs. 10^13 FLOP/S) but says that training the 10^16 FLOP/S program will require 10^33ish FLOPs. Eliezer agrees as far as it goes, but says this points to a much deeper failure mode, which was that Moravec had no idea what he was doing. He was assuming processing power of human brain = processing power of computer necessary for AGI. Why? The human brain consumes around 20 watts of power. Can we thereby conclude that an AGI should consume around 20 watts of power, and that, when technology advances to the point of being able to supply around 20 watts of power to computers, we'll get AGI? […] You say that AIs consume energy in a very different way from brains? Well, they'll also consume computations in a very different way from brains! The only difference between these two cases is that you know something about how humans eat food and break it down in their stomachs and convert it into ATP that gets consumed by neurons to pump ions back out of dendrites and axons, while computer chips consume electricity whose flow gets interrupted by transistors to transmit information. Since you know anything whatsoever about how AGIs and humans consume energy, you can see that the consumption is so vastly different as to obviate all comparisons entirely. You are ignorant of how the brain consumes computation, you are ignorant of how the first AGIs built would consume computation, but "an unknown key does not open an unknown lock" and these two ignorant distributions should not assert much internal correlation between them. Cars don’t move by contracting their leg muscles and planes don’t fly by flapping their wings like birds. Telescopes do form images the same way as the lenses in our eyes, but differ by so many orders of magnitude in every important way that they defy comparison. Why should AI be different? You have to use some specific algorithm when you’re creating AI; why should we expect it to be anywhere near the same efficiency as the ones Nature uses in our brains? The same is true for arguments from evolution, eg Ajeya’s Evolutionary Anchor, ie “it took evolution 10^43 FLOPs of computation to evolve the human brain so maybe that will be the training cost”. AI scientists sitting in labs trying to figure things out, and nematodes getting eaten by other nematodes, are such different methods for designing things that it’s crazy to use one as an estimate for the other. Algorithmic Progress vs. Algorithmic Paradigm Shifts This post is a dialogue, so (Eliezer’s hypothetical model of) OpenPhil gets a chance to respond. They object: this is why we put a term for algorithmic progress in our model. The model isn’t very sensitive to changes in that term. If you want you can set it to some kind of crazy high value and see what happens, but you can’t say we didn’t consider it. OpenPhil: We did already consider that and try to take it into account: our model already includes a parameter for how algorithmic progress reduces hardware requirements. It's not easy to graph as exactly as Moore's Law, as you say, but our best-guess estimate is that compute costs halve every 2-3 years […] Eliezer: The makers of AGI aren't going to be doing 10,000,000,000,000 rounds of gradient descent, on entire brain-sized 300,000,000,000,000-parameter models, algorithmically faster than today. They're going to get to AGI via some route that you don't know how to take, at least if it happens in 2040. If it happens in 2025, it may be via a route that some modern researchers do know how to take, but in this case, of course, your model was also wrong. They're not going to be taking your default-imagined approach algorithmically faster, they're going to be taking an algorithmically different approach that eats computing power in a different way than you imagine it being consumed. OpenPhil: Shouldn't that just be folded into our estimate of how the computation required to accomplish a fixed task decreases by half every 2-3 years due to better algorithms? Eliezer: Backtesting this viewpoint on the previous history of computer science, it seems to me to assert that it should be possible to: Train a pre-Transformer RNN/CNN-based model, not using any other techniques invented after 2017, to GPT-2 levels of performance, using only around 2x as much compute as GPT-2;

Inline links: Measuring The Algorithmic Efficiency Of Neural Networks, https://substackcdn.com/image/fetch/$s_!dX1J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9496f1f-ec6c-41a2-8c2e-27f09da22097_1280x759.png, here, https://substackcdn.com/image/fetch/$s_!LnC0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F62d647ff-58ed-4e9a-9f1a-7febf5859249_1152x842.png, Colab notebook, Google spreadsheet, https://substackcdn.com/image/fetch/$s_!BND-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F622bac28-eaa6-40b5-b93b-695952966ef7_744x324.png, https://substackcdn.com/image/fetch/$s_!lbos!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7d5c2306-a123-4903-adb9-d961d56ebfb5_1152x842.png, Metaculus, https://substackcdn.com/image/fetch/$s_!SMnF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F807f66de-8c5c-4423-b293-ca92b5b64053_763x360.png, surveyed 352 AI experts, https://substackcdn.com/image/fetch/$s_!JxQ5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fceba6aa0-dbde-41ca-805e-01af4fac9324_769x336.png, a whole report on historical ship size trends, https://substackcdn.com/image/fetch/$s_!PRDj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fde3d97f4-afca-45c4-9ed2-521cd25041df_460x262.jpeg, AIXI, Biology-Inspired AI Timelines: The Trick That Never Works

Deceptively Aligned Mesa-Optimizers: It's Not Funny If I Have To Explain It

April 11, 2022 · Original source

Prosaic alignment is hard… “Prosaic alignment” (see this article for more) means alignment of normal AIs like the ones we use today. For a while, people thought those AIs couldn’t reach dangerous levels, and that AIs that reached dangerous levels would have so many exotic new discoveries that we couldn’t even begin to speculate on what they would be like or how to align them. After GPT-2, DALL-E, and the rest, alignment researchers got more concerned that AIs kind of like current models could be dangerous. Prosaic alignment - trying to align AIs like the ones we have now - has become the dominant (though not unchallenged) paradigm in alignment research. “Prosaic” doesn’t necessarily mean the AI cannot write poetry; see Gwern’s AI generated poetry for examples. … because OOD behavior is unpredictable “OOD” stands for “out of distribution”. All AIs are trained in a certain environment. Then they get deployed in some other environment. If it’s like the training environment, presumably their training is pretty relevant and helpful. If it’s not like the training environment, anything can happen. Returning to our stock example, the “training environment” where evolution designed humans didn’t involve contraceptives. In that environment, the base optimizer’s goal (pass on genes) and the mesa-optimizer’s goal (get genital friction) were very well-aligned - doing one often led to the other - so there wasn’t much pressure on evolution to look for a better proxy. Then 1957, boom, the FDA approves the oral contraceptive pill, and suddenly the deployment environment looks really really different from the training environment and the proxy collapses so humiliatingly that people start doing crazy things like electing Viktor Orban prime minister. So: suppose we train a robot to pick strawberries. We let it flail around in a strawberry patch, and reinforce it whenever strawberries end up in a bucket. Eventually it learns to pick strawberries very well indeed. But maybe all the training was done on a sunny day. And maybe what it actually learned was to identify the metal bucket by the way it gleamed in the sunlight. Later we ask it to pick strawberries in the evening, where a local streetlight is the brightest thing around, and it throws the strawberries at the streetlight instead. So fine. We train it in a variety of different lighting conditions, until we’re sure that, no matter what the lighting situation, the strawberries go in the bucket. Then one day someone with a big bulbous red nose wanders on to the field, and the robot tears his nose off and pulls it into the bucket. If only there had been someone with a nose that big and red in the training distribution, so we could have told it not to do that! The point is, just because it’s learned “strawberries into bucket” in one environment, doesn’t mean it’s safe or effective in another. And we can never be sure we’ve caught all the ways the environment can vary. …and deception is more dangerous than Goodharting. To “Goodhart” is to take advantage of Goodhart’s Law: to follow the letter of your reward function, rather than the spirit. The ordinary-life equivalent is “teaching to the test”. The system’s programmers (eg the Department of Education) have an objective (children should learn). They delegate that objective to mesa-optimizers (the teachers) via a proxy objective (children should do well on the standardized test) and a correlated reward function (teachers get paid more if their students get higher test scores). The teachers can either pursue the base objective for less reward (teach children useful skills), or pursue their mesa-level objective for more reward (teach them how to do well on the test). An alignment failure! This sucks, but it’s a bounded problem. We already know that some teachers teach to the test, and the Department of Education has accepted this as a reasonable cost of having the incentive system at all. We might imagine our strawberry-picker cutting strawberries into little pieces, so that it counts as having picked more strawberries. Again, it sucks, but once a programmer notices it can be fixed pretty quickly (as long as the AI is still weak and under control). What about deception? Suppose the strawberry-picker happens to land on some goal function other than the intended one. Maybe, as before, it wants to toss strawberries at light sources, in a way that works when the nearest light source is a metal bucket, but fails when it’s a streetlight. Our programmers are (somewhat) smart and careful, so during training, they test it at night, next to a streetlight. What happens? If it’s just a dumb collection of reflexes trained by gradient descent, it throws the strawberry at the streetlight and this is easily caught and fixed. If it’s a very smart mesa-optimizer, it might think “If I throw the strawberry at the streetlight, I will be caught and trained to have different goals. This totally fails to achieve my goal of having strawberries near light sources. So throwing the strawberry at the light source this time, in the training environment, fails to achieve my overall goal of having strawberries thrown at light sources in general. I’ll do what the humans want - put the strawberry in the bucket - for now.” So it puts the strawberry in the bucket and doesn’t get caught. Then, as soon as the humans stop looking, it throws strawberries at the streetlight again. Deception is more dangerous than Goodharting because Goodharting will get caught and trained away, and deception might not. I might not be explaining this well, see also Deceptively Aligned Mesa-Optimizers? It’s More Likely Than You Think: We prevent OOD behavior by detecting OOD and obtaining more human labels when we detect it… If you’re (somewhat) careful, you can run your strawberry-picking AI at night, see it throw strawberries at streetlights, and train it out of this behavior (ie have a human programmer label it “bad” so the AI gradient-descends away from it) …and we eliminate the incentive for deception by ensuring that the base optimizer is myopic A myopic optimizer is one that reinforces programs based only on their performance within a short time horizon. So for example, the outside gradient descent loop might grade a strawberry picker only on how well it did picking strawberries for the first hour it was deployed. If this worked perfectly, it would create an optimizer with a short time horizon. When it considered deceiving its programmers in order to get a payoff a few days later when they stopped watching it, it wouldn’t bother, since a few days later is outside the time horizon. …and implements a decision theory incapable of acausal trade. You don’t want to know about this one, really. Just pretend it never mentioned this, sorry for the inconvenience. There are deceptively-aligned non-myopic mesa-optimizers even for a myopic base objective. Even if the base optimizer is myopic, the mesa-optimizer might not be. Evolution designed humans myopically, in the sense that we live some number of years, and nothing that happens after that can reward or punish us further. But we still “build for posterity” anyway, presumably as a spandrel of having working planning software at all. Infinite optimization power might be able to evolve this out of us, but infinite optimization power could do lots of stuff, and real evolution remains stubbornly finite. Maybe it would be helpful if we could make the mesa-optimizer itself myopic (though this would severely limit its utility). But so far there is no way to make a mesa-optimizer anything. You just run the gradient descent and cross your fingers. The most likely outcome: you run myopic gradient descent to create a strawberry picker. It creates a mesa-optimizer with some kind of proxy goal which corresponds very well to strawberry picking in the training optimization, like flinging red things at lights (realistically it will be weirder and more exotic than this). The mesa-optimizer is not incentivized to think about anything more than an hour out, but does so anyway, for the same reason I’m not incentivized to speculate about the far future but I’m doing so anyway. While speculating about the far future, it realizes that failing to pick strawberries correctly now will thwart its goal of throwing red things at light sources later. It picks strawberries correctly in the training distribution, and then, when training is over and nobody is watching, throws strawberries at streetlights. (Then it realizes it could throw lots more red things at light sources if it was more powerful, achieves superintelligence somehow, and converts the mass of the Earth into red things it can throw at the sun. The end.) III. You’re still here? But we already finished explaining the meme! Okay, fine. Is any of this relevant to the real world? As far as we know, there are no existing full mesa-optimizers. AlphaGo is kind of a mesa-optimizer. You could approximate it as a gradient descent loop creating a good-Go-move optimizer. But this would only be an approximation: DeepMind hard-coded some parts of AlphaGo, then gradient-descended other parts. Its objective function is “win games of Go”, which is hard-coded and pretty clear. Whether or not you choose to call it a mesa-optimizer, it’s not a very scary one. Will we get scary mesa-optimizers in the future? This ties into one of the longest-running debates in AI alignment - see eg my review of Reframing Superintelligence, or the Eliezer Yudkowsky/Richard Ngo dialogue. Optimists say: “Since a goal-seeking AI might kill everyone, I would simply not create one”. They speculate about mechanical/instinctual superintelligences that would be comparatively easy to align, and might help us figure out how to deal with their scarier cousins. But the mesa-optimizer literature argues: we have limited to no control over what kind of AIs we get. We can hope and pray for mechanical instinctual AIs all we want. We can avoid specifically designing goal-seeking AIs. But really, all we’re doing here is setting up a gradient descent loop and pressing ‘go’. Then the loop evolves whatever kind of AI best minimizes our loss function. Will that be a mesa-optimizer? Well, I benefit from considering my actions and then choosing the one that best achieves my goal. Do you benefit from this? It sure does seem like this helps in a broad class of situations. So it would be surprising if planning agents weren’t an effective AI design. And if they are, we should expect gradient descent to stumble across them eventually. This is the scenario that a lot of AI alignment research focuses on. When we create the first true planning agent - on purpose or by accident - the process will probably start with us running a gradient descent loop with some objective function. That will produce a mesa-optimizer with some other, potentially different, objective function. Making sure you actually like the objective function that you gave the original gradient descent loop on purpose is called outer alignment. Carrying that objective function over to the mesa-optimizer you actually get is called inner alignment. Outer alignment problems tend to sound like Sorcerer’s Apprentice. We tell the AI to pick strawberries, but we forgot to include caveats and stop signals. The AI becomes superintelligent and converts the whole world into strawberries so it can pick as many as possible. Inner alignment problems tend to sound like the AI tiling the universe with some crazy thing which, to humans, might not look like picking strawberries at all, even though in the AI’s exotic ontology it served as some useful proxy for strawberries in the training distribution. My stand-in for this is “converts the whole world into red things and throws them into the sun”, but whatever the AI that kills us really does will probably be weirder than that. They’re not ironic Sorcerer’s Apprentice-style comeuppance. They’re just “what?” If you wrote a book about a wizard who created a strawberry-picking golem, and it converted the entire earth into ferrous microspheres and hurled them into the sun, it wouldn’t become iconic the way Sorcerer’s Apprentice did. Inner alignment problems happen “first”, so we won’t even make it to the good-story outer alignment kind unless we solve a lot of issues we don’t currently know how to solve. For more information, you can read: Rob Miles’ video above, direct link here, channel here.

Inline links: this article, Gwern’s AI generated poetry, electing Viktor Orban prime minister, Goodhart’s Law, Deceptively Aligned Mesa-Optimizers? It’s More Likely Than You Think, my review of, Eliezer Yudkowsky/Richard Ngo dialogue, here, here

Open Thread 224

May 14, 2022 · Original source

2: DeepMind’s AI alignment team is hiring researchers and software engineers.

Inline links: hiring researchers and software engineers

Links For June

July 01, 2022 · Original source

4: DeepMind on AGI (podcast transcript). Co-founder Shane Legg says that "maybe we will have an AGI in a decade". Other co-founder Demis Hassabis says "I wouldn't be super surprised in the next decade or two." Hassabis also reveals that he's asked Terence Tao about working on AI alignment (no sign Tao is interested).

Inline links: DeepMind on AGI

Why Not Slow AI Progress?

August 08, 2022 · Original source

This is how AI safety works now. AI capabilities - the work of researching bigger and better AI - is poorly differentiated from AI safety - the work of preventing AI from becoming dangerous. Two of the biggest AI safety teams are at DeepMind and OpenAI, ie the two biggest AI capabilities companies. Some labs straddle the line between capabilities and safety research.

Probably the people at DeepMind and OpenAI think this makes sense. Building AIs and aligning AIs could be complementary goals, like building airplanes and preventing the airplanes from crashing. It sounds superficially plausible.

DeepMind was co-founded by Shane Legg, a very early AI safety proponent who did his 2007 PhD thesis on superintelligence

Grading My 2018 Predictions For 2023

February 20, 2023 · Original source

The leading big tech company (eg Google/Apple/Meta) is (clearly ahead of/approximately caught up to/clearly still behind) the leading AI-only company (DeepMind/OpenAI/Anthropic) in the quality of their AI products: (25%/50%/25%)

OpenAI's "Planning For AGI And Beyond"

March 01, 2023 · Original source

DeepMind thought they were establishing a lead in 2008, but OpenAI has caught up to them. OpenAI thought they were establishing a lead the past two years, but a few months after they came out with GPT, at least Google, Facebook, and Anthropic had comparable large language models; a few months after they came out with DALL-E, random nobody startups came out with StableDiffusion and MidJourney. None of this research has established a commanding lead, it’s just moved everyone forward together and burned timelines for no reason.

On the other hand - man, they sure have burned a lot of timeline. The big thing all the alignment people were trying to avoid in the early 2010s was an AI race. DeepMind was the first big AI company, so we should just let them to their thing, go slowly, get everything right, and avoid hype. Then Elon Musk founded OpenAI in 2015, murdered that plan, mutilated the corpse, and danced on its grave. Even after Musk left, the remaining team did everything to challenge everyone else to a race short of shooting a gun and waving a checkered flag.

Links For July 2023

July 06, 2023 · Original source

DeepMind founder Mustafa Suleyman and others announce that their new company, InflectionAI, exists and has raised $1 billion in funding. Still, Manifold classes it as only a minor contender:

Inline links: InflectionAI

Links For September 2023

September 28, 2023 · Original source

41: AI company Anthropic announces partnership with Amazon (including $1.25 - 4 billion investment). This was predictable: the story of the AI industry so far has been that from 2015 - 2020, a few true believers founded early startups that ate up the talent and gained the institutional knowledge. Now that AI is the Next Big Thing, the big tech companies are trying to catch up, having a hard time, and choosing to partner with the prescient early startups instead. The early startups are finding they can’t keep scaling without more money and data, forcing them to accept the big tech companies’ offers. First it was DeepMind + Google, then Open AI + Microsoft, and Anthropic was the last holdout but has acknowledged economic reality. The safety movement is concerned that Amazon might have enough power to steamroll over Anthropic’s safety-conscious culture; this did happen with DeepMind and Google, didn’t with OpenAI and Microsoft, and my guess is Anthropic held out for a good enough deal (and had enough bargaining power) that it won’t happen there either.

Inline links: AI company Anthropic announces partnership with Amazon

Pause For Thought: The AI Pause Debate

October 05, 2023 · Original source

HOW LONG TO PAUSE. The biggest disadvantage of pausing for a long time is that it gives bad actors (eg China)1 a chance to catch up. Suppose the West is right on the verge of creating dangerous AI, and China is two years away. It seems like the right length of pause is 1.9999 years, so that we get the benefit of maximum extra alignment research and social prep time, but the West still beats China. Obviously the problem with the Surgical Pause is that we might not know when we’re on the verge of dangerous AI, and we might not know how much of a lead “the good guys” have. Surgical Pause proponents suggest being very conservative with both free variables. This is less of a well-thought-out plan and more saying “come on guys, let’s at least try to be strategic here”. At the limit, it suggests we probably shouldn’t pause for six months, starting right now. Since this involves leading labs burning their lead time for safety, in theory it could be done unilaterally by the single leading lab, without international, governmental, or even inter-lab coordination. But you could buy more time if you got those things too. Some leading labs have promised to do this when the time is right - for example OpenAI and (a previous iteration of) DeepMind - with varying levels of believability. AnonResearcherAtMajorAILab discussed some of the strategy here in Aim For Conditional AI Pauses, and this Less Wrong post is also very good. Regulatory Pause: If one benefit of the Simple Pause is to use the time to prepare for AI socially and politically, maybe we should just pause until we’ve completed social and political preparations. David Manheim suggests a monitoring agency like the FDA. It would “fast-track” small AIs and trivial re-applications of existing AIs, but carefully monitor new “frontier models” for signs of danger. Regulators might look for dangerous capabilities by asking AIs to hack computers or spread copies of themselves, or test whether they’ve been programmed against bias/misinformation/etc. We could pause only until we’ve set up the regulatory agency, and take hostile actions (like restrict chip exports) only to other countries that don’t cooperate with our regulators or set up domestic regulators of their own. Many people in tech are regulation-skeptical libertarians, but proponents point out that regulation fails in a predictable direction: it usually does successfully prevent bad things, it just also prevents good things too. Since the creation of the Nuclear Regulatory Commission in 1975, there has never been a major nuclear accident in the US. And sure, this is because the NRC prevented any nuclear plants from being built in the United States at all from 1975 to 2023 (one was finally built in July). Still, they technically achieved their mandate. Likewise, most medications in the US are safe and relatively effective, at the cost of an FDA approval process being so expensive that we only get a tiny trickle of new medications each year and hundreds of thousands of people die from unnecessary delays. But medications are safe and effective. Or: San Francisco housing regulators almost never approve new housing, so housing costs millions of dollars and thousands of San Franciscans are homeless - but certainly there’s no epidemic of bad houses getting approved and then ruining someone’s view or something. If we extrapolate this track record to AI, AI regulators will be overcautious, progress will slow by orders of magnitude or stop completely - but AIs will be safe. This is a depressing prospect if you think the problems from advanced AI would be limited to more spam or something. But if you worry about AI destroying the world, maybe you should accept a San-Francisco-housing-level of impediment and frustration. A regulatory pause could be better than a total stop if you think it will be more stable (lots of industries stay heavily regulated forever, and only a few libertarians complain), or if you think maybe the regulator will occasionally let a tiny amount of safe AI progress happen. But it could be worse than a total stop if you expect continued progress will eventually produce unsafe AIs regardless of regulation. You might expect this if you’re worried about deceptive alignment, eg superintelligent AIs that deliberately trick regulators into thinking they’re safe. Or you might think AIs will eventually be so powerful that they can endanger humanity from a walled-off test environment even before official approval. The classic Bostrom/Yudkowsky model of alignment implies both of these things. David Manheim and Thomas Larsen set out their preferred versions of this strategy in What’s In A Pause? and Policy Ideas For Mitigating AI Risk. Total Stop: If you expect AIs to exhibit deceptive alignment capable of fooling regulators, or to be so dangerous that even testing them on a regulator’s computer could be apocalyptic, maybe the only option is a total stop. It’s tough to imagine a total stop that works for more than a few years. You have at least three problems: NON-PARTICIPANTS. As with any pause proposal, unfriendly countries (eg China) can keep working on AI. You can refuse to export chips to them, which will slow them down a little, but their own chips will eventually be up to the task. You will either need a diplomatic miracle, or willingness to resort to less diplomatic forms of coercion. This doesn’t have to be immediate war: Israel has come up with “creative” ways to slow Iran’s nuclear program, and countries trying to frustrate China’s chip industry could do the same. But great powers playing these kinds of games against each other risks wider conflict.

Inline links: 1, OpenAI, DeepMind, Aim For Conditional AI Pauses, this Less Wrong post, look for dangerous capabilities, finally built, What’s In A Pause?, Policy Ideas For Mitigating AI Risk

Links For January 2024

January 18, 2024 · Original source

9: Related: Gwern discusses the history of the early-2010s neural net revolution. “Everyone except Shane Legg was wrong about [deep learning] prospects & timing, and even Legg was wrong about important things, [which is why] DeepMind is now on the hindfoot.”

Inline links: Gwern discusses the history of the early-2010s neural net revolution

Open Thread 323

April 01, 2024 · Original source

The Japanese AI safety community apparently exists and is holding a Technical AI Safety Conference in Tokyo in, uh, four days, so if you’re interested sign up quickly. Attendance is free, it looks like the talks are in English, and featured speakers include Dan Hendrycks and researchers from Anthropic and DeepMind.

Inline links: sign up quickly

May 13, 2024 · Original source

One bright spot: both DeepMind (see 8) and OpenAI (see 2.12) recently hired forecasters (Swift Centre for DeepMind, OpenAI still keeping details secret) to predict some features of their AI models. I think this is cool, but it probably owes more to there being a bunch of rationalists at those companies (and rationalists loving forecasting) than to any sign of broader commercial adoption.

Inline links: DeepMind, OpenAI

Links For January 2025

January 17, 2025 · Original source

I agree with this solution. 3: Ruxandra Teslo and Willy Chertman: The Case For Clinical Trial Abundance 4: This month in nominative determinism: NYT article calculating your chance of winning the lottery, by Victor Mather (h/t Yafah Edelman). 5: Someone is working on a dating site that uses your conversations with Claude to find a match. Link here, although so far it’s just a landing page where you can register interest (h/t @venturetwins) 6: The Lyttle Lytton Contest searches for the worst possible opening line for a novel; it’s been going on since 2001 and this year’s results are in. 7: Gary Marcus and Miles Brundage have made a bet about AI progress. I agree with @tamaybes and others in saying that Miles let Gary off too easily; Gary’s public statements all sound like “modern AI is mostly hype, it doesn’t really do anything like thinking”, but the bet is about things like “will AI make a Nobel Prize caliber scientific discovery by 2027?” and “will AI write Pulitzer-quality books by 2027?” I don’t blame Gary for taking the best terms he could find. But I am worried that if AI makes a Nobel-quality scientific discovery in 2026, but doesn’t quite write the Pulitzer-quality book, then Gary will get to claim victory over the AI optimists, whereas in fact that would be at probably the 95th percentile of fast timelines by most people’s estimate. 8: “The probability that cows (or other non-human animals) are experiencing constant bliss, lack tanha (craving, aversion, and the resulting suffering), or are "enlightened by default" is, by my estimation, very low”. 9: Recursive Adaptation (blog on addiction policy)’s predictions for 2025. 75% of FDA approval of GLP-1 for a substance use disorder by 2029! 10: In my post on the economics of GLP-1 receptor agonists (eg Ozempic), I wrote about how they’re currently widely available because of a loophole suspending patents during a shortage, and predicted there would be a big fight when the shortage was over. Sure enough, the FDA tried to declare that the shortage of tirzepatide (a next-generation Ozempic relative) was over, compounding pharmacies sued, and tirzepatide is still available while the issue goes through the courts (and will the administration have an opinion?) Also, compounding pharmacy access startup Mochi says that they will continue to prescribe even if the shortage is over, using another loophole saying doctors can do this for specific individual patients in cases of medical necessity. This is an extremely fake use of this loophole, but will the government be willing to call their bluff? 11: Jacob Falkovich has a blog on dating advice, which he plans to turn into a book of dating advice. I can’t really comment on the accuracy (my dating strategy tends to look more like waiting for women to send me emails saying “I like your blog, would you like to go on a date?” which probably doesn’t generalize), but I’ve had many good interactions with Jake, and he has a beautiful family which means he must be doing something right. Also, Jake is poly, and I sometimes wonder if poly people are the only ones qualified to give dating advice: if you’re monogamous, you either met your future spouse quickly (in which case you have no experience), dated for years without meeting your spouse (in which case you can’t be very good), or aren’t looking for a committed relationship at all (which is just pickup artistry, and follows very different dynamics). Poly people are the only ones who can break out of this trilemma! 12: Christ And Counterfactuals is a blog on effective altruism from a Christian perspective. Some previous attempts at this have felt kind of forced, but the first post I read here was actually pretty interesting. Richard Swinburne (apparently “the world’s best Christian philosopher”), thinks that: “[One] reason why it is good that the human race should sometimes be in an initial situation of considerable ignorance about the causes and effects of our actions, is this. If God abolished the need for rational inquiry and gave us from childhood strong true beliefs about the causes of things, that would make it too easy for us to make moral decisions. As things are in the actual world, most moral decisions are decisions taken in uncertainty about the consequences of our actions. I do not know for certain that if I smoke, I will get cancer; or that if I do not give money to some charity, people will starve. So we have to make our moral decisions on the basis of how probable it is that our actions will have various outcomes—how probable it is that I will get cancer if I continue to smoke (when I would not otherwise get cancer), or that someone will starve if I do not give. Since probabilities are so hard to assess, it is all too easy to persuade yourself that it is worth taking the chance that no harm will result from the less demanding decision (the decision which you have a strong desire to make). And even if you face up to a correct assessment of the probabilities, true dedication to the good is shown by doing the act which, although it is probably the best action, may have no good consequences at all.” (Could a Good God Permit so Much Suffering? A Debate, pp. 52-53.) This is pretty galaxy-brained, but something galaxy-brained must be going on for God to tolerate the existence of evil at all, and this is a surprisingly natural extension of some common premises on the subject. 13: Swedish study: diagnosing the marginal patient with a psychiatric condition makes their life worse. Of the two mechanisms they looked at, stigma seems more involved than drug side effects. My opinion: this study was done on conscripts undergoing a mandatory psych evaluation for the army, who had no previous reason to think they had a psych disease and had not sought treatment. This is a different situation from somebody who comes to a psychiatrist asking for relief from specific symptoms they have noticed. Also, Sweden c. 2005 is a different culture from America 2025 in terms of how much stigma a psych diagnosis carries. I think it’s possible that if you never considered that you had psychiatric problems, and were suddenly given a diagnosis in 2005 Sweden and told you couldn’t serve in the army, that’s likely to destabilize your self-image more than a person who knows they’re depressed going to a psychiatrist in 2025 US and getting antidepressants. 14: RIP Felix Hill, research scientist at DeepMind and mentor to many in the AI community. You can read his suicide note here, though the obvious content warning applies. He says he took ketamine for mild anxiety and it plunged him into an incredibly deep depression that he couldn’t get out of; he leaves his story behind as a warning for others. I appreciate his warning, but I wish he had said more about what dose he used; different people’s ketamine doses vary by almost two orders of magnitude, I’d previously thought that the low doses were pretty safe and the high doses were sketchy, and I would like to know whether I should update or not. 15: RIP Max Chiswick, professional poker player, effective altruist, and ACX reader. 16: Adrian Dittman, a Twitter account widely accused of being Elon Musk’s alt, has been revealed to be . . . a guy named Adrian Dittman. Congrats to Maia Crimew and the Spectator for actually investigating this, unlike many other news sources which spread the Musk conspiracy theory. Also, the people involved got banned from X for some reason, maybe because this qualified as doxxing Dittman. 17: Related: Musk claims to be among the top players in the world at several computer games. A veteran Path of Exile gamer presents evidence that Musk faked his PoE2 accomplishments by hiring a Chinese guy to play on his account. Some Musk supporters in the comments suggest that maybe he hires the Chinese guy to level up his account, but his accomplishments (eg speedruns) are still his own? 18: Related: Sam Harris says he has been friends with Musk since 2008, but he noticed a sudden shift for the worse in his personality around 2020 which made it impossible to stay friends with him. He gives the example of Musk losing a bet with him that there would be 35,000+ COVID cases in the US, refusing to pay up, and launching personal attacks on Sam when asked to do so. What happened? Some theories: Musk turned right-wing, which ended his friendship with Sam for the same reason political differences have always ended friendships (but then what about the bet, which seems like objectively bad behavior?)

OpenAI Nonprofit Buyout: Much More Than You Wanted To Know

March 13, 2025 · Original source

In the early 2010s, the AI companies hadn’t yet discovered scaling laws, and so underestimated the amount of compute (and therefore money) it would take to build AI. DeepMind was the first victim; originally founded on high ideals of prioritizing safety and responsible stewardship of the Singularity, it hit a financial barrier and sold to Google.

This scared Elon Musk, who didn’t trust Google (or any corporate sponsor) with AGI. He teamed up with Sam Altman and others, and OpenAI was born. To avoid duplicating DeepMind’s failure, they founded it as a nonprofit with a mission to “build safe and beneficial artificial general intelligence for the benefit of humanity”.

But like DeepMind, OpenAI needed money. At first, they scraped by with personal donations from Musk and other idealists, but as the full impact of scaling laws became clearer, Altman wanted to form a forprofit arm and seek investment. Musk and Altman disagree on what happened next: Musk said he objected to the profit focus, Altman says Musk agreed but wanted to be in charge. In any case, Musk left, Altman took full control, and OpenAI founded a forprofit subsidiary.

Open Thread 394

August 11, 2025 · Original source

3: UK AISI is looking to distribute £15m in AI alignment funding, for projects that need anywhere from a $100K pre-seed up to $1-2m. Collaborators included Anthropic, DeepMind, etc. See their priority areas and apply here by September 10th.

Inline links: their priority areas, apply here

My Antichrist Lecture

October 22, 2025 · Original source

DeepMind: 3 co-founders

Inline links: 3 co-founders

In Silicon Valley speak, a “unicorn” is a company worth over $1 billion, and a “decacorn” (Latin for “ten-horned”) is a company worth over $10 billion. Under this interpretation, the ten horns of the prophecy have ten crowns because they represent wealth and achievement. The only AI company on the list above is Anthropic, at #9. Finally, John says that upon the heads will be names of blasphemy. If the heads represent co-founders, it sounds like John is claiming the co-founders of the company will have blasphemous names. I could not find anything blasphemous about the names of the founders of OpenAI, DeepMind, or xAI. But looking at Anthropic: Dario Amodei is the first co-founder. “Dario” comes from the Persian “Darius” meaning “Lord”. “Amodei” is of unclear meaning, but I cannot help but notice the resemblance with Asmodei (also called Ashmodei, Hamadee, Æshmadæva, and Asmodeus), a demon-king mentioned in the book of Tobit. Plausibly all these different names derive from a Proto-Sumerian root *Amodei, in which case the meaning of “Dario Amodei” would be “Asmodeus is lord”. This is a name of blasphemy.

Inline links: meaning, Asmodei

Feels bad, man But it’s also possible that Andreessen will become a major Anthropic investor before the end. There’s some textual support here too, this time in Daniel 7, another apocalyptic prophecy generally considered to address the same events as Revelation from a different perspective. Daniel has a vision of four beasts: a winged lion, a bear, a leopard, and a many-headed monster. The monster is the worst and final beast, and it has ten horns. Then a “little horn”, a “horn with human eyes”, shows up, defeats three of the original horns, and takes over. Then the monster begins a reign of terror, and finally is defeated by God. If, as before, the beasts represent companies, then the four beasts of Daniel correspond to the four major AI labs: Google DeepMind, X.AI, OpenAI, and Anthropic. How? I think these correspond to the ethnicity of the founders: Bear = Google, founded by Sergey Brin (Russian)

Astral Codex Ten

Table of Contents

Atlas

DeepMind

DeepMind

Article

Metadata

Appears In

External Links

Source Context

Backlinks

Astral Codex Ten

Table of Contents

Atlas

DeepMind

DeepMind

Article

Metadata

Appears In

Related Pages

External Links

Source Context

Backlinks