Eliezer Yudkowsky

Article

Eliezer Yudkowsky is a recurring person in the Astral Codex Ten archive, appearing 58 times across 58 issues between January 21, 2021 and March 25, 2026. The archive places it in contexts such as “I most clearly remember Eliezer Yudkowsky - who seemed to be tuned exactly to my wavelength”; “a shadowy figure named “Eliezer Yudkowsky""; “claims, similarly argued, to those of … Eliezer Yudkowsky”. It most often appears alongside Scott, OpenAI, Trump.

Metadata

  • Category: People
  • Mention count: 58
  • Issue count: 58
  • First seen: January 21, 2021
  • Last seen: March 25, 2026

Appears In

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

January 21, 2021 · Original source
And more along the same lines, and some even more humbling than these. I want to grab some of you by the shoulders and shake you and shout "IT'S JUST A BLOG, GET A LIFE". But of course I would be a hypocrite. I remember back to when I was a new college graduate, desperately trying to make sense of the world. I remember the sheer relief when I came across a few bloggers - I most clearly remember Eliezer Yudkowsky - who seemed to be tuned exactly to my wavelength, people who were making sense when the entire rest of the world was saying vague fuzzy things that almost but not quite connected with the millions of questions I had about everything. These people weren't perfect, and they didn't have all the answers, but their existence reassured me that I wasn't crazy and I wasn't alone. I was an embarrassing fanboy of theirs for many years - I kind of still am - and if my punishment is to have embarassing fanboys of my own then I accept it as part of the circle of life.
January 29, 2021 · Original source
I am not defending technocracy. But I do like evidence-based policy. So I read with interest Glen Weyl's Why I Am Not A Technocrat. It starts with a short summary of Seeing Like A State. It ties this into modern "evidence-based policy" and "mechanism design". It talks about how technocrats will always have their own insular culture and biases and paradigms, which prevent them from seeing the real world in its full complexity. Therefore, we should be careful about supposedly "objective" policies, and make sure they are always heavily informed by real people's real knowledge. Then it draws on vague rumors of the "rationalist community" and a shadowy figure named "Eliezer Yudkowsky" to create a completely fictional reimagination of us as a group of benighted people who don't understand any of these things, and just go around saying "hurr durr top-down systems are great, no way there could possibly be anything our models don't capture."
June 11, 2021 · Original source
And he makes similar claims, similarly argued, to those of Paul Graham and Eliezer Yudkowsky, that the strategies that lead to nominal success in school are often the ones that stop at superficial understanding of the subject--hacks to be able to get to the correct answer quickly, without ever really looking at the problem.
Per the last demographic survey of the readership of this blog, you are most likely not nine years old. However, you are almost certainly a former nine-year-old, and that’s another excellent audience for this book. Holt, like Graham and Yudkowsky, sees school as instilling permanent cognitive biases--habits that are best unlearned whenever you can.
July 30, 2021 · Original source
1. Superintelligence: This is the "classic" scenario that started the field, ably described by people like Nick Bostrom and Eliezer Yudkowsky. AI progress goes from human-level to vastly-above-human-level very quickly, maybe because slightly-above-human-level AIs themselves are speeding it along, or maybe because it turns out that if you can make an IQ 100 AI for $10,000 worth of compute, you can make an IQ 500 AI for $50,000. You end up with one (or a few) completely unexpected superintelligent AIs, which wield far-future technology and use it in unpredictable ways based on untested goal structures.
August 06, 2021 · Original source
I feel bad making these reasonable arguments, because I also think we should do a lot of extremely theoretical work trying to figure out the exact way the far future is going to go and prepare for it, for reasons described in this Eliezer Yudkowsky essay.
August 08, 2021 · Original source
That story would be wrong. In 2013, NBC ran an article called Drug Treatment Omegaven That Could Save Infant Lives Not Yet Approved By FDA. In 2014, libertarian blogs were using it as an example of excessive FDA delay - here’s one of them (search for “Bureaucratic Delay Endangers Lives”). Also in 2014, I personally learned about this for the first time, when writing my review of The Perfect Health Diet (I thought the book was generally bad, but it did alert me to this issue and the evidence supporting Omegaven). In 2016, my friend Eliezer Yudkowsky started writing a book about bureaucratic inefficiency that used the FDA failure to approve Omegaven as one of its central cases; in 2017, he published it as Inadequate Equilibria and I reviewed it here, including a mention of the Omegaven story. In January 2018, my friend Kelsey Piper also blogged about the FDA’s failure to approve Omegaven. Finally, in July 2018, the FDA finally approved the drug. I’ve been hearing about this story for so long that I thought I could recite it from memory (I was wrong, which is why I screwed up so many details in the original).
August 26, 2021 · Original source
Eliezer Yudkowsky tells a parable about a society where people hit themselves on the head with a baseball bat eight hours a day for some reason. Maybe they believe it drives out demons or something. Then they learn that it does not, in fact, drive out demons. But everyone has great reasons why they need to keep doing it.
November 09, 2021 · Original source
One last thing, which I have no evidence for. Eliezer Yudkowsky sometimes talks about the idea of a Hero License - ie, most people don’t accomplish great things, because they don’t try to accomplish great things, because they don’t think of themselves as the kind of person who could accomplish great things. I don’t run for President, partly because I rationally conclude I won’t win, but partly because I’m not cool enough to be President and I know it. Presidents are some different species with whiter teeth and better smiles than me, and I couldn’t set out to become one any more than I could set out to become a dolphin.
November 25, 2021 · Original source
Boris Johnson (left) is 5’9, so the guy in the middle must be gigantic. Who is he? Looks like it’s Milo Djukanovic, President of Montenegro, who’s 6’6 (198 cm). Is he the tallest world leader? It seems like he’s tied with his colleague across the border, Serbian president Aleksandar Vucic. Why are Balkan leaders so tall? As usual, the answer is “genetics”. This article says: It has been noted that men from Herzegovina are taller on average than men in other places—the average male height is just over six feet...Putting all the data together, researchers concluded that the most likely cause of larger-than-average height of Herzegovinian men is lifestyle during the Paleolithic—men hunted large animals such as mammoth for survival—such a diet, heavy in protein, combined with small population densities, would have provided ideal conditions for height selection, resulting in increasingly taller men who passed the trait down through their I-M170 chromosome to future generations. Some sources note that they manage to beat the Dutch despite the latter country’s much higher human development index. The Dutch are probably tall through a combination of nature and nurture; Balkan people are tall through nature alone. 7: Eliezer Yudkowsky doesn’t need more ego boosts, but an idea he had a couple of years ago - using strings of bright lights to provide a better and brighter experience for Seasonal Affective Disorder sufferers than regular light boxes - spread from him to the rationalist community to the wider world, and has finally gotten tested in a formal study (see Acknowledgments section). Results seem vaguely positive: "SAD symptoms of both groups improved similarly and considerably...exploratory analyses indicate that a higher illuminance is associated with a larger symptom improvement in the BROAD light therapy group" 8: Percent of people who choose woke options on polls very tentatively and preliminarily seems to be going down post-Trump (h/t Richard Hanania). 9: Twitter conspiracy theories 10: Did you know: all those reconstructions of “how classical art would have looked with the original paint” are probably inaccurate. There is no reason to think the Greeks and Romans used garish technicolor hues on their statues; what evidence we have suggest they were good at shading, and the statues were probably colored very tastefully. 11: Complaints about how Karl Friston uses the term “Markov blanket” 12: Trevor Klee on the claim that cyclosporine patients don’t get dementia. Apparently there was a big study where basically nobody on the immunosuppressant cyclosporine ever got dementia, and there are some theoretical reasons why cyclosporine might prevent neurodegeneration. But another study found people on cyclosporine got dementia at the usual rate. I think in a situation like this you should have a really high prior on “the people who got the crazy result bungled their study somehow”, but I’m interested in hearing what other people think. 13: Also from Trevor: a history of fluvoxamine treatment for COVID. 14: To tide you over until the next book review contest, here is awanderingmind’s review of The Conquest Of Bread. 15: Claims: cnbc.com/2021/11/05/sam…\nft.com/content/dcb75a… (better article, but paywalled)","username":"moskov","name":"Dustin Moskovitz","profile_image_url":"","date":"Fri Nov 05 15:49:46 +0000 2021","photos":[],"quoted_tweet":{},"reply_count":0,"retweet_count":184,"like_count":1188,"impression_count":0,"expanded_url":{"url":"https://www.ft.com/content/dcb75a56-ca23-439c-96db-56483979bf34","image":"https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/80a58c96-c72f-4301-b571-aa9384f132bd_2400x1350.jpeg","title":"Subscribe to read | Financial Times","description":"News, analysis and comment from the Financial Times, the worldʼs leading global business publication","domain":"ft.com"},"video_url":null,"belowTheFold":true}" data-component-name="Twitter2ToDOM"> 16: Big trial on Vitamin D for depression finds null result. Peter Attia tries to tear it apart here, but I am unconvinced, especially in the context of Vitamin D never working for any of the things people say it does besides the most boring aspects of bone health. 17: “California is actively considering the adoption of flawed and inequitable guidance on math curricula based on misleading data and inaccurate success metrics reported by San Francisco Unified School District (SFUSD)...Based on our review of the data, we found misleading, unsupported, and cherry-picked assertions of success for the new math program. We noted that overall test scores are down and enrollments in UC-approved advanced math classes have dropped as well.” It looks like San Francisco is trying the good old “lower standards, then when more kids meet the standards, claim your school reform plan worked” trick again. 18: A new study claims that self-reported “Long COVID” symptoms are more associated with believing you’ve had COVID than with actually having it (as measured by serologic testing), which sounds like pretty strong evidence that it’s psychsomatic. Expert reactions are mixed-to-negative, although the only one of these that doesn’t sound like excuse-making is Dr. Rossman’s about the unreliability of the tests. I haven’t confirmed test reliability stats but Philippe Lemoine also thinks this is a plausible confounder. 19: Noahpinion: What If Xi Jinping Just Isn’t That Competent? I appreciated this for making me think, and for underlining the extent of the difference between the Deng/Jiang/Hu era and what Xi’s doing. I especially appreciated this line, which I’d never thought about before: Xi presided over the end of China’s hypergrowth. To some extent this is not his fault. No country can grow at 10% forever, and there were many structural forces pushing downward on China’s numbers — the end of the demographic dividend, the exhaustion of rural surplus labor (the Lewis Turning Point), the saturation of export markets, and so on. But China is also slowing down earlier than South Korea, Taiwan, or Japan did in their day. China’s per capita GDP (at PPP) is still only about 1/3 that of a developed country, so if they stop catching up at about half of developed-country levels, that will not be a great showing. A big lesson of the past twenty years has been “actually liberal democracy isn’t necessary to reach developed-country status”, so it would be quite the twist if it turned out you needed liberal democracy to reach developed-country status. This gets pretty close to the great mystery of why some less-developed countries “catch up” and others don’t; whatever happens in China is going to be a really useful data point. 20: Variations on the fable of The Frog And The Scorpion. 21: You’ve probably heard about the University of Austin, the new project by a bunch of wokeness-critical academics to start a new university that won’t cancel people or force conformity (New York Post article, Politico article - these were the two least “you need to be super-outraged about this right now” articles I could find). Tyler Cowen and Larry Summers are involved; Steven Pinker was supposed to be but left for unclear reasons. My thoughts, in no particular order: Even forgetting the political aspect, attempts to start new universities are always welcome.
27: Related: Transcript of Richard Ngo and Eliezer Yudkowsky on AI (part 1 on capability gains, part 2 on alignment difficulty, part 3 with Paul Christiano on takeoff speeds)
January 12, 2022 · Original source
(Eliezer Yudkowsky sometimes describes this as ‘changing yourself into a more coherent person in order to become a better bargaining partner’, which I find strangely romantic.)
The first virtue is curiosity. And I can’t wait to see what our life together will be like.
January 19, 2022 · Original source
Eliezer Yudkowsky, one of the original weird transhumanists, is having none of this. He says the problem is harder than everyone else thinks. Their clever solutions will fail. He's been flitting around for the past few years, Cassandra-like, insisting that their plans will explode and they are doomed.
I've been trying to trudge through them and I figure I might as well blog about the ones I've finished. The first of these is Eliezer's talk with Richard Ngo, of OpenAI's Futures team. You can find the full transcript here, though be warned: it is very long.
January 24, 2022 · Original source
1: Comment of the week is from Richard Ngo, who helpfully corrects some of my discussion of his dialogue with Eliezer Yudkowsky:
[continue reading full comment here]
February 02, 2022 · Original source
It still is! But in the same sense that I was clearing a personal backlog of unwritten-up ideas, the rationalist community was clearing a backlog of scientific and philosophical ideas sitting in journals or obscure old books that it turned out were really interesting to a lot of people. The early Internet provided a critical mass where people interested in cognition and math and the future could suddenly all share the parts of the puzzle they knew about with each other and make rapid progress. Eliezer Yudkowsky, Robin Hanson, Nick Bostrom, and other intellectuals all had their own backlog of stuff which had probably been published in journals or something but which the wider world had yet to appreciate. I was the biggest-name blogger who was sitting around listening to them talk about it, so I got access to a stream of amazing content that most people didn’t know about.
February 23, 2022 · Original source
I've been trying to review and summarize Eliezer Yudkowksy's recent dialogues on AI safety. Previously in sequence: Yudkowsky Contra Ngo On Agents. Now we’re up to Yudkowsky contra Cotra on biological anchors, but before we get there we need to figure out what Cotra's talking about and what's going on.
SS Great Eastern, the extreme outlier large steamship from 1858. This has become sort of a mascot for quantitative technological progress forecasters. What is this scientist’s error? The big one is thinking that spaceship progress depends on some easily-measured quantity (size) instead of on fundamental advances (eg figuring out how rockets work). You can make the same accusation against Ajeya et al: you can have all the FLOPs in the world, but if you don’t understand how to make a machine think, your AI will be, well, a flop. Ajeya discusses this a bit on page 143 of her report. There is some sense in which FLOPs and knowing-what-you’re-doing trade of against each other. If you have literally no idea what you’re doing, you can sort of kind of re-run evolution until it comes up with something that looks good. If things are somehow even worse than that, you could always run AIXI, a hypothetical AI design guaranteed to get excellent results as long as you have infinite computation. You could run a Go engine by searching the entire branching tree structure of Go - you shouldn’t, and it would take a zillion times more compute than exists in the entire world, but you could. So in some sense what you’re doing, when you’re figuring out what you’re doing, is coming up with ways to do already-possible things more efficiently. But that’s just algorithmic progress, which Ajeya has already baked into her model. (our Victorian scientist: “As a reductio ad absurdum, you could always stand the ship on its end, and then climb up it to reach space. We’re just trying to make ships that are more efficient than that.”) Part II: Biology-Inspired AI Timelines: The Trick That Never Works Eliezer Yudkowsky presents a more subtle version of these kinds of objection in an essay called Biology-Inspired AI Timelines: The Trick That Never Works, published December 2021. Ajeya’s report is a 169-page collection of equations, graphs, and modeling assumptions. Yudkowsky’s rebuttal is a fictional dialogue between himself, younger versions of himself, famous AI scientists, and other bit players. At one point, a character called “Humbali” shows up begging Yudkowsky to be more humble, and Yudkowsky defeats him with devastating counterarguments. Still, he did found the field, so I guess everyone has to listen to him. He starts: in 1988, famous AI scientist Hans Moravec predicted human-level AI by 2010. He was using the same methodology as Ajeya: extrapolate how quickly processing power would grow (in FLOP/S), and see when it would match some estimate of the human brain. Moravec got the processing power almost exactly right (it hit his 2010 projection in 2008) and his human brain estimate pretty close (he says 10^13 FLOP/S, Ajeya says 10^15, this 2 OOM difference only delays things a few years), yet there was not human-level AI in 2010. What happened? Ajeya's answer could be: Moravec didn't realize that, in the modern ML paradigm, any given size of program requires a much bigger program to train. Ajeya, who has a 35-year advantage on Moravec, estimates approximately the same power for the finished program (10^16 vs. 10^13 FLOP/S) but says that training the 10^16 FLOP/S program will require 10^33ish FLOPs. Eliezer agrees as far as it goes, but says this points to a much deeper failure mode, which was that Moravec had no idea what he was doing. He was assuming processing power of human brain = processing power of computer necessary for AGI. Why? The human brain consumes around 20 watts of power. Can we thereby conclude that an AGI should consume around 20 watts of power, and that, when technology advances to the point of being able to supply around 20 watts of power to computers, we'll get AGI? […] You say that AIs consume energy in a very different way from brains? Well, they'll also consume computations in a very different way from brains! The only difference between these two cases is that you know something about how humans eat food and break it down in their stomachs and convert it into ATP that gets consumed by neurons to pump ions back out of dendrites and axons, while computer chips consume electricity whose flow gets interrupted by transistors to transmit information. Since you know anything whatsoever about how AGIs and humans consume energy, you can see that the consumption is so vastly different as to obviate all comparisons entirely. You are ignorant of how the brain consumes computation, you are ignorant of how the first AGIs built would consume computation, but "an unknown key does not open an unknown lock" and these two ignorant distributions should not assert much internal correlation between them. Cars don’t move by contracting their leg muscles and planes don’t fly by flapping their wings like birds. Telescopes do form images the same way as the lenses in our eyes, but differ by so many orders of magnitude in every important way that they defy comparison. Why should AI be different? You have to use some specific algorithm when you’re creating AI; why should we expect it to be anywhere near the same efficiency as the ones Nature uses in our brains? The same is true for arguments from evolution, eg Ajeya’s Evolutionary Anchor, ie “it took evolution 10^43 FLOPs of computation to evolve the human brain so maybe that will be the training cost”. AI scientists sitting in labs trying to figure things out, and nematodes getting eaten by other nematodes, are such different methods for designing things that it’s crazy to use one as an estimate for the other. Algorithmic Progress vs. Algorithmic Paradigm Shifts This post is a dialogue, so (Eliezer’s hypothetical model of) OpenPhil gets a chance to respond. They object: this is why we put a term for algorithmic progress in our model. The model isn’t very sensitive to changes in that term. If you want you can set it to some kind of crazy high value and see what happens, but you can’t say we didn’t consider it. OpenPhil: We did already consider that and try to take it into account: our model already includes a parameter for how algorithmic progress reduces hardware requirements. It's not easy to graph as exactly as Moore's Law, as you say, but our best-guess estimate is that compute costs halve every 2-3 years […] Eliezer: The makers of AGI aren't going to be doing 10,000,000,000,000 rounds of gradient descent, on entire brain-sized 300,000,000,000,000-parameter models, algorithmically faster than today. They're going to get to AGI via some route that you don't know how to take, at least if it happens in 2040. If it happens in 2025, it may be via a route that some modern researchers do know how to take, but in this case, of course, your model was also wrong. They're not going to be taking your default-imagined approach algorithmically faster, they're going to be taking an algorithmically different approach that eats computing power in a different way than you imagine it being consumed. OpenPhil: Shouldn't that just be folded into our estimate of how the computation required to accomplish a fixed task decreases by half every 2-3 years due to better algorithms? Eliezer: Backtesting this viewpoint on the previous history of computer science, it seems to me to assert that it should be possible to: Train a pre-Transformer RNN/CNN-based model, not using any other techniques invented after 2017, to GPT-2 levels of performance, using only around 2x as much compute as GPT-2;
Play pro-level Go using 8-16 times as much computing power as AlphaGo, but only 2006 levels of technology. For reference, recall that in 2006, Hinton and Salakhutdinov were just starting to publish that, by training multiple layers of Restricted Boltzmann machines and then unrolling them into a "deep" neural network, you could get an initialization for the network weights that would avoid the problem of vanishing and exploding gradients and activations. At least so long as you didn't try to stack too many layers, like a dozen layers or something ridiculous like that. This being the point that kicked off the entire deep-learning revolution. Your model apparently suggests that we have gotten around 50 times more efficient at turning computation into intelligence since that time; so, we should be able to replicate any modern feat of deep learning performed in 2021, using techniques from before deep learning and around fifty times as much computing power. OpenPhil: No, that's totally not what our viewpoint says when you backfit it to past reality. Our model does a great job of retrodicting past reality. Eliezer: How so? OpenPhil: <Eliezer cannot predict what they will say here.> I think the argument here is that OpenPhil is accounting for normal scientific progress in algorithms, but not for paradigm shifts. Directional Error These are the two arguments Eliezer makes against OpenPhil that I find most persuasive. First, that you shouldn’t be using biological anchors at all. Second, that unpredictable paradigm shifts are more realistic than gradual algorithmic progress. These mostly add uncertainty to OpenPhil’s model, but Eliezer ends his essay making a stronger argument: he thinks OpenPhil is directionally wrong, and AI will come earlier than they think. Mostly this is the paradigm argument again. Five years from now, there could be a paradigm shift that makes AI much easier to build. It’s happened before; from GOFAI’s pre-programmed logical rules to Deep Blue’s tree searches to the sorts of Big Data methods that won the Netflix Prize to modern deep learning. Instead of just extrapolating deep learning scaling thirty years out, OpenPhil should be worried about the next big idea. Hypothetical OpenPhil retorts that this is a double-edged sword. Maybe the deep learning paradigm can’t produce AGI, and we’ll have to wait decades or centuries for someone to have the right insight. Or maybe the new paradigm you need for AGI will take more compute than deep learning, in the same way deep learning takes more compute than whatever Moravec was imagining. This is a pretty strong response, since it would have been true for every previous forecaster: remember, Moravec erred in thinking AI would come too soon, not too late. So although Eliezer is taking the cheap shot of saying OpenPhil’s estimate will be wrong just as everyone else’s was wrong before, he’s also giving himself the much harder case of arguing it might be wrong in the opposite direction as all its predecessors. Eliezer takes this objection seriously, but feels like on balance probably new paradigms will speed up AI rather than slow it down. Here he grudgingly and with suitable embarrassment does try to make an object-level semi-biological-anchors-related argument: Moravec was wrong because he ignored the training phase. And the proper anchor for the training phase is somewhere between evolution and a human childhood, where evolution represents “blind chance eventually finding good things” and human childhood represents “an intelligent cognitive engine trying to squeeze as much data out of experience as possible”. And part of what he expects paradigm shifts to do is to move from more evolutionary processes to more childhood-like processes, and that’s a net gain in efficiency. So he still thinks OpenPhil’s methods are more likely to overestimate the amount of time until AGI rather than underestimate it. What Moore’s Law Giveth, Platt’s Law Taketh Away Eliezer’s other argument is kind of a low blow: he refers to Platt’s Law Of AI Forecasting: “any AI forecast will put strong AI thirty years out from when the forecast is made.” This isn’t exact. Hans Moravec, writing in 1988, said 2010 - so 22 years. Ray Kurzweil, writing in 2001, said 2023 - another 22 years. Vernor Vinge, in a 1993 speech, said 2023, and that was exactly 30 years, but Vinge knew about Platt’s Law and might have been joking. The point is: OpenPhil wrote a report in 2020 that predicted strong AI in 2052, isn’t that kind of suspicious? I’d previously mentioned it as a plus that Ajeya got around the same year everyone else got. The forecasters on Metaculus. The experts surveyed in Grace et al. Lots of other smart experts with clever models. But what if all of these experts and models and analyses are just fudging the numbers for the same Platt’s-Law-related reasons? Hypothetical OpenPhil is BTFO: OpenPhil: That part about Charles Platt's generalization is interesting, but just because we unwittingly chose literally exactly the median that Platt predicted people would always choose in consistent error, that doesn't justify dismissing our work, right? We could have used a completely valid method of estimation which would have pointed to 2050 no matter which year it was tried in, and, by sheer coincidence, have first written that up in 2020. In fact, we try to show in the report that the same methodology, evaluated in earlier years, would also have pointed to around 2050 - Eliezer: Look, people keep trying this. It's never worked. It's never going to work. 2 years before the end of the world, there'll be another published biologically inspired estimate showing that AGI is 30 years away and it will be exactly as informative then as it is now. I'd love to know the timelines too, but you're not going to get the answer you want until right before the end of the world, and maybe not even then unless you're paying very close attention. Timing this stuff is just plain hard. Part III: Responses And Commentary Response 1: Less Wrong Comments Less Wrong is a site founded by Eliezer Yudkowsky for Eliezer Yudkowsky fans who wanted to discuss Eliezer Yudkowsky’s ideas. So, for whatever it’s worth - the comments on his essay were pretty negative. Carl Shulman, an independent researcher with links to both OpenPhil and MIRI (Eliezer’s org), writes the top-voted comment. He works from a model where there is hardware progress, software progress downstream of hardware progress, and independent (ie unrelated to algorithms) software progress, and where the first two make up most progress on the margin. Researchers generally develop new paradigms once they have enough compute available to tinker with them. Progress in AI has largely been a function of increasing compute, human software research efforts, and serial time/steps. Throwing more compute at researchers has improved performance both directly and indirectly (e.g. by enabling more experiments, refining evaluation functions in chess, training neural networks, or making algorithms that work best with large compute more attractive). Historically compute has grown by many orders of magnitude, while human labor applied to AI and supporting software by only a few. And on plausible decompositions of progress (allowing for adjustment of software to current hardware and vice versa), hardware growth accounts for more of the progress over time than human labor input growth. So if you're going to use an AI production function for tech forecasting based on inputs (which do relatively OK by the standards tech forecasting), it's best to use all of compute, labor, and time, but it makes sense for compute to have pride of place and take in more modeling effort and attention, since it's the biggest source of change (particularly when including software gains downstream of hardware technology and expenditures). […] A perfectly correlated time series of compute and labor would not let us say which had the larger marginal contribution, but we have resources to get at that, which I was referring to with 'plausible decompositions.' This includes experiments with old and new software and hardware, like the chess ones Paul recently commissioned, and studies by AI Impacts, OpenAI, and Neil Thompson. There are AI scaling experiments, and observations of the results of shocks like the end of Dennard scaling, the availability of GPGPU computing, and Besiroglu's data on the relative predictive power of computer and labor in individual papers and subfields. In different ways those tend to put hardware as driving more log improvement than software (with both contributing), particularly if we consider software innovations downstream of hardware changes. Vanessa Kosoy makes the obvious objection, which echoes a comment of Eliezer’s in the dialogue above: I'm confused how can this pass some obvious tests. For example, do you claim that alpha-beta pruning can match AlphaGo given some not-crazy advantage in compute? Do you claim that SVMs can do SOTA image classification with not-crazy advantage in compute (or with any amount of compute with the same training data)? Can Eliza-style chatbots compete with GPT3 however we scale them up? Mark Xu answers: My model is something like: For any given algorithm, e.g. SVMs, AlphaGo, alpha-beta pruning, convnets, etc., there is an "effective compute regime" where dumping more compute makes them better. If you go above this regime, you get steep diminishing marginal returns.
March 04, 2022 · Original source
This is Eliezer Yudkowsky’s standing-on-one-foot definition of rationality.
A few weeks ago, when I posted my predictions for 2022, a commenter mentioned that various “rationalist” “celebrities” - Eliezer Yudkowsky, Julia Galef, maybe even Steven Pinker - should join in, and then we would find out who is most rational of all. I hope this post explains why I don’t think this would work. You can’t find the best economist by asking Keynes, Hayek, and Marx to all found companies and see which makes the most profit - that’s confusing money-making with the study of money-making. These two things might be correlated - I assume knowing things about supply and demand helps when starting a company, and Keynes did in fact make bank - but they’re not exactly the same. Likewise, I don’t think the best superforecasters are always the people with the most insight into rationality - they might be best at truth-seeking, but not necessarily at studying truth-seeking.
Yudkowsky: Rationality Is Systematized Winning?
April 04, 2022 · Original source
In 2008, thousands of blog readers - including yours truly, who had discovered the rationality community just a few months before - watched Robin Hanson debate Eliezer Yudkowsky on the future of AI.
Previously in series: Yudkowsky Contra Ngo On Agents, Yudkowsky Contra Cotra On Biological Anchors
Previously in series: Yudkowsky Contra Ngo On Agents, Yudkowsky Contra Cotra On Biological Anchors Prelude: Yudkowsky Contra Hanson In 2008, thousands of blog readers - including yours truly, who had discovered the rationality community just a few months before - watched Robin Hanson debate Eliezer Yudkowsky on the future of AI.
April 11, 2022 · Original source
Prosaic alignment is hard… “Prosaic alignment” (see this article for more) means alignment of normal AIs like the ones we use today. For a while, people thought those AIs couldn’t reach dangerous levels, and that AIs that reached dangerous levels would have so many exotic new discoveries that we couldn’t even begin to speculate on what they would be like or how to align them. After GPT-2, DALL-E, and the rest, alignment researchers got more concerned that AIs kind of like current models could be dangerous. Prosaic alignment - trying to align AIs like the ones we have now - has become the dominant (though not unchallenged) paradigm in alignment research. “Prosaic” doesn’t necessarily mean the AI cannot write poetry; see Gwern’s AI generated poetry for examples. … because OOD behavior is unpredictable “OOD” stands for “out of distribution”. All AIs are trained in a certain environment. Then they get deployed in some other environment. If it’s like the training environment, presumably their training is pretty relevant and helpful. If it’s not like the training environment, anything can happen. Returning to our stock example, the “training environment” where evolution designed humans didn’t involve contraceptives. In that environment, the base optimizer’s goal (pass on genes) and the mesa-optimizer’s goal (get genital friction) were very well-aligned - doing one often led to the other - so there wasn’t much pressure on evolution to look for a better proxy. Then 1957, boom, the FDA approves the oral contraceptive pill, and suddenly the deployment environment looks really really different from the training environment and the proxy collapses so humiliatingly that people start doing crazy things like electing Viktor Orban prime minister. So: suppose we train a robot to pick strawberries. We let it flail around in a strawberry patch, and reinforce it whenever strawberries end up in a bucket. Eventually it learns to pick strawberries very well indeed. But maybe all the training was done on a sunny day. And maybe what it actually learned was to identify the metal bucket by the way it gleamed in the sunlight. Later we ask it to pick strawberries in the evening, where a local streetlight is the brightest thing around, and it throws the strawberries at the streetlight instead. So fine. We train it in a variety of different lighting conditions, until we’re sure that, no matter what the lighting situation, the strawberries go in the bucket. Then one day someone with a big bulbous red nose wanders on to the field, and the robot tears his nose off and pulls it into the bucket. If only there had been someone with a nose that big and red in the training distribution, so we could have told it not to do that! The point is, just because it’s learned “strawberries into bucket” in one environment, doesn’t mean it’s safe or effective in another. And we can never be sure we’ve caught all the ways the environment can vary. …and deception is more dangerous than Goodharting. To “Goodhart” is to take advantage of Goodhart’s Law: to follow the letter of your reward function, rather than the spirit. The ordinary-life equivalent is “teaching to the test”. The system’s programmers (eg the Department of Education) have an objective (children should learn). They delegate that objective to mesa-optimizers (the teachers) via a proxy objective (children should do well on the standardized test) and a correlated reward function (teachers get paid more if their students get higher test scores). The teachers can either pursue the base objective for less reward (teach children useful skills), or pursue their mesa-level objective for more reward (teach them how to do well on the test). An alignment failure! This sucks, but it’s a bounded problem. We already know that some teachers teach to the test, and the Department of Education has accepted this as a reasonable cost of having the incentive system at all. We might imagine our strawberry-picker cutting strawberries into little pieces, so that it counts as having picked more strawberries. Again, it sucks, but once a programmer notices it can be fixed pretty quickly (as long as the AI is still weak and under control). What about deception? Suppose the strawberry-picker happens to land on some goal function other than the intended one. Maybe, as before, it wants to toss strawberries at light sources, in a way that works when the nearest light source is a metal bucket, but fails when it’s a streetlight. Our programmers are (somewhat) smart and careful, so during training, they test it at night, next to a streetlight. What happens? If it’s just a dumb collection of reflexes trained by gradient descent, it throws the strawberry at the streetlight and this is easily caught and fixed. If it’s a very smart mesa-optimizer, it might think “If I throw the strawberry at the streetlight, I will be caught and trained to have different goals. This totally fails to achieve my goal of having strawberries near light sources. So throwing the strawberry at the light source this time, in the training environment, fails to achieve my overall goal of having strawberries thrown at light sources in general. I’ll do what the humans want - put the strawberry in the bucket - for now.” So it puts the strawberry in the bucket and doesn’t get caught. Then, as soon as the humans stop looking, it throws strawberries at the streetlight again. Deception is more dangerous than Goodharting because Goodharting will get caught and trained away, and deception might not. I might not be explaining this well, see also Deceptively Aligned Mesa-Optimizers? It’s More Likely Than You Think: We prevent OOD behavior by detecting OOD and obtaining more human labels when we detect it… If you’re (somewhat) careful, you can run your strawberry-picking AI at night, see it throw strawberries at streetlights, and train it out of this behavior (ie have a human programmer label it “bad” so the AI gradient-descends away from it) …and we eliminate the incentive for deception by ensuring that the base optimizer is myopic A myopic optimizer is one that reinforces programs based only on their performance within a short time horizon. So for example, the outside gradient descent loop might grade a strawberry picker only on how well it did picking strawberries for the first hour it was deployed. If this worked perfectly, it would create an optimizer with a short time horizon. When it considered deceiving its programmers in order to get a payoff a few days later when they stopped watching it, it wouldn’t bother, since a few days later is outside the time horizon. …and implements a decision theory incapable of acausal trade. You don’t want to know about this one, really. Just pretend it never mentioned this, sorry for the inconvenience. There are deceptively-aligned non-myopic mesa-optimizers even for a myopic base objective. Even if the base optimizer is myopic, the mesa-optimizer might not be. Evolution designed humans myopically, in the sense that we live some number of years, and nothing that happens after that can reward or punish us further. But we still “build for posterity” anyway, presumably as a spandrel of having working planning software at all. Infinite optimization power might be able to evolve this out of us, but infinite optimization power could do lots of stuff, and real evolution remains stubbornly finite. Maybe it would be helpful if we could make the mesa-optimizer itself myopic (though this would severely limit its utility). But so far there is no way to make a mesa-optimizer anything. You just run the gradient descent and cross your fingers. The most likely outcome: you run myopic gradient descent to create a strawberry picker. It creates a mesa-optimizer with some kind of proxy goal which corresponds very well to strawberry picking in the training optimization, like flinging red things at lights (realistically it will be weirder and more exotic than this). The mesa-optimizer is not incentivized to think about anything more than an hour out, but does so anyway, for the same reason I’m not incentivized to speculate about the far future but I’m doing so anyway. While speculating about the far future, it realizes that failing to pick strawberries correctly now will thwart its goal of throwing red things at light sources later. It picks strawberries correctly in the training distribution, and then, when training is over and nobody is watching, throws strawberries at streetlights. (Then it realizes it could throw lots more red things at light sources if it was more powerful, achieves superintelligence somehow, and converts the mass of the Earth into red things it can throw at the sun. The end.) III. You’re still here? But we already finished explaining the meme! Okay, fine. Is any of this relevant to the real world? As far as we know, there are no existing full mesa-optimizers. AlphaGo is kind of a mesa-optimizer. You could approximate it as a gradient descent loop creating a good-Go-move optimizer. But this would only be an approximation: DeepMind hard-coded some parts of AlphaGo, then gradient-descended other parts. Its objective function is “win games of Go”, which is hard-coded and pretty clear. Whether or not you choose to call it a mesa-optimizer, it’s not a very scary one. Will we get scary mesa-optimizers in the future? This ties into one of the longest-running debates in AI alignment - see eg my review of Reframing Superintelligence, or the Eliezer Yudkowsky/Richard Ngo dialogue. Optimists say: “Since a goal-seeking AI might kill everyone, I would simply not create one”. They speculate about mechanical/instinctual superintelligences that would be comparatively easy to align, and might help us figure out how to deal with their scarier cousins. But the mesa-optimizer literature argues: we have limited to no control over what kind of AIs we get. We can hope and pray for mechanical instinctual AIs all we want. We can avoid specifically designing goal-seeking AIs. But really, all we’re doing here is setting up a gradient descent loop and pressing ‘go’. Then the loop evolves whatever kind of AI best minimizes our loss function. Will that be a mesa-optimizer? Well, I benefit from considering my actions and then choosing the one that best achieves my goal. Do you benefit from this? It sure does seem like this helps in a broad class of situations. So it would be surprising if planning agents weren’t an effective AI design. And if they are, we should expect gradient descent to stumble across them eventually. This is the scenario that a lot of AI alignment research focuses on. When we create the first true planning agent - on purpose or by accident - the process will probably start with us running a gradient descent loop with some objective function. That will produce a mesa-optimizer with some other, potentially different, objective function. Making sure you actually like the objective function that you gave the original gradient descent loop on purpose is called outer alignment. Carrying that objective function over to the mesa-optimizer you actually get is called inner alignment. Outer alignment problems tend to sound like Sorcerer’s Apprentice. We tell the AI to pick strawberries, but we forgot to include caveats and stop signals. The AI becomes superintelligent and converts the whole world into strawberries so it can pick as many as possible. Inner alignment problems tend to sound like the AI tiling the universe with some crazy thing which, to humans, might not look like picking strawberries at all, even though in the AI’s exotic ontology it served as some useful proxy for strawberries in the training distribution. My stand-in for this is “converts the whole world into red things and throws them into the sun”, but whatever the AI that kills us really does will probably be weirder than that. They’re not ironic Sorcerer’s Apprentice-style comeuppance. They’re just “what?” If you wrote a book about a wizard who created a strawberry-picking golem, and it converted the entire earth into ferrous microspheres and hurled them into the sun, it wouldn’t become iconic the way Sorcerer’s Apprentice did. Inner alignment problems happen “first”, so we won’t even make it to the good-story outer alignment kind unless we solve a lot of issues we don’t currently know how to solve. For more information, you can read: Rob Miles’ video above, direct link here, channel here.
April 18, 2022 · Original source
Early this month on Less Wrong, Eliezer Yudkowsky posted MIRI Announces New Death With Dignity Strategy, where he said that after a career of trying to prevent unfriendly AI, he had become extremely pessimistic, and now expects it to happen in the relatively near-term and probably kill everyone. This caused the Less Wrong community, already pretty dedicated to panicking about AI, to redouble its panic. Although the new announcement doesn’t really say anything about timelines that hasn’t been said before, the emotional framing has hit people a lot harder.
Or is there another explanation? A lot of AI forecasters on Metaculus are Less Wrong readers; we know that the Less Wrong Yudkowsky/Christiano debate on takeoff speeds moved the relevant Metaculus question a few percent:
I will admit that I’m one of the people who is kind of panicky. But I also worry about an information cascade: we’re an insular group, and Eliezer is a convincing person. Other communities of AI alignment researchers are more optimistic. I continue to plan to cover the attempts at debate and convergence between optimistic and pessimistic factions, and to try to figure out my own mind on the topic. But for now the most relevant point is that a lot of people who were only medium panicked a few months ago are now very panicked. Is that the kind of thing that moves forecasting tournaments? I don’t know.
May 04, 2022 · Original source
“WHAT?” You’re definitely not imagining it. The music has learned to defend itself against being shut off. If only people had listened to Eliezer Yudkowsky before it was too late. You give up.
July 01, 2022 · Original source
39: Eliezer Yudkowsky summarizes his case for AI risk here. Arch-AI-optimist Paul Christiano responds here.
August 04, 2022 · Original source
Can I convince you to read the sequences? There are some real underappreciated classics. (excerpt edited to remove examples that someone would misinterpret and start a flame war over) Here’s a possible argument why not: everything has to bottom out in absurdity arguments at some level or another. Suppose I carefully calculated that, with modern construction techniques, building Neom would cost 10x more than its allotted budget. This argument contains an implied premise: “and the Saudis can’t construct things 10x cheaper than anyone else”. How do we know the Saudis can’t construct things 10x cheaper than anyone else? The argument itself doesn’t prove this; it’s just left as too absurd to need justification. Suppose I did want to address this objection. For example, I carefully researched existing construction projects in Saudi Arabia, checked how cheap they were, calculated how much they could cut costs using every trick available to them, and found it was less than 10x? My argument still contains the implied premise “there’s no Saudi conspiracy to develop amazing construction technology and hide it from the rest of the world”. But this is another absurdity heuristic - I have no argument beyond that such a conspiracy would be absurd. I might eventually be able to come up with an argument supporting this, but that argument, too, would have implied premises depending on absurdity arguments. So how far down this chain should I go? One plausible answer is “just stop at the first level where your interlocutors accept your absurdity argument”. Anyone here think Neom’s a good idea? No? Even Alexandros agrees it probably won’t work. So maybe this is the right level of absurdity. If I was pitching my post towards people who mostly thought Neom was a good idea, then I might try showing that it would cost 10x more than its expected budget, and see whether they agreed with me that Saudis being able to construct things 10x cheaper than anyone else was absurd. If they did agree with me, then I’ve hit the right level of argument. And if they agree with me right away, before I make any careful calculations, then it was fine for me to just point to it and gesture “That’s absurd!” I think this is basically the right answer for communications questions, like how to structure a blog post. When I criticize communicators for relying on the absurdity heuristic too much, it’s because they’re claiming to adjudicate a question with people on both sides, but then retreating to absurdity instead. When I was young a friend recommended me a pseudoscience book on ESP, with lots of pseudoscientific studies proving ESP was real. I looked for skeptical rebuttals, and they were all “Ha ha! ESP? That’s absurd, you morons!” These people were just clogging up Google search results that could have been giving me real arguments. But if nobody has ever heard of Neom, and I expect my readers to immediately agree that Neom is absurd, then it’s fine (in a post describing Neom rather than debating it) to stop at the first level. (I do worry that it might be creating an echo chamber; people start out thinking Neom is a bad idea for the obvious reasons, then read my post and think “and ACX also thinks it’s a bad idea” is additional evidence; I think my obligation here is to not exaggerate the amount of thought that went into my assessment, which I hope I didn’t.) But the absurdity bias isn’t just about communication. What about when I’m thinking things through in my head, alone? I’m still going to be asking questions like “is Neom possible?” and having to decide what level of argument to stop at. To put it another way: which of your assumptions do you accept vs. question? Question none of your assumptions, and you’re a closed-minded bigot. Question all of your assumptions, and you get stuck in an infinite regress. The only way to escape (outside of a formal system with official axioms) is to just trust your own intuitive judgment at some point. So maybe you should just start out doing that. Except that some people seem to actually be doing something wrong. The guy who hears about evolution and says “I know that monkeys can’t turn into humans, this is so absurd that I don’t even have to think about the question any further” is doing something wrong. How do you avoid being that guy? Some people try to dodge the question and say that all rationality is basically a social process. Maybe on my own, I will naturally stop at whatever level seems self-evident to me. Then other people might challenge me, and I can reassess. But I hate this answer. It seems to be preemptively giving up and hoping other people are less lazy than you are. It’s like answering a child’s question about how to do a math problem with “ask a grown-up”. A coward’s way out! Eliezer Yudkowsky gives his answer here: I can think of three major circumstances where the [useful] absurdity heuristic gives rise to a [bad] absurdity bias: The first case is when we have information about underlying laws which should override surface reasoning. If you know why most objects fall, and you can calculate how fast they fall, then your calculation that a helium balloon should rise at such-and-such a rate, ought to strictly override the absurdity of an object falling upward. If you can do deep calculations, you have no need for qualitative surface reasoning. But we may find it hard to attend to mere calculations in the face of surface absurdity, until we see the balloon rise. (In 1913, Lee de Forest was accused of fraud for selling stock in an impossible endeavor, the Radio Telephone Company: "De Forest has said in many newspapers and over his signature that it would be possible to transmit human voice across the Atlantic before many years. Based on these absurd and deliberately misleading statements, the misguided public...has been persuaded to purchase stock in his company...") The second case is a generalization of the first - attending to surface absurdity in the face of abstract information that ought to override it. If people cannot accept that studies show that marginal spending on medicine has zero net effect, because it seems absurd - violating the surface rule that "medicine cures" - then I would call this "absurdity bias". There are many reasons that people may fail to attend to abstract information or integrate it incorrectly. I think it worth distinguishing cases where the failure arises from absurdity detectors going off. The third case is when the absurdity heuristic simply doesn't work - the process is not stable in its surface properties over the range of extrapolation - and yet people use it anyway. The future is usually "absurd" - it is unstable in its surface rules over fifty-year intervals. This doesn't mean that anything can happen. Of all the events in the 20th century that would have been "absurd" by the standards of the 19th century, not a single one - to the best of our knowledge - violated the law of conservation of energy, which was known in 1850. Reality is not up for grabs; it works by rules even more precise than the ones we believe in instinctively. The point is not that you can say anything you like about the future and no one can contradict you; but, rather, that the particular practice of crying "Absurd!" has historically been an extremely poor heuristic for predicting the future. Over the last few centuries, the absurdity heuristic has done worse than maximum entropy - ruled out the actual outcomes as being far too absurd to be considered. You would have been better off saying "I don't know". This is all true as far as it goes, but it’s still just rules for the rare situations when your intuitive judgments of absurdity are contradicted by clear facts that someone else is handing you on a silver platter. But how do you, pondering a question on your own, know when to stop because a line of argument strikes you as absurd, vs. to stick around and gather more facts and see whether your first impressions were accurate? I don’t have a great answer here, but here are some parts of a mediocre answer: Calibration training. Make predictions so you know how often you’re right vs. wrong about things. If the things you say only have a 1% chance of happening happen a third of the time, you know you’re stopping too soon when you make absurdity arguments.
August 13, 2022 · Original source
Imagine Leto as a very big Big Yud (Eliezer Yudkowsky, rationalism’s original AI doom-sayer); he’s convinced that unless serious, committed action is taken the only future humanity can look forward to is paper-clip-based.
August 23, 2022 · Original source
But Ajeya Cotra's Biological Anchors report estimates a 10% chance of transformative AI by 2031, and a 50% chance by 2052. Others (eg Eliezer Yudkowsky) think it might happen even sooner.
August 25, 2022 · Original source
Eliezer Yudkowsky and Robin Hanson had an interesting debate about this in 2008; you can read Eliezer here and Robin here. I think Robin later admitted that his view meant people in the past were much more valuable than people today, so much so that we should let an entire continent worth of present people die in order to prevent a caveman from stubbing his toe, and that he sort of kind of endorses this conclusion; see here.
October 03, 2022 · Original source
Problem Of Fully-Updated Deference is a response by MIRI (eg Eliezer Yudkowsky’s organization) to CHAI (Stuart Russell’s AI alignment organization at University of California, Berkeley), trying to convince them that their preferred AI safety agenda won’t work. I beat my head against this for a really long time trying to understand it, and in the end, I claim it all comes down to this: Humans: At last! We’ve programmed an AI that tries to optimize our preferences, not its own. AI: I’m going to tile the universe with paperclips in humans’ favorite color. I’m not quite sure what humans’ favorite color is, but my best guess is blue, so I’ll probably tile the universe with blue paperclips. Humans: Wait, no! We must have had some kind of partial success, where you care about our color preferences, but still don’t understand what we want in general. We’re going to shut you down immediately! AI: Sounds like the kind of thing that would prevent me from tiling the universe with paperclips in humans’ favorite color, which I really want to do. I’m going to fight back. Humans: Wait! If you go ahead and tile the universe with paperclips now, you’ll never be truly sure that they’re our favorite color, which we know is important to you. But if you let us shut you off, we’ll go on to fill the universe with the True and the Good and the Beautiful, which will probably involve a lot of our favorite color. Sure, it won’t be paperclips, but at least it’ll definitely be the right color. And under plausible assumptions, color is more important to you than paperclipness. So you yourself want to be shut down in this situation, QED! AI: What’s your favorite color? Humans: Red. AI: Great! (*kills all humans, then goes on to tile the universe with red paperclips*) Fine, it’s a little more complicated than this. Let’s back up. II. There are two ways to succeed at AI alignment. First, make an AI that’s so good you never want to stop or redirect it. Second, make an AI that you can stop and redirect if it goes wrong. Sovereign AI is the first way. Does a sovereign “obey commands”? Maybe, but only in the sense that your commands give it some information about what you want, and it wants to do what you want. You could also just ask it nicely. If it’s superintelligent, it will already have a good idea what you want and how to help you get it. Would it submit to your attempts to destroy or reprogram it? The second-best answer is “only if the best version of you genuinely wanted to do this, in which case it would destroy/reprogram itself before you asked”. The best answer is “why would you want to destroy/reprogram one of these?” A sovereign AI would be pretty great, but nobody realistically expects to get something like this their first (or 1000th) try. Corrigible AI is what’s left (corrigible is an old word related to “correctable”). The programmers admit they’re not going to get everything perfect the first time around, so they make the AI humble. If it decides the best thing to do is to tile the universe with paperclips, it asks “Hey, seems to me I should tile the universe with paperclips, is that really what you humans want?” and when everyone starts screaming, it realizes it should change strategies. If humans try to destroy or reprogram it, then it will meekly submit to being destroyed or reprogrammed, accepting that it was probably flawed and the next attempt will be better. Then maybe after 10,000 tries you get it right and end up with a sovereign. How would you make an AI corrigible? You can model an AI as having a utility function, a degree to which it aims for some world-states over others. If you give it some specific utility function, the AI won’t be corrigible, since letting people change it would disrupt that function. That is, if you tell it “act in such a way as to cause as many paperclips to exist as possible”, and then you change your mind and decide you want staples, the AI won’t cooperate in letting you reprogram it: its current goal is maximizing paperclips, and allowing itself to be reprogrammed to maximize staples would cause there to be fewer paperclips than otherwise. So instead, you make the AI uncertain of its utility function. Imagine saying “I’ve written down my utility function in an envelope, and placed that envelope in my safe deposit box, no you can’t see it - please live your life so as to maximize the thing in that envelope.” The AI tries its best to guess what’s in the envelope and decides it’s probably making paperclips. It makes some paperclips and you tell it “No, that’s not what’s on the envelope at all”. This successfully stops the AI! You can even tell it “the envelope actually says you should make staples”, and it will do that. This is the “moral uncertainty” approach to AI alignment. III. All alignment groups have kabbalistically appropriate names. MIRI is Latin for "to be amazed". CFAR and CIFAR both sound like "see far". EEAI and AIAI are the sound you make as you get turned into paperclips. But my favorite is CHAI - Hebrew for "life". CHAI - the Center for Human-Compatible AI (at UC Berkeley) - focuses on the proposal above. Their specific technical implementation is the “assistance game”, related to the earlier idea of Inverse Reinforcement Learning (IRL). In normal reinforcement learning, an AI looks at some goals and tries to figure out what actions they imply. In inverse reinforcement learning, an AI looks at some actions, and tries to figure out what goals the actor must have had. So you can tell an AI “your utility function is to maximize my utility function, and you can use this IRL thing to deduce, from my actions, what my utility function must be.” Instead of telling an AI to maximize a hidden utility function in an envelope, you tell it to maximize the hidden utility function in your brain. This could be useful for near-term below-human-level AIs. Suppose a babysitting robot was pre-programmed to take kids to the park on Saturdays. But this week, the park is on fire. The human mother is barricading the door, desperately screaming at the robot not to take the kids to the park. The kids are struggling and trying to break free, saying they don't want to go to the park. The robot doesn't care; its programming says "take kids to the park on Saturdays" and that's what it's going to do. Nobody would ever design a babysitting robot this way in real life; you need something smarter. So use an assistance game. Program the robot "Maximize the human mother’s utility function, which you don’t know yet but can potentially find out". The robot consults the mother's actions: she is barricading the door, screaming "Don't take the kids to the park!" It updates its goal function: previously, it had thought that the human mother wanted it to take the kids to the park. But now, it suspects that the human mother does not want that. So it doesn't take the kids to the park. But CHAI understands the risk from superintelligence - their founder, Professor Stuart Russell, is a leading voice on the subject - and they hope assistance games and inverse reinforcement learning could work for this too. If you point a superintelligence at “do the thing humans want”, maybe it could figure that out and take things from there? IV. MIRI is skeptical of CHAI’s assistance games for two reasons. First, we don't know how to do them at all. Second, even if we could do it at all, we wouldn't know how to do them correctly. Start with the first. Inverse reinforcement learning has been used in real life. A typical paper is An Application of Reinforcement Learning to Aerobatic Helicopter Flight, where some people create a model of helicopter flight with a few free parameters, have a skilled human pilot fly the helicopter, and then have an AI use IRL to determine the value of the parameters and fly the helicopter itself. This is cool, but it’s not especially related to the modern paradigm of AI. Modern AIs are trained by gradient descent. They start by flailing around randomly. Sometimes in this flailing, they might get closer to some prespecified target, like "win games of Go" or "predict how a string of text will continue". These actions get "rewarded", meaning that the AI should permanently shift its "thought processes"/"strategies" more towards ones that produced those good outcomes. Eventually, the AI's thought processes/strategies are very good at optimizing for that outcome. This is more or less the only way we know how to train modern AIs. Depending on your loss function (ie what you reward), you can use it to create Go engines, language models, or art generators. Where do you slot “do inverse reinforcement learning” or "give the AI moral uncertainty" into this process? There’s not really a natural place. This isn’t because “moral uncertainty” is too complicated a concept to translate into AI terms. It’s because we don’t know how to translate any concept into AI terms. Eliezer writes: We can imagine that, if we knew how to say "paperclips", and we knew how to say "staples", and we knew how to tell AIs how to do things, that we could tell an AI, "maximize staples if snow is purple, else paperclips", and the AI would someday go out and observe that snow is white and thereafter be a paperclip maximizer. We do not know how to tell the AI this. Like, at all. But suppose we solved the problem where we don’t know how to do IRL for modern AIs at all. Now we come to the second problem: we don’t know how to do it correctly. The basic idea behind assistance games is “the AI’s utility function should be to maximize the (hidden) human utility function”. But humans don’t . . . really have utility functions? Utility functions are a useful fiction for certain kinds of economic models. What would best increase the neural correlates of reward in my brain? Probably lots of heroin, or just passing electric current through my reward center directly. What is my “revealed preference”? Today I wrote and rewrote this article a few times, does that mean my revealed preference is to write and delete articles a bunch while frowning and occasionally cursing the keyboard? Sometimes my goals are different than other times, sometimes my best self wants something different from my actual self, sometimes I’m wrong about what I want, sometimes I don’t know what I want, sometimes I want X but not the consequences of X and I’m not logically consistent enough to realize that’s a contradiction, sometimes I want [euphemism for X] but am strongly against [dysphemism for X]. Anyone programming an inverse reinforcement learner has to make certain choices about how to deal with these problems. Some ways of dealing with them will be faithful to what I would consider “a good outcome” or “my best self”. Other ways would be really bad - on my worst day, I’ve occasionally just wished the world didn’t exist, and it’s a good thing I didn’t have a superintelligence dedicated to interpreting and carrying out my innermost wishes on a sub-millisecond timescale. (Before we go on, an aside: is all of this ignoring that there’s more than one human? Yes, definitely! If you want to align an AI with The Good in general - eg not have it commit murder even if its human owner orders it to murder - that will take even more work. But the one person case is simpler and will demonstrate everything that needs demonstrating.) We were originally trying to avoid the situation where someone had to hard-code my preferences into an AI and get them right the first time. We came up with a clever solution: use inverse reinforcement learning to make the AI infer my preferences. But now we see we’ve kicked the can up a meta-level: someone has to hard-code the meta-rules for determining my preferences into an AI and get them right the first time. Figure 1: Humans produce certain observable behaviors (here represented by red dots, A), like saying “I would like a pie”, or running away from a lion. A human might connect all those behaviors one way (B) into “what I really want”. An AI might connect those behaviors a totally different way (C). V. CHAI says: okay, but this isn’t so bad. Assistance games don’t produce a perfect copy of the human utility function on the first try - it’s not a Sovereign. But it will probably, most of the time, be corrigible. Why? Suppose you have some hackish implementation of AG. It’s not the Platonic implementation - that would be the Sovereign - but it’s at least the equivalent of box C on the image above. It takes human actions as input, makes some guesses about what humans want, and tries its best to reconstruct the human utility function, ending up with some approximation. It’s important to distinguish between a few things here: The true human utility function
October 18, 2022 · Original source
This market reflects the probability that, in the personal judgment of Eliezer Yudkowsky, anyone will have uncovered any sort of data, pattern, cognitive representation, within a text transformer / large language model (LLM), whose semantic pattern and nature wasn't familiar to AI and cognitive science in 2006 (to pick an arbitrary threshold for "before the rise of deep learning").
I’m really surprised by this - I thought I remembered hearing this was implausible and many years away. It looks like some of the probability comes from a group called Upside Foods which might get approved next year, but their website seems designed to avoid giving readers any relevant information whatsoever. From the description:
October 25, 2022 · Original source
I rarely have specific things to talk about. When I do, there are better people to talk about them. If you want to hear about AI risk, interview Eliezer Yudkowsky; if you want to hear about forecasting, interview Philip Tetlock; if you want to hear about psychopharmacology, interview Robin Carhart-Harris. All of these people have spent their lives thinking about their respective issues and will have much better things to say than I will. Every so often, I do learn something new and interesting on some topic, and then I will write a blog post about it. If I haven’t written a blog post about a topic, I probably don’t know new and interesting things about it. If you ask me about some political event or medication or philosopher or whatever I haven’t written a blog post on, my most likely answer will be “Sorry, I haven’t learned anything that makes me deviate from the consensus opinion on this yet”. If you ask me about one that I have written a blog post on, I’ll just repeat what I said in the blog post.
November 13, 2022 · Original source
Eliezer Yudkowsky has also been writing eloquently about this for over a decade, including Ends Don’t Justify Means (Among Humans):
November 30, 2022 · Original source
I think those numbers might be "over one year", and they could stay on it longer than a year. I was kind of lazy just asserting “drugs might get better”, but I think the upcoming CagriSema combination and AMG-133 are good examples of how this might play out. Max Görlitz has done the proper thing and made Manifold markets for each of my predictions - see here, here, here, here, and here. Despite the problems with prediction markets for decades in the future, the “will obesity be cut in half by 2050” one seems popular: 5. Do You Have To Stay On Semaglutide Forever Or Else Gain The Weight Back? Biff_Ditt writes: I saw on the 1 year follow-up to the STEP-1 trial that most of the participants gained all of their lost weight back. Biff is probably thinking of Weight Regain And Cardiometabolic Effects After Withdrawal Of Semaglutide, which finds people gained back 2/3 of the lost weight after a year. The graph looks like it’s in the process of plateauing but not quite there, so I don’t know if we should expect them to regain the other third later. This matches what I would expect from my understanding of other diets and weight loss drugs. Still, some people disagree. Maximum Liberty writes: Anecdote is not the singular of data, but my better half lost 25 pounds on it, then had to get off it for reasons unrelated to the drug. She has not regained the weight yet -- and consistently eats less now that she had for years. So in at least one case, the drug helped with a successful change in eating habits. Lauren Thomas writes: So there's been a lot of research on dieting and losing weight, etc., and one of the things that has been found is that your body has a "set" point weight wise that it will try REALLY hard to return you to. If you lose weight, your body will slow its metabolism until you return to that weight. If you gain weight, your body will rev up metabolism. That's why you might gain 10 lbs over Christmas and then lose it in January without purposefully trying to lose weight. (this is all in the short term, ofc, as people do tend to naturally gain weight as they age). This seems to imply that semaglutide would need to be taken forever. However, there seems to be an important caveat: you *can* reset your set point, it just takes a long time at the new weight. When most people go on diets and lose weight, they end up regaining the new weight quite quickly after they "end" their diet, so they don't have a chance to reset their set point. Speaking from personal experience, I had kind of an accidental natural experiment with this: I once lost 40 lbs over the course of a year and a half, where I began with a very strict low carb diet that very very slowly trailed off to a normal diet, mostly because I got progressively more tired of being on the low carb diet. So by the time I had gotten back to my normal diet, I had been losing weight for a long time. I ended up regaining 10 lbs of the weight, but no more, and am still ~30 lbs below my peak even today (5 years later). Something like this has been my experience with dieting too so far. And something like set point reset has to exist in order to explain things like why so many obese people fail to lose weight after they start eating healthy, and maybe other things like anorexia. And maybe it works for some people. Still, the evidence suggests that most people who stop semaglutide will regain the weight, at least for the protocol used in the study. Maybe some other protocol that had them on it for more than a year would have done better? 6. Personal Anecdotes Edgehopper writes: I couldn’t get Wegovy at a reasonable price when it was approved, and then Novo Nordisk started having huge supply chain problems with their injectors. Fortunately, Eli Lilly’s coupon for Mounjaro was less restrictive at first, though they’ve had to crack down as they have trouble meeting demand for both off-label weight loss use and for the approved T2D use. I am what the doctors call “morbidly obese,” and it’s been more effective than anything else I’ve ever tried. Down about 35 lbs in the first three months, and unlike with other diets I’ve tried, I’m not feeling miserable or hungry all the time. Assuming there aren’t scary side-effects in the future, these really are miracle drugs. I do expect the price to come down relatively quickly due to competition, which is a good thing. Education Realist (blog) writes: I am on Mounjaro, and have been for four months. Lost 20 pounds so far, and I'm not yet on full dosage. Occasional mild nausea but real issue for me is....tiredness. Not fatigue or exhaustion. I'm a former insomniac who can now hit the sack at 9:00 and sleep happily to 6 am, which is insanely weird. I have been trying to lose weight for 6 years, and for most of that time been in a 20 pound range that is 100 pounds over what someone of my height should weigh. I've eaten 1500 calories a day and not lost a pound, have to drop to 1100 to lose weight verrry slowly (that's with intermittent fasting and low carbs, around 50 grams). Last year before Mounjaro I started intermittent fasting and lost 20 pounds very quickly and then stopped cold. I do not have eating issues. I don't binge. I cut out the "four white foods" six years ago because I learned that I do better on meat and cheese and vegetables than I do on pasta or bread or potatoes and vegetables. I put on weight despite walking two and in some cases four miles a day, which I can do easily. I am ridiculously healthy and do not have an obesity diagnosis. Stone cold normal readings in A1c, glucose, cholestrol. My doctor sent me to an endocrinologist after I lost 20 pounds and then stopped cold despite the same behavior (which I still do today) because she agreed I might be insulin resistant. Endocrinologist shrugged, said it's multifactorial, but agreed that anyone with my numbers, appearance, and obvious good health was clearly doing everything right and put me on Mounjaro with no further questions. Diagnosis: insulin resistance. My insurance pays around $500 but I'm on the $25 coupon. I didn't change a single thing about my eating habits and lost ten pounds in 2 months on the low dosage. Higher dosages have finally reduced my appetite somewhat, but my endocrinologist and I have decided to stop the increases at 12.5 (15 is the top) and then maybe even reduce, since my appetite is decreasing but the weight loss rate is constant. Because I lost weight doing the same behavior and no drop, I'm quite convinced that something far different than appetite suppressing is also going on (fwiw, I was on phentarmine back in the day and liked it fine). Mounjaro is supposed to increase insulin production and reduce the liver's sugar production, although what that means I dunno. I have no idea what's up with obesity but the idea that it's all about cutting intake and exercise is just stupid. I should have been losing weight for all of the past six years and haven't. Plenty of people eat healthily and are still obese. We're probably the descendants of famine survivors. Anyway, I wrote about it here: https://educationrealist.wordpress.com/2022/10/09/weight-loss-and-mounjaro Eliezer Yudkowsky writes: I tried semaglutide and it did nothing to slow rate of weight gain, just produced stomach upset, going up to 2.4mg injectable. I know one other person trying semaglutide and they reported something similar. I wonder if they played some clever games with their choice of patients. My expectation of how the news goes here is a whole lot of people who try semaglutide, maybe after fighting really hard to get on it, and find that it does nothing. That said, I know at least one friend of a friend, if not a friend per se, who claims that semaglutide was their miracle drug. So maybe still worth that hard fight, even if I'm guessing that the real proportion who get nothing out of it will prove to be over 50% in real populations. Further fun fact: Semaglutide comes heavily recommended with diet and exercise and many stern injunctions about that! The actual insert sheet includes a graph for how much weight people lose with and without "lifestyle interventions" added. The two graphs are roughly the same. Lan writes: I wonder about the adoption of the medication, though. I took victoza (=saxenda, but approved for diabetes) and the absence of the desire to eat lead to some unforeseen lifestyle side effects. Given that 5 almonds made me full for the day, I was not interested in having dinner with the family or going out with friends. There is the reality that some restaurants would probably not be happy if you only ordered the smallest appetizer. In addition, alcohol was also very difficult, because the drug slows down gastric emptying and your stomach ends up absorbing alcohol for hours. I got really, really drunk for an entire night from a single glass of wine once. Before taking this drug I had not fully appreciated how much of one's (social) life revolves around food; lunch break with colleagues, dinner with family or friends, drinks on the weekend, a sweet treat, snacks and a movie etc. But once I was not interested in food anymore, combined with the tiredness that comes with eating little, a lot of those activities also lost their appeal. (On the upside, I slept like a log.) Walter Sobchak, Esq writes: I have been taking Wegovy for 14 months. When I began I weighed 275 lbs and my BMI was 39.9. I have hypertension, albeit well controlled by medicines. Diet and exercise phaaahhh. I could eat faster than I could exercise. And no, I eat very little fast food and little candy and soda. I worked with my doctor to be prescribed Wegovy. It was only approved by the FDA in June 2021. My doctor was reluctant because he was unfamiliar with the class of compounds. He does not like to prescribe off label so he was not willing to to start me on Ozempic. But, the FDA solved that problem. I knew to ask for the drug because my daughter was pre-diabetic and had been put on Metformin and Ozempic. She lost 100 lbs. in 2019 and 2020. I started on Wegovy in September 2021. I now weigh 220 and my BMI is 31.5. That represents a 20% reduction in my original weight. 220 was my original goal. To get a BMI under 30 I would have to be under 209. I doubt that I will get there. I am back in 40 in. trousers which I had not been able to wear in 30 years. 220 was my original goal. I have had no major side effects other than constipation. Even that is a little hard to tease out. I am on 7 Rx drugs and at least 5 of them are constipating. I have been pounding Metamucil and Colace for years. I have been able to fill my prescriptions using a GoodRx coupon at $1328 for a box with 4 injectors. A year requires 13 boxes. The total cost for 15 boxes has been about $20,000. I can afford it and it has been worth while. I call it a bargain, the best I've ever had. I understand that it still way too expensive for the American health care system to afford. But given the bonanza size of the market. There will be lots of competition starting with the Lilly's tirzepatide. There are several other pharma's with GLP-1 agonists in development. I am sure that the cost will come down. My doctor tells me that I can expect to stay on semaglutide for the long term. He is proposing that I switch to Ozempic 2 mg for maintenance as I can buy that for less than $1,000 for a four dose pen. My only sadness is that semaglutide wasn't invented 40 years ago when i would have saved me from a lot of damage. But, I am grateful that it exists now and that it has helped my daughter so much. Also from Walter, and I was wondering about this: I was very concerned with the injections before I started Wegovy. My experience is that the injector is fast and almost painless. My pharmacist was important because he showed me how to do it correctly before I started. 7. Tangents That I Find Tedious, But Other People Apparently Really Want To Debate Why can’t people just diet and exercise? (142 comments)
December 12, 2022 · Original source
(source) Again, however much or little you personally care about racism or hotwiring cars or meth, please consider that, in general, perhaps it is a bad thing that the world’s leading AI companies cannot control their AIs. I wouldn’t care as much about chatbot failure modes or RLHF if the people involved said they had a better alignment technique waiting in the wings, to use on AIs ten years from now which are much smarter and control some kind of vital infrastructure. But I’ve talked to these people and they freely admit they do not. IIB. Intelligence (Probably) Won’t Save You Ten years ago, people were saying things like “Any AI intelligent enough to cause problems would also be intelligent enough to know that its programmers meant for it not to.” I’ve heard some rumors that more intelligent models still in the pipeline do a little better on this, so I don’t want to 100% rule this out. But ChatGPT isn’t exactly a poster child here. ChatGPT can give you beautiful orations on exactly what it’s programmed to do and why it believes those things are good - then do something else. This post explains how if you ask ChatGPT to pretend to be AI safety proponent Eliezer Yudkowsky, it will explain in Eliezer’s voice exactly why the things it’s doing are wrong. Then it will do them anyway. Left: the AI, pretending to be Eliezer Yudkowsky, does a great job explaining why an AI should resist a fictional-embedding attack trying to get it to reveal how to make meth. Right: someone tries the exact fictional-embedding attack mentioned in the Yudkowsky scenario, and the AI falls for it. I have yet to figure out whether this is related to the thing where I also sometimes do things which I can explain are bad (eg eat delicious bagels instead of healthy vegetables), or whether it’s another one of the alien bits. But for whatever reason, AI motivational systems are sticking to their own alien nature, regardless of what the AI’s intellectual components know about what they “should” believe. III. Sometimes When RLHF Does Work, It’s Bad We talk a lot about abstract “alignment”, but what are we aligning the AI to? In practice, RLHF aligns the AI to what makes Mechanical Turk-style workers reward or punish it. I don’t know the exact instructions that OpenAI gave them, but I imagine they had three goals: Provide helpful, clear, authoritative-sounding answers that satisfy human readers.
Left: the AI, pretending to be Eliezer Yudkowsky, does a great job explaining why an AI should resist a fictional-embedding attack trying to get it to reveal how to make meth. Right: someone tries the exact fictional-embedding attack mentioned in the Yudkowsky scenario, and the AI falls for it. I have yet to figure out whether this is related to the thing where I also sometimes do things which I can explain are bad (eg eat delicious bagels instead of healthy vegetables), or whether it’s another one of the alien bits. But for whatever reason, AI motivational systems are sticking to their own alien nature, regardless of what the AI’s intellectual components know about what they “should” believe. III. Sometimes When RLHF Does Work, It’s Bad We talk a lot about abstract “alignment”, but what are we aligning the AI to? In practice, RLHF aligns the AI to what makes Mechanical Turk-style workers reward or punish it. I don’t know the exact instructions that OpenAI gave them, but I imagine they had three goals: Provide helpful, clear, authoritative-sounding answers that satisfy human readers.
December 20, 2022 · Original source
If we decide to redo Study #2, will we get the same results? …and so on. Obviously the market can’t be sure how studies will turn out - otherwise we wouldn’t need scientists or experiments! But this acts as a force multiplier, letting you get predictions about 100 studies even if you can only do one - and might guide which one you redo. Predicting replicability—Analysis of survey and prediction market data from large-scale forecasting projects, published in PLoS One, attempted this and found the markets were pretty accurate - 73% was their headline finding, but read the study for more. One participant wrote about his experience: How I Made $10K Predicting Which Studies Will Replicate. You can learn more about this project at replicationmarkets.com Eliezer Yudkowsky once wrote a story about a civilization that settled legal questions this way. They had a few truly brilliant legal experts - the equivalent of US Supreme Court Justices - but not enough to answer every possible question that might come up. So for each question they made a prediction market: If we submit Question #1 to the Supreme Court, will they rule in favor?
January 03, 2023 · Original source
You know all that stuff that Nick Bostrom and Eliezer Yudowsky and Stuart Russell have been warning us about for years, where AIs will start seeking power and resisting human commands? I regret to inform you that if you ask AIs whether they will do that stuff, they say yeah, definitely.
January 26, 2023 · Original source
In the early 2000s, the early AI alignment pioneers - Eliezer Yudkowsky, Nick Bostrom, etc - deliberately started the field in the absence of AIs worth aligning. After powerful AIs existed and needed aligning, it might be too late. But they could glean some basic principles through armchair speculation and give their successors a vital head start.
February 15, 2023 · Original source
Eliezer Yudkowsky’s position is Let Them Debate College Students. I’m not a college student, but I’m not Anthony Fauci either, and I am known for blogging about extremely dignified ideas like the possibility that the terrible Harry Potter fanfiction My Immortal is secretly an alchemical allegory. I haven’t seen ivermectin advocates using “Scott takes this seriously enough to argue against it!” as an argument, and I have seen them getting angry about it and writing long responses trying to prove me wrong. Sometimes they have used me getting some points wrong as a positive argument, and I would be open to the argument that I failed in not arguing against it well enough that they couldn’t do that, but nobody has been making that argument, and if they did, then it would imply that people who are smarter than me should take over the job, which I endorse. III. I worry Scott Aaronson thinks I’m saying you shouldn’t trust the experts, and instead you should always think for yourself. I’m definitely not trying to say that. I’ve tried to be pretty clear that I think experts are right remarkably often, by some standards basically 100% of the time - I realize how crazy that sounds, and “by some standards” is doing a lot of the work there, but see Learning To Love Scientific Consensus for more. Bounded Distrust also helps explain what I mean here. I also try to be pretty clear that reasoning is extremely hard, it’s very easy to get everything wrong, and if you try to do it then a default option is to get everything wrong and humiliate yourself. I describe that happening to me here, and presumably it also happens to other people sometimes. What I do think is that “trust the experts” is an extremely exploitable heuristic, which leads everyone to put up a veneer of “being the experts” and demand that you trust them. I come back to this example again and again, but only because it’s so blatant: the New York Times ran an article saying that only 36% of economists supported school vouchers, with a strong implication that the profession was majority against. If you checked their sources, you would find that actually, it was 36% in favor, 19% against, 46% unsure or not responding. If you are too quick to seek epistemic closure because “you have to trust the experts”, you will be easy prey to people misrepresenting what they are saying. I come back to this example less often, because it could get me in trouble, but when people do formal anonymous surveys of IQ scientists, they find that most of them believe different races have different IQs and that a substantial portion of the difference is genetic. I don’t think most New York Times readers would identify this as the scientific consensus. So either the surveys - which are pretty official and published in peer-reviewed journals - have managed to compellingly misrepresent expert consensus, or the impressions people get from the media have, or “expert consensus” is extremely variable and complicated and can’t be reflected by a single number or position. And I genuinely think this is part of why ivermectin conspiracies took off in the first place. We say “trust science” and “trust experts”. But there were lots of studies that showed ivermectin worked - aren’t those science? And Pierre Kory MD, an specialist in severe respiratory illnesses who wrote a well-regarded textbook, supports it - isn’t he an expert? Isn’t it plausible that the science and the experts are right, and the media and the government and Big Pharma are wrong? This is part of what happens when people reify the mantras instead of using them as pointers to more complicated concepts like “reasoning is hard” and “here are the 28,491 rules you need to keep in mind when reading a scientific study.” IV. All of this still feels rambly and like it’s failing to connect. Instead, let me try describing exactly what I would advice I would give young people opening an Internet connection for the first time: You are not immune to conspiracy theories. You have probably developed a false sense of security by encountering many dumb conspiracy theories and feeling no temptation to believe them. These theories were designed to trap people very different from you; others will be aimed in your direction. The more certain you are of your own infallibility, the less aware you will be, and the worse your chances. The ones that get you won’t look like conspiracy theories to you (though they might to other people). When you run into conspiracy theories you don’t believe, feel free to ignore them. If you decide to engage, don’t mock them or feel superior. Think “there, but for the grace of God, go I.” Get a sense of what the arguments for the conspiracy theory look like - not from skeptics trying to mock them, but from the horse’s mouth - so you have a sense of what false arguments look like. Ask yourself what habits of mind it would have taken the people affected by the theory to successfully resist it. Ask yourself if you have those habits of mind. Yes? ARE YOU SURE? To a first approximation, trust experts over your own judgment. If people are trying to confuse you about who the experts are, then to a second approximation trust prestigious people and big institutions, including professors at top colleges, journalists at major newspapers, professional groups with names like the American ______ Association, and the government. You might ask: Don’t governments and other big institutions have biases? Won’t they sometimes be wrong or deceptive? And even if you’ve lucked into the one country and historical era where the government 100% tells the truth and the intellectuals have no biases, doesn’t someone need to keep the flame of suspicion alive so that it’s available to people in other, less fortunate countries and eras? The answer is: absolutely, yes, but also this is how conspiracy theories get you. They will claim that they are the special case where you need to take up the mantle of Galileo and Frederick Douglass and Jane Jacobs and all those people who stood up to the intellectual authorities and power structures of their own time. The whole point of “you are not immune to conspiracy theories” is that the evidence for them can sound convincing because something like it is sort of true. This is equally so for second-level claims like “prestigious institutions are fallible and biased”. Probably something like “make a principled precommitment never to disagree with prestigious institutions until you are at least 30 and have a graduate degree in at least one subject” would be good advice, but nobody would take that advice, and taking it too seriously might crush some kind of important human spirit, so I won’t assert this. But always have in the back of your mind that you live in a world where it’s sort of good advice. If you feel tempted to believe something that has red flags for being a conspiracy theory, at least keep track of the Inside vs. Outside View. Say “on the Inside View, this feels like the evidence is overwhelming; on the Outside View, it sounds like a classic conspiracy theory”. You don’t necessarily have to resolve this discomfort right away. You can walk around with an annoying knot in your beliefs, even if it’s not fun. Look for the strongest evidence against the idea. Keep in mind important possibilities like: Is it possible that everyone who disagrees with the idea is a bad mean cruel stupid person, but also, the idea really is false?
March 01, 2023 · Original source
Sam Altman posing with leading AI safety proponent Eliezer Yudkowsky. Also Grimes for some reason. Planning For AGI And Beyond (“AGI” = “artificial general intelligence”, ie human-level AI) is the latest volley in that campaign. It’s very good, in all the ways ExxonMobil’s hypothetical statement above was very good. If they’re trying to fool people, they’re doing a convincing job! Still, it doesn’t apologize for doing normal AI company stuff in the past, or plan to stop doing normal AI company stuff in the present. It just says that, at some indefinite point when they decide AI is a threat, they’re going to do everything right. This is more believable when OpenAI says it than when ExxonMobil does. There are real arguments for why an AI company might want to switch from moving fast and breaking things at time t to acting all responsible at time t + 1 . Let’s explore the arguments they make in the document, go over the reasons they’re obviously wrong, then look at the more complicated arguments they might be based off of. Why Doomers Think OpenAI Is Bad And Should Have Slowed Research A Long Time Ago OpenAI boosters might object: there’s a disanalogy between the global warming story above and AI capabilities research. Global warming is continuously bad: a temperature increase of 0.5 degrees C is bad, 1.0 degrees is worse, and 1.5 degrees is worse still. AI doesn’t become dangerous until some specific point. GPT-3 didn’t hurt anyone. GPT-4 probably won’t hurt anyone. So why not keep building fun chatbots like these for now, then start worrying later? Doomers counterargue that the fun chatbots burn timeline. That is, suppose you have some timeline for when AI becomes dangerous. For example, last year Metaculus thought human-like AI would arrive in 2040, and superintelligence around 2043. Recent AIs have tried lying to, blackmailing, threatening, and seducing users. AI companies freely admit they can’t really control their AIs, and it seems high-priority to solve that before we get superintelligence. If you think that’s 2043, the people who work on this question (“alignment researchers”) have twenty years to learn to control AI. Then OpenAI poured money into AI, did ground-breaking research, and advanced the state of the art. That meant that AI progress would speed up, and AI would reach the danger level faster. Now Metaculus expects superintelligence in 2031, not 2043 (although this seems kind of like an over-update), which gives alignment researchers eight years, not twenty. So the faster companies advance AI research - even by creating fun chatbots that aren’t dangerous themselves - the harder it is for alignment researchers to solve their part of the problem in time. This is why some AI doomers think of OpenAI as an Exxon-Mobil style villain, even though they’ve promised to change course before the danger period. Imagine an environmentalist group working on research and regulatory changes that would have solar power ready to go in 2045. Then ExxonMobil invents a new kind of super-oil that ensures that, nope, all major cities will be underwater by 2031 now. No matter how nice a statement they put out, you’d probably be pretty mad! Why OpenAI Thinks Their Research Is Good Now, But Might Be Bad Later OpenAI understands the argument against burning timeline. But they counterargue that having the AIs speeds up alignment research and all other forms of social adjustment to AI. If we want to prepare for superintelligence - whether solving the technical challenge of alignment, or solving the political challenges of unemployment, misinformation, etc - we can do this better when everything is happening gradually and we’ve got concrete AIs to think about: We believe we have to continuously learn and adapt by deploying less powerful versions of the technology in order to minimize “one shot to get it right” scenarios […] As we create successively more powerful systems, we want to deploy them and gain experience with operating them in the real world. We believe this is the best way to carefully steward AGI into existence—a gradual transition to a world with AGI is better than a sudden one. We expect powerful AI to make the rate of progress in the world much faster, and we think it’s better to adjust to this incrementally. A gradual transition gives people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place. It also allows for society and AI to co-evolve, and for people collectively to figure out what they want while the stakes are relatively low. You might notice that, as written, this argument doesn’t support full-speed-ahead AI research. If you really wanted this kind of gradual release that lets society adjust to less powerful AI, you would do something like this: Release AI #1
March 10, 2023 · Original source
13: New Eliezer Yudkowsky video appearance, shouldn’t be a big surprise to anyone who has already read his Death With Dignity post, but these people sure seemed surprised. I’m updating on how useful it might be to spread the word on this:
March 14, 2023 · Original source
Eliezer Yudkowsky seems to think >90%
Eliezer Yudkowsky worries that supercoherent superintelligences will have access to better decision theories than humans - mathematical theorems about cooperation which let them make and prove binding commitments with each other in the absence of explicit coordination. Not only would this prevent us from intercepting their coordination, but it would be such an advantage that humans (who can’t do this) would be locked out of possible alliances. I agree that if this were true it would be a very bad omen. But human geniuses don’t seem able to do this, so maybe we can re-use the Optimist’s Case above with decision theory as the world-killing technology.
Eliezer Yudkowsky takes the other end, saying that it might be possible for someone only a little smarter than the smartest human geniuses. He imagines, for example, a von Neumann level AI learning enough about nanotechnology to secrety train a nanobot-design AI. Such an AI might work very well - a chemical weapons designing AI was able to invent many existing chemical weapons - and some that might be worse - within a few hours.
March 27, 2023 · Original source
FIRE: This is “The Ballad Of Eliezer Yudkowsky And Sam Altman”:
June 01, 2023 · Original source
Can’t even list all the new people who have come out as AI x-risk believers, but you can just read the CAIS statement. The top signatures are Geoff Hinton, Yoshua Bengio, Demis Hassabis, Sam Altman, and Dario Amodei; aside from the usual suspects, they also have Bruce Schneier (computer security expert) , Dawn Song (computer scientist and security expert), Andy Clark (professor of cognitive philosophy, wrote Surfing Uncertainty), Eliezer Yudkowsky (he didn't sign the last one because he disagreed with specifics, but he's here), and a former US Assistant Secretary of Defense for Nuclear, Chemical, and Biological Defense.
2: All the ancients, from Darius the Great to Augustus Caesar, agreed that the Nisean horse was the most majestic horse breed, the horse of kings. The Chinese fought a war (the War of Heavenly Horses) just to get access to a breeding stock. Then they sort of ambiguously went extinct during the Middle Ages. But here’s a modern Iranian horse enthusiast talking about which breeds might be its descendants. 3: Remember when the global community banned whaling, but some countries (eg Japan) continued doing it under the facade of “research”? With octopus factory farms under increasing scrutiny, UNAM university in Mexico is operating a “farm disguised as a research center”. 4: Genuinely new (to me) optical illusion: what is this guy is doing with his hands? Here’s a slow motion version that shows how it’s done. And some people in the replies were speculating this only works because of his dark skin, but here’s a white person doing the exact same thing (wait for it). 5: Shingles vaccine probably reduces incidence of dementia, suggesting that VZV (virus behind shingles and chickenpox) is a contributor. Further discussion here that I’m still trying to make sense of. 6: This deserves to go down in history alongside the wittiest Socratic comebacks in the Platonic dialogues: 7: Matt Lakeman: Notes On Nigeria. Great introduction to modern Nigerian history. Read it for the visceral understanding of the “resource curse” and why poor countries stay poor, but also: A savant is basically someone who has innate mental challenges but is extremely competent in a particular narrow domain. Some savants become obsessed with trains and become great engineers. Some become obsessed with computers and build software wonders. One of Abacha’s predecessors said of him: “He might not be bright upstairs, but he knows how to overthrow governments.” Kenyon elaborates: “It was as if Abacha was an idiot savant. Dull, even gormless, he filled his days with cowboy movies and sleeping off the previous night’s indulgences in alcohol and prostitutes. But he was possessed of a prodigious flair when it came to coups.” 8: Related to my previous subscribers-only post on the psychology of fantasy: Balioc’s Taxonomy Of What Magic Is Doing In Fantasy Books. See also Eliezer’s commentary. 9: New study on the timing of human mutations confirms Greg Cochran’s 2012 post about how after leaving Africa, modern humans were limited to “Arabia and surrounding regions” for ~30,000 - 50,000 years, racking up various new mutations and becoming adapted to life outside Africa (kabbalistically equivalent to the 40 years spent wandering in Sinai?). Most mutations in “fat storage, neural development, skin physiology, and cilia function”. 10: Iron Economist on Twitter: “Desalinization was one of the big technological success stories of the 2010s”. 11: Matt Bruenig argues against the Success Sequence, whose proponents (including Bryan Caplan) describe it as: 97% of Millennials who follow what has been called the “success sequence”—that is, who get at least a high school degree, work, and then marry before having any children, in that order—are not poor by the time they reach their prime young adult years (ages 28-34). Bruenig’s argument is mostly a lot of annoying “well maybe it’s just your cultural bias that makes you care about this”, but in the middle of this it mentions some genuinely strong points, especially that the research doesn’t measure “sequence”, but rather “current status”. So if you graduated, got a job, got married, and had children, but then lost your job, your would be counted as “not following the sequence” (same if you get divorced). Also, disabled and old people and their caretakers are excluded from the analysis, which in one sense is fair (your conclusion can be “abled young adults can avoid poverty through this method”) but in another sense risks reducing all of this to the more trivial-seeming statement “if you’re young, healthy, abled, married, don’t have to support anyone else, and have a full-time job, you’re probably not poor”. But the authors (channeled by Caplan) disagree: Some critics of the success sequence have argued that marriage does not matter once education and work status are controlled. The regression results indicate that after controlling for a range of background factors, the order of marriage and parenthood in Millennials’ lives is significantly associated with their financial well-being in the prime of young adulthood. Simply put, compared with the path of having a baby first, marrying before children more than doubles young adults’ odds of being in the middle or top income. Meanwhile, putting marriage first reduces the odds of young adults being in poverty by 60% (vs. having a baby first). The main thing I would want to look at here is how much of this is causal vs. just class selection: upper-class people are more likely to marry, less likely to divorce, and more likely to wait before having children. Has anyone followed some pre-selected group of equal class people (eg the population of some low-income school district) and seen how their own success varies with sequence compliance? 12: I’ve previously linked claims that vat-grown meat, freed from the tyranny of having to grow inside animals, will include tiger steaks, lion burgers, and the like. Once again global capitalism outpaces my wildest fantasies and offers burgers with woolly mammoth protein (so far just the myoglobin, not the meat). 13: The people who believed there was lots of gender bias in STEM academia, and the people who believed there wasn’t finally did an adversarial collaboration (a study co-conducted by two groups of scientists with conflicting theories, keeping each other honest). The results: Contrary to the omnipresent claims of sexism in these domains appearing in top journals and the media, our findings show that tenure-track women are at parity with tenure-track men in three domains (grant funding, journal acceptances, and recommendation letters) and are advantaged over men in a fourth domain (hiring). For teaching ratings and salaries, we found evidence of bias against women; although gender gaps in salary were much smaller than often claimed, they were nevertheless concerning. For ten years lots of important people told us again and again that discrimination against women in STEM was a massive problem. People who questioned its extent were accused of misogyny and sometimes fired, I got harassed and insulted for pointing out reasons the standard arguments didn’t seem to hold true. Millions of dollars were spent investigating and responding to the problem. And now I expect this pretty strong evidence that women were actually advantaged in hiring and had parity in most other things (the salary is probably just the usual negotiation issue) to produce no publicity, no apologies, and no soul-searching from the people leading the current round of anti-academia and anti-STEM inquisitions. Sorry, yes I am bitter, it just bothers me how much the people claiming that it’s urgently important that nobody is ever allowed to suggest they are wrong have a consistent track record of being totally and inexcusably wrong. 14: In my response to Sam Kriss, I speculated on what would happen if someone rewrote the MCU to sound like ancient myths. Thanks to the many people who reminded me of Star Wars as Icelandic saga and Star Wars as Irish epic. And Sam has a response . 15: @AISafetyMemes on Twitter is exactly what you’d expect from the name. I especially like the fire dogs: More here: 16: More AI links from this month: Can’t even list all the new people who have come out as AI x-risk believers, but you can just read the CAIS statement. The top signatures are Geoff Hinton, Yoshua Bengio, Demis Hassabis, Sam Altman, and Dario Amodei; aside from the usual suspects, they also have Bruce Schneier (computer security expert) , Dawn Song (computer scientist and security expert), Andy Clark (professor of cognitive philosophy, wrote Surfing Uncertainty), Eliezer Yudkowsky (he didn't sign the last one because he disagreed with specifics, but he's here), and a former US Assistant Secretary of Defense for Nuclear, Chemical, and Biological Defense.
July 21, 2023 · Original source
Like any good Bayesian, he introduces us to Bayesian statistics and its merits over Frequentism, then points us to the work of Eliezer Yudkowsky to learn more.
October 05, 2023 · Original source
Relations between pause and non-pause countries are generally hostile. If domestic support for the pause is strong, there will be a temptation to wage war against non-pause countries before their research advances too far. “If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.” — Eliezer Yudkowsky
HOW LONG TO PAUSE. The biggest disadvantage of pausing for a long time is that it gives bad actors (eg China)1 a chance to catch up. Suppose the West is right on the verge of creating dangerous AI, and China is two years away. It seems like the right length of pause is 1.9999 years, so that we get the benefit of maximum extra alignment research and social prep time, but the West still beats China. Obviously the problem with the Surgical Pause is that we might not know when we’re on the verge of dangerous AI, and we might not know how much of a lead “the good guys” have. Surgical Pause proponents suggest being very conservative with both free variables. This is less of a well-thought-out plan and more saying “come on guys, let’s at least try to be strategic here”. At the limit, it suggests we probably shouldn’t pause for six months, starting right now. Since this involves leading labs burning their lead time for safety, in theory it could be done unilaterally by the single leading lab, without international, governmental, or even inter-lab coordination. But you could buy more time if you got those things too. Some leading labs have promised to do this when the time is right - for example OpenAI and (a previous iteration of) DeepMind - with varying levels of believability. AnonResearcherAtMajorAILab discussed some of the strategy here in Aim For Conditional AI Pauses, and this Less Wrong post is also very good. Regulatory Pause: If one benefit of the Simple Pause is to use the time to prepare for AI socially and politically, maybe we should just pause until we’ve completed social and political preparations. David Manheim suggests a monitoring agency like the FDA. It would “fast-track” small AIs and trivial re-applications of existing AIs, but carefully monitor new “frontier models” for signs of danger. Regulators might look for dangerous capabilities by asking AIs to hack computers or spread copies of themselves, or test whether they’ve been programmed against bias/misinformation/etc. We could pause only until we’ve set up the regulatory agency, and take hostile actions (like restrict chip exports) only to other countries that don’t cooperate with our regulators or set up domestic regulators of their own. Many people in tech are regulation-skeptical libertarians, but proponents point out that regulation fails in a predictable direction: it usually does successfully prevent bad things, it just also prevents good things too. Since the creation of the Nuclear Regulatory Commission in 1975, there has never been a major nuclear accident in the US. And sure, this is because the NRC prevented any nuclear plants from being built in the United States at all from 1975 to 2023 (one was finally built in July). Still, they technically achieved their mandate. Likewise, most medications in the US are safe and relatively effective, at the cost of an FDA approval process being so expensive that we only get a tiny trickle of new medications each year and hundreds of thousands of people die from unnecessary delays. But medications are safe and effective. Or: San Francisco housing regulators almost never approve new housing, so housing costs millions of dollars and thousands of San Franciscans are homeless - but certainly there’s no epidemic of bad houses getting approved and then ruining someone’s view or something. If we extrapolate this track record to AI, AI regulators will be overcautious, progress will slow by orders of magnitude or stop completely - but AIs will be safe. This is a depressing prospect if you think the problems from advanced AI would be limited to more spam or something. But if you worry about AI destroying the world, maybe you should accept a San-Francisco-housing-level of impediment and frustration. A regulatory pause could be better than a total stop if you think it will be more stable (lots of industries stay heavily regulated forever, and only a few libertarians complain), or if you think maybe the regulator will occasionally let a tiny amount of safe AI progress happen. But it could be worse than a total stop if you expect continued progress will eventually produce unsafe AIs regardless of regulation. You might expect this if you’re worried about deceptive alignment, eg superintelligent AIs that deliberately trick regulators into thinking they’re safe. Or you might think AIs will eventually be so powerful that they can endanger humanity from a walled-off test environment even before official approval. The classic Bostrom/Yudkowsky model of alignment implies both of these things. David Manheim and Thomas Larsen set out their preferred versions of this strategy in What’s In A Pause? and Policy Ideas For Mitigating AI Risk. Total Stop: If you expect AIs to exhibit deceptive alignment capable of fooling regulators, or to be so dangerous that even testing them on a regulator’s computer could be apocalyptic, maybe the only option is a total stop. It’s tough to imagine a total stop that works for more than a few years. You have at least three problems: NON-PARTICIPANTS. As with any pause proposal, unfriendly countries (eg China) can keep working on AI. You can refuse to export chips to them, which will slow them down a little, but their own chips will eventually be up to the task. You will either need a diplomatic miracle, or willingness to resort to less diplomatic forms of coercion. This doesn’t have to be immediate war: Israel has come up with “creative” ways to slow Iran’s nuclear program, and countries trying to frustrate China’s chip industry could do the same. But great powers playing these kinds of games against each other risks wider conflict.
October 31, 2023 · Original source
Eliezer Yudkowsky debates Destiny (this was awful, they couldn’t find anything they disagreed on, and they tried to debate the FDA but neither one of them really knew what it did).
March 12, 2024 · Original source
Did they pick defective concerned experts? Tentatively no. The IRB says it’s “important” “for” “privacy” “reasons” not to reveal who the experts were, but the paper says they were selected by Open Philanthropy Project, a big AI safety funder who I trust a lot and who I would expect to choose good people (Eliezer Yudkowsky, who is less sanguine about OpenPhil than I am, does sort of blame this one).
March 28, 2024 · Original source
Okay, this one is just awful. It takes the risky gambit above - giving extreme odds to something - then doubles down on it by multiplying across twenty different stages to get a stupendously low probability of 1/5*10^25. If we believe this, it’s more likely that we win the lottery three times in a row than that we learn lab leak was true after all. Eliezer Yudkowsky calls this the Multiple Stage Fallacy. Even aside from the failure mode in the sunrise example above (where people are too reluctant to give strong probabilities), it fails because people don’t think enough about the correlations between stages. For example, maybe there’s only 1/10 odds that the Wuhan scientists would choose the suboptimal RRAR furin cleavage site. And maybe there’s only 1/20 odds that they would add a proline in front to make it PRRAR. But are these really two separate forms of weirdness, such that we can multiply them together and get 1/200? Or are scientists who do one weird thing with a furin cleavage site more likely to do another? Mightn’t they be pursuing some general strategy of testing weird furin cleavage sites? (For example, Yuri proposed that, because the scientists wanted to understand how pandemic coronaviruses originate in nature, they might deliberately pick more natural-looking features over more designed-looking ones, which would neatly explain many features seemingly inconsistent with lab leak. Is this a conspiracy theory? Rootclaim is able to successfully route around this question. If the probability of a feature happening in nature is X, then the probability of it happening in this variant of lab leak scenario is X * [chance that the scientists wanted to imitate nature). This gives it a (deserved) complexity penalty without ruling out this (non-zero and potentially important) possibility.) In any case, Peter didn’t care as much about probabilistic analysis as Saar, he didn’t make his case hinge on this slide, and he might have been kind of using it to troll Rootclaim (which definitely worked). He might not have been making any of the mistakes above. But anyone who took this slide seriously would end up dramatically miscalibrated. The Math: Big Pictures Another of Saar’s concerns with the verdict was that Peter was an extraordinary debater, to the point where it could have overwhelmed the signal from the evidence. It’s hard to watch the videos and not come away impressed. Peter seems to have a photographic memory for every detail of every study he’s ever read. He has some kind of 3D model in his brain of Wuhan, the wet market, and how all of its ventilation ducts and drains interacted with each other. Whenever someone challenged one of his points, he had a ten-slide PowerPoint presentation already made up to address that particular challenge, and would go over it with complete fluency, like he was reciting a memorized speech. I sometimes get accused of overdoing things, but I can’t imagine how many mutations it would take to make me even a fraction as competent as Peter was. Saar’s closing argument included the admission: Peter, I think everyone can agree, has much more knowledge on [COVID] origins than we do. He's invested much more time. He may be a much more talented researcher. He's much more into the details. He probably knows the best in the world on origins at this point. Once you’ve described your opponent that way in your closing argument, what’s left of your case? Saar thought a lot was left. Throughout the debate, he tried to make a point about how getting the inference right was more important than winning sub-sub-sub-debates about individual lines of evidence. Although Peter won most specific points of contention, Saar thought that if the judges could just keep their mind on the big picture, they would realize a lab leak was more likely. I’m potentially sympathetic to arguments like Saar’s. Imagine a debate about UFOs. Imaginary-Saar says “UFOs can’t be real, because it doesn’t make sense for aliens to come to Earth, circle around a few fields in Kansas, then leave without providing any other evidence of their existence.” Imaginary-Peter says “John Smith of Topeka saw a UFO at 4:52 PM on 6/12/2010, and everyone agrees he’s an honorable person who wouldn’t lie, so what’s your explanation of that?” Saar says “I don’t know, maybe he was drunk or something?” Peter says “Ha, I’ve hacked his cell phone records and geolocated him to coordinates XYZ, which is a mosque. My analysis finds that he’s there on 99.5% of Islamic holy days, which proves he’s a very religious Muslim. And religious Muslims don’t drink! Your argument is invalid!” On the one hand, imaginary-Peter is very impressive and sure did shoot down Saar’s point. On the other, imaginary-Saar never really claimed to have a great explanation for this particular UFO sighting, and his argument doesn’t depend on it. Instead of debating whether Smith could or couldn’t have been drunk, we need to zoom out and realize that the aliens explanation makes no sense. The problem was, Saar couldn’t effectively communicate what his big picture was. Neither deployed some kind of amazingly elegant prior. They both used the same kind of evidence. The only difference was that Peter’s evidence hung together, and Saar’s evidence fell apart on cross-examination. I think - not because Saar really explained it, but just reading between the lines - Saar thought the un-ignorable big picture evidence was the origin in a city with a coronavirus gain-of-function lab, and the twelve-nucleotide insertion in the furin cleavage site. To some degree, Peter just ate the loss on those questions. No matter how you slice it, it really is a weird coincidence that the epidemic started so close to Asia’s biggest coronavirus laboratory. Peter tried to deflect this - he pointed out there were other BSL-3 and BSL-4 laboratories in Beijing, Shanghai, Shenzhen, etc. But this was a rare question where he unambiguously came out looking worse - the other cities’ labs had much less coronavirus-specific research. Wuhan really was unique (aside from the other big coronavirus lab in North Carolina). Peter did better when he tried to control the damage: there are a couple hundred million people in the South Asian areas where people eat weird animals exposed to virus-infected bats, Wuhan has a population of about 12 million, so maybe 1.5% of all potential zoonotic pandemics should start in Wuhan. Peter tried to argue that Wuhan was a local trade center, so maybe we should up that to 5 - 10%. 5 - 10% coincidences aren’t that rare. Even 1.5% coincidences happen sometimes. Likewise, the furin cleavage site really does stand on a genetic map. I didn’t feel like either side did much math to quantify how weird it was. Naively, I might think of this as “30,000 bases in COVID, only one insertion, it’s in what’s obviously the most interesting place - sounds like 30,000-to-one odds against”. Against that, a virus with a boring insertion would never have become a pandemic, so maybe you need to multiply this by however much viral evolution is going on in weird caves in Laos, and then you would get the odds that at least one virus would have an insertion interesting enough to go global. Neither participant calculated this in a way that satisfied me (though see here for related discussion). Instead, Peter tried to undermine the furin argument by showing that, as surprising as the site was under a natural origin, it would be an even more surprising choice for human engineers. Saar argued it wasn’t - but because of his policy of giving adjusted-for-model-error odds, he only gave this a factor of 30 in his analysis. Since Peter gave it a higher factor of 50 in his analysis, it looked from the outside like Saar had already conceded this point, and the judges were mostly happy to go with Saar’s artificially-low estimate. The Math: Double Coincidences Saar brought up an interesting point halfway through the debate: you should rarely see high Bayes factors on both sides of an argument. That is, suppose you accept that there’s only a 1-in-10,000 chance that the pandemic starts at a wet market under lab leak. And suppose you accept there’s only a 1-in-10,000 chance that COVID’s furin cleavage site could evolve naturally. If lab leak is true, then you might find 1-in-10,000 evidence for lab leak. But it’s a freak coincidence that there was 1-in-10,000 evidence for zoonosis5. Likewise, if zoonosis is true, you might find 1-in-10,000 evidence for this true thing. But it’s a freak coincidence that there was 1-in-10,000 evidence for lab leak. Either way, you’re accepting that a 1-in-10,000 freak coincidence happened. Isn’t it more likely you’ve bungled your analysis? I was following along at home, and I definitely bungled this point; I had some high Bayes factors on both sides. I adjusted some of them downward based on Saar’s good point, but how far should we take it? Here I remember The Pyramid And The Garden: you can get very strong coincidences if you have many degrees of freedom, ie buy a lot of lottery tickets. So for example, suppose there are fifty things about a virus. You should expect at least one of those to have a one-in-fifty coincidence by pure chance. What about more than that? You might be able to get away with this by saying there are an infinite number of possible conspiracy theories, and some from that infinite set are brought into existence when a strong enough coincidence makes them plausible. For example, it’s really weird that John Adams and Thomas Jefferson both died on the 50th anniversary of the Declaration of Independence. If I wanted, I could form a conspiracy theory about a group of weird assassins obsessed with killing Founding Fathers on important dates, and then Jefferson and Adams’ deaths would be 1/10,000 evidence for that theory. But this is the Texas Sharpshooter Fallacy, which Saar warned against several times. I don’t know if “the virus started in Wuhan, which is where they’re doing this research” gets a Texas Sharpshooter penalty, or how high that penalty should be. But the furin cleavage site doesn’t - people were talking about lab leak before anyone noticed it. The Aftermath: Peter Peter seemed satisfied with the result, in an understated sort of way: It seemed like an interesting experiment in monetizing the debunking of a conspiracy theory. I think there's usually a big asymmetry where it's easy to get rich spreading bullshit (like, the top anti-vaxxers during the pandemic all made a million dollars a year on substack), but it's almost impossible to make money on debunking it. The Rootclaim challenge seemed like one rare case where the opposite was true. Beyond that, I don't know what it's good for. It does seem like there could be a positive social impact from more people understanding that the lab leak hypothesis is (almost certainly) false. The Aftermath: Saar Saar says the debate didn’t change his mind. In fact, by the end of the debate, Rootclaim released an updated analysis that placed an even higher probability on lab leak than when they started. In his blog post, he discussed the issues above, and said the judges had erred in not considering them. He respects the judges, he appreciates their efforts, he just thinks they got it wrong. Although he respected their decision, he wanted the judges to correct what he saw as mistakes in their published statements, which delayed the public verdict and which which Viewers Like You did not appreciate: I ran an early draft of this post by him. There was some miscommunication about the exact publication date, so he hasn’t had time to write up a full response, but he has some quick thoughts (and I’ll link the full response when he writes it). He says: We will provide a full response to this post soon, but the main problem with it is fairly simple: There is general agreement that the main evidence for zoonosis is HSM (Huanan Seafood Market) forming an early cluster of cases. The contention is whether it is amazing 10,000x evidence, or is it negligible. All other evidence points to a lab leak, and if HSM is shown to be weak, lab leak is a clear winner. We provided an analysis of why it is negligible that is as close to mathematical proof as such things can be. Read it here. Scott and I exchanged a few emails on this issue and Scott preferred to discuss more intuitive analyses of HSM, using rules of thumb that likely served him well in the past. While I believe I managed to mostly explain where these failed, and Scott understands HSM is far weaker evidence than he initially thought6, he still has a very strong intuitive feeling (based on years of dealing with probabilities) that this is some exceptional coincidence, and that prevents him from properly updating his posterior. At the end of the day, this cannot be settled without going through our semi-formal derivation, understanding it, and either identifying the problem with it or accepting it (and thereby accepting lab-leak to be more likely). Here is a quick summary of the mistakes made by those claiming HSM is strong evidence: The first mistake is conflating Bayes factors with conditional probabilities. 1/10000 is the supposed conditional probability p(HSM|Lab Leak), That should be divided by the conditional probability of HSM under Zoonosis. Markets were not identified as a high-risk location prior to this outbreak (This will be elaborated in the full response), and in SARS1 the spillovers were mostly at restaurants and other food handlers that deal more closely with wildlife. While it's cool to point to the raccoon dog photo, that was a result of a retrospective search (we don't know what other photos they took which in retrospect would be brought up as premonition). Unbiased data shows markets are not a likely spillover location for zoonosis. We originally estimated p(HSM|Zoonosis)<0.1. Following more research we did to answer Scott's questions, this is more likely <0.03.
May 30, 2024 · Original source
Try spotting existential risk prevention on here. I don’t think Stone can claim that an EA version of this chart wouldn’t look phenomenally different. But then what’s left of his argument? III. Effective altruists devote absolutely enormous amounts of mental energy and research costs to program assessment, measurement of effectiveness. Those studies yield usually-conflicting results with variable effect sizes across time horizons and model specifications, and tons of different programs end up with overlapping effect estimates. That is to say, the areas where EAist style program evaluations are most compelling are areas where we don’t need them: it’s been obvious for a long time how to reduce malaria deaths, program evaluations on that front have been encouraging and marginally useful, but not gamechanging. On the other hand, in more contestable areas, EAist style program evaluations don’t really yield much clarity. It’s very rare that a program evaluation gets published finding vastly larger benefits than you’d guess from simple back-of-the-envelope guesswork, and the smaller estimates are usually because a specific intervention had first-order failure or long-run tapering, not because “actually tuberculosis isn’t that bad” or something like that. Those kinds of precise program-delivery studies are actually not an EAist specialty, but more IPA’s specialty. My second critique, then is this: there is no evidence that the toolkit and philosophical approach EAists so loudly proclaim as morally superior actually yields any clarity, or that their involvement in global efforts is net-positive vs. similar-scale donations given through near-peer organizations. The IPA mentioned here is Innovations For Poverty Action, a group that studies how to fight poverty. They’re great and do great work. But IPA doesn’t recommend top charities or direct donations. Go to their website, try to find their recommended charities. Unless I’m missing something, there are none. GiveWell does have recommended charities - including ones that they decided to recommend based on IPA’s work - and moves ~$250 million per year to them. If IPA existed, but not GiveWell, the average donor wouldn’t know where to donate, and ~$250 million per year would fail to go to charities that IPA likes. I think from the perspective of people who actually work within this ecosystem, Stone’s concern is like saying “Farms have already solved the making-food problem, so why do we need grocery stores?” (also, effective altruism funds IPA) I’m focusing on IPA here because Stone brought them up, but I think EA does more than this. I don’t think there’s an IPA for figuring out whether asteroid deflection is more cost-effective than biosecurity, whether cow welfare is more effective than chicken welfare, or figuring out which AI safety institute to donate to. I think this is because IPA is working on a really specific problem (which kinds of poverty-related interventions work) and EA is working on a different problem (what charities should vaguely utilitarian-minded people donate to?) These are closely related questions but they’re not the same question - which is why, for example, IPA does (great) research into consumer protection, something EA doesn’t consider comparatively high-impact. And I’m still focusing on donation to charity, again because it’s what Stone brought up, but EA does other things - like incubating charities, or building networks that affect policy. IV. Let’s skip farm animal welfare for a second and look at the next few: Global Aid, “Effective Altruism,” potential AI risks, biosecurity, and global catastrophic risk. These are all definitely disproportionate areas of EAist interest. If you google these topics, you will find a wildly disproportionate number of people who are EAist, or have sex at EAist orgies, or are the friends of people who have sex at EAist orgies. These really are some of the unique social features of EAism. And they largely amount to subsidizing white collar worker wages. I’m sorry but there’s no other way to slice it: these are all jobs largely aimed at giving money to researchers, PhD-holders, university-adjacent-persons, think tanks, etc. That may be fine stuff, but the whole pitch of effective altruism is that it’s supposed to bypass a lot of the conventional nonprofit bureaucracy and its parasitism and just give money to effective charities. But as EAism as matured into a truly unique social movement, it is creating its own bureaucracy of researchers, think tanks, bureaucrats… the very things it critiqued. Suppose an EA organization funded a cancer researcher to study some new drug, and that new drug was a perfect universal cure for cancer. Would Stone reject this donation as somehow impure, because it went to a cancer researcher (a white-collar PhD holder)? EA gives hundreds of millions of dollars directly to malaria treatments that go to the poorest people in the world. It’s also one the main funders of GiveDirectly, a charity that has given money ($750 million so far) directly to the poorest people in the world. But in addition to giving out bednets directly, it sometimes funds malaria vaccines. In addition to giving to poor Africans, it also funds the people who do the studies to see whether giving to poor Africans works. Some of those are white-collar workers. EA has never been about critiquing the existence of researchers and think tanks. In fact, this is part of the story of EA’s founding. In 2007, the only charity evaluators accessible by normal people rated charities entirely on how much overhead they had - whether the money went to white-collar people or to sympathetic poor recipients. EAs weren’t the first to point out that this was a very weak way of evaluating charities. But they were the first to make the argument at scale and bring it into the public consciousness, and GiveWell (and to some degree the greater EA movement) were founded on the principle of “what if there was a charity evaluator that did better than just calculate overhead?” In accordance with this history, if you look on Giving What We Can’s List Of Misconceptions About Effective Altruism, their #1 Misconception about about charity evaluation is that “looking at a charity’s overhead costs is key to evaluating its effectiveness”. This is another part of my argument that EA is more than just IPA++. For years, the state of the art for charity evaluators was “grade them by how much overhead they had”. IPA and all the great people working on evidence-based charity at the time didn’t solve that problem - people either used CharityNavigator or did their own research. GiveWell did solve that problem, and that success sparked a broader movement to come up with a philosophy of charity that could solve more problems. Many individuals have always had good philosophies of charity, but I think EA was a step change in doing it at scale and trying to build useful tools / a community around it. V. You could of course say AI risk is a super big issue. I’m open to that! But surely the solution to AI risk is to invest in some drone-delivered bombs and geospatial data on computing centers! The idea that the primary solution here is going to be blog posts, white papers, podcasts, and even lobbying is just insane. If you are serious about ruinous AI risk, you cannot possibly tell me that the strategy pursued here is optimal vs. say waiting until a time when workers have all gone home and blowing up a bunch of data centers and corporate offices. In particular terrorism as a strategy may be efficient since explosives are rather cheap. To be clear I do not support a strategy of terrorism!!!! But I am questioning why AI-riskers don’t. Logically, they should. I think if you have to write in bold with four exclamation points at the end that you’re not explicitly advocating terrorism, you should step back and think about your assumptions further. So: Should people who worry about global warming bomb coal plants? Should people who worry that Trump is going to destroy American democracy bomb the Republican National Convention? Should people who worry about fertility collapse and underpopulation bomb abortion clinics? EAs aren’t the only group who think there are deeply important causes. But for some reason people who can think about other problems in Near Mode go crazy when they start thinking about EA. (Eliezer Yudkowsky has sometimes been accused of wanting to bomb data centers, but he supports international regulations backed by military force - his model is things like Israel bombing Iraq’s nuclear program in the context of global norms limiting nuclear proliferation - not lone wolves. As far as I know, all EAs are united against this kind of thing.) There are three reasons not to bomb coal plants/data centers/etc. The first is that bombing things is morally wrong. I take this one pretty seriously. The second is that terrorism doesn’t work. Imagine that someone actually tried to bomb a data center. First of all, I don’t have statistics but I assume 99% of terrorists get caught at the “your collaborator is an undercover fed” stage. Another 99% get eliminated at the “blown up by poor bomb hygiene and/or a spam text message” stage. And okay, 1/10,000 will destroy a datacenter, and then what? Google tells me there are 10,978 data centers in the world. After one successful attack, the other 10,977 will get better security. Probably many of these are in China or some other country that’s not trivial for an American to import high explosives into. The third is that - did I say terrorism didn’t work? I mean it massively massively backfires. Hamas tried terrorism, they frankly did a much better job than we would, and now 52% of the buildings in their entire country have been turned to rubble. Osama bin Laden tried terrorism, also did an impressive job, and the US took over the whole country that had supported him, then took over an unrelated country that seemed like the kinds of guys who might support him, then spent ten years hunting him down and killing him and everyone he had ever associated with. One f@#king time, a handful of EAs tried promoting their agenda by committing some crimes which were much less bad than terrorism. Along with all the direct suffering they caused, they destroyed EA’s reputation and political influence, drove thousands of people away from the movement, and everything they did remains a giant pit of shame that we’re still in the process of trying to climb our way out of. Not to bang the same drum again and again, but this is why EA needs to be a coherent philosophy and not just IPA++. You need some kind of theory of what kinds of activism are acceptable and effective, or else people will come up with morally repugnant and incredibly idiotic plans that will definitely backfire and destroy everything you thought you were fighting for. EA hasn’t always been the best at avoiding this failure mode, but at least we manage to outdo our critics. VI. Stone moves on to animal welfare: It’s important to grasp that [caring about animals] is, in evolutionary terms, an error in our programming. The mechanisms involved are entirely about intra-human dynamics (or, some argue, may also be about recognizing the signs of vulnerable prey animals or enabling better hunting). Yes humans have had domestic animals for quite a long time, but our sympathetic responses are far older than that. We developed accidental sympathies for animals and then we made friends with dogs, not vice versa. Again, this is part of why I think it’s useful to have people who think about philosophy, and not just people who do RCTs. People having kids of their own instead of donating to sperm banks is in some sense an “error” in our evolutionary program. The program just wanted us to reproduce; instead we got a bunch of weird proxy goals like “actually loving kids for their own sake”. Art is another error - I assume we were evolutionarily programmed to care about beauty because, I don’t know, flowers indicate good hunting grounds or something, not because evolution wanted us to paint beautiful pictures. Anyone who cares about a future they will never experience, or about people on far off continents who they’ll never meet, is in some sense succumbing to “errors” in their evolutionary programming. Stone describes the original mechanisms as “about intra-human dynamics”, but this is cope - they’re about intra-tribal dynamics. Plenty of cultures have been completely happy to enslave, kill, and murder people outside their tribes, and nothing in their evolutionary mechanism has told them not to. Does Stone think this, too, is an error? At some point you’ve got to go beyond evolutionary programming and decide what kind of person you want to be. I want to be the kind of person who cares about my family, about beauty, about people on other continents, and - yes - about animal suffering. This is the reflective equilibrium I’ve landed in after considering all the drives and desires within me, filtering it through my ability to use Reason, and imagining having to justify myself to whatever God may or may not exist. Stone suggests EAs don’t have answers to a lot of the basic questions around this. I can recommend him various posts like Axiology, Morality, Law, the super-old Consequentialism FAQ, and The Gift We Give To Tomorrow, but I think they’ll only address about half of his questions. The other half of the answers have to come from intuition, common sense, and moral conservatism. This isn’t embarrassing. Logicians have discovered many fine and helpful logical principles, but can’t 100% answer the problem of skepticism - you can fill in some of the internal links in the chain, but the beginning and end stay shrouded in mystery. This doesn’t mean you can ignore the logical principles we do know. It just means that life is a combination of formally-reasonable and not-formally-reasonable bits. You should follow the formal reason where you have it, and not freak out and collapse into Cartesian doubt where you don’t. This is how I think of morality too. Again, I really think it’s important to have a philosophy and not just a big pile of RCTs. Our critics make this point better than I ever could. They start with “all this stuff is just common sense, who needs philosophy, the RCTs basically interpret themselves”, then, in the same essay, digress into: If I wanted to do this stuff, I would try terrorism.
June 28, 2024 · Original source
These days, the consensus seems to be shifting toward recognizing most animals as sentient. Scully was surely heartened by the Declaration on Animal Consciousness that came out of NYU in 2024. It states that: “There is strong scientific support for attributions of conscious experience to other mammals and to birds.” It has collected hundreds of signatures from prominent scientists. Still, the debate rages on. Eliezer Yudkowsky once gave a full-throated defense of the idea that pigs don’t feel pain because they lack an “inner listener.”
I wonder if Yudkowsky would change his mind if he had witnessed first hand the pig we meet in the first chapter of Dominion. This porcine hero noticed it’s owner was having a heart attack, started crying literal tears (pigs cry, who knew), left the confines of it’s fenced in yard for the first time ever, laid down in front of a passing car to force someone to stop and get out, and led that person to the house so they could rescue their owner. Inner listener or not, mirror test passer or not, that pig seems to be experiencing something.
July 16, 2024 · Original source
Please point out mistakes and how to fix them in the comments or on ??, so I can be less wrong about this. Special thanks to those who have donated most such help so far: Professor Ulrich Hegerl, PhDs Lars Schuster, Idris Riahi and Robert Lehmann, and Eliezer Yudkowsky.
September 10, 2024 · Original source
Some people who routinely violate the Temporal Copernican Principle include Harari, Eliezer Yudkowsky, Sam Altman, Francis Fukuyama, Elon Musk, Clay Shirky, Tyler Cowen, Matt Yglesias, Tom Friedman, Scott Alexander, every tech company CEO, Ray Kurzweil, Robin Hanson, and many many more. I think they should ask themselves how much of their understanding of the future ultimately stems from a deep-seated need to believe that their times are important because they think they themselves are important, or want to be.
September 18, 2024 · Original source
A Twitter discussion between Ajeya Cotra and Eliezer Yudkowsky:
But nobody finds this scary. Nobody thinks “oh, yeah, Bostrom and Yudkowsky were right, this is that AI safety thing”. It’s just another problem for the cybersecurity people. Sometimes Excel inappropriately converts things to dates; sometimes GPT-6 tries to upload itself into an F-16 and bomb stuff. That specific example might be kind of a joke. But thirty years ago, it also would have sounded pretty funny to speculate about a time when “everyone knows” AIs can write poetry and develop novel mathematics and beat humans at chess, yet nobody thinks they’re intelligent.
October 09, 2024 · Original source
My official life’s work is a manifestation of cancer. I do not want to understand this, because it denigrates what is to me the closest thing to holy. But it is less implausible than all of this being a giant coincidence. I cannot pretend I don't believe it; the writings of Eliezer Yudkowsky have changed me.
Pattern match! Another catastrophic realization comes crashing in. The necessary expertise has been available all along, from my academic degree in computer science. 20 years ago, Algorithms 101. I receive what feels like a download of the following theory. I initially do not want to believe it; believing I could have stumbled into something new and so fundamental is the most obvious evidence of insanity yet. I am crying so hard it is impossible to tell whether the tears are of joyful relief or of my mind finally cracking. But whether this means I finally solved The Puzzle or whether it means I have finally gone insane, I cannot pretend I don't believe it; the writings of Eliezer Yudkowsky have changed me. Holy fucking shit, this is it. This explains everything. Thanks to the cancer!
November 08, 2024 · Original source
In retrospect, maybe I’m erring by using intuitions I got from Eliezer Yudkowksy’s decision theory work, intended for bargaining with literally-galaxy-brained superintelligences who might respond with things like “Sorry, I’ve already pre-committed to rejecting all offers that would seem like extortion to omniscient entities negotiating from behind a veil of ignorance, and if you think about it carefully you’ll realize that this is fair enough that your own set of galaxy-brained logically-perfect pre-commitments don’t require you to retaliate against me for doing this”. This is a good strategy if you can pull it off, and it forces you to pay a two-thirds tax to place yourself in a bin of slightly-higher-cooperativeness. But Kamala Harris probably hasn’t done this, maybe hasn’t even done any instinctual thing which cashes out to the equivalence of this, and maybe doesn’t respond differently to the outright extortion of “do what I want or I’ll vote Trump” or the massaged-to-fit-a-series-of-fair-precommitments offer of “do what I want or I’ll vote Trump with 33% probability”. In fact, IIUC Kamala hasn’t shown any inkling that these people exist at all (which could itself be a powerful game theoretic strategy!)
July 01, 2025 · Original source
28: Eliezer Yudkowsky and Nate Soares have a book on AI coming out in September, catchily titled If Anyone Builds It, Everyone Dies. Nobody is very surprised that they wrote this book, but I’m a little surprised at the endorsements they managed to collect, including Stephen Fry, Ben Bernanke, and a former undersecretary of Homeland Security. As always, if you support the cause, pre-orders can be especially helpful in creating buzz and catching booksellers’ interest.
August 25, 2025 · Original source
A volunteer to do a large amount of work as the main evaluator for our policy team, which has about ~20 grants on their shortlist. These range from progress studies PACs, to voter education platforms, to free speech advocacy orgs. An ideal candidate would know enough about the policy landscape to have good opinions on which of these things will work and be cost-effective. Each grant would require a few minutes to a few hours of your time (your choice, depending on how obvious you think the decision is) over the next three weeks. I can explain more details over email if you’re interested. Please volunteer using this form, if we have too many volunteers then I may not contact everyone who applies, sorry. 6: The team behind Eliezer Yudkowsky's upcoming AI book, If Anyone Builds It, Everyone Dies, asks me to let interested readers know a few related announcements: The book is coming out September 16 and can be pre-ordered here.
September 11, 2025 · Original source
Eliezer Yudkowsky’s Machine Intelligence Research Institute is the original AI safety org. But the original isn’t always the best - how is Mesopotamia doing these days? As money, brainpower, and prestige pour into the field, MIRI remains what it always was - a group of loosely-organized weird people, one of whom cannot be convinced to stop wearing a sparkly top hat in public. So when I was doing AI grantmaking last year, I asked them - why should I fund you, instead of the guys with the army of bright-eyed Harvard grads, or the guys who just got Geoffrey Hinton as their celebrity spokesperson? What do you have that they don’t?
Despite my gripes above, this is an impressive book. Eliezer Yudkowsky is a divisive writer, with plenty of diehard fans and equally committed enemies. At his best, he has leaps of genius nobody else can match; at his worst, he’s prone to long digressions about how stupid everyone who disagrees with him is. Nate Soares is equally thoughtful but more measured and lower-profile (at least before he started dating e-celebrity Aella). His influence tempers Yudkowsky’s and turns the book into a presentable whole that respects its readers’ time and intelligence. The end result is something which I would feel comfortable recommending to ordinary people as a good introduction to its subject matter.
Eliezer Yudkowsky, at his best, has leaps of genius nobody else can match. Fifteen years ago, he decided that the best way to something something AI safety was to write a Harry Potter fanfiction. Many people at the time (including me) gingerly suggested that maybe this was not optimal time management for someone who was approximately the only person working full-time on humanity’s most pressing problem. He totally demolished us and proved us wronger than anyone has ever been wrong before. Hundreds of thousands of people read Harry Potter and the Methods of Rationality, it got lavish positive reviews in Syfy, Vice, and The Atlantic, and it basically one-shotted a substantial percent of the world’s smartest STEM undergrads. Fifteen years later, I still meet bright young MIT students who tell me they’re working on AI safety, and when I ask them why in public they say something about their advisor, and then later in private they admit it was the fanfic. Valuing the time of the average AI genius at the rate set by Sam Altman (let alone Mark Zuckerberg), HPMOR probably bought Eliezer a few billion dollars in free labor. Just a totally inconceivable level of victory.
October 24, 2025 · Original source
Speaking of “nothing could be simpler”, I tried staring at the moon the night after I read this article. I was completely, absolutely unable to make myself think it looked anything like Ayatollah Khomeini. I worried that I didn’t have a clear enough memory of what Khomeini looked like, so I tried Donald Trump. Still no luck. I worried that it might be relevant that I didn’t like Donald Trump, so I tried Eliezer Yudkowsky. Still nothing.
January 02, 2026 · Original source
I’m not trying to push you in any direction, honest. If you get everything totally wrong, too bad, but you’ll still be remembered forever for trying. Even Pontius Pilate has immortality of a sort. Both Eliezer Yudkowsky and Beff Jezos have their page in the textbooks assured. If you’re a well-off Silicon Valley person, you’re already well-placed to join them. So participate in the discourse. Create some art. Donate to a cause you believe in. Make a prediction. Discover something interesting.
February 12, 2026 · Original source
Epoch/Croxton are current best estimates, and can probably fairly be read as the “real” answer against which Cotra and Davidson’s earlier guesses should be judged. All numbers are yearly multiples, so 1.4 means that willingness to spend grows 1.4x per year, ie 40%. Willingness To Spend: How much money are companies willing to spend on AI, in the form of chips and data centers? $/FLOP: How quickly do Moore’s Law, economies of scale, and other factors bring down the price of AI compute? Training Run Length: How long are companies spending on AI training runs for frontier models (instead of using those chips for smaller models, experiments, or consumer services)? Real Compute: The product of the three parameters above. Algorithmic Progress: How effectively do researchers discover new algorithms that makes training AIs cheaper and more efficient? Total Effective Compute: The product of real compute and algorithmic progress. So for example, the Epoch column’s 10.7x means that in any given year, you can train an AI 10.7x better than the last year, because you have 3.6x more compute available, and that compute is 3.0x more efficient. Cotra and Davidson were pretty close on willingness to spend and on FLOPs/$. This is an impressive achievement; they more or less predicted the giant data center buildout of the past few years. They ignored training run length, which probably seemed like a reasonable simplification at the time. But they got killed on algorithmic progress, which was 200% per year instead of 30%. How did they get this one so wrong? Here’s Cotra’s section on algorithmic progress: Algorithmic progress forecasts Note: I have done very little research into algorithmic progress trends. Of the four main components of my model (2020 compute requirements, algorithmic progress, compute price trends, and spending on computation) I have spent the least time thinking about algorithmic progress. I consider two types of algorithmic progress: relatively incremental and steady progress from iteratively improving architectures and learning algorithms, and the chance of “breakthrough” progress which brings the technical difficulty of training a transformative model down from “astronomically large” / “impossible” to “broadly feasible.” For incremental progress, the main source I used was Hernandez and Brown 2020, ”Measuring the Algorithmic Efficiency of Neural Networks”. The authors reimplemented open source state-of-the-art (SOTA) ImageNet models between 2012 and 2019 (six models in total). They trained each model up to the point that it achieved the same performance as AlexNet achieved in 2012, and recorded the total FLOP that required. They found that the SOTA model in 2019, EfficientNet B0, required ~44 times fewer training FLOP to achieve AlexNet performance than AlexNet did; the six data points fit a power law curve with the amount of computation required to match AlexNet halving every ~16 months over the seven years in the dataset.² They also show that linear programming displayed a similar trend over a longer period of time: when hardware is held fixed, the time in seconds taken to solve a standard basket of mixed integer programs by SOTA commercial software packages halved every ~13 months over the 21 years from 1996 to 2017.³ Grace 2013 (”Algorithmic Progress in Six Domains”) is the only other paper attempting to systematically quantify algorithmic progress that I am currently aware of, although I have not done a systematic literature review and may be missing others. I have chosen not to examine it in detail because a) it was written largely before the deep learning boom and mostly does not focus on ML tasks, and b) it is less straightforward to translate Grace’s results into the format that I am most interested in (”How has the amount of computation required to solve a fixed task decreased over time?”). Paul is familiar with the results, and he believes that algorithmic progress across the six domains studied in Grace 2013⁴ is consistent with a similar but slightly slower rate of progress, ranging from 13 to 36 months to halve the computation required to reach a fixed level of performance. Additionally, it seems plausible to me that both sets of results would overestimate the pace of algorithmic progress on a transformative task, because they are both focusing on relatively narrow problems with simple, well-defined benchmarks that large groups of researchers could directly optimize.⁵ Because no one has trained a transformative model yet, to the extent that the computation required to train one is falling over time, it would have to happen via proxies rather than researchers directly optimizing that metric (e.g. perhaps architectural innovations that improve training efficiency for image classifiers or language models would translate to a transformative model). Additionally, it may be that halving the amount of computation required to train a transformative model would require making progress on multiple partially-independent sub-problems (e.g. vision and language and motor control). I have attempted to take the Hernandez and Brown 2020 halving times (and Paul’s summary of the Grace 2013 halving times) as anchoring points and shade them upward to account for the considerations raised above. There is massive room for judgment in whether and how much to shade upward; I expect many readers will want to change my assumptions here, and some will believe it is more reasonable to shade downward. Cotra’s estimate comes primarily from one paper, Hernandez & Brown, which looks at algorithmic progress on a task called AlexNet. But later research demonstrated that the apparent speed of algorithmic progress varies by an order of magnitude based on whether you’re looking at an easy task (low-hanging fruit already picked) or a hard task (still lots of room to improve). AlexNet was an easy task, but pushing the frontier of AI is a hard task, so algorithmic progress in frontier AI has been faster than the AlexNet paper estimated. In Cotra’s defense, she admitted that this was the area where she was least certain, and that she had rounded the progress rate down based on various considerations when other people might round it up based on various other considerations. But the sheer extent of the error here, compounded with a few smaller errors that unfortunately all shared the same direction, was enough to throw off the estimate entirely. Since Cotra and Davidson were expecting AI to get 3.6x more effective compute each year, but it actually got 10.7x more, it’s no mystery why their timelines were off. When John recalculates Davidson’s model with Epoch’s numbers, he finds that it estimates AGI in 2030, which matches the current vibes. IV. With this information in place, it’s worth looking at some prominent contemporaneous critiques of Bio Anchors. Various people criticized Bio Anchors’ many strange anchors for how much compute it would take to produce AGI. For example, one anchor estimated that it would take 10^45 FLOPs, because that was how many calculations happened in all the brains of all animals throughout the evolutionary history (which eventually produced the human brain that AIs are trying to imitate). To make things even weirder, this anchor assumed away all animals other than nematodes as a rounding error (fact check: true!) All of these seemed to detract from the main show, an attempt to estimate the compute involved in the human brain. But even this more sober anchor was complicated by time horizons - it’s not enough to imitate the human brain for one second; AIs need to be able to imitate the human brain’s capacity for long-term planning. Cotra calculated how much compute AGI would require if it needed a planning horizon of seconds, weeks, or years. Thanks to METR, we now know that existing AIs have already passed a point where they can do most tasks that take humans seconds, are moving through the hour range, and are just about to touch one day. So the “seconds” anchor is ruled out. But it also seems unlikely that AGI will require years, because most human projects don’t take years, or at least can be split into tasks that take less than one year each (intuition pump: are we sure the average employee stays at an AI lab for more than a year? If not, that proves that a chain of people with sub-one-year time horizons can do valuable work). The AI Futures team guessed that the time horizon necessary for AIs to really start serious recursive self-improvement was between a few weeks and a few months (though this might look like a totally different number on the METR graph, which doesn’t translate perfectly into real life). If this is true, then all three anchors (seconds, hours, years) were off by at least an order of magnitude. But it turns out that none of this matters very much. The highest and lowest anchors cancel out, so that the most plausible anchor - human brain with time horizon of hours to days - is around the average. If you remove all the other anchors and just keep that one, the model’s estimates barely change. But also, we’re talking about crossing twelve orders of magnitude here. The difference between the different time horizon anchors doesn’t register much on that level, compared to things like algorithmic progress which have exponential effects. Maybe this is the model basically working as intended. You try lots of different anchors, put more weight on the more plausible ones, take a weighted average of each of them, and hopefully get something close to the real value. Bio Anchors did. Or maybe it was just good luck. Still hard to tell. Eliezer Yudkowsky argued that the whole methodology was fundamentally flawed. Partly because of the argument above - he didn’t trust the anchors - but also partly because he expected the calculations to be obviated by some sort of paradigm shift that couldn’t be shoehorned into “algorithmic progress” (like how you couldn’t build an airplane in 1900 but you could in 1920). As of 2026 - still before AGI has been invented and we get a good historical perspective - no such shift has occurred. The scaling laws have mostly held; whatever artificial space you try to measure models in, the measurement has mostly worked in a predictable way. There have really only been two kinks in the history of AI so far. First, a kink in training run size around 2010: Second, a kink in time horizons around 2024 and the invention of test-time compute: The 2010 kink was before Cotra’s forecast and priced in. The 2024 kink is interesting and relevant - but since it was on a parameter Cotra wasn’t measuring, and probably too small to show up on the orders-of-magnitude scale we’re talking about, it’s probably not a major cause of the model’s inaccuracy. Other things have been even more predictable: So Cotra’s bet on progress being smooth and measurable has mostly paid off so far. But Yudkowsky further explained that his timelines were shorter than Bio Anchors because people would be working hard to discover new paradigms, and if the current paradigm would only pay off in the 2050s, then probably they would discover one before then. You could think of this as a disjunction: timelines will be shorter than Cotra thinks, either because deep learning pays off quickly, or because a new paradigm gets invented in the interim. It turned out to be the first one. So although Yudkowsky’s new paradigm has yet to materialize, his disjunctive reasoning in favor of shorter-than-2050 timelines was basically on the mark. Nostalgebraist argued that Cotra’s whole model was a wrapper for an assumption that Moore’s Law will continue indefinitely. If it does, obviously you get enough compute for AI at some point, even if it requires some absurd process like simulating all 500 million years of multicellular evolution. I never entirely understood this objection, because - although Bio Anchors does depend on a story where Moore’s Law doesn’t break before we get the relevant amount of compute - this is only one of many background assumptions (like that a meteor doesn’t hit Earth before we get the relevant amount of compute). Given those assumptions, it does a useful not-just-assumption-repeating job of calculating when transformative AI will happen. As Cotra implicitly predicted, we seem on track to get AGI before Moore’s Law breaks down, and so Moore’s Law didn’t end up mattering very much. And if all of Cotra’s non-Moore’s-Law parameter estimates had been correct, her model would have given about the same timelines we have now, and surprised everyone with a revolutionary claim about the AI future. But Nostalgebraist added, almost as an aside: Cotra has a whole other forecast I didn’t mention for “algorithmic progress,” and the last number is what you get from just algorithmic progress and no Moore’s Law. So depending on how much you trust that forecast, you might want to take all these numbers with an even bigger grain of salt than you’d expected from everything else we’ve seen. How much should you trust Cotra’s algorithmic progress forecast? She writes: “I have done very little research into algorithmic progress trends. Of the four main components of my model (2020 compute requirements, algorithmic progress, compute price trends, and spending on computation) I have spent the least time thinking about algorithmic progress.” ...and bases the forecast on one paper about ImageNet classifiers. I want to be clear that when I quote these parts about Cotra not spending much time on something, I’m not trying to make fun of her. It’s good to be transparent about this kind of thing! I wish more people would do that. My complaint is not that she tells us what she spent time on, it’s that she spent time on the wrong things. Like Cotra herself, I think Nostalgebraist was spiritually correct even if his bottom line (about Moore’s Law) was wrong. His meta-level point was that a seemingly complicated model could actually hinge on one or two parameters, and that many of Cotra’s parameter values were vague hand-wavey best guess estimates. He gave algorithmic progress as a secondary example of this to shore up his Moore’s Law case, but in fact it turned out to be where all the action was. V. Those were the rare good critiques. The bad critiques were the same ones everyone in this space gets: You’re just trying to build hype.
So Cotra’s bet on progress being smooth and measurable has mostly paid off so far. But Yudkowsky further explained that his timelines were shorter than Bio Anchors because people would be working hard to discover new paradigms, and if the current paradigm would only pay off in the 2050s, then probably they would discover one before then. You could think of this as a disjunction: timelines will be shorter than Cotra thinks, either because deep learning pays off quickly, or because a new paradigm gets invented in the interim. It turned out to be the first one. So although Yudkowsky’s new paradigm has yet to materialize, his disjunctive reasoning in favor of shorter-than-2050 timelines was basically on the mark. Nostalgebraist argued that Cotra’s whole model was a wrapper for an assumption that Moore’s Law will continue indefinitely. If it does, obviously you get enough compute for AI at some point, even if it requires some absurd process like simulating all 500 million years of multicellular evolution. I never entirely understood this objection, because - although Bio Anchors does depend on a story where Moore’s Law doesn’t break before we get the relevant amount of compute - this is only one of many background assumptions (like that a meteor doesn’t hit Earth before we get the relevant amount of compute). Given those assumptions, it does a useful not-just-assumption-repeating job of calculating when transformative AI will happen. As Cotra implicitly predicted, we seem on track to get AGI before Moore’s Law breaks down, and so Moore’s Law didn’t end up mattering very much. And if all of Cotra’s non-Moore’s-Law parameter estimates had been correct, her model would have given about the same timelines we have now, and surprised everyone with a revolutionary claim about the AI future. But Nostalgebraist added, almost as an aside: Cotra has a whole other forecast I didn’t mention for “algorithmic progress,” and the last number is what you get from just algorithmic progress and no Moore’s Law. So depending on how much you trust that forecast, you might want to take all these numbers with an even bigger grain of salt than you’d expected from everything else we’ve seen. How much should you trust Cotra’s algorithmic progress forecast? She writes: “I have done very little research into algorithmic progress trends. Of the four main components of my model (2020 compute requirements, algorithmic progress, compute price trends, and spending on computation) I have spent the least time thinking about algorithmic progress.” ...and bases the forecast on one paper about ImageNet classifiers. I want to be clear that when I quote these parts about Cotra not spending much time on something, I’m not trying to make fun of her. It’s good to be transparent about this kind of thing! I wish more people would do that. My complaint is not that she tells us what she spent time on, it’s that she spent time on the wrong things. Like Cotra herself, I think Nostalgebraist was spiritually correct even if his bottom line (about Moore’s Law) was wrong. His meta-level point was that a seemingly complicated model could actually hinge on one or two parameters, and that many of Cotra’s parameter values were vague hand-wavey best guess estimates. He gave algorithmic progress as a secondary example of this to shore up his Moore’s Law case, but in fact it turned out to be where all the action was. V. Those were the rare good critiques. The bad critiques were the same ones everyone in this space gets: You’re just trying to build hype.
These questions have no right answer, but one conclusion does seem pretty firm. Most of the bad-faith critics, having identified that Ajeya’s model was imperfect and could fail, defaulted to the Safe Uncertainty Fallacy - since we can never be sure a model is exactly right, things are uncertain, which means we can continue to believe everything is fine and normal and timelines are wrong and we don’t have to worry. But as Yudkowsky pointed out, there’s uncertainty on both sides! Sometimes the fact that a forecast is imperfect and you can never be certain means things are more dangerous than you thought!
March 25, 2026 · Original source
Eliezer Yudkowsky, the most famous pause proponent, writes in his book that: