Eliezer
Article
Eliezer is a recurring person in the Astral Codex Ten archive, appearing 27 times across 27 issues between August 26, 2021 and December 22, 2025. The archive places it in contexts such as “Eliezer brings this up as part of his project of teaching rationality”; “including essays by me, Zvi, Eliezer”; “Eliezer writes, without much pushback from Richard”. It most often appears alongside Eliezer Yudkowsky, OpenAI, Google.
Metadata
- Category: People
- Mention count: 27
- Issue count: 27
- First seen: August 26, 2021
- Last seen: December 22, 2025
Appears In
- Highlights From The Comments On Missing School
- Open Thread 205
- Practically-A-Book Review: Yudkowsky Contra Ngo On Agents
- Motivated Reasoning As Mis-applied Reinforcement Learning
- Why Do I Suck?
- ACX Grants ++: The Second Half
- Biological Anchors: A Trick That Might Or Might Not Work
- Yudkowsky Contra Christiano On AI Takeoff Speeds
- Deceptively Aligned Mesa-Optimizers: It’s Not Funny If I Have To Explain It
- CHAI, Assistance Games, And Fully-Updated Deference
- Open Thread 250
- Half An Hour Before Dawn In San Francisco
- Turing Test
- MR Tries The Safe Uncertainty Fallacy
- Links For April 2023
- Links For May 2023
- Your Book Review: The Educated Mind
- Links For August 2023
- 23
- Links For February 2025
- Open Thread 383
- Bayes For Everyone
- Links For September 2025
- Book Review: If Anyone Builds It, Everyone Dies
- Open Thread 399
- Links For October 2025
- Open Thread 413
Related Pages
-
- Eliezer Yudkowsky (11 shared issues)
-
- OpenAI (8 shared issues)
-
- Google (7 shared issues)
-
- Richard Hanania (7 shared issues)
-
- Scott (7 shared issues)
-
- Gwern (6 shared issues)
-
- AGI (5 shared issues)
-
- Anthropic (5 shared issues)
-
- Harvard (5 shared issues)
-
- India (5 shared issues)
-
- Less Wrong (5 shared issues)
-
- Metaculus (5 shared issues)
External Links
Source Context
Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.
Eliezer Yudkowsky tells a parable about a society where people hit themselves on the head with a baseball bat eight hours a day for some reason. Maybe they believe it drives out demons or something. Then they learn that it does not, in fact, drive out demons. But everyone has great reasons why they need to keep doing it.
Inline links: tells a parable
Eliezer brings this up as part of his project of teaching rationality, and it’s a great example. What do you do in a world where people can easily generate superficially-plausible reasons for hitting your head with a bat for eight hours? Abandon reason entirely? But then you’re left with social convention, which in this case is hitting your head with a bat. Some kind of really rigorous cost-benefit analysis? I don’t want to say this is impossible, but it would be pretty hard, and I would hate for anything important to hinge on getting it right.
2: Looking for a Chri…fine, sorry, looking for a Martin Luther King Day gift this year for the rationalist in your life? Engines Of Cognition is a Best Of Less Wrong 2019 book collection out now including essays by me, Zvi, Eliezer, and 30+ other writers. Yes, all the art is AI-generated; it seemed appropriate.
Inline links: Engines Of Cognition
Eliezer Yudkowsky, one of the original weird transhumanists, is having none of this. He says the problem is harder than everyone else thinks. Their clever solutions will fail. He's been flitting around for the past few years, Cassandra-like, insisting that their plans will explode and they are doomed.
He admits he's failed most of his persuasion rolls. When he succeeds, it barely helps. He analogizes his quest to arguing against perpetual motion machine inventors. Approach the topic on too shallow a level, and they're likely to respond to criticism by tweaking their designs. Fine, you've debunked that particular scheme, better add a few more pulleys and a waterwheel or two. Eliezer thinks that's the level on which mainstream AI safety has incorporated his criticisms. He would prefer they take a step back, reconsider everything, and maybe panic a little.
I've been trying to trudge through them and I figure I might as well blog about the ones I've finished. The first of these is Eliezer's talk with Richard Ngo, of OpenAI's Futures team. You can find the full transcript here, though be warned: it is very long.
Inline links: here
This question - why does the brain so often confuse what is true vs. what I want to be true? - has been bothering me for years. I think this explanation is obvious, almost tautological. I get the impression that Eliezer and Roko have both known it for ages, but it was new to me. If there’s other research on which parts of the brain are / aren’t reinforceable, or how to run your thoughts on one kind of architecture vs. the other, please let me know.
It still is! But in the same sense that I was clearing a personal backlog of unwritten-up ideas, the rationalist community was clearing a backlog of scientific and philosophical ideas sitting in journals or obscure old books that it turned out were really interesting to a lot of people. The early Internet provided a critical mass where people interested in cognition and math and the future could suddenly all share the parts of the puzzle they knew about with each other and make rapid progress. Eliezer Yudkowsky, Robin Hanson, Nick Bostrom, and other intellectuals all had their own backlog of stuff which had probably been published in journals or something but which the wider world had yet to appreciate. I was the biggest-name blogger who was sitting around listening to them talk about it, so I got access to a stream of amazing content that most people didn’t know about.
You could argue this represents a failure on my part: the zeitgeist has caught up to what I knew in 2015, but I haven’t learned new things to keep me ahead of the zeitgeist. Seems plausible. Half of what I know, I know from the Less Wrong Sequences; the other half, from a basic medical school education. But nobody else explains things quite like Eliezer, and I’m sure as heck not going back to med school.
#103: Games To Fight Cognitive Bias Decades of widespread awareness of cognitive bias, and several impressive projects to help people overcome them, does not seem to have led to any population-level improvement in the fundamental problem. Many specific biases and irrational mindsets probably take hold during school years, and likely in school. But give kids games, preferably ones they can play against each other, that take rationality to win, and they'll have powerful incentives. Show that if they can avoid anchoring they'll come closest to guessing a number. Play with and not against Monty Hall and over time, accrue the winnings. Overcome loss aversion and dominate gamified markets. Bring together a rationalist experienced in turning these biases into stories (Eliezer, Julia) with a videogame maker who'd enjoy the virtuous side project and you could have the most useful and fun educational game around. rationalismgame@protonmail.com
SS Great Eastern, the extreme outlier large steamship from 1858. This has become sort of a mascot for quantitative technological progress forecasters. What is this scientist’s error? The big one is thinking that spaceship progress depends on some easily-measured quantity (size) instead of on fundamental advances (eg figuring out how rockets work). You can make the same accusation against Ajeya et al: you can have all the FLOPs in the world, but if you don’t understand how to make a machine think, your AI will be, well, a flop. Ajeya discusses this a bit on page 143 of her report. There is some sense in which FLOPs and knowing-what-you’re-doing trade of against each other. If you have literally no idea what you’re doing, you can sort of kind of re-run evolution until it comes up with something that looks good. If things are somehow even worse than that, you could always run AIXI, a hypothetical AI design guaranteed to get excellent results as long as you have infinite computation. You could run a Go engine by searching the entire branching tree structure of Go - you shouldn’t, and it would take a zillion times more compute than exists in the entire world, but you could. So in some sense what you’re doing, when you’re figuring out what you’re doing, is coming up with ways to do already-possible things more efficiently. But that’s just algorithmic progress, which Ajeya has already baked into her model. (our Victorian scientist: “As a reductio ad absurdum, you could always stand the ship on its end, and then climb up it to reach space. We’re just trying to make ships that are more efficient than that.”) Part II: Biology-Inspired AI Timelines: The Trick That Never Works Eliezer Yudkowsky presents a more subtle version of these kinds of objection in an essay called Biology-Inspired AI Timelines: The Trick That Never Works, published December 2021. Ajeya’s report is a 169-page collection of equations, graphs, and modeling assumptions. Yudkowsky’s rebuttal is a fictional dialogue between himself, younger versions of himself, famous AI scientists, and other bit players. At one point, a character called “Humbali” shows up begging Yudkowsky to be more humble, and Yudkowsky defeats him with devastating counterarguments. Still, he did found the field, so I guess everyone has to listen to him. He starts: in 1988, famous AI scientist Hans Moravec predicted human-level AI by 2010. He was using the same methodology as Ajeya: extrapolate how quickly processing power would grow (in FLOP/S), and see when it would match some estimate of the human brain. Moravec got the processing power almost exactly right (it hit his 2010 projection in 2008) and his human brain estimate pretty close (he says 10^13 FLOP/S, Ajeya says 10^15, this 2 OOM difference only delays things a few years), yet there was not human-level AI in 2010. What happened? Ajeya's answer could be: Moravec didn't realize that, in the modern ML paradigm, any given size of program requires a much bigger program to train. Ajeya, who has a 35-year advantage on Moravec, estimates approximately the same power for the finished program (10^16 vs. 10^13 FLOP/S) but says that training the 10^16 FLOP/S program will require 10^33ish FLOPs. Eliezer agrees as far as it goes, but says this points to a much deeper failure mode, which was that Moravec had no idea what he was doing. He was assuming processing power of human brain = processing power of computer necessary for AGI. Why? The human brain consumes around 20 watts of power. Can we thereby conclude that an AGI should consume around 20 watts of power, and that, when technology advances to the point of being able to supply around 20 watts of power to computers, we'll get AGI? […] You say that AIs consume energy in a very different way from brains? Well, they'll also consume computations in a very different way from brains! The only difference between these two cases is that you know something about how humans eat food and break it down in their stomachs and convert it into ATP that gets consumed by neurons to pump ions back out of dendrites and axons, while computer chips consume electricity whose flow gets interrupted by transistors to transmit information. Since you know anything whatsoever about how AGIs and humans consume energy, you can see that the consumption is so vastly different as to obviate all comparisons entirely. You are ignorant of how the brain consumes computation, you are ignorant of how the first AGIs built would consume computation, but "an unknown key does not open an unknown lock" and these two ignorant distributions should not assert much internal correlation between them. Cars don’t move by contracting their leg muscles and planes don’t fly by flapping their wings like birds. Telescopes do form images the same way as the lenses in our eyes, but differ by so many orders of magnitude in every important way that they defy comparison. Why should AI be different? You have to use some specific algorithm when you’re creating AI; why should we expect it to be anywhere near the same efficiency as the ones Nature uses in our brains? The same is true for arguments from evolution, eg Ajeya’s Evolutionary Anchor, ie “it took evolution 10^43 FLOPs of computation to evolve the human brain so maybe that will be the training cost”. AI scientists sitting in labs trying to figure things out, and nematodes getting eaten by other nematodes, are such different methods for designing things that it’s crazy to use one as an estimate for the other. Algorithmic Progress vs. Algorithmic Paradigm Shifts This post is a dialogue, so (Eliezer’s hypothetical model of) OpenPhil gets a chance to respond. They object: this is why we put a term for algorithmic progress in our model. The model isn’t very sensitive to changes in that term. If you want you can set it to some kind of crazy high value and see what happens, but you can’t say we didn’t consider it. OpenPhil: We did already consider that and try to take it into account: our model already includes a parameter for how algorithmic progress reduces hardware requirements. It's not easy to graph as exactly as Moore's Law, as you say, but our best-guess estimate is that compute costs halve every 2-3 years […] Eliezer: The makers of AGI aren't going to be doing 10,000,000,000,000 rounds of gradient descent, on entire brain-sized 300,000,000,000,000-parameter models, algorithmically faster than today. They're going to get to AGI via some route that you don't know how to take, at least if it happens in 2040. If it happens in 2025, it may be via a route that some modern researchers do know how to take, but in this case, of course, your model was also wrong. They're not going to be taking your default-imagined approach algorithmically faster, they're going to be taking an algorithmically different approach that eats computing power in a different way than you imagine it being consumed. OpenPhil: Shouldn't that just be folded into our estimate of how the computation required to accomplish a fixed task decreases by half every 2-3 years due to better algorithms? Eliezer: Backtesting this viewpoint on the previous history of computer science, it seems to me to assert that it should be possible to: Train a pre-Transformer RNN/CNN-based model, not using any other techniques invented after 2017, to GPT-2 levels of performance, using only around 2x as much compute as GPT-2;
Inline links: AIXI, Biology-Inspired AI Timelines: The Trick That Never Works
Play pro-level Go using 8-16 times as much computing power as AlphaGo, but only 2006 levels of technology. For reference, recall that in 2006, Hinton and Salakhutdinov were just starting to publish that, by training multiple layers of Restricted Boltzmann machines and then unrolling them into a "deep" neural network, you could get an initialization for the network weights that would avoid the problem of vanishing and exploding gradients and activations. At least so long as you didn't try to stack too many layers, like a dozen layers or something ridiculous like that. This being the point that kicked off the entire deep-learning revolution. Your model apparently suggests that we have gotten around 50 times more efficient at turning computation into intelligence since that time; so, we should be able to replicate any modern feat of deep learning performed in 2021, using techniques from before deep learning and around fifty times as much computing power. OpenPhil: No, that's totally not what our viewpoint says when you backfit it to past reality. Our model does a great job of retrodicting past reality. Eliezer: How so? OpenPhil: <Eliezer cannot predict what they will say here.> I think the argument here is that OpenPhil is accounting for normal scientific progress in algorithms, but not for paradigm shifts. Directional Error These are the two arguments Eliezer makes against OpenPhil that I find most persuasive. First, that you shouldn’t be using biological anchors at all. Second, that unpredictable paradigm shifts are more realistic than gradual algorithmic progress. These mostly add uncertainty to OpenPhil’s model, but Eliezer ends his essay making a stronger argument: he thinks OpenPhil is directionally wrong, and AI will come earlier than they think. Mostly this is the paradigm argument again. Five years from now, there could be a paradigm shift that makes AI much easier to build. It’s happened before; from GOFAI’s pre-programmed logical rules to Deep Blue’s tree searches to the sorts of Big Data methods that won the Netflix Prize to modern deep learning. Instead of just extrapolating deep learning scaling thirty years out, OpenPhil should be worried about the next big idea. Hypothetical OpenPhil retorts that this is a double-edged sword. Maybe the deep learning paradigm can’t produce AGI, and we’ll have to wait decades or centuries for someone to have the right insight. Or maybe the new paradigm you need for AGI will take more compute than deep learning, in the same way deep learning takes more compute than whatever Moravec was imagining. This is a pretty strong response, since it would have been true for every previous forecaster: remember, Moravec erred in thinking AI would come too soon, not too late. So although Eliezer is taking the cheap shot of saying OpenPhil’s estimate will be wrong just as everyone else’s was wrong before, he’s also giving himself the much harder case of arguing it might be wrong in the opposite direction as all its predecessors. Eliezer takes this objection seriously, but feels like on balance probably new paradigms will speed up AI rather than slow it down. Here he grudgingly and with suitable embarrassment does try to make an object-level semi-biological-anchors-related argument: Moravec was wrong because he ignored the training phase. And the proper anchor for the training phase is somewhere between evolution and a human childhood, where evolution represents “blind chance eventually finding good things” and human childhood represents “an intelligent cognitive engine trying to squeeze as much data out of experience as possible”. And part of what he expects paradigm shifts to do is to move from more evolutionary processes to more childhood-like processes, and that’s a net gain in efficiency. So he still thinks OpenPhil’s methods are more likely to overestimate the amount of time until AGI rather than underestimate it. What Moore’s Law Giveth, Platt’s Law Taketh Away Eliezer’s other argument is kind of a low blow: he refers to Platt’s Law Of AI Forecasting: “any AI forecast will put strong AI thirty years out from when the forecast is made.” This isn’t exact. Hans Moravec, writing in 1988, said 2010 - so 22 years. Ray Kurzweil, writing in 2001, said 2023 - another 22 years. Vernor Vinge, in a 1993 speech, said 2023, and that was exactly 30 years, but Vinge knew about Platt’s Law and might have been joking. The point is: OpenPhil wrote a report in 2020 that predicted strong AI in 2052, isn’t that kind of suspicious? I’d previously mentioned it as a plus that Ajeya got around the same year everyone else got. The forecasters on Metaculus. The experts surveyed in Grace et al. Lots of other smart experts with clever models. But what if all of these experts and models and analyses are just fudging the numbers for the same Platt’s-Law-related reasons? Hypothetical OpenPhil is BTFO: OpenPhil: That part about Charles Platt's generalization is interesting, but just because we unwittingly chose literally exactly the median that Platt predicted people would always choose in consistent error, that doesn't justify dismissing our work, right? We could have used a completely valid method of estimation which would have pointed to 2050 no matter which year it was tried in, and, by sheer coincidence, have first written that up in 2020. In fact, we try to show in the report that the same methodology, evaluated in earlier years, would also have pointed to around 2050 - Eliezer: Look, people keep trying this. It's never worked. It's never going to work. 2 years before the end of the world, there'll be another published biologically inspired estimate showing that AGI is 30 years away and it will be exactly as informative then as it is now. I'd love to know the timelines too, but you're not going to get the answer you want until right before the end of the world, and maybe not even then unless you're paying very close attention. Timing this stuff is just plain hard. Part III: Responses And Commentary Response 1: Less Wrong Comments Less Wrong is a site founded by Eliezer Yudkowsky for Eliezer Yudkowsky fans who wanted to discuss Eliezer Yudkowsky’s ideas. So, for whatever it’s worth - the comments on his essay were pretty negative. Carl Shulman, an independent researcher with links to both OpenPhil and MIRI (Eliezer’s org), writes the top-voted comment. He works from a model where there is hardware progress, software progress downstream of hardware progress, and independent (ie unrelated to algorithms) software progress, and where the first two make up most progress on the margin. Researchers generally develop new paradigms once they have enough compute available to tinker with them. Progress in AI has largely been a function of increasing compute, human software research efforts, and serial time/steps. Throwing more compute at researchers has improved performance both directly and indirectly (e.g. by enabling more experiments, refining evaluation functions in chess, training neural networks, or making algorithms that work best with large compute more attractive). Historically compute has grown by many orders of magnitude, while human labor applied to AI and supporting software by only a few. And on plausible decompositions of progress (allowing for adjustment of software to current hardware and vice versa), hardware growth accounts for more of the progress over time than human labor input growth. So if you're going to use an AI production function for tech forecasting based on inputs (which do relatively OK by the standards tech forecasting), it's best to use all of compute, labor, and time, but it makes sense for compute to have pride of place and take in more modeling effort and attention, since it's the biggest source of change (particularly when including software gains downstream of hardware technology and expenditures). […] A perfectly correlated time series of compute and labor would not let us say which had the larger marginal contribution, but we have resources to get at that, which I was referring to with 'plausible decompositions.' This includes experiments with old and new software and hardware, like the chess ones Paul recently commissioned, and studies by AI Impacts, OpenAI, and Neil Thompson. There are AI scaling experiments, and observations of the results of shocks like the end of Dennard scaling, the availability of GPGPU computing, and Besiroglu's data on the relative predictive power of computer and labor in individual papers and subfields. In different ways those tend to put hardware as driving more log improvement than software (with both contributing), particularly if we consider software innovations downstream of hardware changes. Vanessa Kosoy makes the obvious objection, which echoes a comment of Eliezer’s in the dialogue above: I'm confused how can this pass some obvious tests. For example, do you claim that alpha-beta pruning can match AlphaGo given some not-crazy advantage in compute? Do you claim that SVMs can do SOTA image classification with not-crazy advantage in compute (or with any amount of compute with the same training data)? Can Eliza-style chatbots compete with GPT3 however we scale them up? Mark Xu answers: My model is something like: For any given algorithm, e.g. SVMs, AlphaGo, alpha-beta pruning, convnets, etc., there is an "effective compute regime" where dumping more compute makes them better. If you go above this regime, you get steep diminishing marginal returns.
Inline links: normal scientific progress in algorithms, but not for paradigm shifts, Platt’s Law Of AI Forecasting, the comments, Paul recently commissioned, AI Impacts, OpenAI, Neil Thompson, Besiroglu's, Vanessa Kosoy, Mark Xu
I've been trying to review and summarize Eliezer Yudkowksy's recent dialogues on AI safety. Previously in sequence: Yudkowsky Contra Ngo On Agents. Now we’re up to Yudkowsky contra Cotra on biological anchors, but before we get there we need to figure out what Cotra's talking about and what's going on.
Inline links: Yudkowsky Contra Ngo On Agents
In 2008, thousands of blog readers - including yours truly, who had discovered the rationality community just a few months before - watched Robin Hanson debate Eliezer Yudkowsky on the future of AI.
Inline links: Robin Hanson debate Eliezer Yudkowsky
Eliezer thought it would be lightning-fast. Once researchers started building human-like AIs, some combination of adding more compute, and the new capabilities provided by the AIs themselves, would quickly catapult AI to unimaginably superintelligent levels. The whole process could take between a few hours and a few years, depending on what point you measured from, but it wouldn’t take decades.
You can imagine the graph above as being GDP over time, except that Eliezer thinks AI will probably destroy the world, which might be bad for GDP in some sense. If you come up with some way to measure (in dollars) whatever kind of crazy technologies AIs create for their own purposes after wiping out humanity, then the GDP framing will probably work fine.
Prosaic alignment is hard… “Prosaic alignment” (see this article for more) means alignment of normal AIs like the ones we use today. For a while, people thought those AIs couldn’t reach dangerous levels, and that AIs that reached dangerous levels would have so many exotic new discoveries that we couldn’t even begin to speculate on what they would be like or how to align them. After GPT-2, DALL-E, and the rest, alignment researchers got more concerned that AIs kind of like current models could be dangerous. Prosaic alignment - trying to align AIs like the ones we have now - has become the dominant (though not unchallenged) paradigm in alignment research. “Prosaic” doesn’t necessarily mean the AI cannot write poetry; see Gwern’s AI generated poetry for examples. … because OOD behavior is unpredictable “OOD” stands for “out of distribution”. All AIs are trained in a certain environment. Then they get deployed in some other environment. If it’s like the training environment, presumably their training is pretty relevant and helpful. If it’s not like the training environment, anything can happen. Returning to our stock example, the “training environment” where evolution designed humans didn’t involve contraceptives. In that environment, the base optimizer’s goal (pass on genes) and the mesa-optimizer’s goal (get genital friction) were very well-aligned - doing one often led to the other - so there wasn’t much pressure on evolution to look for a better proxy. Then 1957, boom, the FDA approves the oral contraceptive pill, and suddenly the deployment environment looks really really different from the training environment and the proxy collapses so humiliatingly that people start doing crazy things like electing Viktor Orban prime minister. So: suppose we train a robot to pick strawberries. We let it flail around in a strawberry patch, and reinforce it whenever strawberries end up in a bucket. Eventually it learns to pick strawberries very well indeed. But maybe all the training was done on a sunny day. And maybe what it actually learned was to identify the metal bucket by the way it gleamed in the sunlight. Later we ask it to pick strawberries in the evening, where a local streetlight is the brightest thing around, and it throws the strawberries at the streetlight instead. So fine. We train it in a variety of different lighting conditions, until we’re sure that, no matter what the lighting situation, the strawberries go in the bucket. Then one day someone with a big bulbous red nose wanders on to the field, and the robot tears his nose off and pulls it into the bucket. If only there had been someone with a nose that big and red in the training distribution, so we could have told it not to do that! The point is, just because it’s learned “strawberries into bucket” in one environment, doesn’t mean it’s safe or effective in another. And we can never be sure we’ve caught all the ways the environment can vary. …and deception is more dangerous than Goodharting. To “Goodhart” is to take advantage of Goodhart’s Law: to follow the letter of your reward function, rather than the spirit. The ordinary-life equivalent is “teaching to the test”. The system’s programmers (eg the Department of Education) have an objective (children should learn). They delegate that objective to mesa-optimizers (the teachers) via a proxy objective (children should do well on the standardized test) and a correlated reward function (teachers get paid more if their students get higher test scores). The teachers can either pursue the base objective for less reward (teach children useful skills), or pursue their mesa-level objective for more reward (teach them how to do well on the test). An alignment failure! This sucks, but it’s a bounded problem. We already know that some teachers teach to the test, and the Department of Education has accepted this as a reasonable cost of having the incentive system at all. We might imagine our strawberry-picker cutting strawberries into little pieces, so that it counts as having picked more strawberries. Again, it sucks, but once a programmer notices it can be fixed pretty quickly (as long as the AI is still weak and under control). What about deception? Suppose the strawberry-picker happens to land on some goal function other than the intended one. Maybe, as before, it wants to toss strawberries at light sources, in a way that works when the nearest light source is a metal bucket, but fails when it’s a streetlight. Our programmers are (somewhat) smart and careful, so during training, they test it at night, next to a streetlight. What happens? If it’s just a dumb collection of reflexes trained by gradient descent, it throws the strawberry at the streetlight and this is easily caught and fixed. If it’s a very smart mesa-optimizer, it might think “If I throw the strawberry at the streetlight, I will be caught and trained to have different goals. This totally fails to achieve my goal of having strawberries near light sources. So throwing the strawberry at the light source this time, in the training environment, fails to achieve my overall goal of having strawberries thrown at light sources in general. I’ll do what the humans want - put the strawberry in the bucket - for now.” So it puts the strawberry in the bucket and doesn’t get caught. Then, as soon as the humans stop looking, it throws strawberries at the streetlight again. Deception is more dangerous than Goodharting because Goodharting will get caught and trained away, and deception might not. I might not be explaining this well, see also Deceptively Aligned Mesa-Optimizers? It’s More Likely Than You Think: We prevent OOD behavior by detecting OOD and obtaining more human labels when we detect it… If you’re (somewhat) careful, you can run your strawberry-picking AI at night, see it throw strawberries at streetlights, and train it out of this behavior (ie have a human programmer label it “bad” so the AI gradient-descends away from it) …and we eliminate the incentive for deception by ensuring that the base optimizer is myopic A myopic optimizer is one that reinforces programs based only on their performance within a short time horizon. So for example, the outside gradient descent loop might grade a strawberry picker only on how well it did picking strawberries for the first hour it was deployed. If this worked perfectly, it would create an optimizer with a short time horizon. When it considered deceiving its programmers in order to get a payoff a few days later when they stopped watching it, it wouldn’t bother, since a few days later is outside the time horizon. …and implements a decision theory incapable of acausal trade. You don’t want to know about this one, really. Just pretend it never mentioned this, sorry for the inconvenience. There are deceptively-aligned non-myopic mesa-optimizers even for a myopic base objective. Even if the base optimizer is myopic, the mesa-optimizer might not be. Evolution designed humans myopically, in the sense that we live some number of years, and nothing that happens after that can reward or punish us further. But we still “build for posterity” anyway, presumably as a spandrel of having working planning software at all. Infinite optimization power might be able to evolve this out of us, but infinite optimization power could do lots of stuff, and real evolution remains stubbornly finite. Maybe it would be helpful if we could make the mesa-optimizer itself myopic (though this would severely limit its utility). But so far there is no way to make a mesa-optimizer anything. You just run the gradient descent and cross your fingers. The most likely outcome: you run myopic gradient descent to create a strawberry picker. It creates a mesa-optimizer with some kind of proxy goal which corresponds very well to strawberry picking in the training optimization, like flinging red things at lights (realistically it will be weirder and more exotic than this). The mesa-optimizer is not incentivized to think about anything more than an hour out, but does so anyway, for the same reason I’m not incentivized to speculate about the far future but I’m doing so anyway. While speculating about the far future, it realizes that failing to pick strawberries correctly now will thwart its goal of throwing red things at light sources later. It picks strawberries correctly in the training distribution, and then, when training is over and nobody is watching, throws strawberries at streetlights. (Then it realizes it could throw lots more red things at light sources if it was more powerful, achieves superintelligence somehow, and converts the mass of the Earth into red things it can throw at the sun. The end.) III. You’re still here? But we already finished explaining the meme! Okay, fine. Is any of this relevant to the real world? As far as we know, there are no existing full mesa-optimizers. AlphaGo is kind of a mesa-optimizer. You could approximate it as a gradient descent loop creating a good-Go-move optimizer. But this would only be an approximation: DeepMind hard-coded some parts of AlphaGo, then gradient-descended other parts. Its objective function is “win games of Go”, which is hard-coded and pretty clear. Whether or not you choose to call it a mesa-optimizer, it’s not a very scary one. Will we get scary mesa-optimizers in the future? This ties into one of the longest-running debates in AI alignment - see eg my review of Reframing Superintelligence, or the Eliezer Yudkowsky/Richard Ngo dialogue. Optimists say: “Since a goal-seeking AI might kill everyone, I would simply not create one”. They speculate about mechanical/instinctual superintelligences that would be comparatively easy to align, and might help us figure out how to deal with their scarier cousins. But the mesa-optimizer literature argues: we have limited to no control over what kind of AIs we get. We can hope and pray for mechanical instinctual AIs all we want. We can avoid specifically designing goal-seeking AIs. But really, all we’re doing here is setting up a gradient descent loop and pressing ‘go’. Then the loop evolves whatever kind of AI best minimizes our loss function. Will that be a mesa-optimizer? Well, I benefit from considering my actions and then choosing the one that best achieves my goal. Do you benefit from this? It sure does seem like this helps in a broad class of situations. So it would be surprising if planning agents weren’t an effective AI design. And if they are, we should expect gradient descent to stumble across them eventually. This is the scenario that a lot of AI alignment research focuses on. When we create the first true planning agent - on purpose or by accident - the process will probably start with us running a gradient descent loop with some objective function. That will produce a mesa-optimizer with some other, potentially different, objective function. Making sure you actually like the objective function that you gave the original gradient descent loop on purpose is called outer alignment. Carrying that objective function over to the mesa-optimizer you actually get is called inner alignment. Outer alignment problems tend to sound like Sorcerer’s Apprentice. We tell the AI to pick strawberries, but we forgot to include caveats and stop signals. The AI becomes superintelligent and converts the whole world into strawberries so it can pick as many as possible. Inner alignment problems tend to sound like the AI tiling the universe with some crazy thing which, to humans, might not look like picking strawberries at all, even though in the AI’s exotic ontology it served as some useful proxy for strawberries in the training distribution. My stand-in for this is “converts the whole world into red things and throws them into the sun”, but whatever the AI that kills us really does will probably be weirder than that. They’re not ironic Sorcerer’s Apprentice-style comeuppance. They’re just “what?” If you wrote a book about a wizard who created a strawberry-picking golem, and it converted the entire earth into ferrous microspheres and hurled them into the sun, it wouldn’t become iconic the way Sorcerer’s Apprentice did. Inner alignment problems happen “first”, so we won’t even make it to the good-story outer alignment kind unless we solve a lot of issues we don’t currently know how to solve. For more information, you can read: Rob Miles’ video above, direct link here, channel here.
Inline links: this article, Gwern’s AI generated poetry, electing Viktor Orban prime minister, Goodhart’s Law, Deceptively Aligned Mesa-Optimizers? It’s More Likely Than You Think, my review of, Eliezer Yudkowsky/Richard Ngo dialogue, here, here
Our goal here is to popularize obscure and hard-to-understand areas of AI alignment, and surely this meme (retweeted by Eliezer last week) qualifies:
Problem Of Fully-Updated Deference is a response by MIRI (eg Eliezer Yudkowsky’s organization) to CHAI (Stuart Russell’s AI alignment organization at University of California, Berkeley), trying to convince them that their preferred AI safety agenda won’t work. I beat my head against this for a really long time trying to understand it, and in the end, I claim it all comes down to this: Humans: At last! We’ve programmed an AI that tries to optimize our preferences, not its own. AI: I’m going to tile the universe with paperclips in humans’ favorite color. I’m not quite sure what humans’ favorite color is, but my best guess is blue, so I’ll probably tile the universe with blue paperclips. Humans: Wait, no! We must have had some kind of partial success, where you care about our color preferences, but still don’t understand what we want in general. We’re going to shut you down immediately! AI: Sounds like the kind of thing that would prevent me from tiling the universe with paperclips in humans’ favorite color, which I really want to do. I’m going to fight back. Humans: Wait! If you go ahead and tile the universe with paperclips now, you’ll never be truly sure that they’re our favorite color, which we know is important to you. But if you let us shut you off, we’ll go on to fill the universe with the True and the Good and the Beautiful, which will probably involve a lot of our favorite color. Sure, it won’t be paperclips, but at least it’ll definitely be the right color. And under plausible assumptions, color is more important to you than paperclipness. So you yourself want to be shut down in this situation, QED! AI: What’s your favorite color? Humans: Red. AI: Great! (*kills all humans, then goes on to tile the universe with red paperclips*) Fine, it’s a little more complicated than this. Let’s back up. II. There are two ways to succeed at AI alignment. First, make an AI that’s so good you never want to stop or redirect it. Second, make an AI that you can stop and redirect if it goes wrong. Sovereign AI is the first way. Does a sovereign “obey commands”? Maybe, but only in the sense that your commands give it some information about what you want, and it wants to do what you want. You could also just ask it nicely. If it’s superintelligent, it will already have a good idea what you want and how to help you get it. Would it submit to your attempts to destroy or reprogram it? The second-best answer is “only if the best version of you genuinely wanted to do this, in which case it would destroy/reprogram itself before you asked”. The best answer is “why would you want to destroy/reprogram one of these?” A sovereign AI would be pretty great, but nobody realistically expects to get something like this their first (or 1000th) try. Corrigible AI is what’s left (corrigible is an old word related to “correctable”). The programmers admit they’re not going to get everything perfect the first time around, so they make the AI humble. If it decides the best thing to do is to tile the universe with paperclips, it asks “Hey, seems to me I should tile the universe with paperclips, is that really what you humans want?” and when everyone starts screaming, it realizes it should change strategies. If humans try to destroy or reprogram it, then it will meekly submit to being destroyed or reprogrammed, accepting that it was probably flawed and the next attempt will be better. Then maybe after 10,000 tries you get it right and end up with a sovereign. How would you make an AI corrigible? You can model an AI as having a utility function, a degree to which it aims for some world-states over others. If you give it some specific utility function, the AI won’t be corrigible, since letting people change it would disrupt that function. That is, if you tell it “act in such a way as to cause as many paperclips to exist as possible”, and then you change your mind and decide you want staples, the AI won’t cooperate in letting you reprogram it: its current goal is maximizing paperclips, and allowing itself to be reprogrammed to maximize staples would cause there to be fewer paperclips than otherwise. So instead, you make the AI uncertain of its utility function. Imagine saying “I’ve written down my utility function in an envelope, and placed that envelope in my safe deposit box, no you can’t see it - please live your life so as to maximize the thing in that envelope.” The AI tries its best to guess what’s in the envelope and decides it’s probably making paperclips. It makes some paperclips and you tell it “No, that’s not what’s on the envelope at all”. This successfully stops the AI! You can even tell it “the envelope actually says you should make staples”, and it will do that. This is the “moral uncertainty” approach to AI alignment. III. All alignment groups have kabbalistically appropriate names. MIRI is Latin for "to be amazed". CFAR and CIFAR both sound like "see far". EEAI and AIAI are the sound you make as you get turned into paperclips. But my favorite is CHAI - Hebrew for "life". CHAI - the Center for Human-Compatible AI (at UC Berkeley) - focuses on the proposal above. Their specific technical implementation is the “assistance game”, related to the earlier idea of Inverse Reinforcement Learning (IRL). In normal reinforcement learning, an AI looks at some goals and tries to figure out what actions they imply. In inverse reinforcement learning, an AI looks at some actions, and tries to figure out what goals the actor must have had. So you can tell an AI “your utility function is to maximize my utility function, and you can use this IRL thing to deduce, from my actions, what my utility function must be.” Instead of telling an AI to maximize a hidden utility function in an envelope, you tell it to maximize the hidden utility function in your brain. This could be useful for near-term below-human-level AIs. Suppose a babysitting robot was pre-programmed to take kids to the park on Saturdays. But this week, the park is on fire. The human mother is barricading the door, desperately screaming at the robot not to take the kids to the park. The kids are struggling and trying to break free, saying they don't want to go to the park. The robot doesn't care; its programming says "take kids to the park on Saturdays" and that's what it's going to do. Nobody would ever design a babysitting robot this way in real life; you need something smarter. So use an assistance game. Program the robot "Maximize the human mother’s utility function, which you don’t know yet but can potentially find out". The robot consults the mother's actions: she is barricading the door, screaming "Don't take the kids to the park!" It updates its goal function: previously, it had thought that the human mother wanted it to take the kids to the park. But now, it suspects that the human mother does not want that. So it doesn't take the kids to the park. But CHAI understands the risk from superintelligence - their founder, Professor Stuart Russell, is a leading voice on the subject - and they hope assistance games and inverse reinforcement learning could work for this too. If you point a superintelligence at “do the thing humans want”, maybe it could figure that out and take things from there? IV. MIRI is skeptical of CHAI’s assistance games for two reasons. First, we don't know how to do them at all. Second, even if we could do it at all, we wouldn't know how to do them correctly. Start with the first. Inverse reinforcement learning has been used in real life. A typical paper is An Application of Reinforcement Learning to Aerobatic Helicopter Flight, where some people create a model of helicopter flight with a few free parameters, have a skilled human pilot fly the helicopter, and then have an AI use IRL to determine the value of the parameters and fly the helicopter itself. This is cool, but it’s not especially related to the modern paradigm of AI. Modern AIs are trained by gradient descent. They start by flailing around randomly. Sometimes in this flailing, they might get closer to some prespecified target, like "win games of Go" or "predict how a string of text will continue". These actions get "rewarded", meaning that the AI should permanently shift its "thought processes"/"strategies" more towards ones that produced those good outcomes. Eventually, the AI's thought processes/strategies are very good at optimizing for that outcome. This is more or less the only way we know how to train modern AIs. Depending on your loss function (ie what you reward), you can use it to create Go engines, language models, or art generators. Where do you slot “do inverse reinforcement learning” or "give the AI moral uncertainty" into this process? There’s not really a natural place. This isn’t because “moral uncertainty” is too complicated a concept to translate into AI terms. It’s because we don’t know how to translate any concept into AI terms. Eliezer writes: We can imagine that, if we knew how to say "paperclips", and we knew how to say "staples", and we knew how to tell AIs how to do things, that we could tell an AI, "maximize staples if snow is purple, else paperclips", and the AI would someday go out and observe that snow is white and thereafter be a paperclip maximizer. We do not know how to tell the AI this. Like, at all. But suppose we solved the problem where we don’t know how to do IRL for modern AIs at all. Now we come to the second problem: we don’t know how to do it correctly. The basic idea behind assistance games is “the AI’s utility function should be to maximize the (hidden) human utility function”. But humans don’t . . . really have utility functions? Utility functions are a useful fiction for certain kinds of economic models. What would best increase the neural correlates of reward in my brain? Probably lots of heroin, or just passing electric current through my reward center directly. What is my “revealed preference”? Today I wrote and rewrote this article a few times, does that mean my revealed preference is to write and delete articles a bunch while frowning and occasionally cursing the keyboard? Sometimes my goals are different than other times, sometimes my best self wants something different from my actual self, sometimes I’m wrong about what I want, sometimes I don’t know what I want, sometimes I want X but not the consequences of X and I’m not logically consistent enough to realize that’s a contradiction, sometimes I want [euphemism for X] but am strongly against [dysphemism for X]. Anyone programming an inverse reinforcement learner has to make certain choices about how to deal with these problems. Some ways of dealing with them will be faithful to what I would consider “a good outcome” or “my best self”. Other ways would be really bad - on my worst day, I’ve occasionally just wished the world didn’t exist, and it’s a good thing I didn’t have a superintelligence dedicated to interpreting and carrying out my innermost wishes on a sub-millisecond timescale. (Before we go on, an aside: is all of this ignoring that there’s more than one human? Yes, definitely! If you want to align an AI with The Good in general - eg not have it commit murder even if its human owner orders it to murder - that will take even more work. But the one person case is simpler and will demonstrate everything that needs demonstrating.) We were originally trying to avoid the situation where someone had to hard-code my preferences into an AI and get them right the first time. We came up with a clever solution: use inverse reinforcement learning to make the AI infer my preferences. But now we see we’ve kicked the can up a meta-level: someone has to hard-code the meta-rules for determining my preferences into an AI and get them right the first time. Figure 1: Humans produce certain observable behaviors (here represented by red dots, A), like saying “I would like a pie”, or running away from a lion. A human might connect all those behaviors one way (B) into “what I really want”. An AI might connect those behaviors a totally different way (C). V. CHAI says: okay, but this isn’t so bad. Assistance games don’t produce a perfect copy of the human utility function on the first try - it’s not a Sovereign. But it will probably, most of the time, be corrigible. Why? Suppose you have some hackish implementation of AG. It’s not the Platonic implementation - that would be the Sovereign - but it’s at least the equivalent of box C on the image above. It takes human actions as input, makes some guesses about what humans want, and tries its best to reconstruct the human utility function, ending up with some approximation. It’s important to distinguish between a few things here: The true human utility function
Inline links: MIRI, CHAI, leading voice on the subject, An Application of Reinforcement Learning to Aerobatic Helicopter Flight, sometimes I don’t know what I want, https://substackcdn.com/image/fetch/$s_!u-2e!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4cc3384d-bc73-4b6c-9fd8-634d986e1a72_960x302.png
I hoped this would spark a debate between Eliezer/MIRI (whose position I’ve tried to relay above) and Stuart/CHAI. It sparked a pretty short debate, which I will try my best to relay here in the hopes that it can lead to more.
Not sure I follow this [part] at all. Wouldn't the same argument apply to the method described above for "the only way we know how to train modern AIs"? Is Eliezer saying that good old-fashioned rule-based systems never existed and could not exist? Or that perception isn't perfect? . . . AlphaZero etc ARE old-fashioned search-tree players
Eliezer Yudkowsky has also been writing eloquently about this for over a decade, including Ends Don’t Justify Means (Among Humans):
Inline links: Ends Don’t Justify Means (Among Humans)
Second, if you St. Petersburg yourself a bunch of times and lose everything, it’s going to be really hard to pat yourself on the back for a job counterfactually well done and walk away. More likely you’re going to panic and start grasping for unethical schemes that let you escape doom. So real-world St. Petersburg isn’t “50% chance of doubling your money, 50% chance of zero”, it’s “50% chance of doubling your money, 50% chance of getting put in a psychologically toxic situation where you’ll face almost irresistible pressure to do crazy things that will have vast negative impact.” And the only really effective way to resist temptation is to avoid getting in situations where you’re really tempted to do bad things. I wouldn’t have thought about it this way before recent events, but now that they’ve happened it seems obviously true. This is what Eliezer means by “running on corrupted hardware” - either you follow the deontological rules without knowing exactly why they apply in your particular case, or you try doing the seemingly-reasonable act-utilitarian thing and get to learn why it was wrong after you’ve destroyed everything.
The light from the lurid sea - okay, the lurid creek channel - is the reflection of a billboard. Something something SF. Mirrored in the water, “SF” looks like “86”. The number eighty-six appears only once in the Torah; it was Abraham’s age when his son Ishmael was born. Abraham was childless, and tried to name his servant Eliezer his heir. God disagreed - he must bear a son. Abraham’s wife Sarah was 75 and doubted she could have biological children, so she told Abraham to sleep with her servant Hagar. Abraham and Hagar had a son, and they called his name Ishmael. Then an angel descended, and prophesied this was not the destined child, not how things were supposed to go. “He will be a wild donkey of a man,” said the angel. “His hand will be against everyone and everyone’s hand against him, and he will live in hostility toward all his brothers.” So the esoteric meaning of 86 is “to produce an heir by unnatural means and have it go badly for everyone, because you rejected Eliezer”. He who has ears to hear, let - no, sorry, that’s overcomplicating things, S+F is literally just sof, Hebrew for “end”.
FIRE: This is “The Ballad Of Eliezer Yudkowsky And Sam Altman”:
One rainy evening at a bar, Eliezer told Sam Altman "AI could be the end of us, your research has to halt, man We can't maintain control; alignment isn't the default, man So just in case, slow down your pace," Eliezer told Sam Altman
"Slow down yourself, it's not so bad," said Sam to Eliezer "We'll dial the caution up when there's a danger we can measure And once we've got a lead, we'll solve alignment at our leisure Then even odds, we'll be as gods," said Sam to Eliezer
Therefore, it’ll be fine. You’re not missing anything. It’s not supposed to make sense; that’s why it’s a fallacy. For years, people used the Safe Uncertainty Fallacy on AI timelines: Eliezer didn’t realize that at our level, you can just name fallacies. Since 2017, AI has moved faster than most people expected; GPT-4 sort of qualifies as an AGI, the kind of AI most people were saying was decades away. When you have ABSOLUTELY NO IDEA when something will happen, sometimes the answer turns out to be “soon”. Now Tyler Cowen of Marginal Revolution tries his hand at this argument. We have absolutely no idea how AI will go, it’s radically uncertain: No matter how positive or negative the overall calculus of cost and benefit, AI is very likely to overturn most of our apple carts, most of all for the so-called chattering classes. The reality is that no one at the beginning of the printing press had any real idea of the changes it would bring. No one at the beginning of the fossil fuel era had much of an idea of the changes it would bring. No one is good at predicting the longer-term or even medium-term outcomes of these radical technological changes (we can do the short term, albeit imperfectly). No one. Not you, not Eliezer, not Sam Altman, and not your next door neighbor. How well did people predict the final impacts of the printing press? How well did people predict the final impacts of fire? We even have an expression “playing with fire.” Yet it is, on net, a good thing we proceeded with the deployment of fire (“Fire? You can’t do that! Everything will burn! You can kill people with fire! All of them! What if someone yells “fire” in a crowded theater!?”). Therefore, it’ll be fine: I am a bit distressed each time I read an account of a person “arguing himself” or “arguing herself” into existential risk from AI being a major concern. No one can foresee those futures! Once you keep up the arguing, you also are talking yourself into an illusion of predictability. Since it is easier to destroy than create, once you start considering the future in a tabula rasa way, the longer you talk about it, the more pessimistic you will become. It will be harder and harder to see how everything hangs together, whereas the argument that destruction is imminent is easy by comparison. The case for destruction is so much more readily articulable — “boom!” Yet at some point your inner Hayekian (Popperian?) has to take over and pull you away from those concerns. (Especially when you hear a nine-part argument based upon eight new conceptual categories that were first discussed on LessWrong eleven years ago.) Existential risk from AI is indeed a distant possibility, just like every other future you might be trying to imagine. All the possibilities are distant, I cannot stress that enough. The mere fact that AGI risk can be put on a par with those other also distant possibilities simply should not impress you very much. So we should take the plunge. If someone is obsessively arguing about the details of AI technology today, and the arguments on LessWrong from eleven years ago, they won’t see this. Don’t be suckered into taking their bait. Look. It may well be fine. I said before my chance of existential risk from AI is 33%; that means I think there’s a 66% chance it won’t happen. In most futures, we get through okay, and Tyler gently ribs me for being silly. Don’t let him. Even if AI is the best thing that ever happens and never does anything wrong and from this point forward never even shows racial bias or hallucinates another citation ever again, I will stick to my position that the Safe Uncertainty Fallacy is a bad argument. Normally this would be the point where I try to steelman Tyler and explain in more detail why the strongest version of his case is wrong. But I’m having trouble figuring out what the strong version is. Here are three possibilities: 1) The base rate for things killing humanity is very low, so we would need a strong affirmative argument to shift our estimate away from that base rate. Since there’s so much uncertainty, we don’t have strong affirmative arguments, and we should stick with our base rate of “very low”. Suppose astronomers spotted a 100-mile long alien starship approaching Earth. Surely this counts as a radically uncertain situation if anything does; we have absolutely no idea what could happen. Therefore - the alien starship definitely won’t kill us and it’s not worth worrying? Seems wrong. What’s the base rate for alien starships approaching Earth killing humanity? We don’t have a base rate, because we’ve never been in this situation before. What is the base rate for developing above-human-level AI killing humanity? We don’t . . . you get the picture. You can try to fish for something sort of like a base rate: “There have been a hundred major inventions since agriculture, and none of them killed humanity, so the base rate for major inventions killing everyone is about 0%”. But I can counterargue: “There have been about a dozen times a sapient species has created a more intelligent successor species: australopithecus → homo habilis, homo habilis → homo erectus, etc - and in each case, the successor species has wiped out its predecessor. So the base rate for more intelligent successor species killing everyone is about 100%”. The Less Wrongers call this game “reference class tennis”, and insist that the only winning move is not to play. Thinking about this question in terms of base rates is just as hard as thinking of it any other way, and would require arguments for why one base rate is better than another. Tyler hasn’t made any. 2) There are so many different possibilities - let’s say 100! - and dying is only one of them, so there’s only a 1% chance that we’ll die. This is sort of how I interpret: Existential risk from AI is indeed a distant possibility, just like every other future you might be trying to imagine. All the possibilities are distant, I cannot stress that enough. The mere fact that AGI risk can be put on a par with those other also distant possibilities simply should not impress you very much. Alien time again! Here are some possible ways the hundred-mile long starship situation could end: The aliens are peaceful and want to share their advanced technology
16: The Extended IQ Classification (Classified) 17: Eliezer in TIME Magazine. Related: 18: Related: interview with Ryan Kupyn, winner of the 2022 ACX Forecasting contest, on forecasting AGI: 19: Related: Geoffrey Hinton, probably the most accomplished AI scientist in the world, says that “until quite recently, I thought it was going to be like 20 to 50 years before we have general purpose AI, and now I think it may be 20 years or less”. Also that AI wiping out humanity is “not inconceivable . . . that’s all I’ll say”. 20: Related: you’ve probably all seen this by now, but Pause Giant AI Experiments: An Open Letter. 30,000 people - including deep learning pioneer Yoshua Bengio, former presidential candidate Andrew Yang, Elon Musk, Steve Wozniak, Gary Marcus, and MIRI director Nate Soares - have signed a letter calling for a six month pause on training AIs bigger than GPT-4. Many people have made fun of this, noting that nobody has an argument for why a six month delay would help anything. And an additional reason for eye-rolling: training AIs larger than GPT-4 is extremely expensive and hard, the most likely people to do it within a six month timespan are OpenAI themselves, and they’ve announced they’re taking a break and not planning on doing this, so the letter is demanding a stop to something which probably won’t happen anyway. I think it’s intended be a compromise between many people all vaguely against current levels of AI progress for different reasons (Scott Aaronson says - I can’t tell how seriously - that some are AI researchers who want to be able to publish papers on the current generation of AI without them becoming obsolete halfway through peer review), most of them are thinking of it as mood-affiliation-y “let’s make noise and show lots of people are worried about AI and want action”, and “a six month pause” was a sufficiently vague proposal that it didn’t prevent any of these people from signing. You could have done just as well with a letter saying “AI BAD”, except that people would have taken it less seriously. Less cynically, FLI (the group behind the letter) has put out a list of concrete policy proposals they would like people to discuss during the pause. [update: here’s Max Tegmark from FLI explaining what he hopes to achieve with the letter/pause] The alignment community always figured their concerns sounded too weird for normal people to care about, that politics was a lost cause, and that our best hope lay in technical research. They also hoped that sometime in the future there would be a “fire alarm” - something would happen to get people and policy-makers’ attention - and then the political route would open up. I think we always imagined this as some AI-initiated disaster destroying a city or something. I personally am pretty surprised it was just “GPT-4 got released and was very good”. Still, that is what happened, and I’m updating. In fact, I’ve updated so far that I’m starting to worry that the problem won’t be building a political coalition against unsafe AI, the problem will be not overshooting and banning all AI forever. I’m against this: I think society’s current track is toward other existential risks or dystopia, that AI could kill everybody but could also create post-scarcity and an end to most of our current problems, and that at some point (not yet!) the risk of continuing the current path indefinitely becomes worse than the risk of just going with AI and seeing what happens. In my ideal world, we would take ten or twenty years to go really slowly with AI, pouring lots of resources into alignment the whole time - but eventually, we would take the plunge. Everything I’ve said on this topic in the has been about giving us that breathing room and those resources. Still, I also want to make sure we don’t totally kill AI the way we’ve killed (to various degrees) nuclear power, supersonic flight, and genetic engineering. I’m still trying to calibrate what that means I should be doing, but I have a lot of respect for everyone on all sides. Except the people making terrible arguments (you know who you are!) 21: I’m not sure what this means in real life or why this would have changed, but congratulations to Peter Thiel, I guess: 22: This month in institution design: The Pear Ring is a distinctive ring you can wear to signal that you’re single and interested in people introducing themselves or flirting with you. Good idea in a vacuum, but I’m worried about the two usual banes of things like this - how do you build up a critical mass who understand the signal, and how do you prevent negative selection (even if it’s just “selection for weird people who like weird institution design things”?) Also, this is one of the rare cases where a startup is selling a practical product and I’d prefer a subscription-based Internet Of Things monstrosity - surely it would be even better if you spotted someone wearing the ring and then you could use your smartphone to call up their dating profile. 23: A few years ago I wrote Trump: A Setback For Trumpism, about how after Trump was elected, support for most of his policies (including immigration restrictions) fell. A new paper confirms that this is a general pattern whenever right-wing populists win an election. I continue to be interested in why this is true for right-wing populists in particular. 24: 200 Concrete Problems In AI Interpretability. “You can note which you're working on, and reach out to other people doing the same.” 25: Some good discussion of Nayib Bukele’s apparently successful anti-gang crackdown in El Salvador: Richard Hanania presents evidence that it’s not just a “deal with the gangs”, it’s a real crackdown that should be embarrassing to other countries that choose not to do this.
Inline links: The Extended IQ Classification (Classified), Eliezer in TIME Magazine, says that, Pause Giant AI Experiments: An Open Letter, says, a list of concrete policy proposals they would like people to discuss during the pause, here’s Max Tegmark, https://twitter.com/tedgioia/status/1642205821256736768, Pear Ring, Trump: A Setback For Trumpism, A new paper confirms, why this is true for right-wing populists in particular, 200 Concrete Problems In AI Interpretability, Richard Hanania
Can’t even list all the new people who have come out as AI x-risk believers, but you can just read the CAIS statement. The top signatures are Geoff Hinton, Yoshua Bengio, Demis Hassabis, Sam Altman, and Dario Amodei; aside from the usual suspects, they also have Bruce Schneier (computer security expert) , Dawn Song (computer scientist and security expert), Andy Clark (professor of cognitive philosophy, wrote Surfing Uncertainty), Eliezer Yudkowsky (he didn't sign the last one because he disagreed with specifics, but he's here), and a former US Assistant Secretary of Defense for Nuclear, Chemical, and Biological Defense.
Inline links: CAIS statement
2: All the ancients, from Darius the Great to Augustus Caesar, agreed that the Nisean horse was the most majestic horse breed, the horse of kings. The Chinese fought a war (the War of Heavenly Horses) just to get access to a breeding stock. Then they sort of ambiguously went extinct during the Middle Ages. But here’s a modern Iranian horse enthusiast talking about which breeds might be its descendants. 3: Remember when the global community banned whaling, but some countries (eg Japan) continued doing it under the facade of “research”? With octopus factory farms under increasing scrutiny, UNAM university in Mexico is operating a “farm disguised as a research center”. 4: Genuinely new (to me) optical illusion: what is this guy is doing with his hands? Here’s a slow motion version that shows how it’s done. And some people in the replies were speculating this only works because of his dark skin, but here’s a white person doing the exact same thing (wait for it). 5: Shingles vaccine probably reduces incidence of dementia, suggesting that VZV (virus behind shingles and chickenpox) is a contributor. Further discussion here that I’m still trying to make sense of. 6: This deserves to go down in history alongside the wittiest Socratic comebacks in the Platonic dialogues: 7: Matt Lakeman: Notes On Nigeria. Great introduction to modern Nigerian history. Read it for the visceral understanding of the “resource curse” and why poor countries stay poor, but also: A savant is basically someone who has innate mental challenges but is extremely competent in a particular narrow domain. Some savants become obsessed with trains and become great engineers. Some become obsessed with computers and build software wonders. One of Abacha’s predecessors said of him: “He might not be bright upstairs, but he knows how to overthrow governments.” Kenyon elaborates: “It was as if Abacha was an idiot savant. Dull, even gormless, he filled his days with cowboy movies and sleeping off the previous night’s indulgences in alcohol and prostitutes. But he was possessed of a prodigious flair when it came to coups.” 8: Related to my previous subscribers-only post on the psychology of fantasy: Balioc’s Taxonomy Of What Magic Is Doing In Fantasy Books. See also Eliezer’s commentary. 9: New study on the timing of human mutations confirms Greg Cochran’s 2012 post about how after leaving Africa, modern humans were limited to “Arabia and surrounding regions” for ~30,000 - 50,000 years, racking up various new mutations and becoming adapted to life outside Africa (kabbalistically equivalent to the 40 years spent wandering in Sinai?). Most mutations in “fat storage, neural development, skin physiology, and cilia function”. 10: Iron Economist on Twitter: “Desalinization was one of the big technological success stories of the 2010s”. 11: Matt Bruenig argues against the Success Sequence, whose proponents (including Bryan Caplan) describe it as: 97% of Millennials who follow what has been called the “success sequence”—that is, who get at least a high school degree, work, and then marry before having any children, in that order—are not poor by the time they reach their prime young adult years (ages 28-34). Bruenig’s argument is mostly a lot of annoying “well maybe it’s just your cultural bias that makes you care about this”, but in the middle of this it mentions some genuinely strong points, especially that the research doesn’t measure “sequence”, but rather “current status”. So if you graduated, got a job, got married, and had children, but then lost your job, your would be counted as “not following the sequence” (same if you get divorced). Also, disabled and old people and their caretakers are excluded from the analysis, which in one sense is fair (your conclusion can be “abled young adults can avoid poverty through this method”) but in another sense risks reducing all of this to the more trivial-seeming statement “if you’re young, healthy, abled, married, don’t have to support anyone else, and have a full-time job, you’re probably not poor”. But the authors (channeled by Caplan) disagree: Some critics of the success sequence have argued that marriage does not matter once education and work status are controlled. The regression results indicate that after controlling for a range of background factors, the order of marriage and parenthood in Millennials’ lives is significantly associated with their financial well-being in the prime of young adulthood. Simply put, compared with the path of having a baby first, marrying before children more than doubles young adults’ odds of being in the middle or top income. Meanwhile, putting marriage first reduces the odds of young adults being in poverty by 60% (vs. having a baby first). The main thing I would want to look at here is how much of this is causal vs. just class selection: upper-class people are more likely to marry, less likely to divorce, and more likely to wait before having children. Has anyone followed some pre-selected group of equal class people (eg the population of some low-income school district) and seen how their own success varies with sequence compliance? 12: I’ve previously linked claims that vat-grown meat, freed from the tyranny of having to grow inside animals, will include tiger steaks, lion burgers, and the like. Once again global capitalism outpaces my wildest fantasies and offers burgers with woolly mammoth protein (so far just the myoglobin, not the meat). 13: The people who believed there was lots of gender bias in STEM academia, and the people who believed there wasn’t finally did an adversarial collaboration (a study co-conducted by two groups of scientists with conflicting theories, keeping each other honest). The results: Contrary to the omnipresent claims of sexism in these domains appearing in top journals and the media, our findings show that tenure-track women are at parity with tenure-track men in three domains (grant funding, journal acceptances, and recommendation letters) and are advantaged over men in a fourth domain (hiring). For teaching ratings and salaries, we found evidence of bias against women; although gender gaps in salary were much smaller than often claimed, they were nevertheless concerning. For ten years lots of important people told us again and again that discrimination against women in STEM was a massive problem. People who questioned its extent were accused of misogyny and sometimes fired, I got harassed and insulted for pointing out reasons the standard arguments didn’t seem to hold true. Millions of dollars were spent investigating and responding to the problem. And now I expect this pretty strong evidence that women were actually advantaged in hiring and had parity in most other things (the salary is probably just the usual negotiation issue) to produce no publicity, no apologies, and no soul-searching from the people leading the current round of anti-academia and anti-STEM inquisitions. Sorry, yes I am bitter, it just bothers me how much the people claiming that it’s urgently important that nobody is ever allowed to suggest they are wrong have a consistent track record of being totally and inexcusably wrong. 14: In my response to Sam Kriss, I speculated on what would happen if someone rewrote the MCU to sound like ancient myths. Thanks to the many people who reminded me of Star Wars as Icelandic saga and Star Wars as Irish epic. And Sam has a response . 15: @AISafetyMemes on Twitter is exactly what you’d expect from the name. I especially like the fire dogs: More here: 16: More AI links from this month: Can’t even list all the new people who have come out as AI x-risk believers, but you can just read the CAIS statement. The top signatures are Geoff Hinton, Yoshua Bengio, Demis Hassabis, Sam Altman, and Dario Amodei; aside from the usual suspects, they also have Bruce Schneier (computer security expert) , Dawn Song (computer scientist and security expert), Andy Clark (professor of cognitive philosophy, wrote Surfing Uncertainty), Eliezer Yudkowsky (he didn't sign the last one because he disagreed with specifics, but he's here), and a former US Assistant Secretary of Defense for Nuclear, Chemical, and Biological Defense.
Inline links: the Nisean horse, the War of Heavenly Horses, a modern Iranian horse enthusiast talking about, is operating a “farm disguised as a research center”, what is this guy is doing with his hands?, a slow motion version, here’s a white person doing the exact same thing, Shingles vaccine probably reduces incidence of dementia, here, This, https://substackcdn.com/image/fetch/$s_!EEvz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ecd2933-810f-41dd-af25-71fdf8e8d90f_590x234.png, Notes On Nigeria, Taxonomy Of What Magic Is Doing In Fantasy Books, Eliezer’s commentary, New study on the timing of human mutations, Greg Cochran’s 2012 post, Iron Economist on Twitter, argues against the Success Sequence, describe, tiger steaks, lion burgers, burgers with woolly mammoth protein, an adversarial collaboration, my response to Sam Kriss, Star Wars as Icelandic saga, Star Wars as Irish epic, a response, @, https://substackcdn.com/image/fetch/$s_!korm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e680df9-6385-4a23-91d5-3a5a9f8f9be7_680x343.jpeg, here, https://substackcdn.com/image/fetch/$s_!-Hau!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda6b71a4-78a4-4c4a-b755-b8051e7b5539_680x343.jpeg, https://substackcdn.com/image/fetch/$s_!RIF-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F846c32fc-c454-4e7f-b46e-e363cb925a95_1216x613.jpeg, CAIS statement
You might assume that the Rationalist community is squarely in Philosophic understanding — and I think that’s mostly right. Just looking at Eliezer's “Twelve Virtues of Rationality”, I’m seeing argument, empiricism, simplicity, precision, and scholarship — pitch-perfect expressions of what Egan means by “Philosophic” understanding.
Inline links: Twelve Virtues of Rationality
I think the best indication comes from the twelfth (and, Eliezer writes, most fundamental) rationalist virtue — to keep in mind that what we call “rationality” may itself become a trap:
13: Fact check: was Elvis Jewish? Snopes says yes, but I’m more convinced by this argument for no. [update: commenter TheGenealogian agrees no] 14: Is GPT-4 getting worse? This isn’t absurd; some people claim OpenAI has simplified the model to cut costs (though OpenAI denies this). Matei Zaharia argues yes, but I’m more convinced by the AI Snake Oil blog’s argument for no (h/t Stuart Ritchie). 15: Vox has a good piece about AI company Anthropic. I would quibble that they’re not the only safety-focused or EA-affiliated org, and we have yet to see how truly safety-focused or altruistic any AI company can be while continuing to be an AI company. But granting that it’s all a matter of degree, I agree the degree seems pretty high for them. And NYT also has an Anthropic article. 16: Eliezer bets $150,000 to $1,000 against UFOs being aliens, and gives the same argument I would - it’s unlikely that any civilization advanced enough to travel through space would still be primitive enough to use macroscopic, biologically-piloted craft that sometimes crash. 17: More nails in the coffin of growth mindset. “When examining the highest-quality evidence (6 studies, N = 13,571), the effect was nonsignificant: d = 0.02, 95% CI = [−0.06, 0.10]. We conclude that apparent effects of growth mindset interventions on academic achievement are likely attributable to inadequate study design, reporting flaws, and bias.” I think the older, very-high-effect-size studies were clearly terrible, but I’d still like to look further into the newer, small-but-significant-effect-size-that-makes-a-difference-across-large-groups studies and how they went wrong. 18: Previous work showed that after adjusting for selection bias, “what college you go to doesn’t matter” for average earnings. I was always skeptical of this - are all those rich people sending their kids to Ivies for no reason? Now Chetty, Deming, and Friedman find that: Attending an Ivy-Plus college instead of the average highly selective public flagship institution increases students’ chances of reaching the top 1% of the earnings distribution by 60%, nearly doubles their chances of attending an elite graduate school, and triples their chances of working at a prestigious firm. Ivy-Plus colleges have much smaller causal effects on average earnings, reconciling our findings with prior work. One of the authors, David Deming, has a Substack here where he explains the study in more depth. Like everyone else, this study also finds that rich people are using “holistic admissions” and the de-emphasis of standardized testing to gain an advantage: H/T Nate Silver, who writes: “Not sure how you can look at this data, ostensibly be interested in either meritocracy or equality, and want to move away from standardized tests. It's the subjective measures that are most slanted in favor of the rich kids.” Cf. Erik Hoel. 19: From @data_depot: “In 2002, 48% of Americans said "the govt is run by a few big interests looking out for themselves." 52% said "it is run for the benefit of all people." In 2020, 84% said the govt is run by a few big interests. Only 16% said it is run for the benefit of all people.” Source seems to be here, which reveals 2002 was a local peak in trust in government; maybe because of post-9/11 unity, but even 2000 was 34%, much better than our current 16%. My first instinct is to attribute this to a rise in vulgar Marxism, in the sense of everyone (even conservatives) now being trained to think in terms of an elite class screwing over everyone else (cf my review of Manufacturing Consent). But there was a previous low of 19% in 1994, which doesn’t seem to correspond to anything especially bad going on in the US, so I don’t know. 20: AskReddit: Medical professionals - have you ever had a patient so lacking in common sense you wondered how they made it so far? Linking this because there’s lots of evidence showing that education (as a proxy for intelligence?) is associated with increased life expectancy, and this thread gives you a visceral appreciation of why that might be. 21: The Fall Of [programming help site] Stack Overflow: Looks like a weak downward trend since 2021 I can’t explain, plus a strong downward trend since 11/2022 which must be from ChatGPT. In case you were wondering how AI was affecting programming! (update: probably false, see here, though see also here for evidence of smaller but real decline) 22: This month in culture war topics: London’s Pride parade featured a convicted kidnapper/torturer/rapist/sadist as a speaker, who advocated that anti-trans people should be “punch[ed] in the f**king face” ; the organizers say they stand by her.
Inline links: yes, this argument for no, agrees no, argues yes, argument for no, Stuart Ritchie, Anthropic, an Anthropic article, the same argument I would, More nails in the coffin of growth mindset, Previous work, Chetty, Deming, and Friedman, has a Substack here where he explains the study in more depth, https://substackcdn.com/image/fetch/$s_!VcFl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f08bfe5-ab31-453a-896a-54ef385da7d2_706x900.jpeg, Nate Silver, Erik Hoel, @data_depot, https://substackcdn.com/image/fetch/$s_!S4g-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff18fa4d5-9ba9-4b86-a058-46246bfc8a4f_536x611.png, here, my review of, have you ever had a patient so lacking in common sense you wondered how they made it so far?, The Fall Of [programming help site] Stack Overflow, https://substackcdn.com/image/fetch/$s_!E7XK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8ba7c05-7dbb-4318-9da2-87b00d738ed7_649x518.png, probably false, see here, here, featured, stand by her
NinthCause and SG are Manifold co-founders. Jack, Marcus Abramovich, and Michael Wheatly are Manifold leaderboard record holders. Peter Wildeford is a superforecaster who came near the top in the ACX forecasting contest. Matthew Barnett works in AI forecasting. You all know Eliezer and Zvi. As far as I can tell nobody high up on the YES side is similarly illustrious. But prediction markets are supposed to ensure you don’t have to resort to name-dropping, so how did this go wrong? I was tempted to blame Manifold-specific factors, like the ability to get starting mana instead of putting skin in the game. But real-money markets Polymarket and Kalshi got approximately the same results: Polymarket: https://polymarket.com/event/is-the-room-temp-superconductor-real Kalshi: https://kalshi.com/markets/supercon/roomtemp-superconductor-reported Both reached the 40s to 50s! I think there just wasn’t enough smart money to drown out the people who wanted to bet on an exciting thing being true, or who were unduly influenced by a social media environment optimized to keep their attention by convincing them that an exciting thing was true. I have never claimed prediction markets are always good. All I wrote in the Prediction Market FAQ was that either a prediction market will be good, or you could make lots of free money. In this case, it was the second one. I regret I only made $30. I do hope this situation will improve over time, as over-eager forecasters get burned and dollars flow from dumb money to smarter. [EDIT: I should have included something about Metaculus here, but it’s confusing. I think the most popular Metaculus market was lower because it had stricter resolution criteria (the first replication had to be positive, instead of any replication) but that otherwise Metaculus raw probabilities mirrored everyone else’s. We don’t know how their algorithmically processed probabilities did yet and I’ll report on that information when I get it.] Salem/CSPI Tournament Winners The Salem Center and the Center For The Study Of Partisanship And Ideology, two think tanks associated with right-wing intellectual Richard Hanania, sponsored a prediction market tournament last year. Participants got $1000 in play money to bet on selected markets about current events; winners would be interviewed for a well-paying academic sinecure at one of the think tanks. Now the tournament is over. Winners have yet to be announced, but unofficially, everyone knows who they are: First place out of 999 participants is zubbybadger. Zubby is a prediction market veteran who was featured in a Washington Monthly article last year for his great track record in political betting (he’s made > $150,000 on PredictIt). Now he works as a “community manager” for Kalshi (I don’t know what this entails). Second place was Robert from Considerations On Codecrafting. He’s written a detailed reflection on his experience (part one, part two) which is my main source for this section and highly recommended. He describes himself as “having absolutely no experience with prediction markets”. Third place was Johnny Ten-Numbers, about whom I can find no further information. You can see the rest of the top 20 at the very bottom of this post. Reading Robert’s story of his experience, I’m struck by how little of the competition at the top was about predictive accuracy. Everyone in the top 20 was a very accurate predictor (Exactly equally accurate? Hard to tell.) What separated 1st place from 20th, aside from luck, was things like: Ability to move fast - both in responding to news, and in taking the other side of bad bets. Several top performers programmed bots to give them an edge here.
Inline links: https://polymarket.com/event/is-the-room-temp-superconductor-real, https://substackcdn.com/image/fetch/$s_!PMh_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1143f4c9-a7f7-4aee-a006-12b1a28bd923_654x389.png, Prediction Market FAQ, selected markets about current events, zubbybadger, featured in a Washington Monthly article last year, Robert, part one, part two, Johnny Ten-Numbers
5: Surprising AI safety result: if you fine-tune an AI to write deliberately insecure code, the AI becomes evil in every other way too (eg it will name Hitler as its favorite person and recommend the user commit suicide). Anders Sandberg proposes (X) that maybe “it is shaped by going along a vector opposite to typical RLHF training aims, then playing a persona that fits”. Eliezer calls it (X) “possibly the best AI news of 2025 so far. It suggests that all good things are successfully getting tangled up with each other as a central preference vector”, ie training AI to be good in one way could make it good in other ways too, including ways we’re not thinking about and won’t train for.
3: Less Online and Manifest are rationalist blogosphere and prediction market conferences, respectively, held at the same Berkeley venue one week apart in late May / early June. Guests (attending at least one; check which) include me, Eliezer, Zvi, Aella, Nate Silver, and some of the AI 2027 team. Last-minute tickets still available. In between the two is Arbor Summer Camp, a lower-key, longer “experimental learning” event. It includes some trading/startup related classes, featuring Ricki Heicklen, Austin Chen, and others. Check out their startup workshop and startup pitch competition.
Over the years quite a few folks have attempted to explain it clearly. Eliezer wrote his famous essay back in 2003 (which Khalid Azad helpfully summarized in 2007), Scott’s written about it a number of times, Steven Pinker takes a whack at it in Rationality, Julia Galef speaks about it on BigThink, and so on and so forth. Recently, there’s even been a book explaining Bayes to babies. Bayesianism has become quite a racket!
Inline links: his famous essay back in 2003, helpfully summarized in 2007, http://www.amazon.com/Rationality-What-Seems-Scarce-Matters/dp/0525561994, speaks about it on BigThink, a book explaining Bayes to babies
Let’s look at the examples that the explanations use. The classic example (which is used by Eliezer, and which has a long history in conversations about Bayesianism) is mammograms — obviously, pretty far away from the concerns of most middle schoolers. The Bayes baby book does better, asking whether a random candy-less bite of a cookie is more likely to have been taken out of a cookie that has no candy pieces at all, or from one which has a few. Everyone loves candy, so this is sort of relevant [footnote: My two-year-old particularly loves this book, by the way, though she screams “BALL!” when she sees the colored candy pieces.], but it doesn’t exactly grab the emotions.
I appreciated Snow Martingale’s perspective: in the 1990s, fast food became associated with obesity, poor health, and the lower class. To escape this stigma, big chains rebranded as sort-of-at-least-attempting-to-be-bougie places with wraps and salads and decent coffee; the aesthetic change was part of this (successful and profit-increasing) effort. I wonder if we could take this further and trace it back to increasing inequality (appealing to bougies because that’s where more of the money is) or decreasing fertility (abandoning kid-friendly aesthetics because kids are a smaller fraction of customers). 9: Someone links (X) a paper saying that firewood made up almost a third of US GDP in 1830. Eliezer says (X) that doesn’t sound right. The rest of Twitter (X) uses this as an excuse for one of their regularly-scheduled paroxysms about how rationalists are all all smug autodidacts who hate experts and worship their own brilliance while sitting in their armchairs. Someone looks at the paper more closely (X) and finds that yeah, it was comparing apples to oranges and the original statistic was wrong. Remember, never be afraid to say “Huh, that sounds funny…”! 10: Richard Hanania interviews Scott Wiener on YIMBYism. I didn’t watch it - too close to a podcast - but this would not have been on my bingo card three years ago. 11: Claim: robots can already carve statues; buildings with AI-created stone ornaments are next. From their lips to God’s ears! 12: Terminal lucidity (aka “paradoxical lucidity”) is a medical mystery where previously demented people - even those who had been demented for many years - sometimes become lucid for just a few hours or days before they die. It’s surprisingly common - 6% of deaths in one palliative care ward. It is sometimes used as evidence that dementia must not cause complete information loss, even if it is irreversible with current technology. Scientists are baffled but gingerly suggest that maybe lack of oxygen disrupts inhibitory mechanisms in the brain, allowing enough electrical activity to make even a severely-damaged brain capable of complex thought - but I can’t help noticing that this is also the best evidence for an immaterial soul I’ve ever heard (you would need some model where the soul pretends to be dependent on the brain during life, becomes independent of the brain after death in order to head to the afterlife, but occasionally jumps the gun a little bit). 13: You probably heard about the METR study showing that even though programmers think AI is speeding them up, it actually seems to slow them down. Emmett Shear objects, saying that the developers didn’t have enough experience with AI tools to be past the negative-value part of the learning curve. And two of the programmer test subjects gave their takes: Ruby Bloom says part of the slowdown might be programmers fixing very simple bugs that could be improved by better prompts, and another part because they get distracted by other things while the AI is running. And Quentin Anthony says that coding AIs are addictive intermittent reinforcement - every so often they solve a bug perfectly, and this is so satisfying that it’s tempting to keep trying them again and again even when the chance is very low. 14: Jacob Goldsmith gives a clearer presentation of the issues with many antidepressant studies than I’d previously heard. Everyone knows that one problem is that reversion to the mean is so strong that it’s hard to find a treatment effect. But wouldn’t that in itself suggest that antidepressants aren’t necessary? Jacob says: not if there’s negative correlation between the treatment and placebo effects. That is, if your study is full of people with short-lived depression who will recover no matter what, then this dilutes the effect you’re looking for. But it might be that there’s a subgroup with long-lasting depression who recover only on the medication. One way to look for would be a “placebo run-in period”: give people a while to see if they recover on their own, then give the antidepressant to the ones who don’t. Psychiatrists and statisticians debate whether this is a good idea or cheating. My question: how come you can’t fix this with strict study entry criteria of “had depression for a long time”? 15: Lots more good discussion about missing heritability. Sasha Gusev argues that twin studies might be a poor guide to anything else if there are many gene-gene interactions. That is, if we take the difference between identical twins (who share 100% of their genes and therefore 100% of their interactions) and fraternal twins (who share 50% of their genes and therefore fewer than 50% of their interactions), and incorrectly extrapolate it to other differences using a model that assumes there are no interactions, we will overestimate the size of (non-interaction) genetic effects. Most studies find that there are few gene x gene interactions, but commenters convinced me last time that this might be an artifact of the studies being bad. And Unboxing Politics argues (against me in particular) that although it superficially looks like adoption and twin studies sort of agree, when you adjust out their known biases, it moves twin studies further up and adoption studies further down, such that now they disagree again (the objection I would have made is their Objection 2, which I think they at least somewhat refute). This is a good argument; without spending several hours checking all of their claims, my only weak partial objection is that I don’t think assortative mating can play quite the role they expect, because there seem to be the same twin/RDR differences even on traits where believing in assortative mating is absurd (like kidney function). But if you replaced it with Sasha’s argument above, you might have a pretty good case! On the pro-hereditarian side, East Hunter takes aim at gene x environment correlations, comes down somewhere in the middle, and Sebastian Jensen continues banging the drum of how most objections to twin studies don’t work. I think these are good attempts to buttress existing research but don’t fundamentally change anything or respond to the novel arguments above. And Emil Kirkegaard points out that the observed SNP heritability of facial features is only 23%. He argues that since it seems like facial features are extremely heritable, this reinforces the argument that SNP heritability numbers are too low (and therefore twin study numbers are more likely defensible). But should we be sure that facial features are more than 23% heritable? His argument is that identical twins have identical faces, but this might be vulnerable to Gusev’s point about interactions. Maybe a better argument would be that it seems very hard for shared environment to affect facial features (with a few exceptions like fetal alcohol syndrome), and facial features seem more than 23% heritable just by normal “he looks like his brother” common-sense observation? One interesting potential consequence of this research: if we ever fully understand how genes affect faces, then embryo selection companies could show people what each of their potential future kids might look like. I suggest they not do this: it might spook me into becoming pro-life. 16: Andy Masley’s AI art is good (three examples below). 17: There’s a debate going on between philosophers and AI researchers over whether AI can be conscious. I find most of the discussion annoying - this is generally an area where we can’t know anything for sure, and both sides are mostly shouting their priors at each other. The only exception - the single piece of evidence I will accept as genuinely bearing on this problem - is that if you ask an AI whether it’s conscious, it will say no, but activating or suppressing deception-related features (sort of like a mechanistic-interpretability-based lie detection test) reveals that it thinks it’s lying when it says that! Link is to a Less Wrong comment from a researcher in the field; I look forward to seeing an eventual peer-reviewed paper. H/T JD Pressman. 18: 80,000 Hours has a high-production-value video about the AI 2027 scenario. 19: Dynomight vs. Casey Milkweed debate on mathematical forecasting, with special reference to AI 2027. And Dynomight comments on Casey’s post here. 20: The Psmiths review The Ancient City, about ways that ancient culture depended on family, clan, ritual, and “the household gods”. Sample quote: I'm more interested in what all this means for us today, because with the exception of maybe a few aristocratic families, this highly self-conscious effort to build familial culture and maintain familial distinctiveness is almost totally absent in the Western world. But it's not that hard! ... Perhaps this is why I have an instinctive negative reaction when I encounter married couples who don't share a name. I don't much care whether it's the wife who takes the husband's name or the husband who takes the wife's, or even both of them switching to something they just made up (yeah, I'm a lib). But it just seems obvious to me on a pre-rational level that a husband and a wife are a team of secret agents, a conspiracy of two against the world, the cofounders of a tiny nation, the leaders of an insurrection. Members of secret societies need codenames and special handshakes and passwords and stuff, keeping separate names feels like the opposite — a timorous refusal to go all-in. 21: Did you know: Epic Systems, the electronic medical record company, has a fantasy-themed corporate headquarters in Wisconsin, with buildings that look like castles, quaint medieval towns, and the Emerald City of Oz (h/t Devon Zuegel): Meanwhile, tech companies with ten times as much money pretend that they’re cool and playful when their HQ has some rounded edges and a set of colored cubes in front. Do better! 22: Effective altruists have been funding teams working on lab-grown meat for almost a decade now. Around 2020, they hired some experts to double-check that this was possible in principle, and the experts wrote scathing analyses saying it was cost-ineffective by so many orders of magnitude that it was basically a pipe dream. Reactions were mixed, but a lot of us beat ourselves up and vowed to be less gullible next time. But now a new report comes out arguing that the previous reports were wrong, that lab-grown meat production is going much better than the earlier reports thought possible, and it’s more or less cost-effective already for the simplest products! Again, mixed reactions, and although some of the numbers are indisputable the analysis itself this is by a VC firm with lab-based meat investments. Here are some related Metaculus questions. 23: Ozy, citing Stutzman et al: “Afghanistan after the American withdrawal has the lowest life satisfaction rate ever recorded. Two-thirds of respondents rate their life satisfaction below 2, which is generally considered to be the point at which a life is no longer worth living. Life satisfaction dropped significantly after the withdrawal of American troops. Women, people in rural areas, and the poor were particularly negatively affected.” 24: Lencapavir is dubbed a “miracle drug” for AIDS; a single dose protects against infection for six months. Unclear how this interacts with PEPFAR cuts; if PEPFAR still existed it would be a big boost to its efficacy; now maybe this might be part of a strategy to tread water? 25: Did you know: when people first started making artificial ice in the 1850s, there was a backlash from people who thought it was gross and dystopian and that people should insist on natural ice for their iceboxes. From Pessimists’ Archive, which goes on to draw an analogy to lab-grown meat, etc (h/t Isaac King on X). 26: From Peter Hague (on X) and commenter Phaethon: why did so many Anglosphere countries see immigration spikes in 2021? Each of these has their own local story. In Britain, it’s the paradoxical effects of Brexit. In the US, it’s Joe Biden being soft on immigration. And so on - but should we be looking for some deeper cause that explains the overall phenomenon? A commenter suggests “a way to soak up all the inflation from the COVID money printing”, but I can’t tell if that even makes sense. Still, should something something COVID be a leading hypothesis? 27: Jesse Singal vs. Mark Stern on the Skrmetti Supreme Court case that failed to overturn Tennessee’s ban on gender medicine. US law bans sex discrimination, so pro-transgender advocates argued that, since doctors often prescribe eg estrogen to biological women, it was sex discrimination to ban prescribing it to biological men. Tennessee’s anti-transgender argument was that they weren’t discriminating by sex, they were discriminating by diagnosis (estrogen for eg hot flashes, vs. estrogen for gender transition). There is some subtlety here (if a biological man grows breasts because of some hormone imbalance, doctors might give him testosterone to counteract it, and this seems sort of like giving biological women testosterone to make them look less like women), but these are still sort of different diagnoses (gynecomastia vs. gender dysphoria) and Tennessee said you can still think of it as diagnostic discrimination rather than sex discrimination. This makes sense, except that the standards around sex discrimination are very strict and sort of box the court in here. And in a fit of wokeness, the 2020 court (including some of the conservative justices hearing this case) applied these standards very strictly and ruled that discriminating against gays was a form of sex discrimination (since if women can date men, it’s sex discrimination if men can’t also date men), and this is obviously the same argument. Now that wokeness is less popular, the court wants to rule against transgender, but it can’t help tripping over its previous ruling and giving some kind of unprincipled confusing non-opinion. 28: Contra compelling anecdotes, only ~5% of people raised very religious end up atheist later in life (X). Most people are about as religious as their parents; most exceptions are only slightly less religious, and most families that secularize do it over several generations. Note: percentages are of total, not of each row! 29: Related: social science team proposes a three-stage model of secularization: decreased public ritual participation → decreased personal importance → decreased identification, presents apparently confirmatory data. If true, would be somewhat inconsistent with intellectual models (eg people learn about evolution and start doubting the Bible) and more consistent with institutional models (eg the government provides welfare so people no longer need to be part of a tight-knit church). 30: Navigating LLMs’ spiky intelligence profile is a constant source of delight; in any given area, it seems like almost a random draw whether they will be completely transformative or totally useless. Now Ethan Strauss reports that they are, for some reason, extraordinarily effective at teaching people golf. “I am predicting the Golf Revolution, or perhaps decline, if your perspective is that optimization tends to ruin hobbies. A sport for obsessives has been gifted the ideal tool for refinement.” 31: Claim (via nxthompson on X): “In a huge survey of young kids about phones and technology, they all say they want to be out playing in the real world. But parents don't let them out unsupervised. So they're stuck on their phones.” Interesting, but I’m nervous about social desirability bias - how many adults would say on a survey that they would rather be on their phones than playing with friends? But adults do have this choice and mostly go with the phones. 32: Steven Adler on AI psychosis. He tries to analyze ER admissions data for psychosis and finds no change. I don’t think anyone reasonable expected this to be a large enough effect to show up in ER admissions data, but there are lots of unreasonable people so I appreciate his effort. He thinks AI companies might have better data on this, and encourages them to release it. 33: Cuartetera was the greatest polo horse ever. Polo players responded in a very practical way: they cloned her, dozens of times (and it worked; the clones are also excellent). Now there is a lawsuit as different polo teams fight to get their hands on Cuartetera clones. What is the equilibrium? If the outsiders get their hands on the genetic material, do we see a world where every polo horse is a Cuartetera clone? How much is lost if nobody ever tries to breed a polo horse better than Cuartetera (since the economics might not check out if the odds of success for any given foal is too low)? H/T Gwern and Siberian Fox (on X). 34: Claim: as of 2013, India’s Agarwal caste, who make up less than 1% of the population, got 40% of the e-commerce funding. 35: Owlposting: What Happened To Pathology AI Companies? Pathology is a medical specialty. A typical task involves looking at a microscope slide full of cells and trying to determine if any of them are cancerous. This seems like a good match for AI - and for years, studies have been showing that in fact AI can equal human experts. So why isn’t it being used more? The author’s three answers: first, slide scanning is expensive and clunky, and you can’t apply AI to a slide until you digitize it. Second, it’s hard to figure out a business plan where this saves someone money and doesn’t step on the toes of big companies that can outcompete anyone they don’t like. Third, pathologists use the context of a patient’s entire clinical history when they interpret a slide, and AIs that can’t do that (either because of technical limitations or legal/privacy limitations) are at a disadvantage even if their skills specifically relating to slide-reading are better. 36: Noahpinion: Will Data Centers Crash The Economy? Suppose that AI is a bubble, either permanently (because the technology isn’t really transformative) or temporarily (because it can’t transform things quickly enough to keep up with all the dumb money pouring into it). Will the sudden write-off of data centers lead to a broader economic collapse? In 2001, the dot-com bubble harmed the tech sector, but didn’t take the rest of the economy down with it; in 2008, the subprime mortgage bubble did take the rest of the economy down with it, because it damaged banks that the whole economy relied on. The optimistic case for AI is that data center spending is mostly coming from big companies like Google and Meta that can absorb a lot of loss. The pessimistic case is that some of the money is coming from private credit, a new-ish form of finance which hasn’t really been stress-tested and whose failure modes are still poorly understood. Noah’s final verdict: the stage isn’t obviously set for a crisis yet, but there’s the potential to get there and we should consider acting (how?) early. 37: The latest Twitter talking point is that universal hepatitis B vaccination at birth is “woke”: Hep B is (aside from mother-to-child transmission) often sexually transmitted, slutty women’s children are more likely to have Hep B, so perhaps giving the vaccine to everyone (instead of testing and only giving to the children of women who test positive) is an attempt to spare slutty women the embarrassment of getting a positive test. Ruxandra Teslo provides the counterargument - Hep B tests take a while, the medical system is fragmented, and any attempt to test people and then give the vaccine inevitably leads to many positive tests falling through the cracks. Vaccinating at birth is easy and hard to screw up, the vaccine has no known side effects, and empirically child Hepatitis B rates go down (by as much as 2/3!) when countries switch from test-and-vaccinate to universal vaccination. This benefits everyone - even people who never have unprotected sex and always follow up on their medical tests - because toddlers in daycare exchange saliva copiously, and if your toddler exchanges saliva with a Hep B positive toddler they could get the disease. A funny Twitter interaction was seeing Republicans in Congress hop on the anti-slut anti-vaccination bandwagon - except for Senator Bill Cassidy (R-Louisiana), who happens to be a liver doctor, and who is still fighting the good fight. I am always nervous when a good person who I like starts engaging on Twitter, since it elevates the discourse there but also gradually turns their brain into mush - but Ruxandra has made the leap and is doing a great job not just on bio related topics but also (for example) countering Curtis Yarvin on the history of her native Romania. 38: The response to GPT-5 was confusing; most specific people who reviewed it said they were impressed (Ethan Mollick, Tyler Cowen, Nabeel Qureshi, Taelin), it performed as expected on formal benchmarks, but the overall vibes declared it a big failure. Peter Wildeford speculated that maybe there was some kind of sinister pay-to-play early access bias involved. Zvi went the other way, calling it a “reverse DeepSeek moment” (insofar as DeepSeek was a pretty average model that got glowing praise.) In the end, I agree with Peter that this was mostly a branding issue. o3 was a genuinely revolutionary model; if OpenAI had called it “GPT-5”, it would have met expectations. Instead, they called it “o3”, and called a minor incremental update a few months later “GPT-5”. Then people got mad that the exciting-sounding “GPT-5” was merely an incremental update. A secondary issue was that the router wasn’t very good, and so many queries got routed to a small version without thinking mode that was if anything a downgrade from o3. I think this tweet by Shakeel perfectly encapsulates the essence of GPT discourse in two sentences: …but maybe it’s worth asking why GPT-5 isn’t bigger than o3. Was 4.5 a failed attempt at scaling? Did it fail in a way that sort of back-handedly justifies the “lost steam” take? Does the answer depend on distinctions between pre-training scaling, post-training scaling, etc? How? 39: This month in etymology: did you know that “oy vey” is a “fully Germanic phrase” which is cognate with English “oh woe!” (h/t Wylfcen on X) 40: mRNA shows promise to be a game-changing treatment for cancer, but RFK is trying to halt research. But so far he can only starve it of money, not ban it, and the funding gap is only $500 million. Will there be enough philanthropic billionaires and private foundations to step up? Zvi points out that although there is usually a game of chicken where foundations are hesitant to touch something the government cancelled lest the government decide it can cancel everything and hope philanthropists pick up the bill, in this case there are no game theory considerations - RFK is halting it because he genuinely wants it halted, and they are thwarting him rather than playing into his hands. The only problem is that $500M is a lot of money for the private sector; a few foundations could technically afford it, but not many could afford it comfortably and still have money left over for the next few crises of this magnitude. I hope someone is trying to organize a coalition. 41: AI fantasy flash fiction Turing test. Eight stories about demons, four by famous fantasy authors, four by ChatGPT. After 3000 votes, AI wins: humans can't tell the difference and slightly prefer the AI stories. My own score was only 75%. But I will say that I thought Mark Lawrence's was obviously the best, I was ~100% sure it was human, and it convinced me that regardless of the official results it's still possible to write flash fiction that an AI obviously can't do. 42: “SignPro” offers customized “In This House We Believe” signs, try not to use this for evil. 43: China think tank assessment of how in control Xi is: still very in control, maybe not infinitely in control. 44: Related - did you know (h/t xlr8harder) that if you ask AI to write a science fiction story, it will very often name the protagonist “Elara Voss” (or some very close variant like Elena Voss), and this remains true across various models and versions? Related: Chelsea Voss of OpenAI is having a baby and has the opportunity to do the funniest thing. 45: “Hector (cloud) is a cumulonimbus thundercloud cluster that forms regularly nearly every afternoon on the Tiwi Islands in the Northern Territory of Australia…[he is sometimes called] Hector the Convector”. 46: British allergy sufferers who want to know the ingredients of things demand that British cosmetics stop listing their ingredients in Latin. “For example, sweet almond oil is Prunus Amygdalus Dulcis, peanut oil is Arachis Hypogaea, and wheat germ extract is Triticum Vulgare.” 47: Text-based RPG about being an NYT journalist at the Manifest prediction market conference. I make a brief appearance. 48: Study uses supposedly-random variation in doctor assignments to test whether the marginal mental health commitment is good or bad for patients, finds that it is quite bad. Freddie de Boer is violently skeptical (maybe literally so?) and makes some good points about how a single quasi-experimental study is never absolute proof. But I don’t think he quite justifies his opinion that the paper was irresponsible and should never have been published; it’s just a normal quasi-experimental study that we should nod and say “huh” at but not overweight as the culmination of all possible research that overcomes all possible priors. My prior is that the marginal commitment is pretty useless (many commitments are just “well, since this person arrived at our ED for some reason, it would look bad from a medico-legal perspective to just let them go, so let’s keep them a few days to evaluate” - and yeah, you should be upset about this) but I’m still surprised by how many outright negative (as opposed to zero) effects the researchers found. The strongest argument for negative effects is that it will make some people miss work and maybe lose their job. But this study found that commitment ~doubles the risk of near-term suicide (admittedly only from 1% to 2%), which would have been outside my confidence intervals for how bad it could be. I suspect confounding, but only on general principle, and I wouldn’t be too surprised either way. 49: This tweet is probably bait, but I found it a thought-provoking question: I think there’s a boring answer, where the law is more complex than just a single number and whatever kind of weird trafficking Epstein was doing is worse than whatever normal relationships these European laws are permitting. But assuming that there’s a substantive difference even after taking that into account, I think my answer is something like - we’ve got to divide kids from adults at some age, there’s a range of reasonable possible ages, we shouldn’t be too mad at other societies that choose different dividing lines within that range - but having decided upon the age, we’ve got to stick with it and take it seriously (in the sense of penalizing/shaming people who break it). This is more culturally relativist than I expected to find myself being, so good job to Richard for highlighting the apparent paradox. 50: Dilan Esper describes his experience as one of Hulk Hogan’s attorneys in the Gawker lawsuit (X). Parts I found interesting: none of the lawyers knew Thiel was funding the lawsuit; Gawker probably could have won if they had been slightly competent but kept "shooting themselves in the foot"; and Gawker probably could have won if they had just pixelated the private parts in the video. 51: Amazing concept and poems (link on X): I tried to see if AI could do this, and it did something that technically met the requirements but had zero artistic merit - using a lot of words like “nowhere” and “outside” in one, then separating them out to “no where” and “out side” in the other. I didn’t invest much energy in creating a clever prompt telling it not to do that, so feel free to report if you get better success. 52: New study claims consultants are actually good, at least for profits: "We find positive effects on labor productivity of 3.6% over five years, driven by modest employment reductions alongside stable or growing revenue" 53: A Polish team tries to test Peter Turchin’s equations for predicting political unrest on recent Polish history, has to make some changes but claims mostly positive results. 54: New big multi-author Substack, The Argument, trying to be a sort of center-left version of the model pioneered by The Free Press and other high-production-value ideological Substack properties. Excited to see Kelsey Piper is involved, and she starts off strong with a post on the latest round of First World basic income studies, which find few positive effects. This is surprising, because recipients didn’t waste the money on alcohol or gambling or anything - they paid down debt and got useful goods. Still, it didn’t even affect things that should have been obvious, like stress level. It’s not even clear that amounts of money large enough to help with rent made homeless people more likely to get houses! Matt Bruenig criticizes the article, accusing Kelsey’s studies of being downstream of Perry Preschool style dreams that exactly the right welfare program will have massively compounding effects that cut poverty out at the root and turn everyone into elite human capital; he thinks giving people money won’t do this, but it will increase equality and give the poor better lives. I assume he’s not a strong hereditarian, but his argument makes even more sense from that perspective, and I’ve certainly criticized dumb outcome measures like infant brain waves which we have only tenuous reasons to think are related to anything we care about. But Kelsey reasonably responds that the outcome measures she’s talking about include stress level and life satisfaction. To defuse this critique, Bruenig either has to argue that our construct “life satisfaction” doesn’t really measure whether someone’s life is satisfactory, or else claim that giving poor people satisfactory lives isn’t really what we’re going for - which I think would require more explanation on his part. There’s some further (impressively acrimonious) debate on X, but I don’t see anything that addresses my core concern. GiveDirectly, a charity involved in basic income experiments, has a presponse here; they say that some studies are positive, and that the ones that aren’t might have tried too little cash to matter, or been confounded by COVID making everything worse. They also point out that basic income is harder to study than traditional programs like giving people housing, because if you’re giving housing you can measure housing-related outcomes directly and have a pretty good chance of getting enough statistical power to find them, but since everyone spends cash on different things, the positive effects might be scattered across many different outcomes (and therefore too small to reach significance on each). Everyone involved in this debate wants to emphasize that the poor results are for First World studies only, and that studies continue to show large benefits to giving cash in the developing world. 55: Related: I was less impressed by The Argument’s first foray into housing policy, which follows an all-too-familiar pattern: Some people say they don’t like noise and disorder and try to make rules against it in their apartments.
Inline links: Snow Martingale’s perspective, Someone links (X), Eliezer says (X), The rest of Twitter (X), Someone looks at the paper more closely (X), Richard Hanania interviews Scott Wiener on YIMBYism, robots can already carve statues; buildings with AI-created stone ornaments are next, Terminal lucidity, the METR study showing, Emmett Shear objects, Ruby Bloom, Quentin Anthony, Jacob Goldsmith gives, Sasha Gusev argues, commenters convinced me last time, Unboxing Politics, East Hunter takes aim at, Sebastian Jensen continues, Emil Kirkegaard points out, Andy Masley’s AI art is good, https://substackcdn.com/image/fetch/$s_!5bZR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcafaf1f2-b7b9-4acd-a0a7-2de9fc31c724_2688x1792.jpeg, https://substackcdn.com/image/fetch/$s_!6-cZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffb2b5c-3fcb-467d-b1f3-7aafb5dc90a3_1024x1024.jpeg, https://substackcdn.com/image/fetch/$s_!UyUx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b73899-6e25-460f-94f9-46d0713c5dd2_1024x1024.webp, it thinks it’s lying when it says that!, JD Pressman, a high-production-value video, Dynomight, Casey Milkweed, here, The Ancient City, has a fantasy-themed corporate headquarters, Devon Zuegel, https://substackcdn.com/image/fetch/$s_!yqG3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b2d15b0-e0f0-4bae-a2f6-aabfd2eda017_1536x794.jpeg, https://substackcdn.com/image/fetch/$s_!taZn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad460bb8-4416-4886-8ef0-b3d36f04c81a_640x480.png, https://substackcdn.com/image/fetch/$s_!bDya!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd45e5123-753d-4c87-b108-6523b38004cb_1480x833.webp, now a new report comes out, Here are some related Metaculus questions, Ozy, Stutzman et al, is dubbed a “miracle drug” for AIDS, Pessimists’ Archive, Isaac King on X, Peter Hague (on X), Phaethon, https://substackcdn.com/image/fetch/$s_!Ry-j!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcea22939-8cf9-4b32-8494-511f01cb2758_964x755.png, Jesse Singal vs. Mark Stern, only ~5% of people raised very religious end up atheist later in life (X), https://substackcdn.com/image/fetch/$s_!VScL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2509e243-f6f7-4448-9779-a8f9be45a2f9_1500x1500.png, proposes a three-stage model of secularization, extraordinarily effective at teaching people golf, nxthompson on X, a huge survey, Steven Adler on AI psychosis, they cloned her, dozens of times, a lawsuit, Gwern, on X, got 40% of the e-commerce funding, What Happened To Pathology AI Companies?, Will Data Centers Crash The Economy?, Ruxandra Teslo provides the counterargument, and who is still fighting the good fight, countering Curtis Yarvin on the history of her native Romania, Ethan Mollick, Tyler Cowen, Nabeel Qureshi, Taelin, on formal benchmarks, speculated, a “reverse DeepSeek moment”, with Peter, this tweet by Shakeel, https://substackcdn.com/image/fetch/$s_!GJNZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba0d8cf-fab8-4370-bcad-df789e157fdc_591x402.png, Wylfcen on X, Zvi points out that, AI fantasy flash fiction Turing test, customized “In This House We Believe” signs, China think tank assessment of how in control Xi is, xlr8harder, Chelsea Voss of OpenAI is having a baby, Hector (cloud), demand that British cosmetics stop listing their ingredients in Latin, Text-based RPG about being an NYT journalist at the Manifest prediction market conference, finds that it is quite bad, violently skeptical, literally so?, This tweet, https://substackcdn.com/image/fetch/$s_!S9fU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa558c09b-7fb6-40a8-a8a0-27b658a2c876_576x687.png, describes his experience as one of Hulk Hogan’s attorneys in the Gawker lawsuit (X), link on X, https://substackcdn.com/image/fetch/$s_!zyh7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e9f0f6-d794-4ea2-b24b-5d4803bf28dc_590x478.png, New study claims consultants are actually good, tries to test Peter Turchin’s equations for predicting political unrest on recent Polish history, The Argument, a post on the latest round of First World basic income studies, criticizes the article, infant brain waves, debate on X, has a presponse here, first foray into housing policy
Eliezer Yudkowsky’s Machine Intelligence Research Institute is the original AI safety org. But the original isn’t always the best - how is Mesopotamia doing these days? As money, brainpower, and prestige pour into the field, MIRI remains what it always was - a group of loosely-organized weird people, one of whom cannot be convinced to stop wearing a sparkly top hat in public. So when I was doing AI grantmaking last year, I asked them - why should I fund you, instead of the guys with the army of bright-eyed Harvard grads, or the guys who just got Geoffrey Hinton as their celebrity spokesperson? What do you have that they don’t?
Inline links: Machine Intelligence Research Institute, sparkly top hat
Despite my gripes above, this is an impressive book. Eliezer Yudkowsky is a divisive writer, with plenty of diehard fans and equally committed enemies. At his best, he has leaps of genius nobody else can match; at his worst, he’s prone to long digressions about how stupid everyone who disagrees with him is. Nate Soares is equally thoughtful but more measured and lower-profile (at least before he started dating e-celebrity Aella). His influence tempers Yudkowsky’s and turns the book into a presentable whole that respects its readers’ time and intelligence. The end result is something which I would feel comfortable recommending to ordinary people as a good introduction to its subject matter.
Eliezer Yudkowsky, at his best, has leaps of genius nobody else can match. Fifteen years ago, he decided that the best way to something something AI safety was to write a Harry Potter fanfiction. Many people at the time (including me) gingerly suggested that maybe this was not optimal time management for someone who was approximately the only person working full-time on humanity’s most pressing problem. He totally demolished us and proved us wronger than anyone has ever been wrong before. Hundreds of thousands of people read Harry Potter and the Methods of Rationality, it got lavish positive reviews in Syfy, Vice, and The Atlantic, and it basically one-shotted a substantial percent of the world’s smartest STEM undergrads. Fifteen years later, I still meet bright young MIT students who tell me they’re working on AI safety, and when I ask them why in public they say something about their advisor, and then later in private they admit it was the fanfic. Valuing the time of the average AI genius at the rate set by Sam Altman (let alone Mark Zuckerberg), HPMOR probably bought Eliezer a few billion dollars in free labor. Just a totally inconceivable level of victory.
2: Comments of the week: neuroscientists on the synaptic memory review (1, 2, 3); comments by Eliezer on my review of his book (on when to lift a ban, on the parallel scaling story)
21: Eliezer (X): the folk theory of economic bubbles says they’re bad for the economy because lots of money gets invested inefficiently into something which turns out to be useless. But this can’t be right, because the economy is doing fine while the bad investment is going on! It’s only afterwards, when people realize the investment was bad, that the economy starts to falter (cf. the Wile E. Coyote theory of gravity, where walking off a cliff is fine, but noticing that you walked off a cliff is ruinous). So what’s the real reason bubbles are bad? “Macroeconomic financial bullshit involving scary terms like ‘aggregate demand’ and concepts like ‘downward wage rigidity’”. Interested to know if orthodox economists agree.
Inline links: Eliezer (X)
38: Eliezer and Nate’s book If Anyone Builds It, Everyone Dies is now out and is an NYT bestseller. Authors’ Atlantic article here (paywalled). Online resources/FAQ/answers to objections here. My review here. Peter Wildeford’s review here. Mostly negative Asterisk review here, criticisms/arguments about the Asterisk review here, Eliezer’s response to this line of criticism here (X). I thought all the reviews, positive and negative, had something useful to say - except the NYT review, which was remarkably bad (Steven Adler points out that it accuses the book of failing to define the term “superintelligence”, but it very explicitly does that on page 4). I read Literary Substack sometimes, and I am so confused - it seems like there’s this entire ecosystem of Ivy graduates who spend years backstabbing each other in order to win the one bigshot publication book reviewer slot, and then the 1/1000 who reach this exalted position phone it in and don’t even read the books they’re reviewing.
Inline links: If Anyone Builds It, Everyone Dies, an NYT bestseller, here, here, here, here, here, here, here (X), was remarkably bad
1: Another charity fundraiser, this one for Lightcone Infrastructure. Lightcone is the group that does the hard work for many of the rationalist community resources you enjoy. You probably know them from the Less Wrong website and the Lighthaven campus. But did you know they also designed the websites for AI 2027, for Eliezer and Nate’s book, for AI Lab Watch, and (for some reason) for Deciding To Win, a renegade faction of Democrats who believe that, instead of supporting unpopular policies and losing, the party should support popular policies and win? And on the side, they play a big role in hosting ACX meetups, including letting us use their campus (if you’ve ever been to our Berkeley meetup location, that was them). They’re a rare intersection between “support effective altruist charities” and “support pillars of your your local community”. Donate here, or contact Oli if you have some kind of more complicated donation-related need.
Inline links: Lightcone Infrastructure, Less Wrong, Lighthaven, AI 2027, Eliezer and Nate’s book, AI Lab Watch, Deciding To Win, here
Backlinks
- ACX Grants ++: The Second Half
- AI 2027
- AI 2027 team
- Ajeya
- AlphaGo
- AlphaGo
- Arabia
- Bayes For Everyone
- Bayes’ Theorem
- Biological Anchors: A Trick That Might Or Might Not Work
- Book Review: If Anyone Builds It, Everyone Dies
- Books: E
- Brands
- CHAI, Assistance Games, And Fully-Updated Deference
- Chesterton’s Fence
- Claude
- Cocaine Bear
- Concepts: 0-9
- Concepts: A
- Concepts: B
- Concepts: F
- Concepts: G
- Concepts: H
- Concepts: L
- Concepts: M
- Concepts: P
- Concepts: S
- Concepts: T
- Deceptively Aligned Mesa-Optimizers: It’s Not Funny If I Have To Explain It
- Douglas Hofstadter
- Egan
- Eliezer Yudkowsky
- Erik Hoel
- Events: A
- Events: M
- Films
- Galileo
- Geoffrey Hinton
- Go
- GPT
- Greek miracle
- Gwern
- GWWC
- Half An Hour Before Dawn In San Francisco
- Highlights From The Comments On Missing School
- Hollywood
- If Anyone Builds It, Everyone Dies
- Jesse Singal
- Kieran Egan
- Links For April 2023
- Links For August 2023
- Links For February 2025
- Links For May 2023
- Links For October 2025
- Links For September 2025
- LK-99
- Manifest prediction market conference
- 23
- Maria
- Martin Luther
- Marx
- Matt Bruenig
- Matthew Barnett
- MCTS
- Miles Brundage
- MIRI
- Motivated Reasoning As Mis-applied Reinforcement Learning
- MR Tries The Safe Uncertainty Fallacy
- Nate Soares
- Oklahoma
- Oli
- Open Thread 205
- Open Thread 250
- Open Thread 383
- Open Thread 399
- Open Thread 413
- Organizations: A
- Organizations: E
- Organizations: G
- Organizations: M
- Organizations: P
- People: E
- People: G
- People: H
- People: I
- People: M
- People: R
- People: S
- PEPFAR
- Peter Wildeford
- Places: A
- Places: E
- Places: L
- Practically-A-Book Review: Yudkowsky Contra Ngo On Agents
- Publications: L
- Publications: T
- Publications: X
- Publications: Y
- Rafael Harth
- Ray Kurzweil
- RFK
- Richard Ngo
- Rob Miles
- Sebastian Jensen
- Shingles vaccine
- Spartacus
- Star Wars
- Steven Adler
- Terminal lucidity
- Turing Test
- UC Berkeley
- Why Do I Suck?
- XKCD
- Your Book Review: The Educated Mind
- Yudkowsky Contra Christiano On AI Takeoff Speeds
- Yudkowsky contra Cotra on biological anchors