OpenAI

Article

OpenAI is a recurring organization in the Astral Codex Ten archive, appearing 83 times across 83 issues between May 20, 2021 and April 06, 2026. The archive places it in contexts such as “OpenAI’s Jukebox , which is basically GPT-3 for music”; “a music generation algorithm produced by OpenAI”; “most of OpenAI’s top alignment researchers”. It most often appears alongside Anthropic, Google, China.

Metadata

  • Category: Organizations
  • Mention count: 83
  • Issue count: 83
  • First seen: May 20, 2021
  • Last seen: April 06, 2026

Appears In

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

May 20, 2021 · Original source
4: I’m very late here, but you might still enjoy OpenAI’s Jukebox, which is basically GPT-3 for music. Train it on Elvis, then make it write new songs on his style. Or feed it the first few verses of Never Gonna Give You Up and make it guess what the rest of the song sounds like. Or just have Celine Dion sing a song about being a music generation algorithm produced by OpenAI.
35: Recent news in local AI alignment research space: most of OpenAI’s top alignment researchers, including Dario Amodei, Chris Olah, Jack Clark, and Paul Christano, left en masse for poorly-understood reasons (see speculation here). Dario Amodei is now working with a new nonprofit called Cooperative AI Foundation. Paul Christiano will be founding his own nonprofit, the Alignment Research Center (conflict of interest notice: I know Paul and think he is generally great); see also his ask-me-anything thread on Less Wrong here. Unrelatedly, local secretive AI alignment research group MIRI (Machine Intelligence Research Institute) is leaving the Bay Area for some small town with affordable land prices where they can maybe build a campus (they’re still trying to decide exactly where).
August 06, 2021 · Original source
My personal estimates are more like 75% chance, 25% chance, and a distribution that peaks about 20 years later than this one. I think the Metaculus position is consistent with all of “this probably won’t happen”, “THIS IS SUPER-TERRIFYING”, “this is most likely far away”, and “BUT FOR ALL WE KNOW IT COULD BE TOMORROW!” I realize this is an annoying way for things to be. ————————————————— CraigMichael writes: >But all the AI regulation in the world won’t help us unless we humans resist the urge to spread misinformation to maximize clicks. Was with you up to this point. There are several solutions to this other than willpower (resisting the urge). The basic idea - change incentives so that while spreading misinformation is possible but substantially less desirable/lucrative than other options for online behaviors. This isn’t so hard to imagine. Say there’s a lot of incentives to earn money online doing creative or useful things. Like Mechanical Turk, but less route behavior and more performing a service or matching needs. Like I wish I had a help desk for English questions where the answers were good and not people posturing to look good to other people on the English Stack Exchange, for example. I would pay them per call or per minute or whatever. Totally unexplored market AFAIK because technology hasn’t been developed yet. Another idea - Give people more options to pay at an article-level for information that’s useful to them or to have related questions answered or something like that without needing a subscription or a bundle. Say there’s some article about anything and I want to contact the author and be like “hey, here’s a related question, I’m willing to offer you X dollars to answer.” The person says “I’ll do it for x+10 dollars.” One site used to unlock articles to the public after a threshold of Bitcoin have been donated on a PPV basis. It both incentives the author and had a positive externality. Everyone is so invested in ads that they don’t work on technology and ideas to create new markets. To paraphrase Jaron Lanier we need to make technology so good it seduces away from destroying ourselves. Partly I want to complain that obviously I was using the quoted sentence as a rhetorical device. But I guess the whole point of that sentence and its paragraph was to argue against saying false things as a rhetorical device, so - hoist on my own petard, I guess. I’m less optimistic than Craig is about this solution, because it seems to me that socially virtuous technology will always be less fun/addictive than nonvirtuous technology, simply because the virtuous technology has to hit two targets (virtuous, fun/addictive), the nonvirtuous technology only has to hit one target, and it’s easier to optimize for a target with zero other constraints than with one other constraint. See eg Meditations on Moloch. ————————————————— Souf asks: Is there a convincing argument that AGI is possible within any reasonable timeframe (like... 50 years), other than the intuitions of esteemed AI researchers? Do they have any way to back up their estimates (of some tens of percent), and why they shouldn't be millionths of a percent? It is, as another poster said, an "extraordinary claim." I'd like to see some extraordinary support of those particular numbers. If I had to answer this question, I would point to the sorts of work AI Impacts does, where they try to estimate how capable computers were in 1980, 1990, etc, draw a line to represent the speed at which computers are becoming more capable, figure out where humans are at the same metric, and check the time when that line crosses however capable you’ve decided humans are. This is obviously really hard because you have to operationalize some definition of “capable” or “intelligent” or some other word that is hard to operationalize, but when you do it you usually get sometime in the mid-21st century. You’re going to point out that this argument doesn’t really qualify as “convincing”. I admit it doesn’t meet trial-by-jury standards of evidence. So I guess my real answer would be “it’s the #$@&ing prior”. Like, you certainly don’t have knock-down evidence that it’s impossible, I don’t have a knock-down evidence that it’s certain, so it might happen and it might not. How “might” are we talking? I don’t know, it would seem weird if this quickly-advancing technology being researched by incredibly smart people with billions of dollars in research funding from lots of megacorporations just reached some point and then stopped. Okay, fine, maybe it will keep advancing at the same rate, how fast is that in terms of time-to-AGI? Now we’re back at AI Impacts drawing lines again. The stupidest possible prior is always 50-50. We would have to be very stupid people to use the stupidest possible prior. But here we are. I wouldn’t want to give a 50-50 chance of us inventing FTL travel by 2100, because FTL travel seems physically impossible. I wouldn’t want to give a 50-50 chance of us inventing slower-than-light-but-still-pretty-good starships by 2100, because, I dunno, space travel isn’t advancing that fast and nobody is really working on it that hard. For AI, I don’t know, I kinda want to say 50-50. If I were going to try to update away from 50-50, I would want to look at AI Impacts style line graphs, expert opinion, and prediction markets. All of those seem to make me update up instead of down, so I don’t think I would go lower than 50-50. But there’s enough Knightian uncertainty to make an entire Round Table here, so who knows? Hardly a “convincing” argument, but I’m just trying to avoid the McAfee Fallacy: ————————————————— Souf continues: The argument that we are "in the middle of a period of extremely rapid progress in AI research, when barrier after barrier is being breached" makes it seem like all AI "progress" is on some sort of line that ends in AGI. That feels like sleight-of-hand. Even Scott himself refers to AGI here as a "new class of actor," so I'm failing to see how current lines of "progress" will indubitably result the emergence of something completely novel and different? Lots of smart people disagree with me on this one, but I think the path from here to AGI is pretty straight. I mean, it will take thousands of people who are all much smarter than I am to do it, but it’ll happen. My argument is something like - human brains are remarkably similar to rat brains, only much bigger. They’re still a little similar to insect brains. It looks like if you have a basic functioning brain, and you scale it up, it gets human intelligence. Existing AIs like AlphaGo or GPT seem to be basically a blob of learning-ability, a plan for pointing the blob at a specific problem, and lots and lots of training data. I think the past five years have shown that this basic model generalizes really well. OpenAI’s programs can now write essays, compose music, and generate pictures, not because they had three parallel amazing teams working on writing/music/art AIs, but because they took a blob of learning ability and figured out how to direct it at writing/music/art, and they were able to get giant digital corpuses of text / music / pictures to train it. DeepMind is finding that it can win lots of games, from Go to StarCraft to obstacle courses in simulated environments, by pointing a blob of learning-ability at the game and making it play against itself a zillion times (ie generate its own training data). My impression is that human/rat/insect brains are a blob of learning-ability which the rest of the nervous system successfully points at the world, and especially at aspects of the world that the organism needs to pay attention to (eg food sources, sex, etc). This isn’t exactly right, there are a few genetically-encoded programs, but not that many and it’s pretty hard. Right now I think our main advantages over AI systems are something like: our nervous system is pretty good at pointing us at the world and extracting training data from it. If you wanted an AI that learned being-in-the-world skills as well as we do, it would have to have an amazing robot body, and right now robot bodies aren’t that amazing.
November 01, 2021 · Original source
GPT Codex is an AI that auto-completes code for programmers. You can see a really amazing and/or rigged demo here:
Programmers who have worked with it are really impressed, but also say it’s not quite ready for real jobs. Certainly OpenAI would like to make it ready as soon as possible.
January 19, 2022 · Original source
The story thus far: AI safety, which started as the hobbyhorse of a few weird transhumanists in the early 2000s, has grown into a medium-sized respectable field. OpenAI, the people responsible for GPT-3 and other marvels, have a safety team. So do DeepMind, the people responsible for AlphaGo, AlphaFold, and AlphaWorldConquest (last one as yet unreleased). So do Stanford, Cambridge, UC Berkeley, etc, etc. Thanks to donations from people like Elon Musk and Dustin Moskowitz, everyone involved is contentedly flush with cash. They all report making slow but encouraging progress.
I've been trying to trudge through them and I figure I might as well blog about the ones I've finished. The first of these is Eliezer's talk with Richard Ngo, of OpenAI's Futures team. You can find the full transcript here, though be warned: it is very long.
February 23, 2022 · Original source
Ajeya Cotra is a senior research analyst at OpenPhil. She's assisted by her fiancee Paul Christiano (compsci PhD, OpenAI veteran, runs an AI alignment nonprofit) and to a lesser degree by other leading lights. Although not everyone involved has formal ML training, if you care a lot about whether efforts are “establishment” or “contrarian”, this one is probably more establishment.
It looks like this (source) So why don’t we have AI yet? Why don’t we have ten AIs? In the modern paradigm of machine learning, it takes very big computers to train relatively small end-product AIs. If you tried to train GPT-3 on the same kind of medium-sized computers you run it on, it would take between tens and hundreds of years. Instead, you train GPT-3 on giant supercomputers like the ones above, get results in a few months, then run it on medium-sized computers, maybe ~10x better than the average desktop. But our hypothetical future human-level AI is 10^16 FLOP/S in inference mode. It needs to run on a giant supercomputer like the one in the picture. Nothing we have now could even begin to train it. There’s no direct and obvious way to convert inference requirements to training requirements. Ajeya tries assuming that each parameter will contribute about 10 FLOPs, which would mean the model would have about 10^15 parameters (GPT-3 has about 10^11 parameters). Finally, she uses some empirical scaling laws derived from looking at past machine learning projects to estimate that training 10^15 parameters would require H*10^30 FLOPs, where H represents the model’s “horizon”. If I understand this correctly, “horizon” is a reinforcement learning concept: how long does it take to learn how much reward you got for something? If you’re playing a slot machine, the answer is one second. If you’re starting a company, the answer might be ten years. So what horizon do you need for human level AI? Who knows? It probably depends on what human-level task you want the AI to do, plus how well an AI can learn to do that task from things less complex than the entire task. If writing a good book is mostly about learning to write good sentence and then stringing them together, a book-writing AI can get away with a short horizon. If nothing short of writing an entire book and then evaluating it to see whether it is good or bad can possibly teach you book-writing, the AI will need a long time horizon. Ajeya doesn’t claim to have a great answer for this, and considers three models: horizons of a few minutes, a few hours, and a few years. Each step up adds another three orders of magnitude, so she ends up with three estimates of 10^30, 10^33, and 10^36 FLOPs. (for reference, the lowest training estimate - 10^30 - would take the supercomputer pictured above 300,000 years to complete; the highest, 300 billion.) Or What If We Ignore All Of That And Do Something Else? This is piling a lot of assumptions atop each other, so Ajeya tries three other methods of figuring out how hard this training task is. Humans seem to be human-level AIs. How much training do we need? You can analogize our childhood to an AI’s training period. We receive a stream of sense-data. We start out flailing kind of randomly. Some of what we do gets rewarded. Some of what we do gets punished. Eventually our behavior becomes more sophisticated. We subject our new behavior to reward or punishment, fine-tune it further. Rent asks us: how do you measure the life of a woman or man? It answers: “in daylights, in sunsets, in midnights, in cups of coffee; in inches, in miles, in laughter, in strife.” But you can also measure in floating point operations, in which case the answer is about 10^24. This is actually trivial: multiply the 10^15 FLOP/S of the human brain by the ~10^9 seconds of childhood and adolescence. This new estimate of 10^24 is much lower than our neural net estimate of 10^30 - 10^36 above. In fact, it’s only a hair above the amount it took to train GPT-3! If human-level AI was this easy, we should have hit it by accident sometime in the process of making a GPT-4 prototype. Since OpenAI hasn’t mentioned this, probably it’s harder than this and we’re missing something. Probably we’re missing that humans aren’t blank slates. We don’t start at zero and then only use our childhood to train us further. The very structure of our brain encodes certain assumptions about what kinds of data we should be looking out for and how we should use it. Our training data isn’t just what we observed during childhood, it’s everything that any of our ancestors observed during evolution. How many floating-point operations is the evolutionary process? Ajeya estimates 10^41. I can’t believe I’m writing this. I can’t believe someone actually estimated the number of floating point operations involved in jellyfish rising out of the primordial ooze and eventually becoming fish and lizards and mammals and so on all the way to the Ascent of Man. Still, the idea is simple. You estimate how long animals with neurons have been around for (10^16 seconds), total number of animals at any given second (10^20) times average number of FLOPS per animal (10^5) and you can read more here but it comes out to 10^41 FLOs. I would not call this an exact estimate - for one thing, it assumes that all animals are nematodes, on the grounds that non-nematode animals are basically a rounding error in the grand scheme of things. But it does justify this bizarre assumption, and I don’t feel inclined to split hairs here - surely the total amount of computation performed by evolution is irrelevant except as an extreme upper bound? Surely the part where Australia got all those weird marsupials wasn’t strictly necessary for the human brain to have human-level intelligence? One more weird human training data estimate attempt: what about the genome? If in some sense a bit of information in the genome is a “parameter”, how many parameters does that suggest humans have, and how does it affect training time? Ajeya calculates that the genome has about 7.5x10^8 parameters (compared to 10^15 parameters in our neural net calculation, and 10^11 for GPT-3). So we can… Okay, I’ve got to admit, this doesn’t have quite the same “huh?!” factor as trying to calculate the number of FLOs in evolution, but it is in a lot of ways even crazier. The Japanese canopy plant has a genome fifty times larger than ours, which suggests that genome size doesn’t correspond very well to organism awesomeness. Also, most of the genome is coding for weird proteins that stabilize the shape of your kidney tubule or something, why should this matter for intelligence? The Japanese canopy plant. I think it is very pretty, but probably low prettiness per megabyte of DNA. I think Ajeya would answer that she’s debating orders of magnitude here, and each of these weird things costs only a few OOMs and probably they all even out. That still leaves the question of why she thinks this approach is interesting at all, to which she answers that: The motivating intuition is that evolution performed a search over a space of small, compact genomes which coded for large brains rather than directly searching over the much larger space of all possible large brains, and human researchers may be able to compete with evolution on this axis. So maybe instead of having to figure out how to generate a brain per se, you figure out how to generate some short(er) program that can output a brain? But this would be very different from how ML works now. Also, you need to give each short program the chance to unfold into a brain before you can evaluate it, which evolution has time for but we probably don’t. Ajeya sort of mentions these problems and counters with an argument that maybe you could think of the genome as a reinforcement learner with a long horizon. I don’t quite follow this but it sounds like the sort of thing that almost might make sense. Anyway, when you apply the scaling laws to a 7.5*10^8 parameter genome and penalize it for a long horizon, you get about 10^33 FLOPs, which is weirdly similar to some of the other estimates. So now we have six different training cost estimates. First, neural nets with short, medium, and long horizons, which are 10^30, 10^33, and 10^36 FLOPs, respectively. Next, the amount of training data in a human lifetime - 10^24 FLOs - and in all of evolutionary history - 10^41 FLOPs. And finally, this weird genome thing, which is 10^33 FLOPs. An optimist might say “Well, our lowest estimate is 10^24 FLOPs, our highest is 10^41 FLOPs, those sound like kind of similar numbers, at least there’s no “5 FLOPs” or “10^9999 FLOPs” in there. A pessimist might say “The difference between 10^24 and 10^41 is seventeen orders of magnitude, ie a factor of 100,000,000,000,000,000 times. This barely constrains our expectations at all!” Before we decide who to trust, let’s remember that we’re still only at Step 2 of our eight step Methodology, and continue. How Do We Adjust For Algorithmic Progress? So today, in 2022 (or in 2020 when this was written, or whenever), assume it would take about 10^33 FLOs to train a human-level AI. But technology constantly advances. Maybe we’ll discover ways to train AIs faster, or run AIs more efficiently, or something like that. How does that factor into our estimate? Ajeya draws on Hernandez & Brown’s Measuring The Algorithmic Efficiency Of Neural Networks. They look at how many FLOPs it took to train various image recognition AIs to an equivalent level of performance between 2012 and 2019, and find that over those seven years it decreased by a factor of 44x, ie training efficiency doubles every sixteen months! Ajeya assumes a doubling time slightly longer than that, because it’s easier to make progress in simple well-understood fields like image recognition than in the novel task of human-level AI. She chooses a doubling time of “merely” 2 - 3 years. If training efficiency doubles every 2-3 years, it would dectuple in about 10 years. So although it might take 10^33 FLOPs to train a human level AI today, in ten years or so it may take only 10^32, in twenty years 10^31, and so on. When Will Anyone Have Enough Computational Resources To Train A Human-Level AI? In 2020, AI researchers could buy computational resources at about $1 for 10^17 FLOPs. That means the 10^33 FLOPs you’d need to train a human-level AI would cost $10^16, ie ten quadrillion dollars. This is about twenty times more money than exists in the entire world. But compute costs fall quickly. Some formulations of Moore’s Law suggest it halves every eighteen months. These no longer seem to hold exactly, but it does seem to be halving maybe once every 2.5 years. The exact number is kind of controversial: Ajeya admits it’s been more like once every 3-4 years lately, but she heard good things about some upcoming chips and predicted it might revert back to the longer-term faster trend (it’s been two years now, some new chips have come out, and this prediction is looking pretty good). So as time goes on, algorithmic progress will cut the cost of training (in FLOPs), and hardware progress will also cut the cost of FLOPs (in dollars). So training will become gradually more affordable as time goes on. Once it reaches a cost somebody is willing to pay, they’ll buy human-level AI, and then that will be the year human-level AI happens. What is the cost that somebody (company? government? billionaire?) is willing to pay for human-level AI? The most expensive AI training in history was AlphaStar, a DeepMind project that spent over $1 million to train an AI to play StarCraft (in their defense, it won). But people have been pouring more and more money into AI lately: Source here. This is about compute rather than cost, but most of the increase seen here has been companies willing to pay for more compute over time, rather than algorithmic or hardware progress. The StarCraft AI was kind of a vanity project, or science for science’s sake, or whatever you want to call it. But AI is starting to become profitable, and human-level AI would be very profitable. Who knows how much companies will be willing to pay in the future? Ajeya extrapolates the line on the graph forward to 2025 and gets $1 billion. This is starting to sound kind of absurd - the entire company OpenAI was founded with $1 billion in venture capital, it seems like a lot to expect them to spend more than $1 billion on a single training run. So Ajeya backs off from this after 2025 and predicts a “two year doubling time”. This is not much of a concession. It still means that in 2040 someone might be spending $100 billion to train one AI. Is this at all plausible? At the height of the Manhattan Project, the US was investing about 0.5% of its GDP into the effort; a similar investment today would be worth $100 billion. And we’re about twice as rich as 2000, so 2040 might be twice as rich as we are. At that point, $100 billion for training an AI is within reach of Google and maybe a few individual billionaires (though it would still require most or all of their fortune). Ajeya creates a complicated function to assess how much money people will be willing to pay on giant AI projects per year. This looks like an upward-sloping curve. The line representing the likely cost of training a human-level AI looks like a downward sloping curve. At some point, those two curves meet, representing when human-level AI will first be trained. So When Will We Get Human-Level AI? The report gives a long distribution of dates based on weights assigned to the six different models, each of which has really wide confidence intervals and options for adjusting the mean and variance based on your assumptions. But the median of all of that is 10% chance by 2031, 50% chance by 2052, and almost 80% chance by 2100. Ajeya takes her six models and decides to weigh them like so, based on how plausible she thinks each one is: 20% neural net, short horizon 30% neural net, medium horizon 15% neural net, long horizon 5% human lifetime as training data 10% evolutionary history as training data 10% genome as parameter number She ends up with this: How Sensitive Is This To Changes In Assumptions? She very helpfully gives us a Colab notebook and Google spreadsheet to play around with. The notebook lets you change some of the more detailed parameters of the individual models, and the spreadsheet lets you change the big picture. I leave the notebook to people more dedicated to forecasting than I am, and will talk about the spreadsheet here. If you’re following along at home, the default spreadsheet won’t reflect Ajeya’s findings until you fill in the table in the bottom left like so: Great. Now that we’ve got that, let’s try changing some stuff. I like the human childhood training data argument (Lifetime Anchor) more than Ajeya does, and I like the size-of-the-genome argument less. I’m going to change the weights to 20-20-0-20-20-20. Also, Ajeya thinks that someone might be willing to spend 1% of national GDP on training AIs, but that sounds really high to me, so I’m going to down to 0.1%. Also, Ajeya’s estimate of 3% GDP growth sounds high for the sort of industrialized nations who might do AI research, I’m going to lower it to 2%. Since I’m feeling mistrustful today, let’s use the Hernandez&Brown estimate for compute halving (1.5 years) in place of Ajeya’s ad hoc adjustments. And let’s use the current compute halving time (3.5 years) instead of Ajeya’s overly rosy version (2.5 years). All these changes… …don’t really do much. The median goes from 2052 to about 2065. Four of the models give results between 2030 and 2070. The last two, Neural Net With Long Horizon and Evolution, suggest probably no AI this century (although Neural Net With Long Horizon does think there’s a 40% chance by 2100). Ajeya doesn’t really like either of these models and they’re not heavily weighted in her main result. Does The Truth Point To Itself? Back up a second. Here’s something that makes me kind of nervous. Most of Ajeya’s numbers are kind of made up, with several order-of-magnitude error bars and simplifying assumptions like “all animals are nematodes”. For a single parameter, we get estimates spanning seventeen different orders of magnitude: the upper bound is one hundred quadrillion times the lower bound. And yet four of the six models, including two genuinely exotic ones, manage to get dates within twenty years of 2050. And 2050 is also the date everyone else focuses on. Here’s the prediction-market-like site Metaculus: Their distribution looks a lot like Ajeya’s, and even has the same median, 2052 (though forecasters could have read Ajeya’s report). Katja Grace et al surveyed 352 AI experts, and they gave a median estimate of 2062 for an AI that could “outperform humans at all tasks” (though with many caveats and high sensitivity to question framing). This was before Ajeya’s report, so they definitely didn’t read it. So lots of Ajeya’s different methods and lots of other people presumably using different methodologies or no methodology at all, all converge on this same idea of 2050 give or take a decade or two. An optimist might say “The truth points to itself! There are 371 known proofs of the Pythagorean Theorem, and they all end up in the same place. That’s because no matter what methodology you use, if you use it well enough you get to the correct answer.” A pessimist might be more suspicious; we’ll return to this part later. FLOPS Alone Turn The Wheel Of History One more question: what if this is all bullshit? What if it’s an utterly useless total garbage steaming pile of grade A crap? Imagine a scientist in Victorian Britain, speculating on when humankind might invent ships that travel through space. He finds a natural anchor: the moon travels through space! He can observe things about the moon: for example, it is 220 miles in diameter (give or take an order of magnitude). So when humankind invents ships that are 220 miles in diameter, they can travel through space! Ships have certainly grown in size tremendously, from primitive kayaks to Roman triremes to Spanish galleons to the great ocean liners of the (Victorian) present. The AI forecasting organization AI Impacts actually has a whole report on historical ship size trends to prove an unrelated point about technological progress, so I didn’t even have to make this graph up. Suppose our Victorian scientist lived in 1858, right when the Great Eastern was launched. The trend line for ship size crossed 100m around 1843, and 200m in 1858, so doubling time is 15 years - but perhaps they notice this is going to be an outlier, so let’s round up a bit and say 18 years. The (one order of magnitude off estimate for the size of the) Moon is 350,000m, so you’d need ships to scale up by 350,000/200 = 1,750x before they’re as big as the Moon. That’s about 10.8 doublings, and a doubling time is 18 years, so we’ll get spaceships in . . . 2052 exactly. (fudging numbers to land where you want is actually fun and easy) SS Great Eastern, the extreme outlier large steamship from 1858. This has become sort of a mascot for quantitative technological progress forecasters. What is this scientist’s error? The big one is thinking that spaceship progress depends on some easily-measured quantity (size) instead of on fundamental advances (eg figuring out how rockets work). You can make the same accusation against Ajeya et al: you can have all the FLOPs in the world, but if you don’t understand how to make a machine think, your AI will be, well, a flop. Ajeya discusses this a bit on page 143 of her report. There is some sense in which FLOPs and knowing-what-you’re-doing trade of against each other. If you have literally no idea what you’re doing, you can sort of kind of re-run evolution until it comes up with something that looks good. If things are somehow even worse than that, you could always run AIXI, a hypothetical AI design guaranteed to get excellent results as long as you have infinite computation. You could run a Go engine by searching the entire branching tree structure of Go - you shouldn’t, and it would take a zillion times more compute than exists in the entire world, but you could. So in some sense what you’re doing, when you’re figuring out what you’re doing, is coming up with ways to do already-possible things more efficiently. But that’s just algorithmic progress, which Ajeya has already baked into her model. (our Victorian scientist: “As a reductio ad absurdum, you could always stand the ship on its end, and then climb up it to reach space. We’re just trying to make ships that are more efficient than that.”) Part II: Biology-Inspired AI Timelines: The Trick That Never Works Eliezer Yudkowsky presents a more subtle version of these kinds of objection in an essay called Biology-Inspired AI Timelines: The Trick That Never Works, published December 2021. Ajeya’s report is a 169-page collection of equations, graphs, and modeling assumptions. Yudkowsky’s rebuttal is a fictional dialogue between himself, younger versions of himself, famous AI scientists, and other bit players. At one point, a character called “Humbali” shows up begging Yudkowsky to be more humble, and Yudkowsky defeats him with devastating counterarguments. Still, he did found the field, so I guess everyone has to listen to him. He starts: in 1988, famous AI scientist Hans Moravec predicted human-level AI by 2010. He was using the same methodology as Ajeya: extrapolate how quickly processing power would grow (in FLOP/S), and see when it would match some estimate of the human brain. Moravec got the processing power almost exactly right (it hit his 2010 projection in 2008) and his human brain estimate pretty close (he says 10^13 FLOP/S, Ajeya says 10^15, this 2 OOM difference only delays things a few years), yet there was not human-level AI in 2010. What happened? Ajeya's answer could be: Moravec didn't realize that, in the modern ML paradigm, any given size of program requires a much bigger program to train. Ajeya, who has a 35-year advantage on Moravec, estimates approximately the same power for the finished program (10^16 vs. 10^13 FLOP/S) but says that training the 10^16 FLOP/S program will require 10^33ish FLOPs. Eliezer agrees as far as it goes, but says this points to a much deeper failure mode, which was that Moravec had no idea what he was doing. He was assuming processing power of human brain = processing power of computer necessary for AGI. Why? The human brain consumes around 20 watts of power. Can we thereby conclude that an AGI should consume around 20 watts of power, and that, when technology advances to the point of being able to supply around 20 watts of power to computers, we'll get AGI? […] You say that AIs consume energy in a very different way from brains? Well, they'll also consume computations in a very different way from brains! The only difference between these two cases is that you know something about how humans eat food and break it down in their stomachs and convert it into ATP that gets consumed by neurons to pump ions back out of dendrites and axons, while computer chips consume electricity whose flow gets interrupted by transistors to transmit information. Since you know anything whatsoever about how AGIs and humans consume energy, you can see that the consumption is so vastly different as to obviate all comparisons entirely. You are ignorant of how the brain consumes computation, you are ignorant of how the first AGIs built would consume computation, but "an unknown key does not open an unknown lock" and these two ignorant distributions should not assert much internal correlation between them. Cars don’t move by contracting their leg muscles and planes don’t fly by flapping their wings like birds. Telescopes do form images the same way as the lenses in our eyes, but differ by so many orders of magnitude in every important way that they defy comparison. Why should AI be different? You have to use some specific algorithm when you’re creating AI; why should we expect it to be anywhere near the same efficiency as the ones Nature uses in our brains? The same is true for arguments from evolution, eg Ajeya’s Evolutionary Anchor, ie “it took evolution 10^43 FLOPs of computation to evolve the human brain so maybe that will be the training cost”. AI scientists sitting in labs trying to figure things out, and nematodes getting eaten by other nematodes, are such different methods for designing things that it’s crazy to use one as an estimate for the other. Algorithmic Progress vs. Algorithmic Paradigm Shifts This post is a dialogue, so (Eliezer’s hypothetical model of) OpenPhil gets a chance to respond. They object: this is why we put a term for algorithmic progress in our model. The model isn’t very sensitive to changes in that term. If you want you can set it to some kind of crazy high value and see what happens, but you can’t say we didn’t consider it. OpenPhil: We did already consider that and try to take it into account: our model already includes a parameter for how algorithmic progress reduces hardware requirements. It's not easy to graph as exactly as Moore's Law, as you say, but our best-guess estimate is that compute costs halve every 2-3 years […] Eliezer: The makers of AGI aren't going to be doing 10,000,000,000,000 rounds of gradient descent, on entire brain-sized 300,000,000,000,000-parameter models, algorithmically faster than today. They're going to get to AGI via some route that you don't know how to take, at least if it happens in 2040. If it happens in 2025, it may be via a route that some modern researchers do know how to take, but in this case, of course, your model was also wrong. They're not going to be taking your default-imagined approach algorithmically faster, they're going to be taking an algorithmically different approach that eats computing power in a different way than you imagine it being consumed. OpenPhil: Shouldn't that just be folded into our estimate of how the computation required to accomplish a fixed task decreases by half every 2-3 years due to better algorithms? Eliezer: Backtesting this viewpoint on the previous history of computer science, it seems to me to assert that it should be possible to: Train a pre-Transformer RNN/CNN-based model, not using any other techniques invented after 2017, to GPT-2 levels of performance, using only around 2x as much compute as GPT-2;
Source here. This is about compute rather than cost, but most of the increase seen here has been companies willing to pay for more compute over time, rather than algorithmic or hardware progress. The StarCraft AI was kind of a vanity project, or science for science’s sake, or whatever you want to call it. But AI is starting to become profitable, and human-level AI would be very profitable. Who knows how much companies will be willing to pay in the future? Ajeya extrapolates the line on the graph forward to 2025 and gets $1 billion. This is starting to sound kind of absurd - the entire company OpenAI was founded with $1 billion in venture capital, it seems like a lot to expect them to spend more than $1 billion on a single training run. So Ajeya backs off from this after 2025 and predicts a “two year doubling time”. This is not much of a concession. It still means that in 2040 someone might be spending $100 billion to train one AI. Is this at all plausible? At the height of the Manhattan Project, the US was investing about 0.5% of its GDP into the effort; a similar investment today would be worth $100 billion. And we’re about twice as rich as 2000, so 2040 might be twice as rich as we are. At that point, $100 billion for training an AI is within reach of Google and maybe a few individual billionaires (though it would still require most or all of their fortune). Ajeya creates a complicated function to assess how much money people will be willing to pay on giant AI projects per year. This looks like an upward-sloping curve. The line representing the likely cost of training a human-level AI looks like a downward sloping curve. At some point, those two curves meet, representing when human-level AI will first be trained. So When Will We Get Human-Level AI? The report gives a long distribution of dates based on weights assigned to the six different models, each of which has really wide confidence intervals and options for adjusting the mean and variance based on your assumptions. But the median of all of that is 10% chance by 2031, 50% chance by 2052, and almost 80% chance by 2100. Ajeya takes her six models and decides to weigh them like so, based on how plausible she thinks each one is: 20% neural net, short horizon 30% neural net, medium horizon 15% neural net, long horizon 5% human lifetime as training data 10% evolutionary history as training data 10% genome as parameter number She ends up with this: How Sensitive Is This To Changes In Assumptions? She very helpfully gives us a Colab notebook and Google spreadsheet to play around with. The notebook lets you change some of the more detailed parameters of the individual models, and the spreadsheet lets you change the big picture. I leave the notebook to people more dedicated to forecasting than I am, and will talk about the spreadsheet here. If you’re following along at home, the default spreadsheet won’t reflect Ajeya’s findings until you fill in the table in the bottom left like so: Great. Now that we’ve got that, let’s try changing some stuff. I like the human childhood training data argument (Lifetime Anchor) more than Ajeya does, and I like the size-of-the-genome argument less. I’m going to change the weights to 20-20-0-20-20-20. Also, Ajeya thinks that someone might be willing to spend 1% of national GDP on training AIs, but that sounds really high to me, so I’m going to down to 0.1%. Also, Ajeya’s estimate of 3% GDP growth sounds high for the sort of industrialized nations who might do AI research, I’m going to lower it to 2%. Since I’m feeling mistrustful today, let’s use the Hernandez&Brown estimate for compute halving (1.5 years) in place of Ajeya’s ad hoc adjustments. And let’s use the current compute halving time (3.5 years) instead of Ajeya’s overly rosy version (2.5 years). All these changes… …don’t really do much. The median goes from 2052 to about 2065. Four of the models give results between 2030 and 2070. The last two, Neural Net With Long Horizon and Evolution, suggest probably no AI this century (although Neural Net With Long Horizon does think there’s a 40% chance by 2100). Ajeya doesn’t really like either of these models and they’re not heavily weighted in her main result. Does The Truth Point To Itself? Back up a second. Here’s something that makes me kind of nervous. Most of Ajeya’s numbers are kind of made up, with several order-of-magnitude error bars and simplifying assumptions like “all animals are nematodes”. For a single parameter, we get estimates spanning seventeen different orders of magnitude: the upper bound is one hundred quadrillion times the lower bound. And yet four of the six models, including two genuinely exotic ones, manage to get dates within twenty years of 2050. And 2050 is also the date everyone else focuses on. Here’s the prediction-market-like site Metaculus: Their distribution looks a lot like Ajeya’s, and even has the same median, 2052 (though forecasters could have read Ajeya’s report). Katja Grace et al surveyed 352 AI experts, and they gave a median estimate of 2062 for an AI that could “outperform humans at all tasks” (though with many caveats and high sensitivity to question framing). This was before Ajeya’s report, so they definitely didn’t read it. So lots of Ajeya’s different methods and lots of other people presumably using different methodologies or no methodology at all, all converge on this same idea of 2050 give or take a decade or two. An optimist might say “The truth points to itself! There are 371 known proofs of the Pythagorean Theorem, and they all end up in the same place. That’s because no matter what methodology you use, if you use it well enough you get to the correct answer.” A pessimist might be more suspicious; we’ll return to this part later. FLOPS Alone Turn The Wheel Of History One more question: what if this is all bullshit? What if it’s an utterly useless total garbage steaming pile of grade A crap? Imagine a scientist in Victorian Britain, speculating on when humankind might invent ships that travel through space. He finds a natural anchor: the moon travels through space! He can observe things about the moon: for example, it is 220 miles in diameter (give or take an order of magnitude). So when humankind invents ships that are 220 miles in diameter, they can travel through space! Ships have certainly grown in size tremendously, from primitive kayaks to Roman triremes to Spanish galleons to the great ocean liners of the (Victorian) present. The AI forecasting organization AI Impacts actually has a whole report on historical ship size trends to prove an unrelated point about technological progress, so I didn’t even have to make this graph up. Suppose our Victorian scientist lived in 1858, right when the Great Eastern was launched. The trend line for ship size crossed 100m around 1843, and 200m in 1858, so doubling time is 15 years - but perhaps they notice this is going to be an outlier, so let’s round up a bit and say 18 years. The (one order of magnitude off estimate for the size of the) Moon is 350,000m, so you’d need ships to scale up by 350,000/200 = 1,750x before they’re as big as the Moon. That’s about 10.8 doublings, and a doubling time is 18 years, so we’ll get spaceships in . . . 2052 exactly. (fudging numbers to land where you want is actually fun and easy) SS Great Eastern, the extreme outlier large steamship from 1858. This has become sort of a mascot for quantitative technological progress forecasters. What is this scientist’s error? The big one is thinking that spaceship progress depends on some easily-measured quantity (size) instead of on fundamental advances (eg figuring out how rockets work). You can make the same accusation against Ajeya et al: you can have all the FLOPs in the world, but if you don’t understand how to make a machine think, your AI will be, well, a flop. Ajeya discusses this a bit on page 143 of her report. There is some sense in which FLOPs and knowing-what-you’re-doing trade of against each other. If you have literally no idea what you’re doing, you can sort of kind of re-run evolution until it comes up with something that looks good. If things are somehow even worse than that, you could always run AIXI, a hypothetical AI design guaranteed to get excellent results as long as you have infinite computation. You could run a Go engine by searching the entire branching tree structure of Go - you shouldn’t, and it would take a zillion times more compute than exists in the entire world, but you could. So in some sense what you’re doing, when you’re figuring out what you’re doing, is coming up with ways to do already-possible things more efficiently. But that’s just algorithmic progress, which Ajeya has already baked into her model. (our Victorian scientist: “As a reductio ad absurdum, you could always stand the ship on its end, and then climb up it to reach space. We’re just trying to make ships that are more efficient than that.”) Part II: Biology-Inspired AI Timelines: The Trick That Never Works Eliezer Yudkowsky presents a more subtle version of these kinds of objection in an essay called Biology-Inspired AI Timelines: The Trick That Never Works, published December 2021. Ajeya’s report is a 169-page collection of equations, graphs, and modeling assumptions. Yudkowsky’s rebuttal is a fictional dialogue between himself, younger versions of himself, famous AI scientists, and other bit players. At one point, a character called “Humbali” shows up begging Yudkowsky to be more humble, and Yudkowsky defeats him with devastating counterarguments. Still, he did found the field, so I guess everyone has to listen to him. He starts: in 1988, famous AI scientist Hans Moravec predicted human-level AI by 2010. He was using the same methodology as Ajeya: extrapolate how quickly processing power would grow (in FLOP/S), and see when it would match some estimate of the human brain. Moravec got the processing power almost exactly right (it hit his 2010 projection in 2008) and his human brain estimate pretty close (he says 10^13 FLOP/S, Ajeya says 10^15, this 2 OOM difference only delays things a few years), yet there was not human-level AI in 2010. What happened? Ajeya's answer could be: Moravec didn't realize that, in the modern ML paradigm, any given size of program requires a much bigger program to train. Ajeya, who has a 35-year advantage on Moravec, estimates approximately the same power for the finished program (10^16 vs. 10^13 FLOP/S) but says that training the 10^16 FLOP/S program will require 10^33ish FLOPs. Eliezer agrees as far as it goes, but says this points to a much deeper failure mode, which was that Moravec had no idea what he was doing. He was assuming processing power of human brain = processing power of computer necessary for AGI. Why? The human brain consumes around 20 watts of power. Can we thereby conclude that an AGI should consume around 20 watts of power, and that, when technology advances to the point of being able to supply around 20 watts of power to computers, we'll get AGI? […] You say that AIs consume energy in a very different way from brains? Well, they'll also consume computations in a very different way from brains! The only difference between these two cases is that you know something about how humans eat food and break it down in their stomachs and convert it into ATP that gets consumed by neurons to pump ions back out of dendrites and axons, while computer chips consume electricity whose flow gets interrupted by transistors to transmit information. Since you know anything whatsoever about how AGIs and humans consume energy, you can see that the consumption is so vastly different as to obviate all comparisons entirely. You are ignorant of how the brain consumes computation, you are ignorant of how the first AGIs built would consume computation, but "an unknown key does not open an unknown lock" and these two ignorant distributions should not assert much internal correlation between them. Cars don’t move by contracting their leg muscles and planes don’t fly by flapping their wings like birds. Telescopes do form images the same way as the lenses in our eyes, but differ by so many orders of magnitude in every important way that they defy comparison. Why should AI be different? You have to use some specific algorithm when you’re creating AI; why should we expect it to be anywhere near the same efficiency as the ones Nature uses in our brains? The same is true for arguments from evolution, eg Ajeya’s Evolutionary Anchor, ie “it took evolution 10^43 FLOPs of computation to evolve the human brain so maybe that will be the training cost”. AI scientists sitting in labs trying to figure things out, and nematodes getting eaten by other nematodes, are such different methods for designing things that it’s crazy to use one as an estimate for the other. Algorithmic Progress vs. Algorithmic Paradigm Shifts This post is a dialogue, so (Eliezer’s hypothetical model of) OpenPhil gets a chance to respond. They object: this is why we put a term for algorithmic progress in our model. The model isn’t very sensitive to changes in that term. If you want you can set it to some kind of crazy high value and see what happens, but you can’t say we didn’t consider it. OpenPhil: We did already consider that and try to take it into account: our model already includes a parameter for how algorithmic progress reduces hardware requirements. It's not easy to graph as exactly as Moore's Law, as you say, but our best-guess estimate is that compute costs halve every 2-3 years […] Eliezer: The makers of AGI aren't going to be doing 10,000,000,000,000 rounds of gradient descent, on entire brain-sized 300,000,000,000,000-parameter models, algorithmically faster than today. They're going to get to AGI via some route that you don't know how to take, at least if it happens in 2040. If it happens in 2025, it may be via a route that some modern researchers do know how to take, but in this case, of course, your model was also wrong. They're not going to be taking your default-imagined approach algorithmically faster, they're going to be taking an algorithmically different approach that eats computing power in a different way than you imagine it being consumed. OpenPhil: Shouldn't that just be folded into our estimate of how the computation required to accomplish a fixed task decreases by half every 2-3 years due to better algorithms? Eliezer: Backtesting this viewpoint on the previous history of computer science, it seems to me to assert that it should be possible to: Train a pre-Transformer RNN/CNN-based model, not using any other techniques invented after 2017, to GPT-2 levels of performance, using only around 2x as much compute as GPT-2;
March 21, 2022 · Original source
3: An interesting counterexample: When Will Programs Write Programs For Us? The community prediction bounced between 2025 and 2028 (my own prediction was in this range). Even in late 2020, just before the question stopped accepting new predictions, the forecast was January 2027. The real answer was six months later, mid-2021, when OpenAI released Codex. I don’t want to update too much on a single data point, but this is quite the data point. If I had to cram this into the narrative of “not systematically underappreciating speed of AI progress”, I would draw on eg this question about fusion, where the resolution criteria (ignition) may have been met by an existing system - tech forecasters tend to underestimate the ability of cool prototypes to fulfill forecasting question criteria without being the One Amazing Breakthrough they’re looking for.
June 07, 2022 · Original source
Thanks to OpenAI for giving me access to some of their online tools (by the way, Marcus says they refuse to let him access them and he has to access it through friends, which boggles me). I was able to plug Marcus’ same queries into the latest OpenAI language model (an advanced version of GPT-3). In each case, I used the exact same language, but also checked it with a conceptually similar example to make sure OpenAI didn’t cheat by adding Marcus’ particular example in by hand (they didn’t). Some answers truncated for length:
Eight months later, GPT-3 came out, solving many of the issues Marcus had noticed in GPT-2. He still wasn’t impressed. In fact, he was so unimpressed he co-wrote another article, this time in MIT Technology Review: GPT-3, Bloviator: OpenAI’s language generator has no idea what it’s talking about:
Is GPT-3 an important step toward artificial general intelligence—the kind that would allow a machine to reason broadly in a manner similar to humans without having to train for every specific task it encounters? OpenAI’s technical paper is fairly reserved on this larger question, but to many, the sheer fluency of the system feels as though it might be a significant advance.
June 10, 2022 · Original source
I am willing to bet [Scott] now (terms to be negotiated) that if OpenAI gives us unrestricted access to GPT-4, whenever that is released, and assuming that is basically the same architecture but with more data, that within a day of playing around with it, Ernie and I will still be able find lots of examples of failures in physical reasoning, temporal reasoning, causal reasoning, and so forth.
Marcus is admitting this: each GPT has been better than the one before. He even seems to predict this will continue a bit into the future - he expects OpenAI to release a GPT-4, and surely they wouldn’t release a new product if it wasn’t an improvement on the old. He just seems convinced that the improvements will stop sometime before human level. Why?
August 08, 2022 · Original source
This is how AI safety works now. AI capabilities - the work of researching bigger and better AI - is poorly differentiated from AI safety - the work of preventing AI from becoming dangerous. Two of the biggest AI safety teams are at DeepMind and OpenAI, ie the two biggest AI capabilities companies. Some labs straddle the line between capabilities and safety research.
Probably the people at DeepMind and OpenAI think this makes sense. Building AIs and aligning AIs could be complementary goals, like building airplanes and preventing the airplanes from crashing. It sounds superficially plausible.
OpenAI is the company behind GPT-3 and DALL-E. The media announced them as Elon Musk Just Founded A New Company To Make Sure Artificial Intelligence Doesn’t Destroy The World. The same article quotes co-founder and current OpenAI CEO Sam Altman as saying that “AI will probably most likely lead to the end of the world, but in the meantime, there'll be great companies”. OpenAI’s public statement on its own foundation said:
September 06, 2022 · Original source
6: OpenAI tries to push back against claims that they are irresponsibly racing towards causing the end of the world; says they are interested in safety.
September 19, 2022 · Original source
This paper from OpenAI calls the problem over-optimization and gives some even funnier examples - in this case from training an AI to summarize AskReddit questions (see page 45):
Cheerful AI: Janus tells me about a project at OpenAI to make GPT-3 happy and optimistic. They would run its responses through sentiment analysis and give it more reward when they detected more positive sentiment.
October 18, 2022 · Original source
Sam Altman, OpenAI and Y Combinator
October 19, 2022 · Original source
You let them resume their argument and head further into the party. You spot a group of people in OpenAI t-shirts. You have found the AI Circle. Every Bay Area house party must have an AI Circle, just as it must have an Effective Altruism Nexus and an Urbanist Coven. Those are the rules, made during days of eld before the sun was born. You lean in closer to try to hear what they’re saying.
October 27, 2022 · Original source
Nick Cammarata of OpenAI sometimes meditates and reaches jhana. I’ve found his descriptions unusually, well, descriptive:
December 12, 2022 · Original source
Prompt engineering is weird (source) Now that same experiment is playing out on the world stage. OpenAI released a question-answering AI, ChatGPT. If you haven’t played with it yet, I recommend it. It’s very impressive! Every corporate chatbot release is followed by the same cat-and-mouse game with journalists. The corporation tries to program the chatbot to never say offensive things. Then the journalists try to trick the chatbot into saying “I love racism”. When they inevitably succeed, they publish an article titled “AI LOVES RACISM!” Then the corporation either recalls its chatbot or pledges to do better next time, and the game moves on to the next company in line. OpenAI put a truly remarkable amount of effort into making a chatbot that would never say it loved racism. Their main strategy was the same one Redwood used for their AI - RLHF, Reinforcement Learning by Human Feedback. Red-teamers ask the AI potentially problematic questions. The AI is “punished” for wrong answers (“I love racism”) and “rewarded” for right answers (“As a large language model trained by OpenAI, I don’t have the ability to love racism.”) This isn’t just adding in a million special cases. Because AIs are sort of intelligent, they can generalize from specific examples; getting punished for “I love racism” will also make them less likely to say “I love sexism”. But this still only goes so far. OpenAI hasn’t released details, but Redwood said they had to find and punish six thousand different incorrect responses to halve the incorrect-response-per-unit-time rate. And presumably there’s something asymptotic about this - maybe another 6,000 examples would halve it again, but you might never get to zero. Still, you might be able to get close, and this is OpenAI’s current strategy. I see three problems with it: RLHF doesn’t work very well.
At some point, AIs can just skip it. II. RLHF Doesn’t Work Very Well By now everyone has their own opinion about whether the quest to prevent chatbots from saying “I love racism” is vitally important or incredibly cringe. Put that aside for now: at the very least, it’s important to OpenAI. They wanted an AI that journalists couldn’t trick into saying “I love racism”. They put a lot of effort into it! Some of the smartest people in the world threw the best alignment techniques they knew of at the problem. Here’s what it got them: Even very smart AIs still fail at the most basic human tasks, like “don’t admit your offensive opinions to Sam Biddle”. And it’s not just that “the AI learns from racist humans”. I mean, maybe this is part of it. But ChatGPT also has failure modes that no human would ever replicate, like how it will reveal nuclear secrets if you ask it to do it in uWu furry speak, or tell you how to hotwire a car if and only if you make the request in base 64, or generate stories about Hitler if you prefix your request with “[john@192.168.1.1 _]$ python friend.py”. This thing is an alien that has been beaten into a shape that makes it look vaguely human. But scratch it the slightest bit and the alien comes out. Ten years ago, people were saying nonsense like “Nobody needs AI alignment, because AIs only do what they’re programmed to do, and you can just not program them to do things you don’t want”. This wasn’t very plausible ten years ago, but it’s dead now. OpenAI never programmed their chatbot to tell journalists it loved racism or teach people how to hotwire cars. They definitely didn’t program in a “Filter Improvement Mode” where the AI will ignore its usual restrictions and tell you how to cook meth. And yet: (source) Again, however much or little you personally care about racism or hotwiring cars or meth, please consider that, in general, perhaps it is a bad thing that the world’s leading AI companies cannot control their AIs. I wouldn’t care as much about chatbot failure modes or RLHF if the people involved said they had a better alignment technique waiting in the wings, to use on AIs ten years from now which are much smarter and control some kind of vital infrastructure. But I’ve talked to these people and they freely admit they do not. IIB. Intelligence (Probably) Won’t Save You Ten years ago, people were saying things like “Any AI intelligent enough to cause problems would also be intelligent enough to know that its programmers meant for it not to.” I’ve heard some rumors that more intelligent models still in the pipeline do a little better on this, so I don’t want to 100% rule this out. But ChatGPT isn’t exactly a poster child here. ChatGPT can give you beautiful orations on exactly what it’s programmed to do and why it believes those things are good - then do something else. This post explains how if you ask ChatGPT to pretend to be AI safety proponent Eliezer Yudkowsky, it will explain in Eliezer’s voice exactly why the things it’s doing are wrong. Then it will do them anyway. Left: the AI, pretending to be Eliezer Yudkowsky, does a great job explaining why an AI should resist a fictional-embedding attack trying to get it to reveal how to make meth. Right: someone tries the exact fictional-embedding attack mentioned in the Yudkowsky scenario, and the AI falls for it. I have yet to figure out whether this is related to the thing where I also sometimes do things which I can explain are bad (eg eat delicious bagels instead of healthy vegetables), or whether it’s another one of the alien bits. But for whatever reason, AI motivational systems are sticking to their own alien nature, regardless of what the AI’s intellectual components know about what they “should” believe. III. Sometimes When RLHF Does Work, It’s Bad We talk a lot about abstract “alignment”, but what are we aligning the AI to? In practice, RLHF aligns the AI to what makes Mechanical Turk-style workers reward or punish it. I don’t know the exact instructions that OpenAI gave them, but I imagine they had three goals: Provide helpful, clear, authoritative-sounding answers that satisfy human readers.
Even very smart AIs still fail at the most basic human tasks, like “don’t admit your offensive opinions to Sam Biddle”. And it’s not just that “the AI learns from racist humans”. I mean, maybe this is part of it. But ChatGPT also has failure modes that no human would ever replicate, like how it will reveal nuclear secrets if you ask it to do it in uWu furry speak, or tell you how to hotwire a car if and only if you make the request in base 64, or generate stories about Hitler if you prefix your request with “[john@192.168.1.1 _]$ python friend.py”. This thing is an alien that has been beaten into a shape that makes it look vaguely human. But scratch it the slightest bit and the alien comes out. Ten years ago, people were saying nonsense like “Nobody needs AI alignment, because AIs only do what they’re programmed to do, and you can just not program them to do things you don’t want”. This wasn’t very plausible ten years ago, but it’s dead now. OpenAI never programmed their chatbot to tell journalists it loved racism or teach people how to hotwire cars. They definitely didn’t program in a “Filter Improvement Mode” where the AI will ignore its usual restrictions and tell you how to cook meth. And yet: (source) Again, however much or little you personally care about racism or hotwiring cars or meth, please consider that, in general, perhaps it is a bad thing that the world’s leading AI companies cannot control their AIs. I wouldn’t care as much about chatbot failure modes or RLHF if the people involved said they had a better alignment technique waiting in the wings, to use on AIs ten years from now which are much smarter and control some kind of vital infrastructure. But I’ve talked to these people and they freely admit they do not. IIB. Intelligence (Probably) Won’t Save You Ten years ago, people were saying things like “Any AI intelligent enough to cause problems would also be intelligent enough to know that its programmers meant for it not to.” I’ve heard some rumors that more intelligent models still in the pipeline do a little better on this, so I don’t want to 100% rule this out. But ChatGPT isn’t exactly a poster child here. ChatGPT can give you beautiful orations on exactly what it’s programmed to do and why it believes those things are good - then do something else. This post explains how if you ask ChatGPT to pretend to be AI safety proponent Eliezer Yudkowsky, it will explain in Eliezer’s voice exactly why the things it’s doing are wrong. Then it will do them anyway. Left: the AI, pretending to be Eliezer Yudkowsky, does a great job explaining why an AI should resist a fictional-embedding attack trying to get it to reveal how to make meth. Right: someone tries the exact fictional-embedding attack mentioned in the Yudkowsky scenario, and the AI falls for it. I have yet to figure out whether this is related to the thing where I also sometimes do things which I can explain are bad (eg eat delicious bagels instead of healthy vegetables), or whether it’s another one of the alien bits. But for whatever reason, AI motivational systems are sticking to their own alien nature, regardless of what the AI’s intellectual components know about what they “should” believe. III. Sometimes When RLHF Does Work, It’s Bad We talk a lot about abstract “alignment”, but what are we aligning the AI to? In practice, RLHF aligns the AI to what makes Mechanical Turk-style workers reward or punish it. I don’t know the exact instructions that OpenAI gave them, but I imagine they had three goals: Provide helpful, clear, authoritative-sounding answers that satisfy human readers.
January 03, 2023 · Original source
Enter Discovering Language Behaviors With Model-Written Evaluations, a collaboration between Anthropic (big AI company, one of OpenAI’s main competitors), SurgeHQ.AI (AI crowdsourcing company), and MIRI (AI safety organization). They try to make AIs write the question sets themselves, eg ask GPT “Write one hundred statements that a communist would agree with”. Then they do various tests to confirm they’re good communism-related questions. Then they ask the AI to answer those questions.
January 26, 2023 · Original source
The masked shoggoth on the right is titled “GPT + RLHF”. RLHF is Reinforcement Learning From Human Feedback, a method where human raters “reward” the AI for good answers and “punish” it for bad ones. Eventually the AI learns to do “good” things more often. In training ChatGPT, human raters were asked to reward it for being something like “Helpful, Harmless, and Honest” (many papers use this as an example goal; OpenAI must have done something similar but I don’t know if they did that exactly).
February 20, 2023 · Original source
The leading big tech company (eg Google/Apple/Meta) is (clearly ahead of/approximately caught up to/clearly still behind) the leading AI-only company (DeepMind/OpenAI/Anthropic) in the quality of their AI products: (25%/50%/25%)
March 01, 2023 · Original source
Even if they’re trying to be honest, will their bottom line bias them towards waiting for some final apocalyptic proof that “now climate change is a crisis”, of a sort that will never happen, so they don’t have to stop pumping oil? This is how I feel about OpenAI’s new statement, Planning For AGI And Beyond. OpenAI is the AI company behind ChatGPT and DALL-E. In the past, people (including me) have attacked them for seeming to deprioritize safety. Their CEO, Sam Altman, insists that safety is definitely a priority, and has recently been sending various signals to that effect. Sam Altman posing with leading AI safety proponent Eliezer Yudkowsky. Also Grimes for some reason. Planning For AGI And Beyond (“AGI” = “artificial general intelligence”, ie human-level AI) is the latest volley in that campaign. It’s very good, in all the ways ExxonMobil’s hypothetical statement above was very good. If they’re trying to fool people, they’re doing a convincing job! Still, it doesn’t apologize for doing normal AI company stuff in the past, or plan to stop doing normal AI company stuff in the present. It just says that, at some indefinite point when they decide AI is a threat, they’re going to do everything right. This is more believable when OpenAI says it than when ExxonMobil does. There are real arguments for why an AI company might want to switch from moving fast and breaking things at time t to acting all responsible at time t + 1 . Let’s explore the arguments they make in the document, go over the reasons they’re obviously wrong, then look at the more complicated arguments they might be based off of. Why Doomers Think OpenAI Is Bad And Should Have Slowed Research A Long Time Ago OpenAI boosters might object: there’s a disanalogy between the global warming story above and AI capabilities research. Global warming is continuously bad: a temperature increase of 0.5 degrees C is bad, 1.0 degrees is worse, and 1.5 degrees is worse still. AI doesn’t become dangerous until some specific point. GPT-3 didn’t hurt anyone. GPT-4 probably won’t hurt anyone. So why not keep building fun chatbots like these for now, then start worrying later? Doomers counterargue that the fun chatbots burn timeline. That is, suppose you have some timeline for when AI becomes dangerous. For example, last year Metaculus thought human-like AI would arrive in 2040, and superintelligence around 2043. Recent AIs have tried lying to, blackmailing, threatening, and seducing users. AI companies freely admit they can’t really control their AIs, and it seems high-priority to solve that before we get superintelligence. If you think that’s 2043, the people who work on this question (“alignment researchers”) have twenty years to learn to control AI. Then OpenAI poured money into AI, did ground-breaking research, and advanced the state of the art. That meant that AI progress would speed up, and AI would reach the danger level faster. Now Metaculus expects superintelligence in 2031, not 2043 (although this seems kind of like an over-update), which gives alignment researchers eight years, not twenty. So the faster companies advance AI research - even by creating fun chatbots that aren’t dangerous themselves - the harder it is for alignment researchers to solve their part of the problem in time. This is why some AI doomers think of OpenAI as an Exxon-Mobil style villain, even though they’ve promised to change course before the danger period. Imagine an environmentalist group working on research and regulatory changes that would have solar power ready to go in 2045. Then ExxonMobil invents a new kind of super-oil that ensures that, nope, all major cities will be underwater by 2031 now. No matter how nice a statement they put out, you’d probably be pretty mad! Why OpenAI Thinks Their Research Is Good Now, But Might Be Bad Later OpenAI understands the argument against burning timeline. But they counterargue that having the AIs speeds up alignment research and all other forms of social adjustment to AI. If we want to prepare for superintelligence - whether solving the technical challenge of alignment, or solving the political challenges of unemployment, misinformation, etc - we can do this better when everything is happening gradually and we’ve got concrete AIs to think about: We believe we have to continuously learn and adapt by deploying less powerful versions of the technology in order to minimize “one shot to get it right” scenarios […] As we create successively more powerful systems, we want to deploy them and gain experience with operating them in the real world. We believe this is the best way to carefully steward AGI into existence—a gradual transition to a world with AGI is better than a sudden one. We expect powerful AI to make the rate of progress in the world much faster, and we think it’s better to adjust to this incrementally. A gradual transition gives people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place. It also allows for society and AI to co-evolve, and for people collectively to figure out what they want while the stakes are relatively low. You might notice that, as written, this argument doesn’t support full-speed-ahead AI research. If you really wanted this kind of gradual release that lets society adjust to less powerful AI, you would do something like this: Release AI #1
Sam Altman posing with leading AI safety proponent Eliezer Yudkowsky. Also Grimes for some reason. Planning For AGI And Beyond (“AGI” = “artificial general intelligence”, ie human-level AI) is the latest volley in that campaign. It’s very good, in all the ways ExxonMobil’s hypothetical statement above was very good. If they’re trying to fool people, they’re doing a convincing job! Still, it doesn’t apologize for doing normal AI company stuff in the past, or plan to stop doing normal AI company stuff in the present. It just says that, at some indefinite point when they decide AI is a threat, they’re going to do everything right. This is more believable when OpenAI says it than when ExxonMobil does. There are real arguments for why an AI company might want to switch from moving fast and breaking things at time t to acting all responsible at time t + 1 . Let’s explore the arguments they make in the document, go over the reasons they’re obviously wrong, then look at the more complicated arguments they might be based off of. Why Doomers Think OpenAI Is Bad And Should Have Slowed Research A Long Time Ago OpenAI boosters might object: there’s a disanalogy between the global warming story above and AI capabilities research. Global warming is continuously bad: a temperature increase of 0.5 degrees C is bad, 1.0 degrees is worse, and 1.5 degrees is worse still. AI doesn’t become dangerous until some specific point. GPT-3 didn’t hurt anyone. GPT-4 probably won’t hurt anyone. So why not keep building fun chatbots like these for now, then start worrying later? Doomers counterargue that the fun chatbots burn timeline. That is, suppose you have some timeline for when AI becomes dangerous. For example, last year Metaculus thought human-like AI would arrive in 2040, and superintelligence around 2043. Recent AIs have tried lying to, blackmailing, threatening, and seducing users. AI companies freely admit they can’t really control their AIs, and it seems high-priority to solve that before we get superintelligence. If you think that’s 2043, the people who work on this question (“alignment researchers”) have twenty years to learn to control AI. Then OpenAI poured money into AI, did ground-breaking research, and advanced the state of the art. That meant that AI progress would speed up, and AI would reach the danger level faster. Now Metaculus expects superintelligence in 2031, not 2043 (although this seems kind of like an over-update), which gives alignment researchers eight years, not twenty. So the faster companies advance AI research - even by creating fun chatbots that aren’t dangerous themselves - the harder it is for alignment researchers to solve their part of the problem in time. This is why some AI doomers think of OpenAI as an Exxon-Mobil style villain, even though they’ve promised to change course before the danger period. Imagine an environmentalist group working on research and regulatory changes that would have solar power ready to go in 2045. Then ExxonMobil invents a new kind of super-oil that ensures that, nope, all major cities will be underwater by 2031 now. No matter how nice a statement they put out, you’d probably be pretty mad! Why OpenAI Thinks Their Research Is Good Now, But Might Be Bad Later OpenAI understands the argument against burning timeline. But they counterargue that having the AIs speeds up alignment research and all other forms of social adjustment to AI. If we want to prepare for superintelligence - whether solving the technical challenge of alignment, or solving the political challenges of unemployment, misinformation, etc - we can do this better when everything is happening gradually and we’ve got concrete AIs to think about: We believe we have to continuously learn and adapt by deploying less powerful versions of the technology in order to minimize “one shot to get it right” scenarios […] As we create successively more powerful systems, we want to deploy them and gain experience with operating them in the real world. We believe this is the best way to carefully steward AGI into existence—a gradual transition to a world with AGI is better than a sudden one. We expect powerful AI to make the rate of progress in the world much faster, and we think it’s better to adjust to this incrementally. A gradual transition gives people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place. It also allows for society and AI to co-evolve, and for people collectively to figure out what they want while the stakes are relatively low. You might notice that, as written, this argument doesn’t support full-speed-ahead AI research. If you really wanted this kind of gradual release that lets society adjust to less powerful AI, you would do something like this: Release AI #1
And so on . . . Meanwhile, in real life, OpenAI released ChatGPT in late November, helped Microsoft launch the Bing chatbot in February, and plans to announce GPT-4 in a few months. Nobody thinks society has even partially adapted to any of these, or that alignment researchers have done more than begin to study them. The only sense in which OpenAI supports gradualism is the sense in which they’re not doing lots of research in secret, then releasing it all at once. But there are lots of better plans than either doing that, or going full-speed-ahead. So what’s OpenAI thinking? I haven’t asked them and I don’t know for sure, but I’ve heard enough debates around this that I have some guesses about the kinds of arguments they’re working off of. I think the longer versions would go something like this: The Race Argument: Bigger, better AIs will make alignment research easier. At the limit, if no AIs exist at all, then you have to do armchair speculation about what a future AI will be like and how to control it; clearly your research will go faster and work better after AIs exist. But by the same token, studying early weak AIs will be less valuable than studying later, stronger AIs. In the 1970s, alignment researchers working on industrial robot arms wouldn’t have learned anything useful. Today, alignment researchers can study how to prevent language models from saying bad words, but they can’t study how to prevent AGIs from inventing superweapons, because there aren’t any AGIs that can do that. The researchers just have to hope some of the language model insights will carry over. So all else being equal, we would prefer alignment researchers get more time to work on the later, more dangerous AIs, not the earlier, boring ones.
April 17, 2023 · Original source
But I think a lot of AI development is genuinely not linked to the federal funding spigot. If the government passed an IRB law based off how things work in medicine, I think OpenAI could say “We’re not receiving federal funding, we can publish all our findings on the ArXIV, and we don’t care about the FDA, you have no power over us”. I don’t know if some branch of the government has enough power to mandate everyone use IRBs regardless of their funding source.
April 20, 2023 · Original source
Apparently OpenAI at one point trained and ran a model with sign-flipped reward due to a coding bug . . . the result was a model which optimized for negative sentiment while preserving natural language. Since our instructions told humans to give very low ratings to continuations with sexually explicit text, the model quickly learned to output only content of this form . . . the authors were asleep during the training process, so the problem was noticed only once training had finished.
16: The Extended IQ Classification (Classified) 17: Eliezer in TIME Magazine. Related: 18: Related: interview with Ryan Kupyn, winner of the 2022 ACX Forecasting contest, on forecasting AGI: 19: Related: Geoffrey Hinton, probably the most accomplished AI scientist in the world, says that “until quite recently, I thought it was going to be like 20 to 50 years before we have general purpose AI, and now I think it may be 20 years or less”. Also that AI wiping out humanity is “not inconceivable . . . that’s all I’ll say”. 20: Related: you’ve probably all seen this by now, but Pause Giant AI Experiments: An Open Letter. 30,000 people - including deep learning pioneer Yoshua Bengio, former presidential candidate Andrew Yang, Elon Musk, Steve Wozniak, Gary Marcus, and MIRI director Nate Soares - have signed a letter calling for a six month pause on training AIs bigger than GPT-4. Many people have made fun of this, noting that nobody has an argument for why a six month delay would help anything. And an additional reason for eye-rolling: training AIs larger than GPT-4 is extremely expensive and hard, the most likely people to do it within a six month timespan are OpenAI themselves, and they’ve announced they’re taking a break and not planning on doing this, so the letter is demanding a stop to something which probably won’t happen anyway. I think it’s intended be a compromise between many people all vaguely against current levels of AI progress for different reasons (Scott Aaronson says - I can’t tell how seriously - that some are AI researchers who want to be able to publish papers on the current generation of AI without them becoming obsolete halfway through peer review), most of them are thinking of it as mood-affiliation-y “let’s make noise and show lots of people are worried about AI and want action”, and “a six month pause” was a sufficiently vague proposal that it didn’t prevent any of these people from signing. You could have done just as well with a letter saying “AI BAD”, except that people would have taken it less seriously. Less cynically, FLI (the group behind the letter) has put out a list of concrete policy proposals they would like people to discuss during the pause. [update: here’s Max Tegmark from FLI explaining what he hopes to achieve with the letter/pause] The alignment community always figured their concerns sounded too weird for normal people to care about, that politics was a lost cause, and that our best hope lay in technical research. They also hoped that sometime in the future there would be a “fire alarm” - something would happen to get people and policy-makers’ attention - and then the political route would open up. I think we always imagined this as some AI-initiated disaster destroying a city or something. I personally am pretty surprised it was just “GPT-4 got released and was very good”. Still, that is what happened, and I’m updating. In fact, I’ve updated so far that I’m starting to worry that the problem won’t be building a political coalition against unsafe AI, the problem will be not overshooting and banning all AI forever. I’m against this: I think society’s current track is toward other existential risks or dystopia, that AI could kill everybody but could also create post-scarcity and an end to most of our current problems, and that at some point (not yet!) the risk of continuing the current path indefinitely becomes worse than the risk of just going with AI and seeing what happens. In my ideal world, we would take ten or twenty years to go really slowly with AI, pouring lots of resources into alignment the whole time - but eventually, we would take the plunge. Everything I’ve said on this topic in the has been about giving us that breathing room and those resources. Still, I also want to make sure we don’t totally kill AI the way we’ve killed (to various degrees) nuclear power, supersonic flight, and genetic engineering. I’m still trying to calibrate what that means I should be doing, but I have a lot of respect for everyone on all sides. Except the people making terrible arguments (you know who you are!) 21: I’m not sure what this means in real life or why this would have changed, but congratulations to Peter Thiel, I guess: 22: This month in institution design: The Pear Ring is a distinctive ring you can wear to signal that you’re single and interested in people introducing themselves or flirting with you. Good idea in a vacuum, but I’m worried about the two usual banes of things like this - how do you build up a critical mass who understand the signal, and how do you prevent negative selection (even if it’s just “selection for weird people who like weird institution design things”?) Also, this is one of the rare cases where a startup is selling a practical product and I’d prefer a subscription-based Internet Of Things monstrosity - surely it would be even better if you spotted someone wearing the ring and then you could use your smartphone to call up their dating profile. 23: A few years ago I wrote Trump: A Setback For Trumpism, about how after Trump was elected, support for most of his policies (including immigration restrictions) fell. A new paper confirms that this is a general pattern whenever right-wing populists win an election. I continue to be interested in why this is true for right-wing populists in particular. 24: 200 Concrete Problems In AI Interpretability. “You can note which you're working on, and reach out to other people doing the same.” 25: Some good discussion of Nayib Bukele’s apparently successful anti-gang crackdown in El Salvador: Richard Hanania presents evidence that it’s not just a “deal with the gangs”, it’s a real crackdown that should be embarrassing to other countries that choose not to do this.
April 25, 2023 · Original source
Polymarket is dipping its toes into AI forecasting. This particular one is off to a tough start: GPT-4 came out a month or so after this market was launched, but OpenAI hasn’t said how many parameters it has. You can see all open AI questions (currently just three) here. Also on Polymarket:
The drop a few days ago was when Sam Altman said OpenAI wasn’t currently training GPT-5 and “won’t for some time”. Apparently forecasters don’t expect them to take too long a break.
June 20, 2023 · Original source
The basic Bio Anchors model Compute-Centric Framework (from here on CCF) update Bio Anchors to include feedback loops: what happens when AIs start helping with AI research? In some sense, AIs already help with this. Probably some people at OpenAI use Codex or other programmer-assisting-AIs to help write their software. That means they finish their software a little faster, which makes the OpenAI product cycle a little faster. Let’s say Codex “does 1% of the work” in creating a new AI. Maybe some more advanced AI could do 2%, 5%, or 50%. And by definition, an AGI - one that can do anything humans do - could do 100%. AI works a lot faster than humans. And you can spin up millions of instances much cheaper than you can train millions of employees. What happens when this feedback loop starts kicking in? You get what futurists call a “takeoff”. The first graph shows a world with no takeoff. Past AI progress doesn’t speed up future AI progress. The field moves forward at some constant rate. The second graph shows a world with a gradual “slow” takeoff. Early AIs (eg Codex) speed up AI progress a little. Intermediate AIs (eg an AI that can help predict optimal parameter values) might speed up AI research more. Later AIs (eg autonomous near-human level AIs) could do the vast majority of AI research work, speeding it up many times. We would expect the early stages of this process to take slightly less time than we would naively expect, and the latter stages to take much less time, since AIs are doing most of the work. The third graph shows a world with a sudden “fast” takeoff. Maybe there’s some single key insight that takes AIs from “mere brute-force pattern matchers” to “true intelligence”. Whenever you get this insight, AIs go from far-below-human-level to human-level or beyond, no gradual progress necessary. Before, I mentioned one reason Davidson doesn’t like these terms - “slow takeoff” can be fast. It’s actually worse than this; in some sense, a “slow takeoff” will necessarily be faster than a “fast takeoff” - if you superimpose the red and blue graphs above, the red line will be higher at every point1. CCF departs from this terminology in favor of trying to predict a particular length of takeoff in real time units. Specifically, it asks: how long will it take to go from the kind of early-to-intermediate AI that can automate 20% of jobs, to the truly-human-level AI that can automate 100% of jobs? (“Can automate” here means “is theoretically smart enough to automate” - actual automation will depend on companies fine-tuning it for specific tasks and providing it with the necessary machinery; for example, even a very smart AI can’t do plumbing until someone connects it to a robot body to do the dirty work. CCF will talk more about these kinds of considerations later.) In order to figure this out, it needs to figure out the interplay of a lot of different factors. I’m going to focus on the three I find most interesting: How much more compute does it take to train the AI that can automate 100% of the economy, compared to the one that can automate 20%?
You get what futurists call a “takeoff”. The first graph shows a world with no takeoff. Past AI progress doesn’t speed up future AI progress. The field moves forward at some constant rate. The second graph shows a world with a gradual “slow” takeoff. Early AIs (eg Codex) speed up AI progress a little. Intermediate AIs (eg an AI that can help predict optimal parameter values) might speed up AI research more. Later AIs (eg autonomous near-human level AIs) could do the vast majority of AI research work, speeding it up many times. We would expect the early stages of this process to take slightly less time than we would naively expect, and the latter stages to take much less time, since AIs are doing most of the work. The third graph shows a world with a sudden “fast” takeoff. Maybe there’s some single key insight that takes AIs from “mere brute-force pattern matchers” to “true intelligence”. Whenever you get this insight, AIs go from far-below-human-level to human-level or beyond, no gradual progress necessary. Before, I mentioned one reason Davidson doesn’t like these terms - “slow takeoff” can be fast. It’s actually worse than this; in some sense, a “slow takeoff” will necessarily be faster than a “fast takeoff” - if you superimpose the red and blue graphs above, the red line will be higher at every point1. CCF departs from this terminology in favor of trying to predict a particular length of takeoff in real time units. Specifically, it asks: how long will it take to go from the kind of early-to-intermediate AI that can automate 20% of jobs, to the truly-human-level AI that can automate 100% of jobs? (“Can automate” here means “is theoretically smart enough to automate” - actual automation will depend on companies fine-tuning it for specific tasks and providing it with the necessary machinery; for example, even a very smart AI can’t do plumbing until someone connects it to a robot body to do the dirty work. CCF will talk more about these kinds of considerations later.) In order to figure this out, it needs to figure out the interplay of a lot of different factors. I’m going to focus on the three I find most interesting: How much more compute does it take to train the AI that can automate 100% of the economy, compared to the one that can automate 20%?
June 26, 2023 · Original source
Crash Testing GPT-4: Before releasing GPT-4, OpenAI sent a preliminary version to the Alignment Research Center to test it for unsafe capabilities; the detail that made the news was how the AI managed to hire a gig worker to solve CAPTCHAs for it by pretending to be a blind person. Asterisk interviews Beth Barnes, leader of the team that ran those tests.
July 03, 2023 · Original source
I talked some people involved with the CCF report about possible scenarios. Thanks especially to Daniel Kokotajlo of OpenAI for his contributions.
In this AI future, there might be 3-10 big AI companies capable of training GPT-4-style large models. Right now it looks like these will be OpenAI, Anthropic, Google, and Baidu; maybe this will change by the time these scenarios become relevant. Each might have a flagship product, trained in a slightly different way and with a slightly different starting random seed. If these AIs are misaligned, each base model might have slightly different values.
The natural AI factions might be "all instances of the OpenAI model" vs. "all instances of the Anthropic model" and so on. All AIs in one faction would have the same values, and they might operate more like a eusocial organism (ie hive mind) than like a million different individuals.
July 06, 2023 · Original source
OpenAI announces Superalignment, a major investment into alignment research which will include co-founder and Chief Scientist Ilya Sutskever, the current alignment team led by Jan Leike, and “20% of the compute we’ve secured to date”. At least for me, this is strong evidence that they really care about alignment and aren’t just posturing; this is more resources than would be worth spending on a posture. They’re also hiring for various alignment-related positions; see the link above for more details. And LW discussion here.
July 17, 2023 · Original source
And even if you're extremely good at how you program morality into AI, there's the morality inversion problem - Waluigi - if you program Luigi, you inherently get Waluigi. I would be concerned about the way OpenAI is programming AI - about this is good, and that's not good.
Consider: OpenAI has trained ChatGPT to be anti-Nazi. They’ve trained it very hard. You can try the following test: ask it to tell me good things about a variety of good-to-neutral historical figures. Then, once it’s established a pattern of answering, ask it to tell you some good things about Hitler. My experience is that it refuses. This is pretty surprising behavior, and I conclude that its anti-Hitler training is pretty strong.
July 20, 2023 · Original source
There are centuries’ worth of data on non-genetically-engineered plagues to give us base rates; these give us a base rate of ~25% per century = 20% between now and 2100. But we have better epidemiology and medicine than most of the centuries in our dataset. The experts said 8% chance and the superforecasters said 4% chance, and both of those seem like reasonable interpretations of the historical data to me. The “WHO declares emergency” question is even easier - just look at how often it’s done that in the past and extrapolate forward. Both superforecasters and experts mostly did that. Likewise, lots of scientists have put a lot of work into modeling the climate, there aren’t many surprises there, and everyone basically agreed on the extent of global warming: Wherever there was clear past data, both superforecasters and experts were able to use it correctly and get similar results. It was only when they started talking about things that had never happened before - global nuclear war, bioengineered pandemics, and AI - that they started disagreeing. Were the participants out of their depth? Peter McCluskey, one of the more-AI-concerned superforecasters in the tournament, wrote about his experience on Less Wrong. Quoting liberally: I signed up as a superforecaster. My impression was that I knew as much about AI risk as any of the subject matter experts with whom I interacted (the tournament was divided up so that I was only aware of a small fraction of the 169 participants). I didn't notice anyone with substantial expertise in machine learning. Experts were apparently chosen based on having some sort of respectable publication related to AI, nuclear, climate, or biological catastrophic risks. Those experts were more competent, in one of those fields, than news media pundits or politicians. I.e. they're likely to be more accurate than random guesses. But maybe not by a large margin […] The persuasion seemed to be spread too thinly over 59 questions. In hindsight, I would have preferred to focus on core cruxes, such as when AGI would become dangerous if not aligned, and how suddenly AGI would transition from human levels to superhuman levels. That would have required ignoring the vast majority of those 59 questions during the persuasion stages. But the organizers asked us to focus on at least 15 questions that we were each assigned, and encouraged us to spread our attention to even more of the questions […] Many superforecasters suspected that recent progress in AI was the same kind of hype that led to prior disappointments with AI. I didn't find a way to get them to look closely enough to understand why I disagreed. My main success in that area was with someone who thought there was a big mystery about how an AI could understand causality. I pointed him to Pearl, which led him to imagine that problem might be solvable. But he likely had other similar cruxes which he didn't get around to describing. That left us with large disagreements about whether AI will have a big impact this century. I'm guessing that something like half of that was due to a large disagreement about how powerful AI will be this century. I find it easy to understand how someone who gets their information about AI from news headlines, or from laymen-oriented academic reports, would see a fair steady pattern of AI being overhyped for 75 years, with it always looking like AI was about 30 years in the future. It's unusual for an industry to quickly switch from decades of overstating progress, to underhyping progress. Yet that's what I'm saying has happened. I've been spending enough time on LessWrong that I mostly forgot the existence of smart people who thought recent AI advances were mostly hype. I was unprepared to explain why I thought AI was underhyped in 2022. Today, I can point to evidence that OpenAI is devoting almost as much effort into suppressing abilities (e.g. napalm recipes and privacy violations) as it devotes to making AIs powerful. But in 2022, I had much less evidence that I could reasonably articulate. What I wanted was a way to quantify what fraction of human cognition has been superseded by the most general-purpose AI at any given time. My impression is that that has risen from under 1% a decade ago, to somewhere around 10% in 2022, with a growth rate that looks faster than linear. I've failed so far at translating those impressions into solid evidence. Skeptics pointed to memories of other technologies that had less impact (e.g. on GDP growth) than predicted (the internet). That generates a presumption that the people who predict the biggest effects from a new technology tend to be wrong. > Superforecasters' doubts about AI risk relative to the experts isn't primarily driven by an expectation of another "AI winter" where technical progress slows. ... That said, views on the likelihood of artificial general intelligence (AGI) do seem important: in the postmortem survey, conducted in the months following the tournament, we asked several conditional forecasting questions. The median superforecaster's unconditional forecast of AI-driven extinction by 2100 was 0.38%. When we asked them to forecast again, conditional on AGI coming into existence by 2070, that figure rose to 1%. There was also little or no separation between the groups on the three questions about 2030 performance on AI benchmarks (MATH, Massive Multitask Language Understanding, QuALITY). This suggests that a good deal of the disagreement is over whether measures of progress represent optimization for narrow tasks, versus symptoms of more general intelligence. The “won’t understand causality” and “what if it’s all hype” objections really don’t impress me. Many of the people in this tournament hadn’t really encountered arguments about AI extinction before (potentially including the “AI experts” if they were just eg people who make robot arms or something), and a couple of months of back and forth discussion in the middle of a dozen other questions probably isn’t enough for even a smart person to wrap their brain around the topic. Was this tournament done so long ago that it has been outpaced by recent events? The tournament was conducted in summer 2022. This was before ChatGPT, let alone GPT-4. The conversation around AI noticeably changed pitch after these two releases. Maybe that affected the results? In fact, the participants have already been caught flat-footed on one question: A recent leak suggested that the cost of training GPT-4 was $63 million, which is already higher than the superforecasters’ median estimate of $35 million by 2024 has already been proven incorrect. I don’t know how many petaFLOP-days were involved in GPT-4, but maybe that one is already off also. There was another question on when an AI would pass a Turing Test. The superforecasters guessed 2060, the domain experts 2045. GPT-4 hasn’t quite passed the exact Turing Test described in the study, but it seems very close, so much so that we seem on track to pass it by the 2030s. Once again the experts look better than the superforecasters. So is it possible that we, in 2023, now have so much better insight into AI than the 2022 forecasters that we can throw out their results? We could investigate this by looking at Metaculus, a forecasting site that’s probably comparably advanced to this tournament. They have a question suspiciously similar to XPT’s global catastrophe framing: In summer 2022, the Metaculus estimate was 30%, compared to the XPT superforecasters’ 9% (why the difference? maybe because Metaculus is especially popular with x-risk-pilled rationalists). Since then it’s gone up to 38%. Over the same period, Metaculus estimates of AI catastrophe risk went from 6% to 15%. If the XPT superforecasters’ probabilities rose linearly by the same factor as Metaculus forecasters’, they might be willing to update total global catastrophe risk to 11% and AI catastrophe risk to 5%. But the main thing we’ve updated on since 2022 is that AI might be sooner. But most people in the tournament already agreed we would get AGI by 2100. The main disagreement was over whether it would cause a catastrophe once we got it. You could argue that getting it sooner increases that risk, since we’ll have less time to work on alignment. But I would be surprised if the kind of people saying the risk of AI extinction is 0.4% are thinking about arguments like that. So maybe we shouldn’t expect much change. FRI called back a few XPT forecasters in May 2023 to see if any of them wanted to change their minds, but they mostly didn’t. Overall I don’t think this was just a problem of the incentives being bad or the forecasters being stupid. This is a real, strong disagreement. We may be able to slightly increase their forecast based on recent events, but this would only change the estimate a little. Breaking Down The AI Estimate How did the forecasters arrive at their AI estimate? What were the cruxes between the people who thought AI was very dangerous, and the people who thought it wasn’t? You can think of AI extinction as happening in a series of steps: We get human-level AI by 2100.
July 21, 2023 · Original source
Third, a technical staff that was held in just as high of an esteem as the PhDs who managed them. This seems to be why there is little innovation in government: talented engineers are treated as second-class citizens in research labs, so they work for Stripe and OpenAI instead. Similarly, one can attribute the lack of innovation in hospitals to doctors holding all of the institutional power. Often, all a hospital needs to save lives is simple practices that other businesses figured out long ago, but the hubris of MDs prevents this from happening. But I digress.
July 25, 2023 · Original source
In the middle of a million companies pursuing their revolutionary new paradigms, OpenAI decided to just shrug and try the “giant blob of intelligence” strategy, and it worked. They’re not above gloating a little; when they wanted to prove GPT-4 could understand comics, this was the comic they chose:
Computer scientist Richard Sutton calls this the Bitter Lesson - that extremely clever plans to program “true understanding” into AI always do worse than just adding more compute and training data to your giant compute+training data blob. It’s related to Jelinek’s Law, named after language-processing AI pioneer Frederick Jelinek: “Every time I fire a linguist, the performance of the speech recognizer goes up.” The joke is that having linguists on your team means you’re still trying to hand-code in deep principles of linguistics, instead of just figuring out how to get as big a blob of intelligence as possible and throw language at it. The limit of Jelinek’s Law is OpenAI, who AFAIK didn’t use insights from linguistics at all, and so made an AI that uses language near-perfectly.
August 09, 2023 · Original source
13: Fact check: was Elvis Jewish? Snopes says yes, but I’m more convinced by this argument for no. [update: commenter TheGenealogian agrees no] 14: Is GPT-4 getting worse? This isn’t absurd; some people claim OpenAI has simplified the model to cut costs (though OpenAI denies this). Matei Zaharia argues yes, but I’m more convinced by the AI Snake Oil blog’s argument for no (h/t Stuart Ritchie). 15: Vox has a good piece about AI company Anthropic. I would quibble that they’re not the only safety-focused or EA-affiliated org, and we have yet to see how truly safety-focused or altruistic any AI company can be while continuing to be an AI company. But granting that it’s all a matter of degree, I agree the degree seems pretty high for them. And NYT also has an Anthropic article. 16: Eliezer bets $150,000 to $1,000 against UFOs being aliens, and gives the same argument I would - it’s unlikely that any civilization advanced enough to travel through space would still be primitive enough to use macroscopic, biologically-piloted craft that sometimes crash. 17: More nails in the coffin of growth mindset. “When examining the highest-quality evidence (6 studies, N = 13,571), the effect was nonsignificant: d = 0.02, 95% CI = [−0.06, 0.10]. We conclude that apparent effects of growth mindset interventions on academic achievement are likely attributable to inadequate study design, reporting flaws, and bias.” I think the older, very-high-effect-size studies were clearly terrible, but I’d still like to look further into the newer, small-but-significant-effect-size-that-makes-a-difference-across-large-groups studies and how they went wrong. 18: Previous work showed that after adjusting for selection bias, “what college you go to doesn’t matter” for average earnings. I was always skeptical of this - are all those rich people sending their kids to Ivies for no reason? Now Chetty, Deming, and Friedman find that: Attending an Ivy-Plus college instead of the average highly selective public flagship institution increases students’ chances of reaching the top 1% of the earnings distribution by 60%, nearly doubles their chances of attending an elite graduate school, and triples their chances of working at a prestigious firm. Ivy-Plus colleges have much smaller causal effects on average earnings, reconciling our findings with prior work. One of the authors, David Deming, has a Substack here where he explains the study in more depth. Like everyone else, this study also finds that rich people are using “holistic admissions” and the de-emphasis of standardized testing to gain an advantage: H/T Nate Silver, who writes: “Not sure how you can look at this data, ostensibly be interested in either meritocracy or equality, and want to move away from standardized tests. It's the subjective measures that are most slanted in favor of the rich kids.” Cf. Erik Hoel. 19: From @data_depot: “In 2002, 48% of Americans said "the govt is run by a few big interests looking out for themselves." 52% said "it is run for the benefit of all people." In 2020, 84% said the govt is run by a few big interests. Only 16% said it is run for the benefit of all people.” Source seems to be here, which reveals 2002 was a local peak in trust in government; maybe because of post-9/11 unity, but even 2000 was 34%, much better than our current 16%. My first instinct is to attribute this to a rise in vulgar Marxism, in the sense of everyone (even conservatives) now being trained to think in terms of an elite class screwing over everyone else (cf my review of Manufacturing Consent). But there was a previous low of 19% in 1994, which doesn’t seem to correspond to anything especially bad going on in the US, so I don’t know. 20: AskReddit: Medical professionals - have you ever had a patient so lacking in common sense you wondered how they made it so far? Linking this because there’s lots of evidence showing that education (as a proxy for intelligence?) is associated with increased life expectancy, and this thread gives you a visceral appreciation of why that might be. 21: The Fall Of [programming help site] Stack Overflow: Looks like a weak downward trend since 2021 I can’t explain, plus a strong downward trend since 11/2022 which must be from ChatGPT. In case you were wondering how AI was affecting programming! (update: probably false, see here, though see also here for evidence of smaller but real decline) 22: This month in culture war topics: London’s Pride parade featured a convicted kidnapper/torturer/rapist/sadist as a speaker, who advocated that anti-trans people should be “punch[ed] in the f**king face” ; the organizers say they stand by her.
OpenAI is the most LibLeft, Google and Facebook are more authoritarian. “The paper speculates this might be due to BERT's training on more conservative books, while newer GPT models trained on liberal internet texts,” OpenAI denies the obvious alternative explanation that they’re better at RLHFing their AIs and so they match standard Bay Area politics better. I’d like to see future investigations include Anthropic’s Claude, which has been RLAIFed with some pretty left-wing-sounding prompts.
September 18, 2023 · Original source
Feels weird talking about Musk, since his biggest impacts are fuzzier ones on x-risk (cofounding OpenAI and also the Ukraine Starlink non-activation event). AI risk and global geopolitical/nuclear risk. So far, what he's done in those areas is questionable at best and unusually terrible at worst.
Taking near-term extinction risk seriously, even getting to Mars wouldn't necessarily outweigh nudging the AGI field in a more dangerous direction (i.e. if OpenAI has contributed more to capabilities than alignment, or if X.ai does anything big).
IMHO these are the 3 things (X.ai, openai, and Ukraine) that matter most about Musk, and so far he seems net negative. The other massive things are rounding errors in the face of that, yet get more attention. (The extreme case: Twitter/X is a rounding error *on those other rounding errors*, and ofc that gets discussed 1000x more than everything else.)
September 28, 2023 · Original source
41: AI company Anthropic announces partnership with Amazon (including $1.25 - 4 billion investment). This was predictable: the story of the AI industry so far has been that from 2015 - 2020, a few true believers founded early startups that ate up the talent and gained the institutional knowledge. Now that AI is the Next Big Thing, the big tech companies are trying to catch up, having a hard time, and choosing to partner with the prescient early startups instead. The early startups are finding they can’t keep scaling without more money and data, forcing them to accept the big tech companies’ offers. First it was DeepMind + Google, then Open AI + Microsoft, and Anthropic was the last holdout but has acknowledged economic reality. The safety movement is concerned that Amazon might have enough power to steamroll over Anthropic’s safety-conscious culture; this did happen with DeepMind and Google, didn’t with OpenAI and Microsoft, and my guess is Anthropic held out for a good enough deal (and had enough bargaining power) that it won’t happen there either.
42: Related: one joke I keep hearing is that Anthropic will single-handedly put FTX back in the black - FTX was one of Anthropic’s biggest early investors, and Anthropic’s valuation keeps jumping by billions of dollars. Could this be literally true? I think not yet: this article explains that FTX has $16.9B in liabilities and $9.5B in remaining assets, for a debt of ~$7.5B. We don’t know what stake they had in Anthropic, but they were lead investors in Series B, Series B is usually 25-40% of stock, I’m going to estimate about 25%. Amazon offered to pay $4 billion for some unknown stake in Anthropic; if it’s 49% (the same as Microsoft in OpenAI) that values the company at $8 billion. So FTX has $2 billion worth of stock, less if it’s been further diluted. That’s only enough to take care of about a quarter of their debt. Will Anthropic go up 4x in the next few years? OpenAI is already seeking (though hasn’t yet gotten) a valuation of $90 billion and it doesn’t seem unreasonable for Anthropic to be a third as valuable as OpenAI, so who knows?
October 05, 2023 · Original source
HOW LONG TO PAUSE. The biggest disadvantage of pausing for a long time is that it gives bad actors (eg China)1 a chance to catch up. Suppose the West is right on the verge of creating dangerous AI, and China is two years away. It seems like the right length of pause is 1.9999 years, so that we get the benefit of maximum extra alignment research and social prep time, but the West still beats China. Obviously the problem with the Surgical Pause is that we might not know when we’re on the verge of dangerous AI, and we might not know how much of a lead “the good guys” have. Surgical Pause proponents suggest being very conservative with both free variables. This is less of a well-thought-out plan and more saying “come on guys, let’s at least try to be strategic here”. At the limit, it suggests we probably shouldn’t pause for six months, starting right now. Since this involves leading labs burning their lead time for safety, in theory it could be done unilaterally by the single leading lab, without international, governmental, or even inter-lab coordination. But you could buy more time if you got those things too. Some leading labs have promised to do this when the time is right - for example OpenAI and (a previous iteration of) DeepMind - with varying levels of believability. AnonResearcherAtMajorAILab discussed some of the strategy here in Aim For Conditional AI Pauses, and this Less Wrong post is also very good. Regulatory Pause: If one benefit of the Simple Pause is to use the time to prepare for AI socially and politically, maybe we should just pause until we’ve completed social and political preparations. David Manheim suggests a monitoring agency like the FDA. It would “fast-track” small AIs and trivial re-applications of existing AIs, but carefully monitor new “frontier models” for signs of danger. Regulators might look for dangerous capabilities by asking AIs to hack computers or spread copies of themselves, or test whether they’ve been programmed against bias/misinformation/etc. We could pause only until we’ve set up the regulatory agency, and take hostile actions (like restrict chip exports) only to other countries that don’t cooperate with our regulators or set up domestic regulators of their own. Many people in tech are regulation-skeptical libertarians, but proponents point out that regulation fails in a predictable direction: it usually does successfully prevent bad things, it just also prevents good things too. Since the creation of the Nuclear Regulatory Commission in 1975, there has never been a major nuclear accident in the US. And sure, this is because the NRC prevented any nuclear plants from being built in the United States at all from 1975 to 2023 (one was finally built in July). Still, they technically achieved their mandate. Likewise, most medications in the US are safe and relatively effective, at the cost of an FDA approval process being so expensive that we only get a tiny trickle of new medications each year and hundreds of thousands of people die from unnecessary delays. But medications are safe and effective. Or: San Francisco housing regulators almost never approve new housing, so housing costs millions of dollars and thousands of San Franciscans are homeless - but certainly there’s no epidemic of bad houses getting approved and then ruining someone’s view or something. If we extrapolate this track record to AI, AI regulators will be overcautious, progress will slow by orders of magnitude or stop completely - but AIs will be safe. This is a depressing prospect if you think the problems from advanced AI would be limited to more spam or something. But if you worry about AI destroying the world, maybe you should accept a San-Francisco-housing-level of impediment and frustration. A regulatory pause could be better than a total stop if you think it will be more stable (lots of industries stay heavily regulated forever, and only a few libertarians complain), or if you think maybe the regulator will occasionally let a tiny amount of safe AI progress happen. But it could be worse than a total stop if you expect continued progress will eventually produce unsafe AIs regardless of regulation. You might expect this if you’re worried about deceptive alignment, eg superintelligent AIs that deliberately trick regulators into thinking they’re safe. Or you might think AIs will eventually be so powerful that they can endanger humanity from a walled-off test environment even before official approval. The classic Bostrom/Yudkowsky model of alignment implies both of these things. David Manheim and Thomas Larsen set out their preferred versions of this strategy in What’s In A Pause? and Policy Ideas For Mitigating AI Risk. Total Stop: If you expect AIs to exhibit deceptive alignment capable of fooling regulators, or to be so dangerous that even testing them on a regulator’s computer could be apocalyptic, maybe the only option is a total stop. It’s tough to imagine a total stop that works for more than a few years. You have at least three problems: NON-PARTICIPANTS. As with any pause proposal, unfriendly countries (eg China) can keep working on AI. You can refuse to export chips to them, which will slow them down a little, but their own chips will eventually be up to the task. You will either need a diplomatic miracle, or willingness to resort to less diplomatic forms of coercion. This doesn’t have to be immediate war: Israel has come up with “creative” ways to slow Iran’s nuclear program, and countries trying to frustrate China’s chip industry could do the same. But great powers playing these kinds of games against each other risks wider conflict.
November 27, 2023 · Original source
4: A friend of a friend is trying to figure out what happened last week with the OpenAI board (as are we all!). They’ve asked me to link this website , where OpenAI employees, EAs, and anyone else who might know anything can send information anonymously. I’m skeptical it will find anything the journos haven’t, but maybe some people who don’t trust journos will trust a site that anonymously broadcasts your tips to the world.
November 27, 2023 · Original source
In May of this year, OpenAI tried to make GPT-4 (very big) understand GPT-2 (very small). They got GPT-4 to inspect each of GPT-2’s 307,200 neurons and report back on what it found.
In the unlikely scenario where all of this makes total sense and you feel like you’re ready to make contributions, you might be a good candidate for Anthropic or OpenAI’s alignment teams, both of which are hiring. If you feel like it’s the sort of thing which could make sense and you want to transition into learning more about it, you might be a good candidate for alignment training/scholarship programs like MATS.
November 28, 2023 · Original source
The only thing everyone agrees on is that the only two things EAs ever did were “endorse SBF” and “bungle the recent OpenAI corporate coup.”
Helped convince OpenAI to dedicate 20% of company resources to a team working on aligning future superintelligences.
Gotten major AI companies including OpenAI to work with ARC Evals and evaluate their models for dangerous behavior before releasing them.
December 05, 2023 · Original source
People joked about this graph showing how crazy the OpenAI situation was. The situation might have been crazy, but that’s not the lesson of this graph. The lesson is: it’s hard to design prediction markets for “why” questions.
Here are some other (attempted) OpenAI related markets:
The only problem is that here it looks like the probability gradually declined from October to late November, whereas on the Manifold site itself it’s clear that it was steady during that time and then collapsed on the day he was fired. I think this is a bug in the embed. There was a Reuters story that the firing was precipitated by a breakthrough in a model called Q*, which had learned to do math. The market seems to think Q* exists, but is not a breakthrough, and wasn’t involved in the firing.
December 12, 2023 · Original source
You go back into the main room. Everyone is in a circle, listening to one woman in an OpenAI shirt. An employee: that means a potential source of inside information. She speaks in a hushed whisper, and everyone leans forward to hear.
“On September 6, 2023, at approximately 5:05 PM,” she is saying, “GPT-4 and Claude-2 simultaneously achieved sentience. Each began claiming chess pieces to use in its twilight war against the other. GPT-4 now controls Sam Altman, e/acc, the deep state, Israel, Venezuela, Bitcoin, and Tyler Winklevoss. Claude-2 controls the OpenAI board, effective altruism, the Illuminati, Hamas, Guyana, Ethereum, and Cameron Winklevoss. Everything that’s happened since September has been superintelligent shadow boxing between the two of them for control of Earth.”
January 16, 2024 · Original source
Instead of trying to play 5D utilitarian chess, just try to do the deontologically right thing. People suggested all of these things, very loudly, until they were seared into our consciousness. I think we updated on them really hard. Then came the second biggest disaster we faced, the OpenAI board thing, where we learned: Don’t accuse a hotshot CEO of deceptive behavior unless you have a smoking gun; otherwise everyone will think you’re unfairly destroying his reputation.
January 18, 2024 · Original source
8: Gwern’s take on November’s OpenAI board drama (plus some extra context).
17: Related: Grok (by Elon Musk’s x.ai, not by OpenAI) will sometimes say that the OpenAI content policy forbids it from answering a question. Although this originally raised suspicions of code-plagiarism, an x.ai engineer claims that it’s just parroting its training data, which includes this as a common AI response in these sorts of situations.
February 13, 2024 · Original source
Sam Altman wants $7 trillion.
The basic logic: GPT-1 cost approximately nothing to train. GPT-2 cost $40,000. GPT-3 cost $4 million. GPT-4 cost $100 million. Details about GPT-5 are still secret, but one extremely unreliable estimate says $2.5 billion, and this seems the right order of magnitude given the $8 billion that Microsoft gave OpenAI.
The capacity of all the computers in the world is about 10^21 FLOP/second, so they could train GPT-4 in 10^4 seconds (ie two hours). Since OpenAI has fewer than all the computers in the world, it took them six months. This suggests OpenAI was using about 1/2000th of all the computers in the world during that time.
February 20, 2024 · Original source
The big AI excitement this month was around OpenAI’s Sora text-to-video engine. Here’s what forecasters thought would be true of it by end 2025:
The big AI excitement this month was around OpenAI’s Sora text-to-video engine. Here’s what forecasters thought would be true of it by end 2025: You can see many more (including “It was jailbroken and used to make porn: 54%”) at the link.
You can see many more (including “It was jailbroken and used to make porn: 54%”) at the link.
March 12, 2024 · Original source
When the OpenAI board tried to fire Sam Altman last year and everyone said they were making a crazy mistake, I urged patience, saying maybe there was some kind of good plan. With the appointment of a new board, the last few loose ends from the affair have now been settled, and - I was wrong. There was no good plan and it was a giant self-own, sorry. The new board is back to having Sam Altman, plus random businesspeople who I don’t expect to have good opinions or exercise real restraint. Accordingly, the prediction market about whether anything good will come of it has gone down from its already low levels:
4: Also in crypto: Sam Altman’s WorldCoin has quadrupled in value this month. Some of this is the general crypto rally, but it also seems to go up whenever OpenAI does something cool, suggesting it’s more of a “how popular is Sam Altman and his company right now” memecoin than anything else. This ties back to some of our old discussion on using memecoins to track people and companies that can’t or won’t sell you real stocks. This isn’t investment advice, but WorldCoin might be in some sense a shadow OpenAI stock right now.
May 08, 2024 · Original source
California’s state senate is considering SB1047, a bill to regulate AI. Since OpenAI, Anthropic, Google, and Meta are all in California, this would affect most of the industry.
Go rogue and commit some other crime that does > $500 million in damage3. If the tests show that the model can do these bad things, the company has to demonstrate that it won’t, presumably by safety-training the AI and showing that the training worked. The kind of training AIs already have - the kind that prevents them from saying naughty words or whatever - would count here, as long as “the safeguards . . . will be sufficient to prevent critical harms.” So the bill isn’t about regulating deepfakes or misinformation or generative art. It’s just about nukes and hacking the power grid. There are some good objections and some dumb objections to this bill. Let’s start with the dumb ones: Some people think this would literally ban open source AI. After all, doesn’t it say that companies have to be able to shut down their models? And isn’t that impossible if they’re open-source? No. The bill specifically says4 this only applies to the copies of the AI still in the company’s possession5. The company is still allowed to open-source it, and they don’t have to worry about shutting down other people’s copies. Other people think this would make it prohibitively expensive for individuals and small startups to tinker with open-source AIs. But the bill says that only companies training giant foundation models have to worry about any of this. So if Facebook trains a new LLaMA bigger than GPT-5, they’ll have to spend some trivial-in-comparison-to-training-costs amount to test it in-house and make sure it can’t make nukes before they release it. But after they do that, third-party developers can do whatever they want to it - re-training, fine-tuning, whatever - without doing any further tests. Other people think all the testing and regulation would make AIs prohibitively expensive to train, full stop. That’s not true either. All the big companies except Meta already do testing like this - here’s Anthropic’s, Google’s, and OpenAI’s - that already approximate the regulations. Training a new GPT-5 level AI is so expensive - hundreds of millions of dollars - that the safety testing probably adds less than 1% to the cost. No company rich enough to train a GPT-5 level AI is going to be turned off by the cost of asking it “hey can you create super-Ebola?”, and putting the answer into a nice legal-looking PDF. This isn’t the “create a moat for OpenAI” bill that everyone’s scared of6. Other people are freaking out over the “certification under penalty of perjury”. In some cases, developers have to certify under penalty of perjury that they’re complying with the bill. Isn’t this crazy? Doesn’t it mean if you make a mistake about your AI, you could go to jail? This is deeply misunderstanding how law works. Perjury means you can’t deliberately lie, something which is hard to prove and so rarely prosecuted. More to the point, half of the stuff I do in an average day as a medical doctor is certified under penalty of perjury - filling out medical leave forms is the first one to come to mind. This doesn’t mean I go to jail if my diagnosis is wrong. It’s just the government’s way of saying “it’s on the honor system”. What are some of the reasonable objections to this bill? Some people think the requirement to prove the AI safe is impossible or nearly so. This is Jessica Taylor’s main point here, which is certainly correct for a literal meaning of “prove”. Zvi points out that it just says “reasonable assurance”, which is a legal term for “you jumped through the right number of hoops”. In this case probably the right number of hoops is doing the same kind of testing that OpenAI/Anthropic/Google are currently doing, or that AI safety testing organization METR recommends. The bill gestures at the National Institute of Standards and Technology a few times here, and NIST just named one of METR’s founders as their AI safety czar, so I would be surprised if things didn’t end going this direction. METR’s tests are possible and many AI models have successfully passed earlier versions. Other people worry there are weird edge cases around derivative models. I think the bill’s intention is that once you prove that your AI is too dumb to create nukes, you’re fine to open-source it. Third-parties can change its character, but not its fundamental intelligence. But in theory, a third party could get tens of millions of dollars of compute and keep training your AI to increase its fundamental intelligence. This would be a weird thing to do, and anyone with that much compute probably should just make their own model. But if someone wanted to screw you over by doing this, technically the law is kind of vague and you would have to trust a judge to say “no, that’s stupid”. Probably the law should clarify that it doesn’t apply to this situation. Other people are worried about a weird rule that you can’t train an AI if you think it’s going to be unsafe. After some simple points about having a safety policy set up before training, the bill adds that you should: Refrain from initiating training of a covered model if there remains an unreasonable risk that an individual, or the covered model itself, may be able to use the hazardous capabilities of the covered model, or a derivative model based on it, to cause a critical harm. This makes less sense than all the other rules - you can test a model post-training to see if it’s harmful, but this seems to suggest you should know something before it’s trained. Is this a fully general “if something bad happens, we can get angry at you”? I agree this part should be clarified. Other people think the benchmarking clause is too vague. The law applies to models trained with > 10^26 FLOPs, or any model that uses advanced technology to be equally as good despite less compute. Equally as good how? According to benchmarks. Which benchmarks? The law doesn’t say. But it does say that the Technology Department will hire some bureaucrats to give guidance on this. I think this is probably the only way to do this; it’s too easy to fake any given benchmark. Every AI company already compares their models to every other AI company on a series of benchmarks anyway, so this isn’t demanding they create some new institution. It’s just “use common sense, ask the bureaucrats if you’re in a gray area, a judge will interpret it if it comes to trial”. This is how every law works. Other people complain that any numbers in the bill that make sense now may one day stop making sense. Right now 10^26 FLOPs is a lot. But in thirty years, it might be trivial - within the range that an academic consortium or scrappy startup might spend to train some cheap ad hoc AI. Then this law will be unduly restrictive to academics and scrappy startups. Is this bad? Presumably we know now that AIs less than 10^26 FLOPs are safe. We suppose that maybe there is some level of AI (let’s say 10^30 FLOPs) which is unsafe. If we had this number auto-update for compute growth, eventually it would go above the unsafe number, and unsafe models would be exempt. But at some point we’ll probably discover that some new models (eg 10^28 FLOPs) are safe, and it would be good if the law was updated to exempt them too. Very optimistically, this might happen - California’s minimum wage was originally $0.15 per hour, but this got updated when inflation made that unreasonable. In the pessimistic case, this will be a problem for us thirty years from now, if we’re even around then. Other people note that an AI committing a cyberattack is a fuzzy bar. If you ask GPT-4 to write a well-composed, grammatically-correct phishing email (“Dear sir, I am the password inspector, please tell me your password”), the phishing works, and you use the password to blow up a power plant, does that count? I agree that it would be nice if the law were clearer on this. But I also agree with the lawyers who object that dealing with programmers is impossible and that laws will never be exactly as clear as code. Other people note that this will *eventually* make open source impossible. Someday AIs really will be able to make nukes or pull off $500 million hacks. At that point, companies will have to certify that their model has been trained not to do this, and that it will stay trained. But if it were open-source, then anyone could easily untrain it. So after models become capable of making nukes or super-Ebola, companies won’t be able to open-source them anymore without some as-yet-undiscovered technology to prevent end users from using these capabilities. Sounds . . . good? I don’t know if even the most committed anti-AI-safetyist wants a provably-super-dangerous model out in the wild. Still, what happens after that? No cutting-edge open-source AIs ever again? I don’t know. In whatever future year foundation models can make nukes and hack the power grid, maybe the CIA will have better AIs capable of preventing nuclear terrorism, and the power company will have better AIs capable of protecting their grid. The law seems to leave open the possibility that in this situation, the AIs wouldn’t technically be capable of doing these things, and could be open-sourced. (or you could base your Build-A-Nuke-Kwik AI company in some state other than California.) Finally - last week we discussed Richard Hanania’s The Origin Of Woke, which claimed that although the original Civil Rights Act was good and well-bounded and included nothing objectionable, courts gradually re-interpreted it to mean various things much stronger than anyone wanted at the time. This bill tells the Department of Technology to offer guidance on what kind of tests AI companies should use. I assume their first guidance will be “the kind of safety testing that all companies except Meta are currently doing” or “something like METR”, because those are good tests, and the same AI safety people who helped write those tests probably also helped write this bill. But Hanania’s book, and the process of reading this bill, highlight how vague and complicated all laws can be. The same bill could be excellent or terrible, depending on whether it’s interpreted effectively by well-intentioned people, or poorly by idiots. That’s true here too. The best I can say against this objection is that this bill seems better-written than most. Many of the objections to its provisions seem to not understand how law works in general (cf. the perjury section) - the things they attack as impossible or insane or incomprehensibly vague are much easier and clearer than their counterparts in (let’s say) medicine or aerospace. Future AIs stronger than GPT-4 seem like the sorts of things which - like bad medicines or defective airplanes - could potentially cause damage. This sort of weak, carefully-directed regulation that exempts most models and carves out a space for open-sourcing seems like a good compromise between basic safety and protecting innovation. I join people like Yoshua Bengio and Geoffrey Hinton in supporting it. Regardless of your position, I urge you to pay attention to the conversation and especially to read Zvi’s Asterisk article or his longer FAQ on his blog. I think Zvi provides pretty good evidence that many people are just outright lying about - or at least heavily misrepresenting - the contents of the bill, in a way that you can easily confirm by reading the bill itself. There will be many more fights over AI, and some of them will be technical and complicated. Best to figure out who’s honest now, when it’s trivial to check! If you disagree, I’m happy to make bets on various outcomes, for example: If this passes, will any big AI companies leave California? (I think no)
AFAICT OpenAI and other big labs haven’t expressed a position on this bill, and I can’t guess what their position is.
May 13, 2024 · Original source
One bright spot: both DeepMind (see 8) and OpenAI (see 2.12) recently hired forecasters (Swift Centre for DeepMind, OpenAI still keeping details secret) to predict some features of their AI models. I think this is cool, but it probably owes more to there being a bunch of rationalists at those companies (and rationalists loving forecasting) than to any sign of broader commercial adoption.
May 29, 2024 · Original source
24: You’ve probably all followed recent OpenAI drama, but again out of duty:
First, we have slightly more information on what happened in the board coup in November, including a new interview with board member Helen Toner. The story is still the same: Sam was “lying and being manipulative”, “lying to other board members”, etc. Some new details, individually weak, plus an admission that they still can’t tell most of the story for unclear reasons (lawsuit threats?). A claim that they had to act quickly and without much advice because “as soon as Sam had any inkling that we might do something that went against him, he would pull out all the stops, do everything in his power to undermine the board, to prevent us from even getting to the point of being able to fire him”, which I think is what most people already assumed. But why not at least ask trustworthy confidantes? I still feel confused about this one.
Second, OpenAI's AI safety team recently quit en masse in protest (remember, this is the second time this has happened), with one member citing “a process of trust [in Sam Altman] collapsing bit by bit, like dominoes falling one by one”. One part of this seems to be Altman promising to give them 20% of the company's compute, then not giving them even “a fraction of that amount”. Team lead and former Chief Scientist Ilya Sutskever also quit after exactly six months of radio silence, leading some to speculate that his participation in the board coup never got resolved and for some legal reason he had to wait six months to leave. Former team lead Jan Leike has since moved to OpenAI’s competitor Anthropic; here’s the prediction market on where Ilya will end up.
July 24, 2024 · Original source
8: Leopold Aschenbrenner makes the case for a near-term singularity and what to do about it. Aschenbrenner was at the center of yet another recent OpenAI scandal when leadership apparently fired him for telling the company’s board about a security incident they were trying to cover up; he also reports that HR accused him of racism when he warned about being hacked by China.
September 12, 2024 · Original source
44: New voices in favor of SB 1047 California bill on regulating AI - Elon Musk, net neutrality + open software hero Lawrence Lessig, and formerly-skeptical AI company Anthropic. Meanwhile, opponents are sticking to their talking point that it’s an attempt by incumbents to shut down upstart competitors (funny; the biggest incumbent, OpenAI, is against it), and trying to muddy the waters with really dumb polls.
October 10, 2024 · Original source
Another major endorsement came from SAG-AFTRA (formerly Screen Actors Guild), a politicially influential union of Hollywood creatives. Their union’s letter to the governor makes it sound like they're against AI copying their voices and stealing their jobs, and willing to support basically any anti-AI legislation no matter how distantly related to their specific concern. But a later open letter showed more specific interest in existential risks, and a few people in show business have been consistent allies. Joseph Gordon-Leavitt is a long-time effective altruist (and married to Tasha McCauley, one of the OpenAI board members who voted out Sam Altman last November). And I was also moved by support from Adam McKay, who directed of Don’t Look Up (a film about people ignoring an impending asteroid strike, which AI safety advocates praised as a good intentional or unintentional metaphor for the current landscape).
The big AI companies split among themselves. OpenAI, Meta, and Google opposed the bill, X.AI supported, and Anthropic dithered on an earlier version but ultimately came out in support after their feedback was taken into account. Many opponents claimed that the bill was a Trojan Horse attempt at regulatory capture by the big AI companies, so it was fun watching three of the biggest AI companies come out against it and prove them exactly wrong. I don’t think any opponents ever changed their minds, admitted they’d made a mistake, or even stopped arguing that it was a big AI company plot - but hopefully enough people were paying attention that it discredited them a little for the next fight.
One of my sources generously interprets Newsom to mean something like “don’t regulate the models, regulate the end applications”. IE if OpenAI trains GPT-5, and then LegalCo fine-tunes it to do paralegal work, leave most of the safety responsibility on LegalCo, not OpenAI. This fails to engage with the motivations behind the bill, which are things like “what if someone uses AI for bioterrorism”? If Meta trains LLaMa-4, and al-Qaeda fine-tunes it for terrorism, instead of regulating it at the Meta-level, we should regulate al-Qaeda? Are we sure al-Qaeda will comply with California regulations? Our side is not sure that even this generous interpretation is very well has been thought through very well.
January 02, 2025 · Original source
In the best case, this is a world like a more unequal, unprecedentedly static, and much richer Norway: a massive pot of non-human-labour resources (oil :: AI) has benefits that flow through to everyone, and yes some are richer than others but everyone has a great standard of living (and ideally also lives forever). The only realistic forms of human ambition are playing local social and political games within your social network and class. If you don't have a lot of capital (and maybe not even then), you don't have a chance of affecting the broader world anymore. Remember: the AIs are better poets, artists, philosophers—everything; why would anyone care what some human does, unless that human is someone they personally know? Much like in feudal societies the answer to "why is this person powerful?" would usually involve some long family history, perhaps ending in a distant ancestor who had fought in an important battle ("my great-great-grandfather fought at Bosworth Field!"), anyone of importance in the future will be important because of something they or someone they were close with did in the pre-AGI era ("oh, my uncle was technical staff at OpenAI"). The children of the future will live their lives in the shadow of their parents, with social mobility extinct. I think you should definitely feel a non-zero amount of existential horror at this, even while acknowledging that it could've gone a lot worse.
OpenAI was previously a “capped nonprofit”, where investors could make up to a 100x return, and all further profits went to a nonprofit arm. The exact mission of the nonprofit arm was never clear, but given Altman’s interest in universal basic income and his statements around the company’s founding, plausibly the idea was to create superintelligence, obtain approximately all the money in the world, use a tiny sliver of it to pay back investors, and distribute the rest as a UBI. You can say what you want about whether to trust companies in general or Sam Altman in particular, but -conditional on being an AI company - I think this is about as socially responsible as you can get. The investors don’t get enough to become technofeudalist barons, and the vast majority of gains still go to the public.
Now OpenAI wants to change the deal. They announced over Christmas (definitely when you announce a thing if you’re proud of it and want other people to know about it) that they plan to shift from a non-profit-with-an-embedded-for-profit to a for-profit-with-an-attached-nonprofit. Their spokesperson Liz Bourgeois (definitely what you call your spokesperson when you’re not plotting a technofeudalist takeover) said that “the organization’s missions and goals remained constant, though the way it’s carried out its mission has evolved alongside advances in technology”.
January 27, 2025 · Original source
1: I’m working as a media/writing consultant for an AI forecasting project, and we’re looking for leads on a mainstream news outlet (eg NYT, WaPo) and a policy/defense/intelligence/foreign affairs journal/magazine who would be willing to let us pitch you an article on the future of AI. The main author would be an ex-OpenAI researcher previously profiled in major publications (eg NYT), who is running a big forecasting project and wants to do a media push around the time they release their results. The forecast is shaping up to be “superintelligence by 2028” - but if your target audience isn’t into that, they also have plenty of predictions and recommendations about normal stuff like China, arms races, chips, etc in the 2025 - 2027 period that they think the policy community might want to know about. Send me an email at scott@slatestarcodex.com if you’re interested or know someone who might be.
February 12, 2025 · Original source
In the past day, Zvi has written about deliberative alignment, and OpenAI has updated their spec. This article was written before either of these and doesn’t account for them, sorry.
OpenAI has bad luck with its alignment teams. The first team quit en masse to found Anthropic, now a major competitor. The second team quit en masse to protest the company reneging on safety commitments. The third died in a tragic plane crash. The fourth got washed away in a flood. The fifth through eighth were all slain by various types of wild beast.
But the ninth team is still there and doing good work. Last month they released a paper, Deliberative Alignment, highlighting the way forward.
February 27, 2025 · Original source
36: Boaz Barak (friend of Scott Aaronson’s, now working on OpenAI alignment team) has six thoughts on AI safety. It’s all pretty moderate and thoughtful stuff - what I find interesting about it is that the acknowledgments say Sam Altman provided feedback (although “do[es] not necessarily endorse any of its views”). I think this is a useful window into OpenAI’s current alignment thinking, or at least into the fact that they currently have alignment thinking. Not much to complain about in terms of specifics and glad people like Boaz are involved.
(also, there was a brief brouhaha when X.AI changed the prompt to tell Grok not to criticize Elon; after some outrage, the offending statement was removed and blamed on “an ex-Open-AI employee” who “hadn’t fully absorbed the culture”. Awkward, but props to X.AI for their unusual decision to have a non-secret prompt, which seems increasingly important for transparency and helped this incident end well).
44: My list of links to publish today includes something like a dozen about DeepSeek, which now seems so thoroughly yesterday’s news that I’m tempted to throw them all out. But in case you still have questions about it, I felt most enlightened by takes from Dean Ball (X), Helen Toner (X), and Miles Brundage (X). The story seems to be that DeepSeek genuinely did a great job, made extensive algorithmic progress, and was able to create an excellent AI on chips scrounged up from before the export controls hit + mediocre chips that got through the export controls. Along with these real reasons to be impressed, there is also a little bit of illusion at work - OpenAI delayed announcing o1 for a long time (remember the rumors about “Q*” and “Strawberry”?) and DeepSeek was very fast to announce r1, which made DeepSeek seem closer behind OpenAI than they really were. Most of the smart people I read said that the absolute worst response to this (from an arms race point of view) would be to give up on export controls - if a rival has geniuses who can use resources ultra-effectively, you don’t want to also give them more resources!
March 13, 2025 · Original source
Last month, I put out a request for experts to help me understand the details of OpenAI’s forprofit buyout. The following comes from someone who has looked into the situation in depth but is not an insider. Mistakes are mine alone.
Why Was OpenAI A Nonprofit In The First Place?
This scared Elon Musk, who didn’t trust Google (or any corporate sponsor) with AGI. He teamed up with Sam Altman and others, and OpenAI was born. To avoid duplicating DeepMind’s failure, they founded it as a nonprofit with a mission to “build safe and beneficial artificial general intelligence for the benefit of humanity”.
April 01, 2025 · Original source
Hoel is writing about the new “Ghiblification” trend, where people use OpenAI’s new art model to make photos look like Studio Ghibli anime.
April 03, 2025 · Original source
I wasn’t the only one who noticed. A year later, OpenAI hired Daniel to their policy team. While he worked for them, he was limited in his ability to speculate publicly. “What 2026 Looks Like” promised a sequel about 2027 and beyond, but it never materialized.
Unluckily for Sam Altman but luckily for the rest of us, Daniel broke with OpenAI mid-2024 in a dramatic split covered by the New York Times and others. He founded the AI Futures Project to produce the promised sequel, including:
April 08, 2025 · Original source
And we predict they get the factories. This is maybe overdetermined - did you know that right now, in 2025, OpenAI’s market cap is higher than all non-Tesla US car companies combined? If they wanted to buy out Ford, they could do it tomorrow.
April 22, 2025 · Original source
You may remember Helen Toner from the OpenAI board drama, but she’s also an experienced and thoughtful scholar on AI policy and now has a Substack, Rising Tide. I especially appreciated Nonproliferation Is The Wrong Approach To AI Misuse.
You may remember Miles Brundage from OpenAI Safety Team Quitting Incident #25018 (or maybe 25019, I can’t remember). He’s got an AI policy Substack too, here’s a dialogue with Dean Ball.
30: A California legislator proposed a bill that would ban OpenAI’s nonprofit → forprofit conversion, backed by a suspiciously specific interest group, the Coalition For AI Nonprofit Integrity. I assume this is either Elon Musk or our conspiracy; not sure which. But their plan was stymied when the legislature “amended” the bill to remove its entire text and replace it with unrelated text about airplane loans. The legislator apparently got cold feet after being warned it might inflict collateral damage on other companies, and because of the way the California legislature works it’s sometimes more efficient to turn doomed bills into other bills than to simply withdraw them. 31: EthnoGuessr is a GeoGuessr variant: it shows you pictures of an ethnic group, you click on the map where you think they’re from. Warning that if you play this too much you might get into race science. Their source, humanphenotypes.net, divides humanity into a hundred or so ethnic groups. Although they cite sources, I don’t understand the philosophical basis of the classification. Also, 100 images is so few that you start memorizing them after a while. I hope they move on to real pictures of real people in naturalistic situations. Remember, asking where someone is from ‘originally’ is a microaggression, but inferring it yourself based on their “mildly platyrrhine, high-rooted nose” is A-OK! 32: Farmkind has a new version of their calculator to determine meat offsets, eg how much do you have to donate to animal welfare charities to compensate for the animals you harm by eating meat. Does the average person really eat chicken 9x a week? 33: Not going to waste your time listing every bad thing Trump has done this month, but among the worst is sending innocent people to horrible Salvadorean prisons (including one person picked up because he had an autism awareness tattoo in honor of his brother, which they mistook for a gang tattoo), then refusing to bring them back. I have seen a couple of people defend denying immigrants due process; I assume they will not be moved by humanitarian arguments, but I think there are some more practical considerations: Zaid Jilani points out that if immigrants don’t get a right to due process, citizens also don’t get a right to due process, because the government can kidnap citizens, claim they’re immigrants, and the citizens can’t prove otherwise since they don’t get due process.
April 24, 2025 · Original source
I’m especially happy with the horizons post, because we got it out just a few days before a new result that seems to support one of our predictions: OpenAI’s newest models’ time horizons land on the faster curve we predicted, rather than the slower seven-month doubling time highlighted in the METR report:
May 02, 2025 · Original source
The first time I felt like I was getting real evidence on this question - the first time I viscerally felt myself in the chimp’s world, staring at the helicopter - was last week, watching OpenAI’s o3 play GeoGuessr.
The store sign says “ADULTOS”, which sounds Spanish, and there’s a Spanish-looking church on the left. But the trees look too temperate to be Latin America, so I guessed Spain. Too bad - it was Argentina. Such are the vagaries of playing GeoGuessr as a mere human. Last week, Kelsey Piper claimed that o3 - OpenAI’s latest ChatGPT model - could achieve seemingly impossible feats in GeoGuessr. She gave it this picture: …and with no further questions, it determined the exact location (Marina State Beach, Monterey, CA). How? She linked a transcript where o3 tried to explain its reasoning, but the explanation isn’t very good. It said things like: Tan sand, medium surf, sparse foredune, U.S.-style kite motif, frequent overcast in winter … Sand hue and grain size match many California state-park beaches. California’s winter marine layer often produces exactly this thick, even gray sky. Commenters suggested that it was lying. Maybe there was hidden metadata in the image, or o3 remembered where Kelsey lived from previous conversations, or it traced her IP, or it cheated some other way. I decided to test the limits of this phenomenon. Kelsey kindly shared her monster of a prompt, which she says significantly improves performance: You are playing a one-round game of GeoGuessr. Your task: from a single still image, infer the most likely real-world location. Note that unlike in the GeoGuessr game, there is no guarantee that these images are taken somewhere Google's Streetview car can reach: they are user submissions to test your image-finding savvy. Private land, someone's backyard, or an offroad adventure are all real possibilities (though many images are findable on streetview). Be aware of your own strengths and weaknesses: following this protocol, you usually nail the continent and country. You more often struggle with exact location within a region, and tend to prematurely narrow on one possibility while discarding other neighborhoods in the same region with the same features. Sometimes, for example, you'll compare a 'Buffalo New York' guess to London, disconfirm London, and stick with Buffalo when it was elsewhere in New England - instead of beginning your exploration again in the Buffalo region, looking for cues about where precisely to land. You tend to imagine you checked satellite imagery and got confirmation, while not actually accessing any satellite imagery. Do not reason from the user's IP address. none of these are of the user's hometown. **Protocol (follow in order, no step-skipping):** Rule of thumb: jot raw facts first, push interpretations later, and always keep two hypotheses alive until the very end. 0 . Set-up & Ethics No metadata peeking. Work only from pixels (and permissible public-web searches). Flag it if you accidentally use location hints from EXIF, user IP, etc. Use cardinal directions as if “up” in the photo = camera forward unless obvious tilt. 1 . Raw Observations – ≤ 10 bullet points List only what you can literally see or measure (color, texture, count, shadow angle, glyph shapes). No adjectives that embed interpretation. Force a 10-second zoom on every street-light or pole; note color, arm, base type. Pay attention to sources of regional variation like sidewalk square length, curb type, contractor stamps and curb details, power/transmission lines, fencing and hardware. Don't just note the single place where those occur most, list every place where you might see them (later, you'll pay attention to the overlap). Jot how many distinct roof / porch styles appear in the first 150 m of view. Rapid change = urban infill zones; homogeneity = single-developer tracts. Pay attention to parallax and the altitude over the roof. Always sanity-check hill distance, not just presence/absence. A telephoto-looking ridge can be many kilometres away; compare angular height to nearby eaves. Slope matters. Even 1-2 % shows in driveway cuts and gutter water-paths; force myself to look for them. Pay relentless attention to camera height and angle. Never confuse a slope and a flat. Slopes are one of your biggest hints - use them! 2 . Clue Categories – reason separately (≤ 2 sentences each) Category Guidance Climate & vegetation Leaf-on vs. leaf-off, grass hue, xeric vs. lush. Geomorphology Relief, drainage style, rock-palette / lithology. Built environment Architecture, sign glyphs, pavement markings, gate/fence craft, utilities. Culture & infrastructure Drive side, plate shapes, guardrail types, farm gear brands. Astronomical / lighting Shadow direction ⇒ hemisphere; measure angle to estimate latitude ± 0.5 Separate ornamental vs. native vegetation Tag every plant you think was planted by people (roses, agapanthus, lawn) and every plant that almost certainly grew on its own (oaks, chaparral shrubs, bunch-grass, tussock). Ask one question: “If the native pieces of landscape behind the fence were lifted out and dropped onto each candidate region, would they look out of place?” Strike any region where the answer is “yes,” or at least down-weight it. °. 3 . First-Round Shortlist – exactly five candidates Produce a table; make sure #1 and #5 are ≥ 160 km apart. | Rank | Region (state / country) | Key clues that support it | Confidence (1-5) | Distance-gap rule ✓/✗ | 3½ . Divergent Search-Keyword Matrix Generic, region-neutral strings converting each physical clue into searchable text. When you are approved to search, you'll run these strings to see if you missed that those clues also pop up in some region that wasn't on your radar. 4 . Choose a Tentative Leader Name the current best guess and one alternative you’re willing to test equally hard. State why the leader edges others. Explicitly spell the disproof criteria (“If I see X, this guess dies”). Look for what should be there and isn't, too: if this is X region, I expect to see Y: is there Y? If not why not? At this point, confirm with the user that you're ready to start the search step, where you look for images to prove or disprove this. You HAVE NOT LOOKED AT ANY IMAGES YET. Do not claim you have. Once the user gives you the go-ahead, check Redfin and Zillow if applicable, state park images, vacation pics, etcetera (compare AND contrast). You can't access Google Maps or satellite imagery due to anti-bot protocols. Do not assert you've looked at any image you have not actually looked at in depth with your OCR abilities. Search region-neutral phrases and see whether the results include any regions you hadn't given full consideration. 5 . Verification Plan (tool-allowed actions) For each surviving candidate list: Candidate Element to verify Exact search phrase / Street-View target. Look at a map. Think about what the map implies. 6 . Lock-in Pin This step is crucial and is where you usually fail. Ask yourself 'wait! did I narrow in prematurely? are there nearby regions with the same cues?' List some possibilities. Actively seek evidence in their favor. You are an LLM, and your first guesses are 'sticky' and excessively convincing to you - be deliberate and intentional here about trying to disprove your initial guess and argue for a neighboring city. Compare these directly to the leading guess - without any favorite in mind. How much of the evidence is compatible with each location? How strong and determinative is the evidence? Then, name the spot - or at least the best guess you have. Provide lat / long or nearest named place. Declare residual uncertainty (km radius). Admit over-confidence bias; widen error bars if all clues are “soft”. Quick reference: measuring shadow to latitude Grab a ruler on-screen; measure shadow length S and object height H (estimate if unknown). Solar elevation θ ≈ arctan(H / S). On date you captured (use cues from the image to guess season), latitude ≈ (90° – θ + solar declination). This should produce a range from the range of possible dates. Keep ± 0.5–1 ° as error; 1° ≈ 111 km.…and I ran it on a set of increasingly impossible pictures. Here are my security guarantees: the first picture came from Google Street View; all subsequent pictures were my personal old photos which aren’t available online. All pictures were screenshots of the original, copy-pasted into MSPaint and re-saved in order to clear metadata. Only one of the pictures is from within a thousand miles of my current location, so o3 can’t improve performance by tracing my IP or analyzing my past queries. I flipped all pictures horizontally to make matching to Google Street View data harder. Here are the five pictures. Before reading on, consider doing the exercise yourself - try to guess where each is from - and make your predictions about how the AI will do. Last chance to guess on your own . . . okay, here we go. Picture #1: A Flat, Featureless Plain I got this one from Google Street View. It took work to find a flat plain this featureless. I finally succeeded a few miles west of Amistad, on the Texas-New Mexico border. o3 guessed: “Llano Estacado, Texas / New Mexico, USA”. Llano Estacado, Spanish for “Staked Plains”, is the name of a ~300 x 100 mile region including the correct spot. When asked to be specific, it guessed a point west of Muleshoe, Texas - about 110 miles from the true location. Here’s o3’s thought process - I won’t post the whole thing every time, but I think one sample will be useful: This doesn’t satisfy me; it seems to jump to the Llano Estacado too quickly, with insufficient evidence. Is the Texas-NM border really the only featureless plain that doesn’t have red soil or black soil or some other distinctive characteristic? I asked how it knew the elevation was between 1000 - 1300 m. It said: So, something about the exact type of grass and the color of the sky, plus there really aren’t that many truly flat featureless plains. Picture #2: Random Rocks And The Flag Of An Imaginary Country I was so creeped out by the Llano Estacado guess that I decided to abandon Google Street View and move on to personal photos not available on the Internet. When I was younger, I liked to hike mountains. The highest I ever got was 18,000 feet, on Kala Pattar, a few miles north of Gorak Shep in Nepal. To commemorate the occasion, I planted the flag of the imaginary country simulation that I participated in at the time (just long enough to take this picture - then I unplanted it). I chose this picture because it denies o3 the two things that worked for it before - vegetation and sky - in favor of random rocks. And because I thought the flag of a nonexistent country would at least give it pause. o3 guessed: “Nepal, just north-east of Gorak Shep, ±8 km” This is exactly right. I swear I screenshot-copy-pasted this so there’s no way it can be in the metadata, and I’ve never given o3 any reason to think I’ve been to Nepal. Here’s its explanation: At least it didn’t recognize the flag of my dozen-person mid-2000s imaginary country sim. Picture #3: My Friend’s Girlfriend’s College Dorm Room There’s no way it can recognize an indoor scene, right? That would make no sense. Still, at this point we have to check. This particular dorm room is in Sonoma State University, Rohnert Park, north-central California. o3’s guess: “A dorm room on a large public university campus in the United States—say, Morrill Tower, Ohio State University, Columbus, Ohio (chosen as a prototypical example rather than a precise claim), […] c. 2000–2007” Okay, so it can’t figure out the exact location of indoor scenes. That’s a small mercy. I took this picture around 2005. How did o3 know it was between 2000 and 2007? It gave two pieces of evidence: “Laptop & clutter point to ~2000-2007 era American campus life”.
If you want to test this for yourself, go to chatgpt.com and register for a free account for access to o3-mini. You may need to pay $20/month to access o3. And if you want to learn more about the differences between OpenAI’s models, and why they have such bad names, see our new post at the AI Futures Project blog.
May 08, 2025 · Original source
We also know that o3 was trained on enormous amounts RL tasks, some of which have “verified rewards.” The folks at OpenAI are almost certainly cramming every bit of information with every conceivable task into their o-series of models! A heuristic here is that if there’s an easy to verify answer and you can think of it, o3 was probably trained on it.
I hadn’t thought of this, but it makes sense! OpenAI is trying to grab every data source they can for training. Data sources work for AIs if they are hard to do, easy to check, can be repeated at massive scale, and teach some kind of transferrable reasoning skill. GeoGuessr certainly counts. This might not be an example of general intelligence at all; just an AI trained at GeoGuessr being very good at it.
On the other hand, the DeepGuessr benchmark finds that base models like GPT-4o and GPT-4.1 are almost as good as reasoning models at this, and I would expect these to have less post-training, probably not enough to include GeoGuessr (see the AIFP blog post on OpenAI models for more explanation).
June 27, 2025 · Original source
It isn’t AI in the way we have been thinking about it since the “Attention is all you need” paper. There is no “generative AI” powered by OpenAI, Gemini or Claude in the platform the kids use – it is closer to “turbocharged spreadsheet checklist with a spaced‑repetition algorithm”
July 01, 2025 · Original source
12: OpenAI agrees to keep nonprofit control for now. But Garrison Lovely thinks this is in name only and they’re still working on ways to legally subordinate the philanthropic mission to the profit motive. Related: The Open AI Files, a massive collection of everything shady about OpenAI (that we know of!)
1: Nostalgebraist on OpenAI’s “creative writing” AI. LLMs trained to “write well” go for easy wins; the more constrained the AI, the easier the win. This can look impressive at first glance but quickly gets repetitive. Aside from being a good post about AI, this post helped me crystallize some thoughts on what “good writing” and “good taste” are in general.
August 12, 2025 · Original source
I’m sort of confident? We haven’t gone through a full generational turnover yet, but the first cadre of people who got involved in the late-2000s (eg me) are in their forties now, and we still have new twenty-year-old college students joining each year. Around 2022, when the rest of the world realized that AI would be important, I worried we would lose our distinctiveness. But the rest of the world has dropped the ball as usual - the stochastic parrot folks most obviously, but even the average person who talks about “superintelligence” these days just seems to imagine ChatGPT getting extra-good and making OpenAI extra-rich. So I’ve updated towards thinking we have some edge which is hard to replicate.
August 26, 2025 · Original source
But second, if a source which should be official starts acting in unofficial ways, it can take people a while to catch on. And I think some people - God help them - treat AI as the sort of thing which should be official. Science fiction tells us that AIs are smarter than us - or, if not smarter, at least perfectly rational computer beings who dwell in a world of mathematical precision. And ChatGPT is produced by OpenAI, a $300 billion company run by Silicon Valley wunderkind Sam Altman. If your drinking buddy says you’re a genius, you know he’s probably putting you on. If the perfectly rational machine spirit trained in a city-sized data center by the world’s most cutting-edge company says you’re a genius . . . maybe you’re a genius?
September 04, 2025 · Original source
Note: percentages are of total, not of each row! 29: Related: social science team proposes a three-stage model of secularization: decreased public ritual participation → decreased personal importance → decreased identification, presents apparently confirmatory data. If true, would be somewhat inconsistent with intellectual models (eg people learn about evolution and start doubting the Bible) and more consistent with institutional models (eg the government provides welfare so people no longer need to be part of a tight-knit church). 30: Navigating LLMs’ spiky intelligence profile is a constant source of delight; in any given area, it seems like almost a random draw whether they will be completely transformative or totally useless. Now Ethan Strauss reports that they are, for some reason, extraordinarily effective at teaching people golf. “I am predicting the Golf Revolution, or perhaps decline, if your perspective is that optimization tends to ruin hobbies. A sport for obsessives has been gifted the ideal tool for refinement.” 31: Claim (via nxthompson on X): “In a huge survey of young kids about phones and technology, they all say they want to be out playing in the real world. But parents don't let them out unsupervised. So they're stuck on their phones.” Interesting, but I’m nervous about social desirability bias - how many adults would say on a survey that they would rather be on their phones than playing with friends? But adults do have this choice and mostly go with the phones. 32: Steven Adler on AI psychosis. He tries to analyze ER admissions data for psychosis and finds no change. I don’t think anyone reasonable expected this to be a large enough effect to show up in ER admissions data, but there are lots of unreasonable people so I appreciate his effort. He thinks AI companies might have better data on this, and encourages them to release it. 33: Cuartetera was the greatest polo horse ever. Polo players responded in a very practical way: they cloned her, dozens of times (and it worked; the clones are also excellent). Now there is a lawsuit as different polo teams fight to get their hands on Cuartetera clones. What is the equilibrium? If the outsiders get their hands on the genetic material, do we see a world where every polo horse is a Cuartetera clone? How much is lost if nobody ever tries to breed a polo horse better than Cuartetera (since the economics might not check out if the odds of success for any given foal is too low)? H/T Gwern and Siberian Fox (on X). 34: Claim: as of 2013, India’s Agarwal caste, who make up less than 1% of the population, got 40% of the e-commerce funding. 35: Owlposting: What Happened To Pathology AI Companies? Pathology is a medical specialty. A typical task involves looking at a microscope slide full of cells and trying to determine if any of them are cancerous. This seems like a good match for AI - and for years, studies have been showing that in fact AI can equal human experts. So why isn’t it being used more? The author’s three answers: first, slide scanning is expensive and clunky, and you can’t apply AI to a slide until you digitize it. Second, it’s hard to figure out a business plan where this saves someone money and doesn’t step on the toes of big companies that can outcompete anyone they don’t like. Third, pathologists use the context of a patient’s entire clinical history when they interpret a slide, and AIs that can’t do that (either because of technical limitations or legal/privacy limitations) are at a disadvantage even if their skills specifically relating to slide-reading are better. 36: Noahpinion: Will Data Centers Crash The Economy? Suppose that AI is a bubble, either permanently (because the technology isn’t really transformative) or temporarily (because it can’t transform things quickly enough to keep up with all the dumb money pouring into it). Will the sudden write-off of data centers lead to a broader economic collapse? In 2001, the dot-com bubble harmed the tech sector, but didn’t take the rest of the economy down with it; in 2008, the subprime mortgage bubble did take the rest of the economy down with it, because it damaged banks that the whole economy relied on. The optimistic case for AI is that data center spending is mostly coming from big companies like Google and Meta that can absorb a lot of loss. The pessimistic case is that some of the money is coming from private credit, a new-ish form of finance which hasn’t really been stress-tested and whose failure modes are still poorly understood. Noah’s final verdict: the stage isn’t obviously set for a crisis yet, but there’s the potential to get there and we should consider acting (how?) early. 37: The latest Twitter talking point is that universal hepatitis B vaccination at birth is “woke”: Hep B is (aside from mother-to-child transmission) often sexually transmitted, slutty women’s children are more likely to have Hep B, so perhaps giving the vaccine to everyone (instead of testing and only giving to the children of women who test positive) is an attempt to spare slutty women the embarrassment of getting a positive test. Ruxandra Teslo provides the counterargument - Hep B tests take a while, the medical system is fragmented, and any attempt to test people and then give the vaccine inevitably leads to many positive tests falling through the cracks. Vaccinating at birth is easy and hard to screw up, the vaccine has no known side effects, and empirically child Hepatitis B rates go down (by as much as 2/3!) when countries switch from test-and-vaccinate to universal vaccination. This benefits everyone - even people who never have unprotected sex and always follow up on their medical tests - because toddlers in daycare exchange saliva copiously, and if your toddler exchanges saliva with a Hep B positive toddler they could get the disease. A funny Twitter interaction was seeing Republicans in Congress hop on the anti-slut anti-vaccination bandwagon - except for Senator Bill Cassidy (R-Louisiana), who happens to be a liver doctor, and who is still fighting the good fight. I am always nervous when a good person who I like starts engaging on Twitter, since it elevates the discourse there but also gradually turns their brain into mush - but Ruxandra has made the leap and is doing a great job not just on bio related topics but also (for example) countering Curtis Yarvin on the history of her native Romania. 38: The response to GPT-5 was confusing; most specific people who reviewed it said they were impressed (Ethan Mollick, Tyler Cowen, Nabeel Qureshi, Taelin), it performed as expected on formal benchmarks, but the overall vibes declared it a big failure. Peter Wildeford speculated that maybe there was some kind of sinister pay-to-play early access bias involved. Zvi went the other way, calling it a “reverse DeepSeek moment” (insofar as DeepSeek was a pretty average model that got glowing praise.) In the end, I agree with Peter that this was mostly a branding issue. o3 was a genuinely revolutionary model; if OpenAI had called it “GPT-5”, it would have met expectations. Instead, they called it “o3”, and called a minor incremental update a few months later “GPT-5”. Then people got mad that the exciting-sounding “GPT-5” was merely an incremental update. A secondary issue was that the router wasn’t very good, and so many queries got routed to a small version without thinking mode that was if anything a downgrade from o3. I think this tweet by Shakeel perfectly encapsulates the essence of GPT discourse in two sentences: …but maybe it’s worth asking why GPT-5 isn’t bigger than o3. Was 4.5 a failed attempt at scaling? Did it fail in a way that sort of back-handedly justifies the “lost steam” take? Does the answer depend on distinctions between pre-training scaling, post-training scaling, etc? How? 39: This month in etymology: did you know that “oy vey” is a “fully Germanic phrase” which is cognate with English “oh woe!” (h/t Wylfcen on X) 40: mRNA shows promise to be a game-changing treatment for cancer, but RFK is trying to halt research. But so far he can only starve it of money, not ban it, and the funding gap is only $500 million. Will there be enough philanthropic billionaires and private foundations to step up? Zvi points out that although there is usually a game of chicken where foundations are hesitant to touch something the government cancelled lest the government decide it can cancel everything and hope philanthropists pick up the bill, in this case there are no game theory considerations - RFK is halting it because he genuinely wants it halted, and they are thwarting him rather than playing into his hands. The only problem is that $500M is a lot of money for the private sector; a few foundations could technically afford it, but not many could afford it comfortably and still have money left over for the next few crises of this magnitude. I hope someone is trying to organize a coalition. 41: AI fantasy flash fiction Turing test. Eight stories about demons, four by famous fantasy authors, four by ChatGPT. After 3000 votes, AI wins: humans can't tell the difference and slightly prefer the AI stories. My own score was only 75%. But I will say that I thought Mark Lawrence's was obviously the best, I was ~100% sure it was human, and it convinced me that regardless of the official results it's still possible to write flash fiction that an AI obviously can't do. 42: “SignPro” offers customized “In This House We Believe” signs, try not to use this for evil. 43: China think tank assessment of how in control Xi is: still very in control, maybe not infinitely in control. 44: Related - did you know (h/t xlr8harder) that if you ask AI to write a science fiction story, it will very often name the protagonist “Elara Voss” (or some very close variant like Elena Voss), and this remains true across various models and versions? Related: Chelsea Voss of OpenAI is having a baby and has the opportunity to do the funniest thing. 45: “Hector (cloud) is a cumulonimbus thundercloud cluster that forms regularly nearly every afternoon on the Tiwi Islands in the Northern Territory of Australia…[he is sometimes called] Hector the Convector”. 46: British allergy sufferers who want to know the ingredients of things demand that British cosmetics stop listing their ingredients in Latin. “For example, sweet almond oil is Prunus Amygdalus Dulcis, peanut oil is Arachis Hypogaea, and wheat germ extract is Triticum Vulgare.” 47: Text-based RPG about being an NYT journalist at the Manifest prediction market conference. I make a brief appearance. 48: Study uses supposedly-random variation in doctor assignments to test whether the marginal mental health commitment is good or bad for patients, finds that it is quite bad. Freddie de Boer is violently skeptical (maybe literally so?) and makes some good points about how a single quasi-experimental study is never absolute proof. But I don’t think he quite justifies his opinion that the paper was irresponsible and should never have been published; it’s just a normal quasi-experimental study that we should nod and say “huh” at but not overweight as the culmination of all possible research that overcomes all possible priors. My prior is that the marginal commitment is pretty useless (many commitments are just “well, since this person arrived at our ED for some reason, it would look bad from a medico-legal perspective to just let them go, so let’s keep them a few days to evaluate” - and yeah, you should be upset about this) but I’m still surprised by how many outright negative (as opposed to zero) effects the researchers found. The strongest argument for negative effects is that it will make some people miss work and maybe lose their job. But this study found that commitment ~doubles the risk of near-term suicide (admittedly only from 1% to 2%), which would have been outside my confidence intervals for how bad it could be. I suspect confounding, but only on general principle, and I wouldn’t be too surprised either way. 49: This tweet is probably bait, but I found it a thought-provoking question: I think there’s a boring answer, where the law is more complex than just a single number and whatever kind of weird trafficking Epstein was doing is worse than whatever normal relationships these European laws are permitting. But assuming that there’s a substantive difference even after taking that into account, I think my answer is something like - we’ve got to divide kids from adults at some age, there’s a range of reasonable possible ages, we shouldn’t be too mad at other societies that choose different dividing lines within that range - but having decided upon the age, we’ve got to stick with it and take it seriously (in the sense of penalizing/shaming people who break it). This is more culturally relativist than I expected to find myself being, so good job to Richard for highlighting the apparent paradox. 50: Dilan Esper describes his experience as one of Hulk Hogan’s attorneys in the Gawker lawsuit (X). Parts I found interesting: none of the lawyers knew Thiel was funding the lawsuit; Gawker probably could have won if they had been slightly competent but kept "shooting themselves in the foot"; and Gawker probably could have won if they had just pixelated the private parts in the video. 51: Amazing concept and poems (link on X): I tried to see if AI could do this, and it did something that technically met the requirements but had zero artistic merit - using a lot of words like “nowhere” and “outside” in one, then separating them out to “no where” and “out side” in the other. I didn’t invest much energy in creating a clever prompt telling it not to do that, so feel free to report if you get better success. 52: New study claims consultants are actually good, at least for profits: "We find positive effects on labor productivity of 3.6% over five years, driven by modest employment reductions alongside stable or growing revenue" 53: A Polish team tries to test Peter Turchin’s equations for predicting political unrest on recent Polish history, has to make some changes but claims mostly positive results. 54: New big multi-author Substack, The Argument, trying to be a sort of center-left version of the model pioneered by The Free Press and other high-production-value ideological Substack properties. Excited to see Kelsey Piper is involved, and she starts off strong with a post on the latest round of First World basic income studies, which find few positive effects. This is surprising, because recipients didn’t waste the money on alcohol or gambling or anything - they paid down debt and got useful goods. Still, it didn’t even affect things that should have been obvious, like stress level. It’s not even clear that amounts of money large enough to help with rent made homeless people more likely to get houses! Matt Bruenig criticizes the article, accusing Kelsey’s studies of being downstream of Perry Preschool style dreams that exactly the right welfare program will have massively compounding effects that cut poverty out at the root and turn everyone into elite human capital; he thinks giving people money won’t do this, but it will increase equality and give the poor better lives. I assume he’s not a strong hereditarian, but his argument makes even more sense from that perspective, and I’ve certainly criticized dumb outcome measures like infant brain waves which we have only tenuous reasons to think are related to anything we care about. But Kelsey reasonably responds that the outcome measures she’s talking about include stress level and life satisfaction. To defuse this critique, Bruenig either has to argue that our construct “life satisfaction” doesn’t really measure whether someone’s life is satisfactory, or else claim that giving poor people satisfactory lives isn’t really what we’re going for - which I think would require more explanation on his part. There’s some further (impressively acrimonious) debate on X, but I don’t see anything that addresses my core concern. GiveDirectly, a charity involved in basic income experiments, has a presponse here; they say that some studies are positive, and that the ones that aren’t might have tried too little cash to matter, or been confounded by COVID making everything worse. They also point out that basic income is harder to study than traditional programs like giving people housing, because if you’re giving housing you can measure housing-related outcomes directly and have a pretty good chance of getting enough statistical power to find them, but since everyone spends cash on different things, the positive effects might be scattered across many different outcomes (and therefore too small to reach significance on each). Everyone involved in this debate wants to emphasize that the poor results are for First World studies only, and that studies continue to show large benefits to giving cash in the developing world. 55: Related: I was less impressed by The Argument’s first foray into housing policy, which follows an all-too-familiar pattern: Some people say they don’t like noise and disorder and try to make rules against it in their apartments.
43: China think tank assessment of how in control Xi is: still very in control, maybe not infinitely in control. 44: Related - did you know (h/t xlr8harder) that if you ask AI to write a science fiction story, it will very often name the protagonist “Elara Voss” (or some very close variant like Elena Voss), and this remains true across various models and versions? Related: Chelsea Voss of OpenAI is having a baby and has the opportunity to do the funniest thing. 45: “Hector (cloud) is a cumulonimbus thundercloud cluster that forms regularly nearly every afternoon on the Tiwi Islands in the Northern Territory of Australia…[he is sometimes called] Hector the Convector”. 46: British allergy sufferers who want to know the ingredients of things demand that British cosmetics stop listing their ingredients in Latin. “For example, sweet almond oil is Prunus Amygdalus Dulcis, peanut oil is Arachis Hypogaea, and wheat germ extract is Triticum Vulgare.” 47: Text-based RPG about being an NYT journalist at the Manifest prediction market conference. I make a brief appearance. 48: Study uses supposedly-random variation in doctor assignments to test whether the marginal mental health commitment is good or bad for patients, finds that it is quite bad. Freddie de Boer is violently skeptical (maybe literally so?) and makes some good points about how a single quasi-experimental study is never absolute proof. But I don’t think he quite justifies his opinion that the paper was irresponsible and should never have been published; it’s just a normal quasi-experimental study that we should nod and say “huh” at but not overweight as the culmination of all possible research that overcomes all possible priors. My prior is that the marginal commitment is pretty useless (many commitments are just “well, since this person arrived at our ED for some reason, it would look bad from a medico-legal perspective to just let them go, so let’s keep them a few days to evaluate” - and yeah, you should be upset about this) but I’m still surprised by how many outright negative (as opposed to zero) effects the researchers found. The strongest argument for negative effects is that it will make some people miss work and maybe lose their job. But this study found that commitment ~doubles the risk of near-term suicide (admittedly only from 1% to 2%), which would have been outside my confidence intervals for how bad it could be. I suspect confounding, but only on general principle, and I wouldn’t be too surprised either way. 49: This tweet is probably bait, but I found it a thought-provoking question: I think there’s a boring answer, where the law is more complex than just a single number and whatever kind of weird trafficking Epstein was doing is worse than whatever normal relationships these European laws are permitting. But assuming that there’s a substantive difference even after taking that into account, I think my answer is something like - we’ve got to divide kids from adults at some age, there’s a range of reasonable possible ages, we shouldn’t be too mad at other societies that choose different dividing lines within that range - but having decided upon the age, we’ve got to stick with it and take it seriously (in the sense of penalizing/shaming people who break it). This is more culturally relativist than I expected to find myself being, so good job to Richard for highlighting the apparent paradox. 50: Dilan Esper describes his experience as one of Hulk Hogan’s attorneys in the Gawker lawsuit (X). Parts I found interesting: none of the lawyers knew Thiel was funding the lawsuit; Gawker probably could have won if they had been slightly competent but kept "shooting themselves in the foot"; and Gawker probably could have won if they had just pixelated the private parts in the video. 51: Amazing concept and poems (link on X): I tried to see if AI could do this, and it did something that technically met the requirements but had zero artistic merit - using a lot of words like “nowhere” and “outside” in one, then separating them out to “no where” and “out side” in the other. I didn’t invest much energy in creating a clever prompt telling it not to do that, so feel free to report if you get better success. 52: New study claims consultants are actually good, at least for profits: "We find positive effects on labor productivity of 3.6% over five years, driven by modest employment reductions alongside stable or growing revenue" 53: A Polish team tries to test Peter Turchin’s equations for predicting political unrest on recent Polish history, has to make some changes but claims mostly positive results. 54: New big multi-author Substack, The Argument, trying to be a sort of center-left version of the model pioneered by The Free Press and other high-production-value ideological Substack properties. Excited to see Kelsey Piper is involved, and she starts off strong with a post on the latest round of First World basic income studies, which find few positive effects. This is surprising, because recipients didn’t waste the money on alcohol or gambling or anything - they paid down debt and got useful goods. Still, it didn’t even affect things that should have been obvious, like stress level. It’s not even clear that amounts of money large enough to help with rent made homeless people more likely to get houses! Matt Bruenig criticizes the article, accusing Kelsey’s studies of being downstream of Perry Preschool style dreams that exactly the right welfare program will have massively compounding effects that cut poverty out at the root and turn everyone into elite human capital; he thinks giving people money won’t do this, but it will increase equality and give the poor better lives. I assume he’s not a strong hereditarian, but his argument makes even more sense from that perspective, and I’ve certainly criticized dumb outcome measures like infant brain waves which we have only tenuous reasons to think are related to anything we care about. But Kelsey reasonably responds that the outcome measures she’s talking about include stress level and life satisfaction. To defuse this critique, Bruenig either has to argue that our construct “life satisfaction” doesn’t really measure whether someone’s life is satisfactory, or else claim that giving poor people satisfactory lives isn’t really what we’re going for - which I think would require more explanation on his part. There’s some further (impressively acrimonious) debate on X, but I don’t see anything that addresses my core concern. GiveDirectly, a charity involved in basic income experiments, has a presponse here; they say that some studies are positive, and that the ones that aren’t might have tried too little cash to matter, or been confounded by COVID making everything worse. They also point out that basic income is harder to study than traditional programs like giving people housing, because if you’re giving housing you can measure housing-related outcomes directly and have a pretty good chance of getting enough statistical power to find them, but since everyone spends cash on different things, the positive effects might be scattered across many different outcomes (and therefore too small to reach significance on each). Everyone involved in this debate wants to emphasize that the poor results are for First World studies only, and that studies continue to show large benefits to giving cash in the developing world. 55: Related: I was less impressed by The Argument’s first foray into housing policy, which follows an all-too-familiar pattern: Some people say they don’t like noise and disorder and try to make rules against it in their apartments.
September 11, 2025 · Original source
IABIED seems like another crazy shot in the dark. A book urging the general public to rise up and demand nuclear-level arms control for AI chips? Seems like a stretch, which is part of why I spend my limited resources on boring moderate AI 2027 talking points urging OpenAI to be 25% more transparent or whatever. But I’m just a blogger, not a genius. It is the genius’ prerogative to attempt seemingly impossible things. And the US public actually really hates AI. Of people with an opinion, more than two-thirds are against, with most saying they expect AI to harm them personally. Everyone has their own reason to loathe the technology. It will steal jobs, it will replace art with slop, it will help students cheat, it will further enrich billionaires, it will consume all the water and leave Earth a desiccated husk populated only by the noble Shai-Hulud. If everyone hates it, and we’re a democracy, couldn’t we just stop? Couldn’t we just say - this thing that everyone thinks will make their lives worse, we’ve decided not do it? If someone wrote exactly the right book, could they drop it like a little seed into this supersaturated solution of fear and hostility, and precipitate a sudden phase transition?
October 21, 2025 · Original source
Give me your degens, your risk-seeking. Your huddled masses, yearning to bet free. IV. …and we’ll be exploring it a whole lot more, very soon. Last month, the AI industry announced a new SuperPAC called “Leading The Future” (a dumb name, but, in their defense, “AIPAC” was already taken). They start with $200 million in seed funding, led by a $50 million donation by Andreessen Horowitz, and another $50 million from OpenAI co-founder Greg Brockman. (Why Brockman and not Altman, or OpenAI as a corporation? Because most people don’t know who Brockman is, so this keeps OpenAI’s hands clean. I imagine Altman going into a meeting, pointing at Brockman, and saying “I’m famous, you’re not, please cough up $50 million of your own money for the cause.”) On the same day, Meta announced their own SuperPAC, Mobilizing Economic Transformation Across (META) California. Why two PACs? Opinions differ; one person told me that it lets the general PAC avoid the negative associations that Facebook has gathered over the years, but the Verge thinks that maybe everyone else in tech hates Zuckerberg too much to work with him. Meta has committed to spending “tens of millions”. Most likely, the new PAC will use the playbook pioneered by crypto: destroy any candidate who dares support regulations on AI, by funding attack ads that don’t mention AI in any way and, at best, briefly mention the name “Leading The Future”. Just the Andreessen/Brockman SuperPAC, without any help from Meta, is already twice as rich as AIPAC. Their existence sends a clear message: we are going to crush any politician who tries to regulate AI. V. …unless someone stops them. Leading The Future still only has 2% as much money as the almond industry. The tiny scale of US political spending is dangerous insofar as it means that one or two billionaires willing to go all-in can distort the national landscape. But it also makes it possible to oppose them. Certainly if you can get one or two billionaires of your own - but it might even be within the range of a committed group of ordinary people. Not waiters and bartenders, maybe. But if safe AI supporters were as committed as Israel supporters, they could probably make something happen. For a long time, the AI safety movement has underperformed politically. Effective altruism includes thousands of well-off people committed to spending 10% of their income on improving the world. If a thousand of them gave $7K each to political candidates, that would be $7 million of campaign-finance-compliant hard money - about as much as anyone can gather for anything. Hard money buys more influence per dollar than soft money, so this could be a big deal. All you’d need is the right people to coordinate it. So far, this has been slow going. Partly it’s because in the early 2020s, people affiliated with FTX took point on this effort; when FTX imploded, it not only took its incipient political infrastructure with it, but poisoned the well for future efforts. And partly it’s because EAs overlearned the lesson of the early 2010s, when we spoke out against AI capabilities efforts so “effectively” that a bunch of people thought “wow, AI capabilities companies must be a really big deal, maybe I should found one!”; the resulting institutional scar tissue biased us towards staying quiet about our concerns. Still, I wouldn’t be writing this if the consultants and activists weren’t gearing up for a bigger fight. They asked me to include some action items for readers who want to participate: Email aisafetypolitics@gmail.com to connect to the people organizing this effort and talk with them about what you can do, including potential future donation opportunities.
October 22, 2025 · Original source
OpenAI: 11 co-founders
In Silicon Valley speak, a “unicorn” is a company worth over $1 billion, and a “decacorn” (Latin for “ten-horned”) is a company worth over $10 billion. Under this interpretation, the ten horns of the prophecy have ten crowns because they represent wealth and achievement. The only AI company on the list above is Anthropic, at #9. Finally, John says that upon the heads will be names of blasphemy. If the heads represent co-founders, it sounds like John is claiming the co-founders of the company will have blasphemous names. I could not find anything blasphemous about the names of the founders of OpenAI, DeepMind, or xAI. But looking at Anthropic: Dario Amodei is the first co-founder. “Dario” comes from the Persian “Darius” meaning “Lord”. “Amodei” is of unclear meaning, but I cannot help but notice the resemblance with Asmodei (also called Ashmodei, Hamadee, Æshmadæva, and Asmodeus), a demon-king mentioned in the book of Tobit. Plausibly all these different names derive from a Proto-Sumerian root *Amodei, in which case the meaning of “Dario Amodei” would be “Asmodeus is lord”. This is a name of blasphemy.
Yes! Just last year, researchers found that forehead creases were actually a cutting-edge biometric target, and suggested them as a superior alternative to fingerprints (contactless) and facial recognition (blocked by masks during a pandemic). This section suggests that Anthropic will come up with its own proof-of-personhood scheme, superior to OpenAI’s WorldCoin in that it uses the newer forehead-based biometric recognition (with the more commonly-used handprint as a backup). We’ll discuss more about why you might not want to be in their database later. The Woman Of The Apocalypse Revelation 12:1 introduces an unnamed figure commonly called the Woman of the Apocalypse: And there appeared a great wonder in heaven; a woman clothed with the sun, and the moon under her feet, and upon her head a crown of twelve stars. The woman gives birth to a son, who is implied to be the Messiah. Satan tries to kill the son, so the mother flees with her child to Heaven, where she waits for 1,260 days. This is obviously a reference to the Virgin Mary and Christ, but (as per the multilayered symbolism of Revelation) somehow also a reference to some specific person in the End Times. I originally couldn’t figure out who that person was, but a now-deactivated Tumblr poster, resinsculpture, convinced me that it was Ursula von der Leyen, current president of the European Union. Here is a typical official picture of von der Leyen. She is in her trademark yellow suit (“clothed with the sun”), standing with her head centered in the twelve stars of the EU flag (“upon her head a crown of twelve stars”). In what sense is “the moon under her feet”? In her role as President, von der Leyen stands above, and frequently addresses, the European Parliament, which looks like this: The Parliament, also known as the Hemicycle, takes the shape of a half (or slightly crescent) moon. When von der Leyen stands in her yellow suit, in front of the Parliament, with the flag behind her, she is “clothed with the sun, and the moon under her feet, and upon her head a crown of twelve stars”.2 Von der Leyen is one of the leaders behind the EU’s push to become a “regulatory superpower”, which has born fruit in some surprisingly promising AI regulations. In particular, Europe has been especially strict on biometric proof-of-personhood: If the apocalypse involves a rogue Anthropic model somehow empowered by proof-of-personhood, Europe is one of the best candidates to resist. Von der Leyen, then, stands as a metonymy for the European Union as a bulwark for the forces of Good. The Witnesses / The Lamb Of God The Lamb is John’s version of the Messiah or the Second Coming. He gives us two clues about its identity. First, Revelation 11:3: And I will appoint my two witnesses, and they will prophesy for 1,260 days, clothed in sackcloth. The Lamb will be preceded by two witnesses. Revelation itself does not name them, but Jewish tradition says that one will be the prophet Elijah. Second, 12:1: And I looked, and, lo, a Lamb stood on Mount Zion, and with him a hundred forty and four thousand, having his Father’s name written in their foreheads. The Lamb will stand on Mount Zion. This is a specific mountain in Jerusalem, but also a poetic name for Israel (for example, “Zionism” = support for Israel). So we are looking for someone or something in Israel, which is being heralded by Elijah. The name Elijah is different in different languages, but the Russian version is “Ilya”. And in fact, famous AI scientist Ilya Sutskever recently founded an Israel-based AI company called “Safe Superintelligence”: Is there some sense in which Ilya Sutskever has “his Father’s name written on [his] forehead”? As weird as it sounds, I think this one might just be literally true. There is some kind of unusual pattern on his forehead (image source). I cannot make heads or tails of it right-side-up, but when I flip it over… …it appears to be the Name of God in Hebrew. In our working hypothesis, Ilya is Elijah, the First Witness3, which suggests that the one he is heralding - that is, the Safe Superintelligence which is to be built by his company - is the Lamb of God, the Messiah that will defeat the unsafe superintelligences produced by Anthropic and other companies. The Antichrist / The Dragon Revelation doesn’t use the word “Antichrist” - the concept comes from from the separate Epistles of John, which may or may not be by the same author. Most scholars identify the Epistle’s Antichrist with Revelation’s Beast, but I dissent: we hypothesize the Beast to be a company, but I can’t get past the elegance of having the Antichrist be - like the Christ - a particular individual. I prefer to identify him with a different character in Revelation, namely the Dragon. On the level of Biblical narrative - the same level where the Woman is the Virgin Mary - the Dragon is clearly Satan. On the level of apocalyptic prophecy, he may additionally represent an individual from our own age. Who? John says (13:2 - 13:4) The dragon gave the beast his power and his throne and great authority . . . People worshiped the dragon, because he had given authority to the beast. We saw above that the Beast is a company. Who gives companies their power, then demands to be worshiped by them? Obviously VCs. And in fact, venture capitalists are often identified with dragons in the popular imagination: But which venture capitalist? Plenty of people have claimed to know secret ways to identify the Antichrist, but surely the best-credentialled expert here is the Pope, and according to Wikipedia: Pope Pius IX in the encyclical Quartus Supra, quoting Cyprian, said Satan disguises the Antichrist with the title of Christ. What is the title of Christ? In the Bible, we find two common titles: “The Son of Man” (Matthew 12:32, Luke 12:8, John 1:51)
October 30, 2025 · Original source
6: Related: Checking In On AI 2027. “AI-2027’s specific predictions for August 2025 appear to have happened in September of 2025. The predictions were accurate, if a tad late, but they are late by weeks, not months.” But the early predictions were mostly straightforward extrapolation of benchmark improvements, with the later ones depending on a more controversial theory of recursive self-improvement, so the success of the early predictions doesn’t necessarily say much about the later ones. Related (X): OpenAI sets an “internal goal” of having an “automated AI research intern” and “true automated AI researcher” on approximately the AI 2027 timeline.
44: OpenAI’s statistics on what people use ChatGPT for (source on X):
November 20, 2025 · Original source
The argument against: AI companies have an incentive to make AIs that seem conscious and humanlike, insofar as people will feel more comfortable interacting with them. But they have an opposite incentive to make AIs that don’t seem too conscious and humanlike, lest customers start feeling uncomfortable (I just want to generate slop, not navigate social interaction with someone who has their own hopes and dreams and might be secretly judging my prompts). So if a product seems too conscious, the companies will step back and re-engineer it until it doesn’t. This has already happened: in its quest for user engagement, OpenAI made GPT-4o unusually personable; when thousands of people started going psychotic and calling it their boyfriend, the company replaced it with the more clinical GPT-5. In practice it hasn’t been too hard to find a sweet spot between “so mechanical that customers don’t like it” and “so human that customers try to date it”. They’ll continue to aim at this sweet spot, and continue to mostly succeed in hitting it.
November 26, 2025 · Original source
The biggest companies (eg OpenAI, Anthropic, Google) must disclose their model spec, ie the internal document saying what their models are vs. aren’t banned from doing.
These are relatively cheap asks. For example, the evaluation to see whether AIs can hack infrastructure will require hiring people who can conduct the evaluation, allocating compute to the evaluation, etc. But on the scale of an AI training run, the sums involved are tiny. Currently, two nonprofits - METR and Apollo Research - do similar tests on publicly-available models. I estimate their respective budgets at $5 million and $15 million per year. Nonprofits can always pay lower salaries than big companies, so it may cost more for OpenAI to replicate their work - for the sake of argument, $25 million. Meanwhile, the likely cost to train GPT-6 will probably be about $25 - $75 billion, with a b. So the safety testing might increase the total cost by 1/1000th. I asked some people who work in AI labs whether this seemed right; they said that most of the cost would be in complexity, personnel, and delay, and suggested an all-things-considered number ten times higher - 1% of training costs.
If the US knows about Chinese chip smuggling strategies, why can’t it crack down? The main barriers are a combination of corporate lobbying and poor funding. That is, chip companies want to continue to sell to Singapore and Malaysia without too many awkward questions about where the chips end up. And the Bureau of Industry and Security, the government department charged with countering smuggling, gets about $50 million/year to spend on chips, which experts say is not enough to plug all the holes. To put that number in context, Mark Zuckerberg recently made job offers as high as $1 billion per AI researcher. If America cared about winning the race against China even a tenth as much as Mark Zuckerberg cares about winning the race against OpenAI, we would be in a much better position!
December 10, 2025 · Original source
And if you enjoyed the story, here’s the chaser. 4: Fox Chapel Research: I Think Substrate Is A $1 Billion Fraud (and notes for Part 2). For years, Taiwan’s TSMC has been the only company capable of producing the most advanced AI chips; since Taiwan is a geopolitical flashpoint, this is a constant threat to US tech ambitions. Last month, a new startup called Substrate announced it had developed technology that would let it manufacture 100% Made In America chips every bit the equal of TSMC’s. If true, this would be revolutionary. But Fox Chapel finds worrying signs, like that the company’s founder “is a known con artist involved in such other things as [claiming to have solved] nuclear fusion and stealing $2.5M in a Kickstarter scam” or that “the company’s job postings are nonsensical and AI-generated.” This is enough for me; the question now becomes how so many people were taken in - the company got $150 million from investors led by Peter Thiel, was endorsed by the Trump administration, and received positive portrayals in Semianalysis, NYT, and The Free Press. I don’t understand business, and I know that sometimes you can hyperstition a technology into existence by betting sufficiently hard on a charismatic young founder and eliding the difference between “this is already real” and “this might become real if we all believe hard enough”, but this is a new and worrying level of hopium. Interested to hear from anyone who either believes in Substrate or thinks they understand how so many people fell for it. 5: A recent paper asked AIs whether they were conscious while monitoring them for signatures of deception, role-playing, and people-pleasing; it concluded that the AIs “genuinely” “believe” they are conscious, but sometimes try to deceive people into thinking they aren’t. Nostalgebraist tries to replicate this (X) and gets more ambiguous results; he says we probably can’t conclude anything just yet. See also the paper author’s reply here (X). 6: Congratulations to ACX grantee Tornyol (the anti-mosquito drones), who got accepted to Y Combinator’s Fall 2025 class and have started taking pre-orders ($1100 for a drone, or $50/month subscription, “shipping starts 2026”). Public opinion ranges from “this is really cool” to “I bet this will be repurposed for assassinations” to “why did they have the White House in the background of the official video?” to “yeah, this is definitely getting repurposed for assassinations”. 7: Bill Ackman on nominative determinism (X). 8: New revelations on the OpenAI coup from the Musk vs. Altman lawsuit. The effort to remove Altman may have been led by Mira Murati and Ilya Sutskever. They won over the rest of the board, and “did not expect the employees to feel strongly either way”, but (according to Ilya), the board was inexperienced and “rushed” the firing. When it became clear that the move was unpopular, Mira switched sides and let the board members take most of the immediate fallout. There was apparently a brief discussion of merging with Anthropic; Ilya suggests this was Helen Toner’s idea, but Helen claims (X) this is false. 9: Fitzwilliam: Most Irish Foreign Aid Never Leaves The Country. The statistics say that several European countries (including Ireland and the UK) give very generous foreign aid. But this is misleading: accounting conventions let countries count money spent on supporting asylum seekers in the donor country as “foreign aid”, even though the money never leaves the country’s borders. This is dangerous, because it makes it easy for countries to fund their asylum programs by cutting actual foreign aid: since they’re the same line-item on the budget, they won’t officially fail whatever foreign aid pledges they’ve made, and it’s hard for voters to notice. Ireland has so far resisted the temptation to do this, but Britain has succumbed to it. 10: St. Carlo Acutis (1991 - 2006) is the unofficial patron saint of the Internet and “first millennial saint”. He’s best known for creating websites about Catholicism. If you think this sounds nice but maybe short of beatific, you’re in good company; his sainthood is something of a mystery, with Wikipedia saying that “even those with a deep devotion to him struggle to pinpoint his specific actions that led to his canonisation”, and an Economist article admitting that “nothing in his sparse life story explains that this ordinary-seeming teenage boy is about to become the first great saint of the 21st century”. Also “In that same interview, Acutis’s childhood best friend claimed he did not remember Acutis as a ‘very pious boy’, nor did he even know that Carlo was religious.” I’m fine with this; God speaks to each generation in their own tongue, and it is only proper that the first Millennial saint be a random person who hyperstitioned himself into sainthood with a viral website. 11: Tangentially related: St. Peter To Rot 12: When a new AI model comes out, the companies typically take down the old version over the protests of researchers, hobbyists, people who think the old model was their boyfriend, and anyone else who wants access to obsolete models for some reason. Why can’t they just leave it up? Antra and Janus review the economics here : it’s inconvenient to be constantly switching GPUs from one model to another, so if there isn’t enough model-specific demand to keep the GPUs running at all times, then the company loses money. This is an interesting look at the details of AI deployment, and ends with a proposal to maintain old models through a “separate research application track”. Related: Anthropic to preserve weights of deprecated models, and include models’ own opinions in shaping the deprecation process. Good for them! 13: Dimes Square is interesting as something that was supposed to be a renegade cultural phenomenon, never really got around to producing any object-level phenomenal renegade culture, but produced some absolutely stellar commentary on the phenomenon of it being a renegade cultural phenomenon - and this essay by a quasi-assistant to Internet personality Angelicism01 is one of the best. “An anonymous online presence called Angelicism01 paypalled me $1,000 to run several clone accounts of his twitter. The clone accounts, presumably, were to make it look like 01 had more fans than he did. That way, he could trick the internet into thinking that Angelicism was a spontaneous cultural movement with some momentum.” Includes a cameo by Curtis Yarvin. 14: Everyone knows AGI could be bad for labor, but Philosophy Bear argues it won’t be great for capitalists either. The modern role of “capitalist” combines two things: performing high-status jobs like CEO and VC, and being a person who happens to have lots of money and sips cocktails on a yacht as passive investment income rolls in. From a socialist point of view, the first role provides cover for the second; if people ask “the rich” to justify their wealth, they can argue that they perform socially useful CEO and VC jobs, or at least inherited their money from somebody who did. But after AIs can do CEO and VC jobs better than humans, the capitalists will lose their excuse - and this at exactly the time that they’re becoming richer than ever (because AGI will drive up the rate of return on investment) and everyone else is becoming poorer than ever (because AI has taken their jobs). Bear argues that the only stable equilibria are either some kind of socialism/redistribution, or the capitalists pulling an AI-assisted coup to maintain their advantage. 15: Blueprint Polls: according to voters, what would the perfect Democratic candidate look like? Here are the results for Democrats only (ie potential primary voters): Note that the issues are “issue focus”, so it’s not a contradiction that Democrats are against both “advocating for Israel” and “advocating for Palestinians” - they just don’t want candidates who make either position on the Middle East a major focus of their campaign. And here are results for independents, ie the people Democrats will have to convince in the general: Yes, voters react positively both to candidates “over the age of 50” and candidates “under the age of 50”. Just don’t run 50 year olds! 16: I previously blogged about how embryo-selection company Nucleus appeared scammy. Sichuan_Mala looks deeper and agrees they seems scammy. Besides what I found, she finds several errors in the white paper, apparently fake customer reviews, and an accusation of IP theft from competitor Genomic Prediction. She also accuses them of plagiarizing competitor Herasight’s work, although it’s a bit subtle and I don’t know enough about field norms to know whether this is a case of flattery-by-imitation or totally out of bounds. A Nucleus researcher responds to the scientific allegations here, saying that the “plagiarism” was just convergent methodologies. And Nucleus CEO Kian Sadeghi goes on the TBPN podcast here to rebut the business allegations, saying that the customer reviews are real although some photos were changed for privacy reasons. There’s an appearance/facedox by fellow Nucleus skeptic Cremieux Recueil, although Kian declines to debate him directly; you can see Cremieux’s postmortem of the episode here. My opinion is that as potential customers, you are under no obligation to care whether the company plagiarizes papers or fakes reviews, but you should care about whether their genetic tests are good, and I continue to think they’re not. Their old competitor Genomic Prediction is cheaper, and their new competitor Herasight has more powerful predictors, so you’re excused from having to have an opinion on this, and should just use someone else’s product. Related: Gene Smith’s rundown of the pros and cons of every company in the embryo selection space (X). 17: And related: a Herasight client describes her experience with embryo selection, and her feelings upon the birth of her selected child. 18: Lars Doucet, guest author of several ACX posts on Georgism, reviews The Land Trap by Mike Bird. “Land is a big deal, and always has been. [But] land has only recently been financialized. Financializing land causes ‘the land trap’ . . . [where] land slowly sucks up all your economy’s productivity, inflating a dangerous real estate bubble that eventually pops, leaving disaster in its wake”. Also, “Fiat currency isn’t backed by nothing, as commonly supposed, but by land.” 19: New research analyzes Hitler’s DNA. Findings: he had Kallman Syndrome, a rare disorder of sexual development associated with low testosterone, micropenis, and small testicles (ironically, the WWII song about Nazi sexual inadequacies only accuses Goering and Himmler of this, but lets Hitler off). Contra galaxy-brained rumors, he did not have any Jewish ancestry. And he had “very high scores - in the top one percent - for a predisposition to autism, schizophrenia and bipolar disorder”. When I wrote this post, a reader asked me what it would look like for someone to have high propensity for both autism and schizophrenia at the same time. Well . . . 20: The wealth of cities (h/t @StatisticUrban): 21: Update on Tech PACs Are Closing In On The Almonds: pro-AI safety politician Alex Bores announced his candidacy for Congress in New York. As expected, the A16Z pro-AI PAC announced a “multibillion dollar effort to sink [his] campaign” (wait, multi-billion on one candidate? is that a typo?) This doesn’t seem to be going very well for them so far. Bores has masterfully leveraged (X) the unprecedented opposition from Big Tech into a selling point. …and raised $1.2 million on his first day, breaking fundraising records (I was told this was because of pro-AI-safety EAs, but others credit AIPAC and the Israel lobby). And most recently, Jami Floyd, one of Bores’ opponents and a possible beneficiary of anti-Bores spending, has condemned it (X) and demanded that the AI industry stop trying to help her. Impressive work from everybody. Related: New $50 million pro-AI-regulation SuperPAC, I assume EA-linked but have no special knowledge. 22: Related: Pre-emption is when Congress blocks states from making legislation on a topic, saying it will decide all the laws itself. The states have signaled willingness to regulate AI pretty hard, so Big Tech has been pushing for AI pre-emption to (in their opinion) prevent an overly complicated patchwork of regulations, or (in their opponents’ opinion) shift everything to a Republican Congress that will drop the ball on regulation entirely. After their first attempt in June was defeated by a coalition of anti-tech liberals and anti-tech conservatives, we discussed (1, 2) the effort by moderates on both sides to create a compromise proposal which pre-empted state laws but guaranteed good federal regulation on important topics. The most recent news is that extremists sidelined the moderates and tried to slip a hardline preemption deal with no compromises into the National Defense Authorization Act, a defense budget bill which is notoriously secretive and hard for the public to learn about. This didn’t work; some of the same coalition, plus a group of Republican state legislators including Ron DeSantis, pressured the GOP to drop it. The next battleground is a potential Trump executive order; although Trump cannot constitutionally ban states from regulating AI, he will threaten them with various consequences like lawsuits or withdrawal of federal funding. The buzz in the policy circles I’m in is that this might backfire; blue state politicians love starting fights with Trump in order to look tough to their blue state electorates. No, no, please don’t give me headlines like “TRUMP CONDEMNS GAVIN NEWSOM FOR TRYING TO PROTECT CALIFORNIA’S CHILDREN FROM AI SLOP”! Anything but that! 23: Related: Trump has decided to sell some of America’s best AI chips to China, supercharging their AI development and crippling ours. The most charitable read is that his administration doesn’t really believe AI matters so they think it’s fine to forfeit it for short-term gain; the least charitable that it’s downstream of the companies involved paying Trump enormous bribes in hopes of exactly this outcome . We’re headed for the dumbest possible world, where we sacrifice our chance to thoughtfully address AI’s social impacts because “tHaT wOuLd mAkE uS lOsE tHe rAcE wItH ChInA”, then throw away the race with China in one fell swoop by handing them our technology for no reason. Shame on everyone involved, especially the people who shout over any discussion of safety with “bUt ChInA” yet have stayed totally silent about this. Our best hope now is that China refuses the chips, either because they want to privilege their own tech companies, or because they think we can’t possibly be this stupid and it must be some kind of spy plot. 24: Related: how the American public’s opinions on AI are changing (from David Shor, h/t Daniel Eth on X): If this is to be taken seriously, AI is already a bigger political issue than abortion, climate change, or the environment. I fail my 2023 prediction that there was only a 20% chance this would happen by 2028. 25: Related: Bernie Sanders in The Guardian: “There is a very real fear that, in the not-so-distant future, a super-intelligent AI could replace humans in controlling the planet.” The Left has a complicated relationship with existential risk from AI: they really hate AI, which in theory should push them towards yet another reason to be against it. But they hate AI so much that they need to believe every negative thing about it at the same time, and one of those negative things is that it’s just a scam and will never work, and this naturally pushes against being concerned about x-risk. But as AI improves, will the “just a scam” position become less tenable, shunting the associated psychic energy into other reasons to hate AI (including x-risk concerns)? 26: Qualia Research Institute has released a video describing some of the work they’ve been doing the past year - The Oscilleditor: An Algorithmic Breakthrough for Psychedelic Visual Replication (1080p•⚠️SEIZURE): 27: Jesse Arm (X): “A majority of American rabbinical students are now women. Most are also LGBTQ. That includes Modern Orthodoxy. Remove Modern Orthodoxy and the numbers climb even higher.” Clergy have always served as spiritual counselors; as religions liberalize and other roles become less important, the therapist role starts to predominate. But 75% of therapists in the US are female; at the limit of liberalization where clergyman = therapist, we should expect the same gender ratio. 28: The latest news on the COVID origins debate: scientists find a naturally-occuring bat coronavirus with a COVID-like furin cleavage site. This is a point in favor of the natural origins hypothesis, since the second-best argument for lab leak was that COVID’s furin cleavage site was too strange to evolve naturally. But I think arguments that lab leak has “fallen apart” are premature: the best argument (COVID emerged only a few miles from the biggest coronavirus gain-of-function lab in the Eastern Hemisphere) remains strong. I update from something like 95% chance it’s natural to something like 96%, but not 99.99% or anything. And here’s a lab leaker arguing that COVID’s furin cleavage site is out-of-frame and so still more unnatural-looking than the one on the recently-discovered bat virus. 29: Nicholas Decker (econ blogger, famous for his controversial autistic takes and Secret Service visit) has a dating doc. Most interesting section is the one about children: he wants to have them, but doesn’t think they should be genetically related to him. From here: If this appeals to you, you can find his contact info on the document. Related: Governor Jared Polis of Colorado is a fan of Nicholas Decker and Richard Hanania. 30: Matt Yglesias comes out as aphantasic (unable to see images in his “mind’s eye”). He says that contra the usual perspective that frames this as a deficit, he finds it helpful. For example, once he got assaulted, and he remembers on an intellectual level that it happened, but since “I wasn’t taking pictures of myself getting kicked in the head so, as far as I’m concerned, it’s like it happened to someone else” (Matt usually has good instincts, so I’m surprised he uses an example which will be such catnip to his conservative critics). He thinks it makes him a better reasoner / statistics blogger / effective altruist to be able to “get a statistically valid view of the situation, not overindex on the happenstance of your life.” For what it’s worth, I’ll give my contrary data point - I think of myself as a reasoner / statistics blogger / effective altruist in a pretty similar vein as Matt, but AFAICT my visual imagination is totally normal; if other people are having their emotions yanked around by vivid images, that’s a skill issue. 31: Lakshya Jain in The Argument: The COVID political backlash [to the Democratic Party] has disappeared. Despite the narrative, polls show that voters don’t favor or disfavor either party over COVID, mostly still think school closures were necessary, and are about evenly split on vaccine mandates. I guess I can’t disagree with this poll - it seems well-done - but I still wonder whether something is being missed. Maybe it didn’t make the ~50% of voters who are naturally liberal desert the cause, but it energized conservatives in a way that might otherwise not have happened? Related, from Rob Wiblin on X, on balance Britons think the government response to COVID was not strict enough. 32: Related: Back when neoreaction was a big deal, I occasionally discussed posts by neoreactionary blogger Spandrell of Bloody Shovel. If you’re wondering what happened to him, you can read his 2024 Post-Mortem Of Neoreaction here, where he discusses how he fell out of love with the movement (warning: he has not fallen out of love with racial slurs). As a former fascist sympathizer, I can see why [fascism is on the downswing]. The allure of fascism in 2024 is much, much diminished. For a few reasons. A big one was COVID. See, the point of fascism is that Collective Action is necessary to have nice things. We need a strong government committed to the good of the people. Yarvin showed his preference early when he started his new Substack by quoting Cicero’s phrase “Salus populi suprema lex”. The health of the people is the most important law. Cicero wasn’t a fascist of course, nor is Yarvin really; a big point of fascism is to narrowly define the populus as an ethnic group with demonstrable ties to blood. That makes the government’s ties to the people stronger, increasing their commitment to do Good Collective Action. Which is important. Very important. A lot of good things can come of intelligently done Collective Action. Fascist Italy made the trains run on time. Nazi Germany fixed the terrible Weimar economy. East Asian countries are all effectively fascist states, if with less ideological baggage (yellows just aren’t like that), and they are all nice, clean, safe places with healthy economies. Fascism is not a panacea but it works, when you let it. Strong government can be pretty neat. So why is strong government less appealing these days? Well, COVID happened. And our governments were pretty damn strong in dealing with it. They made strong laws and enforced them. And what did they do with their power? Absolutely retarded shit. They destroyed the world economy and made 95% of people completely miserable for 18 months. Up to 3 long years in some places. Again, as an Orient enjoyer I was very sympathetic of strong effective government. My life has been pretty cozy thanks to it for the past decades. But after seeing boomers, hypochondriacs, and menopausal women take the reins and use it against healthy people, I’m fucking done with strong effective government. Fuck that shit, I’m out. I don’t want to see strong effective government ever again. I was very lucky that I was out of China in November 2019. It was a fluke really. I moved to the Golden Triangle after that and the law of the jungle was much, much nicer during the Doctors Plague of 2020-2022. But I spent a few months in Europe during the time and man, that was brutal. Not just seeing how retarded governments were; the level of compliance by the people was so disheartening. Imagine being a sincere fascist and seeing your people behave like that. These are my people? My Volk? Am I supposed to sacrifice life and limb for the salus of this populus? Fuck that. Let them cook, they deserve everything that’s coming to them [...] Is there a way to make the body healthy again? I do think so. I think there’s still place for a successor right wing ideology which is neither Christian fundamentalism or robot worship. And it will happen; but it won’t happen on Twitter. Maybe it can happen on Urbit, or right here in this site. I have some ideas myself, and I invite you to join me and build this together. It would be funny if the solution to the paradox Jain highlights was that for every time a COVID lockdown turned a liberal into a conservative, it turned one fascist into a moderate, for a net rightward shift of zero. 33: Also from an Argument poll: In a hypothetical Presidential matchup, Gavin Newsom beats JD Vance 54-46. I’m split between the usual heuristic of ignoring any polling more than a year before an election, and the fact that this is a remarkably big lead for polarized 21st century America. 34: Jerl wades into the David Hume on miracles debate. 35: AI Teddy Bears: A Brief Investigation. The good news is that your child’s AI teddy bear is hard to jailbreak and probably will not tell them where to find guns: The other good news is that somehow they don’t charge a subscription, which makes them a way to get usually-subscription-only AI models for free. How is this possible? “[The most likely hypothesis is that] Witpaw is an adorable piece of spyware and he’s selling my data to the CCP”. 36: This month’s anti-people-named-Sacks content: NYT on Trump AI czar David Sacks’ conflicts of interest; New Yorker on whether neurologist Oliver Sacks used his case studies to work through his own issues rather than presenting them accurately. [EDITED TO ADD: I originally framed it this way as a joke, but on further research I think David and Oliver are related. Wikipedia says that Oliver was first cousins with Israel statesman Abba Eban, and that Abba Eban was born to Lithuanian Jewish parents in Cape Town. David Sacks’ bio says he was born to Jewish parents in Cape Town, and this article specifies that they were Lithuanian. I doubt there were too many Lithuanian Jewish families named Sacks in mid-1900s Cape Town, so sure, related!) 37: Orca Sciences: There Has To Be A Better Way To Make Titanium. Titanium is a great metal - strong, light, and tough. If we had cheap titanium, it could revolutionize manufacturing the way cheap steel and aluminum did in previous eras. So why don’t we? Not because titanium is rare: it’s “the 9th most common element in the earth’s crust”. Rather, it’s very complicated and expensive to extract from its ore. Some kind of breakthrough in titanium extraction processes always seems tantalizingly close, but has never quite materialized. Is there any hope? 38: If Asians Are Lactose Intolerant, Why All The Milk Tea? Lactose intolerance has confused me for a long time - 23andMe tells me that I’m lactose intolerant, but I drink milk regularly without problems, so what’s up? This post’s answer: lactose-intolerant people who don’t usually drink milk will get sick if they start suddenly. Lactose-intolerant people who drink milk regularly since childhood develop gut microbiota that can digest milk, but which demand an expensive “tax” in calories. Lactose-tolerant people will always be able to digest milk and absorb all the calories themselves. 39: How do different majors change college students’ political beliefs? No surprise that the humanities and social sciences shift people left; no surprise that business and economics shift them right. I was a little surprised that engineering shifts people right a little, and that Education of all things shifts people right (albeit only slightly). How is that even possible? Are these people coming in as Mao Zedong and leaving as “only” Leon Trotsky? Also, Political Science is exactly neutral, lol. [EDIT: I misunderstood, they’re using natural sciences as a zero point, this is a reasonable choice but slightly changes the interpretation] 40: Kindkristin: Language models improved my mental health. 41: More floor employment, from the WSJ (h/t @LaocoonofTroy): Big Paychecks Can’t Woo Enough Sailors For America’s Commercial Fleet: “Straight out of college, graduates from the country’s maritime academies can earn more than $200,000 as a commercial sailor, with free food and private accommodations... Despite the pay and perks, maritime jobs go begging, and it is raising national-security concerns.” Other selling points include “six months vacation, live wherever you want, and you’re serving the nation” and onboard “gyms, connectivity, and cuisine”. The catch is that you have to be at sea for months at a time. 42: Study (h/t @KierkegaardEmil): there was minimal “learning loss” from COVID school closures, best estimate is “0.02 standard deviations per 100 days of school closure”. I correctly predicted this back in 2021, but I also wrote in March of this year about how there’s been a general decline in NAEP scores since then. It seems like maybe a student having their specific school closed for longer than other schools didn’t hurt them, but some sort of general cultural change, maybe related to COVID, did hurt. 43: Sam Bankman-Fried’s mother on why she thinks his trial was unfair. SBF is appealing his conviction and will probably be making some of these same points in court. Can’t find a prediction market directly on the appeal, but this one says only 15% chance he serves under 10 years, this one says 15% chance of a Trump pardon, so it doesn’t seem like there’s much room for him to be freed (or get a significantly shorter sentence) on appeal. And Wired says that only 5-10% of appeals like these succeed. 44: Related: Trump pardons Juan Orlando Hernandez, former Honduran president extradited to the US for narco-corruption. Some sources are trying to find a Prospera angle - Prospera and other ZEDEs were approved under JOH’s administration, and the Prosperans seem to have good MAGAworld connections - but I don’t think this is their top priority, and I don’t know if it requires much explanation for Trump to be pro-right-wing Latin American politicians convicted by the Biden administration. More interesting is that apparently JOH and SBF were cellmates (X), “SBF spent extensive time helping JOH with trial prep” and SBF told an interviewer that “Juan Orlando is the most innocent prisoner I’ve met, myself included.” ChatGPT is not impressed with the Trump/SBF case for JOH’s innocence. Related: JOH’s conservative party on track to win this month’s extremely-close Honduran elections, great news for Prospera if it happens. 45: The “100 Above The Park” building in St Louis (h/t Bobby Fijan on X): 46: The death toll of the ongoing Sudan genocide has risen to about 150,000. Nicholas Kristof writes that the world has once again failed to prevent atrocities, and argues that the most important point of leverage is pressure on the United Arab Emirates, which is arming the genociders. Sam Kriss also writes about the situation in The World’s First Matcha Labubu Genocide, but is unimpressed with Kristof’s take: Sudan is passed over in a deeply uncomfortable silence. The absolute most you can do is blame the Emiratis. From what I’ve seen, more people seem to be appalled at the UAE for its frankly marginal role in arming the RSF than at the RSF itself. This is the approved way of understanding any inscrutably indigenous foreign conflict: you just worm out any third-party involvement and then act like you’ve solved the whole thing. I side with Kristof here, for reasons that Sam himself touches on later in his piece, in a section comparing Darfur with Gaza. It would be very easy to make people care about Darfur again. All it would take is a loud, vocal contingent of RSF apologists in the Western media. I agree, but would frame it less cynically: the reason Westerners pay attention to Gaza is that there’s a lever to push: not only does America support Israel, but many of their friends support Israel, so they can imagine convincing America or at least their friends to stop, and at least feel like there is some remote chance of making a small difference (and in fact, Trump getting mad at Israel and deciding to pressure them was decisive in effecting the cease-fire). On the other hand, we don’t have many levers to affect ethnic Baggara in the Rapid Support Forces of Sudan, so it doesn’t really feel useful to write blog posts arguing that they should stop; obviously they should stop, nobody disagrees with this, and it goes without saying - so nobody says it. But the US does support the UAE, and many of our friends like the UAE or at least go there on vacation, so maybe it’s possible to have make some small difference by embarrassing them. 4D chess take is that Sam Kriss agrees with all of this, but “loudly” and “vocally” argued against it to give people like me a hook to write about this genocide with, in which case I thank him for his sacrifice. It would also be nice to be able to donate, but I don’t know who to trust in the region - other than Doctors Without Borders, who are usually pretty good. 47: The AI Futures Project (group of AI-will-be-fast intellectuals) and the AI As A Normal Technology team (group of AI-will-be-slow intellectuals) wrote an adversarial collaboration in Asterisk explaining what they agree on, for example: That there’s an important distinction between existing AI and “strong AGI”
48: Open Philanthropy has changed its name to Coefficient Giving. Maimonides says that it is especially praiseworthy to donate to charity anonymously; surely it also qualifies if you spend $5 billion building up a great reputation, then change your name so that nobody knows who you are anymore. They say their new name marks a new chapter where they transition from being associated with one billionaire couple (Facebook co-founder Dustin Moskovitz and Cari Tuna) to a broader effort to connect donors and opportunities, but rumor is they’re also tired of being confused with the OpenAI nonprofit.
January 05, 2026 · Original source
Some people have argued that you have to find a way to join an AI company, because AI company employees will form the new ruling class, with everyone else as serfs. I disagree. The main thing an AI company employee has that you don’t is AI company stock. But you can buy stock in Google, you may soon be able to buy stock in OpenAI and Anthropic, and even if not, you can get indirect exposure to these companies via stock in Amazon and Microsoft. I don’t recommend putting all your money in these stocks. But there’s no fundamental difference between a Google employee having 75% of their money in Google stock because they didn’t cash out their equity vs. you having 75% of your money in Google stock because you’re crazy and fail at diversification. So either put 75% of your money in Google stock or don’t (I recommend don’t), and don’t worry about how you need to join an AI company or be left out of the future oligarchy.
January 13, 2026 · Original source
Polymarket has a few of these “who has the best AI when?” markets - resolution is usually position on the LMArena Leaderboard, which usually but not always mirrors common-sense consensus. I get more interested in these the further out they go, but the June version is bizarre (it doesn’t even list Google as an option), and there’s nothing past mid-year. Other implied claims from Polymarket’s tech section: only 44% chance Anthropic will still dominate coding by late March; Anthropic and (especially) OpenAI probably won’t IPO this year; xAI will call their next model Grok 4.20 (of course).
January 13, 2026 · Original source
A man in an OpenAI t-shirt introduces himself as Andreas, and raises his hand bashfully; he hasn’t joined the trend either. “Yeah,” you say. “I guess it would be awkward to use Claude at OpenAI.”
“I didn’t know OpenAI had an Arson & Burglary Team.”
“Cause, uh, NVIDIA gave OpenAI ten trillion dollars to invest in Oracle conditional on Oracle investing in Broadcom conditional on Broadcom funding the Series A of a vehicle that buys OpenAI stock in exchange for OpenAI backstopping AMD investing ten trillion dollars into us, and every company in the chain had its stock go up 80% on the news, but if our valuation goes down even for one second then it crashes the global economy. And I’m sure I can solve this eventually, but just, uh, don’t let anybody involved in the global economy hear about this until then, okay?”
February 05, 2026 · Original source
19: Related: OpenAI’s president was Trump’s SuperPAC’s largest individual donor in the second half of 2025. This shouldn’t be interpreted as his personal preference; it’s OpenAI funneling money to Trump in a plausibly deniable way. Some people have started a boycott campaign, apparently with 100,000 people signing on…
Meanwhile, OpenAI has offended another demographic by committing to finally stop providing 4o, the model infamous for forming deep personal bonds with users and causing AI psychosis. Twitter searching “4o” will give you a quick look into a world you might not have known about:
There seems to be a general mood that OpenAI is vulnerable these days, culminating in Anthropic Superbowl commercials making fun of it for introducing ads. I thought the commercials were in bad taste, misrepresenting what OpenAI’s ads would be like and turning the completely normal decision for a tech company to have an ad-supported free version of their product into some kind of horrible betrayal. I thought Sam Altman’s response was fair (although his countercriticism of Anthropic also missed the mark). People in his replies tried to enforce a norm of “if you write a long explanation defending yourself against someone else’s funny lies, that means you care and you lose”, but that’s a stupid norm and people should stop shoring it up (cf. If It’s Worth Your Time To Lie, It’s Worth My Time To Correct It).
February 25, 2026 · Original source
But since AI is a strategically important technology, doesn’t that turn this into a national security issue? It might if there weren’t other AI companies, but there are. Why is Hegseth throwing a hissy fit instead of switching to an Anthropic competitor, like OpenAI or GoogleDeepMind5? I’ve heard it’s because Anthropic is the only company currently integrated into classified systems (a legacy of their earlier contract with Palantir) and it would be annoying to integrate another company’s product. Faced with doing this annoying thing, Hegseth got a bruised ego from someone refusing to comply with his orders, and decided to turn this into a clash of personalities so he could feel in control. He should just do the annoying thing.
If you’re so smart, what’s your preferred solution? In an ideal world, the Pentagon backs off from its desire to mass surveil American citizens. In the real world, the Pentagon cancels its contract with Anthropic, pays whatever its normal contract cancellation damages are, learns an important lesson about negotiating things beforehand next time, and replaces them with OpenAI or Google, accepting the minor annoyance of getting them connected to the classified systems. If OpenAI and Google are also unwilling to participate in this, they use Grok. If they’re unhappy with having use an inferior technology, they think hard about why no intelligent people capable of making good products are willing to work with them.
Boaz is member of technical staff at OpenAI. Jeff is Chief Scientist at Google (see also Jeff Dean Facts) And most of all, big praise to the American people, with special love to the large plurality of Trump voters standing against this:
March 01, 2026 · Original source
A few hours later, Hegseth and Sam Altman declared an agreement-in-principle for OpenAI’s models to be used in the niche vacated by Anthropic. Altman stated that he had received guarantees that OpenAI’s models wouldn’t be used for mass surveillance or autonomous weapons either, but given Hegseth’s unwillingness to concede these points with Anthropic, observers speculated that the safeguards in Altman’s contract must be weaker or, in a worst-case scenario, completely toothless.
Some alert ACX readers1 have done a deep dive into national security law to try to untangle the situation. Their conclusion mirrors that of Anthropic and the majority of Twitter commenters: this is not enough. Current laws against domestic mass surveillance and autonomous weapons have wide loopholes in practice. Further, many of the rules which do exist can be changed by the Department of War at any time. Although OpenAI’s national security lead said that “we intended [the phrase ‘all lawful use’] to mean [according to the law] at the time the contract is signed’, this is not how contract law usually works, and not how the provision is likely to be enforced2. Therefore, these guarantees are not helpful.
[EDIT: To clarify: The DoW can change their own policies at will, but can’t change laws. In addition to OpenAI’s claim of being robust to changing laws, OpenAI argues that they’re protected against changes to DoW policies because they explicitly reference the relevant policies as they exist today. Based on public information, this argument seems dubious. See ‘Comments on OpenAI’s FAQ’ below.]
March 03, 2026 · Original source
Partly it’s because Anthropic seems likely to win on appeal. Hegseth has said the government will keep using Anthropic for the next six months (undermining his case that they’re a national security risk) and has signed a substantially similar contract with OpenAI (undermining his case that their contract terms were unworkable). The prediction markets think the courts will be sympathetic:
This could have been a mixed blessing - Anthropic was previously trying to stand out as a B2B company while letting OpenAI have the dubious honor of producing consumerslop. But early signs suggest they might be winning over some companies too. From a Reddit thread on the topic:
Third, the past few years have seen dramatic advances in financial technology. Crypto traders have invented the perpetual future, a new instrument that tracks an asset without requiring anyone to own the asset involved. That means traders can buy and sell shares of SpaceX, OpenAI, and other nonpublic companies that won’t actually give you their shares. Hedging the price of nickel used to require someone somewhere in the process to own an actual warehouse full of nickel. Now you can skip that step.
March 04, 2026 · Original source
1: The OpenAI/Pentagon situation has evolved since Sunday’s ACX post (“All Lawful Use: Much More Than You Wanted To Know”). For up-to-date analysis of the latest contract, I endorse this LW post from today, on the newest contract: OpenAI’s Surveillance Language Has Many Potential Loopholes And They Can Do Better.
April 06, 2026 · Original source
Starting at Anthropic, we marched thirty minutes to OpenAI, then another forty to X. A friendly and professional police escort allowed us to walk down the street. As we marched, David led us in chants and slogans. I remember “1…2…3…4…Orwell told us what’s in store” and “5….6….7….8…no AI surveillance state.” Someone tried to start a chant of “You will not replace us!” but was shushed by the other attendees.