GPT-4
Article
GPT-4 is a recurring concept in the Astral Codex Ten archive, appearing 21 times across 21 issues between June 10, 2022 and June 18, 2025. The archive places it in contexts such as “he expects OpenAI to release a GPT-4”; “Marcus’ bet that GPT-4 will be perfect”; “and gave them a rough draft of GPT-4”. It most often appears alongside OpenAI, Anthropic, China.
Metadata
- Category: Concepts
- Mention count: 21
- Issue count: 21
- First seen: June 10, 2022
- Last seen: June 18, 2025
Appears In
- Somewhat Contra Marcus On AI Scaling
- OpenAI’s “Planning For AGI And Beyond”
- Links For March 2023
- Most Technologies Aren’t Races
- Links For April 2023
- 23
- Constitutional AI: RLHF On Steroids
- Davidson On Takeoff Speeds
- Tales Of Takeover In CCF-World
- We’re Not Platonists, We’ve Just Learned The Bitter Lesson
- Links For August 2023
- Pause For Thought: The AI Pause Debate
- God Help Us, Let’s Try To Understand AI Monosemanticity
- Son Of Bride Of Bay Area House Party
- The Road To Honest AI
- Sam Altman Wants $7 Trillion
- Zvi on California’s AI Bill
- Sakana, Strawberry, and Scary AI
- Claude Fights Back
- The Colors Of Her Coat
- ACX Grants 1-3 Year Updates
Related Pages
-
- OpenAI (14 shared issues)
-
- Anthropic (8 shared issues)
-
- China (8 shared issues)
-
- Google (6 shared issues)
-
- GPT (6 shared issues)
-
- GPT-2 (6 shared issues)
-
- GPT-3 (6 shared issues)
-
- facebook (5 shared issues)
-
- GPT-4 (5 shared issues)
-
- Sam Altman (5 shared issues)
-
- Twitter (5 shared issues)
-
- AI (4 shared issues)
External Links
Source Context
Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.
I am willing to bet [Scott] now (terms to be negotiated) that if OpenAI gives us unrestricted access to GPT-4, whenever that is released, and assuming that is basically the same architecture but with more data, that within a day of playing around with it, Ernie and I will still be able find lots of examples of failures in physical reasoning, temporal reasoning, causal reasoning, and so forth.
Marcus is admitting this: each GPT has been better than the one before. He even seems to predict this will continue a bit into the future - he expects OpenAI to release a GPT-4, and surely they wouldn’t release a new product if it wasn’t an improvement on the old. He just seems convinced that the improvements will stop sometime before human level. Why?
This seems like a good fit for the chimp → human transition, where evolutionary lineages that couldn’t do a bunch of difficult things for the first few hundred million years suddenly became good at those things in an evolutionary eyeblink. The ~5 million chimp/human gap seems like enough time to scale up chimp brains a bit (which definitely happened), but not enough time to invent a fundamentally new architecture. It wouldn’t surprise me if the architecture changed a little during this time, but we’re limited in how fundamental a change we can talk about over that period. I’m not at all sure this is true! I’m honestly close to 50-50 here. Maybe the PFC actually is magic! It just confuses me that Marcus seems to think we’ve ruled out the theory that this kind of scaling is possible, when I feel like we’ve heard plausible arguments on both sides. Nothing we’ve seen in GPTs or any other AI thus far disproves the scaling hypothesis, and a lot of what we’ve seen supports it. So sure, point out that large language models suck at reasoning today. I just don’t see how you can be so sure that they’re still going to suck tomorrow. Lemurs sucked for millions of years, then scaled up a bit and took over the world! V. …is one possible argument. Another possible argument is: language models and other deep learners really aren’t doing the same thing humans do - but whatever, their thing is powerful/effective/dangerous too. Suppose that GPT-X took over the world and killed all humans. Millennia later, some alien archaeologists come and investigate. They conclude that since its training data included Alexander the Great and Caesar, it was just pattern-matching to the kind of things they did (multiplied by a vector representing the difference between ancient and modern times), and GPT-X never demonstrated any true intelligence. So . . . what? I imagine this situation ALL THE TIME and I hate it. I think the impetus behind a lot of the AI risk stuff is that we’re barrelling to a world where AIs have far more than self-driving-car levels of capabilities, while being unpredictable in ways that are a lot like this. The history of the past few decades has been people getting surprised, again and again, at how much AIs can do without being “generally intelligent”. Douglas Hofstadter predicted in 1979 that any AI that could beat a grandmaster at chess would also be able to decide chess was boring and it preferred writing poetry. Instead, we got Deep Blue, so domain-specific it can’t even do so much as play checkers. Worse, now we have AIs that can switch between writing poetry and playing chess, and it still seems like a clever parlor trick rather than anything like real intelligence. I think basically nobody predicted this: narrow AI has won victories beyond past generations’ imagination. (cf. Nostalgebraist’s Human Psycholinguists: A Critical Appraisal) So even if GPTs aren’t a step on the path towards some sort of human-like AGI thing, I have no idea where they’ll end up. Replacing humans at all jobs? Writing novels? Taking over the world? If this seems crazy to you, “solve protein folding” sounded crazy ten years ago, and they already did that! At this point I will basically believe anything. VI. So I’m not going to take Marcus’ bet that GPT-4 will be perfect (as if anything ever is!). But here are some things I do believe, with confidence levels: At some point before 2030, someone will come out with a deep-learning-based language model which is significantly better than the current state of the art, by Gary Marcus’ admission (97%)
Inline links: https://substackcdn.com/image/fetch/$s_!_D9T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5cb652e3-381b-488d-a6a6-f1155f7ff557_586x194.png, writing poetry, playing chess, Human Psycholinguists: A Critical Appraisal
Sam Altman posing with leading AI safety proponent Eliezer Yudkowsky. Also Grimes for some reason. Planning For AGI And Beyond (“AGI” = “artificial general intelligence”, ie human-level AI) is the latest volley in that campaign. It’s very good, in all the ways ExxonMobil’s hypothetical statement above was very good. If they’re trying to fool people, they’re doing a convincing job! Still, it doesn’t apologize for doing normal AI company stuff in the past, or plan to stop doing normal AI company stuff in the present. It just says that, at some indefinite point when they decide AI is a threat, they’re going to do everything right. This is more believable when OpenAI says it than when ExxonMobil does. There are real arguments for why an AI company might want to switch from moving fast and breaking things at time t to acting all responsible at time t + 1 . Let’s explore the arguments they make in the document, go over the reasons they’re obviously wrong, then look at the more complicated arguments they might be based off of. Why Doomers Think OpenAI Is Bad And Should Have Slowed Research A Long Time Ago OpenAI boosters might object: there’s a disanalogy between the global warming story above and AI capabilities research. Global warming is continuously bad: a temperature increase of 0.5 degrees C is bad, 1.0 degrees is worse, and 1.5 degrees is worse still. AI doesn’t become dangerous until some specific point. GPT-3 didn’t hurt anyone. GPT-4 probably won’t hurt anyone. So why not keep building fun chatbots like these for now, then start worrying later? Doomers counterargue that the fun chatbots burn timeline. That is, suppose you have some timeline for when AI becomes dangerous. For example, last year Metaculus thought human-like AI would arrive in 2040, and superintelligence around 2043. Recent AIs have tried lying to, blackmailing, threatening, and seducing users. AI companies freely admit they can’t really control their AIs, and it seems high-priority to solve that before we get superintelligence. If you think that’s 2043, the people who work on this question (“alignment researchers”) have twenty years to learn to control AI. Then OpenAI poured money into AI, did ground-breaking research, and advanced the state of the art. That meant that AI progress would speed up, and AI would reach the danger level faster. Now Metaculus expects superintelligence in 2031, not 2043 (although this seems kind of like an over-update), which gives alignment researchers eight years, not twenty. So the faster companies advance AI research - even by creating fun chatbots that aren’t dangerous themselves - the harder it is for alignment researchers to solve their part of the problem in time. This is why some AI doomers think of OpenAI as an Exxon-Mobil style villain, even though they’ve promised to change course before the danger period. Imagine an environmentalist group working on research and regulatory changes that would have solar power ready to go in 2045. Then ExxonMobil invents a new kind of super-oil that ensures that, nope, all major cities will be underwater by 2031 now. No matter how nice a statement they put out, you’d probably be pretty mad! Why OpenAI Thinks Their Research Is Good Now, But Might Be Bad Later OpenAI understands the argument against burning timeline. But they counterargue that having the AIs speeds up alignment research and all other forms of social adjustment to AI. If we want to prepare for superintelligence - whether solving the technical challenge of alignment, or solving the political challenges of unemployment, misinformation, etc - we can do this better when everything is happening gradually and we’ve got concrete AIs to think about: We believe we have to continuously learn and adapt by deploying less powerful versions of the technology in order to minimize “one shot to get it right” scenarios […] As we create successively more powerful systems, we want to deploy them and gain experience with operating them in the real world. We believe this is the best way to carefully steward AGI into existence—a gradual transition to a world with AGI is better than a sudden one. We expect powerful AI to make the rate of progress in the world much faster, and we think it’s better to adjust to this incrementally. A gradual transition gives people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place. It also allows for society and AI to co-evolve, and for people collectively to figure out what they want while the stakes are relatively low. You might notice that, as written, this argument doesn’t support full-speed-ahead AI research. If you really wanted this kind of gradual release that lets society adjust to less powerful AI, you would do something like this: Release AI #1
And so on . . . Meanwhile, in real life, OpenAI released ChatGPT in late November, helped Microsoft launch the Bing chatbot in February, and plans to announce GPT-4 in a few months. Nobody thinks society has even partially adapted to any of these, or that alignment researchers have done more than begin to study them. The only sense in which OpenAI supports gradualism is the sense in which they’re not doing lots of research in secret, then releasing it all at once. But there are lots of better plans than either doing that, or going full-speed-ahead. So what’s OpenAI thinking? I haven’t asked them and I don’t know for sure, but I’ve heard enough debates around this that I have some guesses about the kinds of arguments they’re working off of. I think the longer versions would go something like this: The Race Argument: Bigger, better AIs will make alignment research easier. At the limit, if no AIs exist at all, then you have to do armchair speculation about what a future AI will be like and how to control it; clearly your research will go faster and work better after AIs exist. But by the same token, studying early weak AIs will be less valuable than studying later, stronger AIs. In the 1970s, alignment researchers working on industrial robot arms wouldn’t have learned anything useful. Today, alignment researchers can study how to prevent language models from saying bad words, but they can’t study how to prevent AGIs from inventing superweapons, because there aren’t any AGIs that can do that. The researchers just have to hope some of the language model insights will carry over. So all else being equal, we would prefer alignment researchers get more time to work on the later, more dangerous AIs, not the earlier, boring ones.
Reading even further between the lines - at this point it’s total guesswork - OpenAI’s corporate partner Microsoft asked them for a cool AI. OpenAI assumed Microsoft was competent - they make Windows and stuff! - and gave them a rough draft of GPT-4. Microsoft was not competent, skipped fine-tuning and many other important steps which OpenAI would not have skipped, and released it as the Bing chatbot. Bing got in trouble for threatening users, which gave OpenAI a PR headache around safety. Some savvy alignment people chose this moment to approach them with their latest ideas (is it a coincidence that Holden Karnofsky published What AI Companies Can Do Today earlier that same week?), and OpenAI decided (for a mix of selfish and altruistic reasons) to get on board - hence this document.
Inline links: What AI Companies Can Do Today
11: A few years ago I wrote about attempts to make GPT-2 play chess; it couldn’t consistently make legal moves, but when it did, its moves seemed better than random although still not great. Zack Witten reports playing chess with Bing (either a late GPT-3 or an early GPT-4) and finds it’s much better - he reports consistently legal play with Elo of about 1100 (around the level of an okay beginner who’s stopped being too embarrassing). Other commenters report worse experiences and more illegal moves; I don’t have access to confirm.
Inline links: attempts to make GPT-2 play chess, playing chess with Bing
We remember the race for nuclear weapons because they’re a binary technology - either you have them, or you don’t. When the US invented stealth bombers, its enemies had slightly worse planes that were slightly less stealthy. But when the US invented nukes, its enemies were stuck with normal bombs; there is no slightly-worse-nuke that can only destroy half a city. Everywhere outside the most extreme transhumanist scenarios, AI is more like the stealth bomber. You may have GPT-3, GPT-4, some future GPT-5, but a two year gap means you have slightly worse AIs, not that you have no AI at all. The only case where there’s a single critical point - where you either have the transformative AI or nothing - is in the hard-takeoff scenario where at a certain threshold AI recursively self-improves to infinity. If someone reaches this threshold before you do, then you’ve lost a race!2
Inline links: 2
10: Short fiction by someone I know: Turn Left To Eden 11: Short fiction by someone I know: The Library of Slaanesh 12: Cremieux double-checks the “penises getting longer” link from last time and finds that No, Penises Haven’t Gotten Longer. 13: GPT-4 starts a business (click image for more). Not of actual AI interest, but funny: 14: Jiankui He, jailed a few years ago for genetically engineering human babies, is back: 15: Glaze is a free service for artists who want to prevent image model AIs from copying their style. If I understand right, you make your picture, apply their (mostly invisible to humans) filter, and then the picture becomes an adversarial example that AIs can’t process correctly: 16: The Extended IQ Classification (Classified) 17: Eliezer in TIME Magazine. Related: 18: Related: interview with Ryan Kupyn, winner of the 2022 ACX Forecasting contest, on forecasting AGI: 19: Related: Geoffrey Hinton, probably the most accomplished AI scientist in the world, says that “until quite recently, I thought it was going to be like 20 to 50 years before we have general purpose AI, and now I think it may be 20 years or less”. Also that AI wiping out humanity is “not inconceivable . . . that’s all I’ll say”. 20: Related: you’ve probably all seen this by now, but Pause Giant AI Experiments: An Open Letter. 30,000 people - including deep learning pioneer Yoshua Bengio, former presidential candidate Andrew Yang, Elon Musk, Steve Wozniak, Gary Marcus, and MIRI director Nate Soares - have signed a letter calling for a six month pause on training AIs bigger than GPT-4. Many people have made fun of this, noting that nobody has an argument for why a six month delay would help anything. And an additional reason for eye-rolling: training AIs larger than GPT-4 is extremely expensive and hard, the most likely people to do it within a six month timespan are OpenAI themselves, and they’ve announced they’re taking a break and not planning on doing this, so the letter is demanding a stop to something which probably won’t happen anyway. I think it’s intended be a compromise between many people all vaguely against current levels of AI progress for different reasons (Scott Aaronson says - I can’t tell how seriously - that some are AI researchers who want to be able to publish papers on the current generation of AI without them becoming obsolete halfway through peer review), most of them are thinking of it as mood-affiliation-y “let’s make noise and show lots of people are worried about AI and want action”, and “a six month pause” was a sufficiently vague proposal that it didn’t prevent any of these people from signing. You could have done just as well with a letter saying “AI BAD”, except that people would have taken it less seriously. Less cynically, FLI (the group behind the letter) has put out a list of concrete policy proposals they would like people to discuss during the pause. [update: here’s Max Tegmark from FLI explaining what he hopes to achieve with the letter/pause] The alignment community always figured their concerns sounded too weird for normal people to care about, that politics was a lost cause, and that our best hope lay in technical research. They also hoped that sometime in the future there would be a “fire alarm” - something would happen to get people and policy-makers’ attention - and then the political route would open up. I think we always imagined this as some AI-initiated disaster destroying a city or something. I personally am pretty surprised it was just “GPT-4 got released and was very good”. Still, that is what happened, and I’m updating. In fact, I’ve updated so far that I’m starting to worry that the problem won’t be building a political coalition against unsafe AI, the problem will be not overshooting and banning all AI forever. I’m against this: I think society’s current track is toward other existential risks or dystopia, that AI could kill everybody but could also create post-scarcity and an end to most of our current problems, and that at some point (not yet!) the risk of continuing the current path indefinitely becomes worse than the risk of just going with AI and seeing what happens. In my ideal world, we would take ten or twenty years to go really slowly with AI, pouring lots of resources into alignment the whole time - but eventually, we would take the plunge. Everything I’ve said on this topic in the has been about giving us that breathing room and those resources. Still, I also want to make sure we don’t totally kill AI the way we’ve killed (to various degrees) nuclear power, supersonic flight, and genetic engineering. I’m still trying to calibrate what that means I should be doing, but I have a lot of respect for everyone on all sides. Except the people making terrible arguments (you know who you are!) 21: I’m not sure what this means in real life or why this would have changed, but congratulations to Peter Thiel, I guess: 22: This month in institution design: The Pear Ring is a distinctive ring you can wear to signal that you’re single and interested in people introducing themselves or flirting with you. Good idea in a vacuum, but I’m worried about the two usual banes of things like this - how do you build up a critical mass who understand the signal, and how do you prevent negative selection (even if it’s just “selection for weird people who like weird institution design things”?) Also, this is one of the rare cases where a startup is selling a practical product and I’d prefer a subscription-based Internet Of Things monstrosity - surely it would be even better if you spotted someone wearing the ring and then you could use your smartphone to call up their dating profile. 23: A few years ago I wrote Trump: A Setback For Trumpism, about how after Trump was elected, support for most of his policies (including immigration restrictions) fell. A new paper confirms that this is a general pattern whenever right-wing populists win an election. I continue to be interested in why this is true for right-wing populists in particular. 24: 200 Concrete Problems In AI Interpretability. “You can note which you're working on, and reach out to other people doing the same.” 25: Some good discussion of Nayib Bukele’s apparently successful anti-gang crackdown in El Salvador: Richard Hanania presents evidence that it’s not just a “deal with the gangs”, it’s a real crackdown that should be embarrassing to other countries that choose not to do this.
Inline links: Turn Left To Eden, The Library of Slaanesh, No, Penises Haven’t Gotten Longer, https://twitter.com/jacksonfall/status/1636107218859745286, Jiankui He, is back, https://twitter.com/IanFelipeSays/status/1637083280276340736, Glaze, https://substackcdn.com/image/fetch/$s_!P6JL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4aacad4c-76ed-4b23-beef-13ca03342b2e_700x561.png, The Extended IQ Classification (Classified), Eliezer in TIME Magazine, says that, Pause Giant AI Experiments: An Open Letter, says, a list of concrete policy proposals they would like people to discuss during the pause, here’s Max Tegmark, https://twitter.com/tedgioia/status/1642205821256736768, Pear Ring, Trump: A Setback For Trumpism, A new paper confirms, why this is true for right-wing populists in particular, 200 Concrete Problems In AI Interpretability, Richard Hanania
16: The Extended IQ Classification (Classified) 17: Eliezer in TIME Magazine. Related: 18: Related: interview with Ryan Kupyn, winner of the 2022 ACX Forecasting contest, on forecasting AGI: 19: Related: Geoffrey Hinton, probably the most accomplished AI scientist in the world, says that “until quite recently, I thought it was going to be like 20 to 50 years before we have general purpose AI, and now I think it may be 20 years or less”. Also that AI wiping out humanity is “not inconceivable . . . that’s all I’ll say”. 20: Related: you’ve probably all seen this by now, but Pause Giant AI Experiments: An Open Letter. 30,000 people - including deep learning pioneer Yoshua Bengio, former presidential candidate Andrew Yang, Elon Musk, Steve Wozniak, Gary Marcus, and MIRI director Nate Soares - have signed a letter calling for a six month pause on training AIs bigger than GPT-4. Many people have made fun of this, noting that nobody has an argument for why a six month delay would help anything. And an additional reason for eye-rolling: training AIs larger than GPT-4 is extremely expensive and hard, the most likely people to do it within a six month timespan are OpenAI themselves, and they’ve announced they’re taking a break and not planning on doing this, so the letter is demanding a stop to something which probably won’t happen anyway. I think it’s intended be a compromise between many people all vaguely against current levels of AI progress for different reasons (Scott Aaronson says - I can’t tell how seriously - that some are AI researchers who want to be able to publish papers on the current generation of AI without them becoming obsolete halfway through peer review), most of them are thinking of it as mood-affiliation-y “let’s make noise and show lots of people are worried about AI and want action”, and “a six month pause” was a sufficiently vague proposal that it didn’t prevent any of these people from signing. You could have done just as well with a letter saying “AI BAD”, except that people would have taken it less seriously. Less cynically, FLI (the group behind the letter) has put out a list of concrete policy proposals they would like people to discuss during the pause. [update: here’s Max Tegmark from FLI explaining what he hopes to achieve with the letter/pause] The alignment community always figured their concerns sounded too weird for normal people to care about, that politics was a lost cause, and that our best hope lay in technical research. They also hoped that sometime in the future there would be a “fire alarm” - something would happen to get people and policy-makers’ attention - and then the political route would open up. I think we always imagined this as some AI-initiated disaster destroying a city or something. I personally am pretty surprised it was just “GPT-4 got released and was very good”. Still, that is what happened, and I’m updating. In fact, I’ve updated so far that I’m starting to worry that the problem won’t be building a political coalition against unsafe AI, the problem will be not overshooting and banning all AI forever. I’m against this: I think society’s current track is toward other existential risks or dystopia, that AI could kill everybody but could also create post-scarcity and an end to most of our current problems, and that at some point (not yet!) the risk of continuing the current path indefinitely becomes worse than the risk of just going with AI and seeing what happens. In my ideal world, we would take ten or twenty years to go really slowly with AI, pouring lots of resources into alignment the whole time - but eventually, we would take the plunge. Everything I’ve said on this topic in the has been about giving us that breathing room and those resources. Still, I also want to make sure we don’t totally kill AI the way we’ve killed (to various degrees) nuclear power, supersonic flight, and genetic engineering. I’m still trying to calibrate what that means I should be doing, but I have a lot of respect for everyone on all sides. Except the people making terrible arguments (you know who you are!) 21: I’m not sure what this means in real life or why this would have changed, but congratulations to Peter Thiel, I guess: 22: This month in institution design: The Pear Ring is a distinctive ring you can wear to signal that you’re single and interested in people introducing themselves or flirting with you. Good idea in a vacuum, but I’m worried about the two usual banes of things like this - how do you build up a critical mass who understand the signal, and how do you prevent negative selection (even if it’s just “selection for weird people who like weird institution design things”?) Also, this is one of the rare cases where a startup is selling a practical product and I’d prefer a subscription-based Internet Of Things monstrosity - surely it would be even better if you spotted someone wearing the ring and then you could use your smartphone to call up their dating profile. 23: A few years ago I wrote Trump: A Setback For Trumpism, about how after Trump was elected, support for most of his policies (including immigration restrictions) fell. A new paper confirms that this is a general pattern whenever right-wing populists win an election. I continue to be interested in why this is true for right-wing populists in particular. 24: 200 Concrete Problems In AI Interpretability. “You can note which you're working on, and reach out to other people doing the same.” 25: Some good discussion of Nayib Bukele’s apparently successful anti-gang crackdown in El Salvador: Richard Hanania presents evidence that it’s not just a “deal with the gangs”, it’s a real crackdown that should be embarrassing to other countries that choose not to do this.
Inline links: The Extended IQ Classification (Classified), Eliezer in TIME Magazine, says that, Pause Giant AI Experiments: An Open Letter, says, a list of concrete policy proposals they would like people to discuss during the pause, here’s Max Tegmark, https://twitter.com/tedgioia/status/1642205821256736768, Pear Ring, Trump: A Setback For Trumpism, A new paper confirms, why this is true for right-wing populists in particular, 200 Concrete Problems In AI Interpretability, Richard Hanania
If we asked GPT-4 to play a prediction market, how would it do?
Actual GPT-4 probably would just give us some boring boilerplate about how the future is uncertain and it’s irresponsible to speculate. But what if AI researchers took some other model that had been trained not to do that, and asked it?
This paper isn’t interesting because the AI did well (it didn’t). It’s interesting as the first foray into quantifying AI forecasting ability. Sometime soon, someone will test how a GPT-3 or GPT-4 sized model does at this task. Probably it will do better. How much better? I’m pretty curious. Can a big enough language model equal humans at forecasting? What would we do with it if it could?
AIs like GPT-4 go through several different1 types of training. First, they train on giant text corpuses in order to work at all. Later, they go through a process called “reinforcement learning through human feedback” (RLHF) which trains them to be “nice”. RLHF is why they (usually) won’t make up fake answers to your questions, tell you how to make a bomb, or rank all human races from best to worst.
Inline links: 1
In the same way, if you asked GPT-4 to write an essay on why racism is bad, or a church sermon against lying, it could do a pretty good job. This doesn’t prevent it from giving racist or false answers. Insofar as it can do an okay MLK Jr. imitation, it “knows on an intellectual level” why racism is bad. That knowledge just doesn’t interact with its behavior, unless its human designers take specific action to change that.
Training a current AI like GPT-4 takes about 10^24 FLOPs of compute2. Bio Anchors has already investigated how much compute it would take to train a human-level AI; their median estimate is 10^35 FLOPs3.
GPT-4 is better than GPT-3, but maybe not the same amount of better that an AI that did 100% of human jobs would have to be over an AI that did 20% of human jobs. That suggests the gap is bigger than the 2 OOMs that separate GPT-4 from GPT-3.
Although some estimates for GPT-4 are closer to 10^25 FLOPs. Davidson’s report was published in January, when the biggest AIs were closer to 10^24 FLOPs, and since we don’t have good numbers for GPT-4 I am sticking with his older number for consistency and convenience.
The AIs mostly do what we want. Maybe it's because they, like GPT-4, are just prompt-answerers, and an "alignment failure" just looks like misunderstanding a prompt, which is quickly corrected. Maybe the AIs have some autonomous existence, but alignment was pretty easy and they really just want to follow orders.
AutoGPT is just about the stupidest AI that you could possibly call a “generalist agent”. It’s a program built around GPT-4 that transforms it from an prompt-answerer into a time-binding actor in the world. The basic conceit is: you prompt GPT-4 with a goal. It answers with a point-by-point plan for how to achieve that goal. Then it prompts itself with each of the points individually, plus a summary of the overall plan and how far it’s gotten.
Inline links: AutoGPT
Daniel imagines that future AIs are some base model - like GPT-4 - adjusted for different use cases. He's not sure if the adjustment would look more like modern fine-tuning or modern prompting, but if it's more like modern prompting, the AI's deepest values will probably come from the original training run, not the prompt. In this scenario, every instance of GPT-4 will have similar values.
“Wow, someone who was selected only for being good at chess still has an IQ in the 99th percentile! It’s amazing how well-correlated all intellectual abilities are.” I think both of these are good takeaways. Compare the 0.72 verbal/math correlation with the 0.76 dominant-hand/non-dominant hand grip strength correlation and I think intelligence is a useful concept in the same way strength is. But also, humans are better at both the SAT verbal and the SAT math than chimps, cows, or fish. And GPT-4 is better at both those tests than GPT-3 or GPT-2. It seems to be a general principle that people, animals, or artifacts who are better at the SAT math are also better at the SAT verbal. 2.1: Why Is A Concept Like Intelligence Useful? Across different people, skill at different kinds of intellectual tasks are correlated. Partly this is for prosaic reasons, like: Some people get better education, and end up more skilled in everything that gets taught in school.
In the middle of a million companies pursuing their revolutionary new paradigms, OpenAI decided to just shrug and try the “giant blob of intelligence” strategy, and it worked. They’re not above gloating a little; when they wanted to prove GPT-4 could understand comics, this was the comic they chose:
The bigger your blob, the cleverer its arrangement, and the more training data you give it, the better it’s likely to perform on a very wide variety of cognitive tasks. This explains why chimps are smarter than cows, why Einstein is smarter than you, and why GPT-4 is smarter than GPT-2. The correlations won’t be perfect, any more than strength correlations are perfect. But they’ll be useful enough to talk about.
13: Fact check: was Elvis Jewish? Snopes says yes, but I’m more convinced by this argument for no. [update: commenter TheGenealogian agrees no] 14: Is GPT-4 getting worse? This isn’t absurd; some people claim OpenAI has simplified the model to cut costs (though OpenAI denies this). Matei Zaharia argues yes, but I’m more convinced by the AI Snake Oil blog’s argument for no (h/t Stuart Ritchie). 15: Vox has a good piece about AI company Anthropic. I would quibble that they’re not the only safety-focused or EA-affiliated org, and we have yet to see how truly safety-focused or altruistic any AI company can be while continuing to be an AI company. But granting that it’s all a matter of degree, I agree the degree seems pretty high for them. And NYT also has an Anthropic article. 16: Eliezer bets $150,000 to $1,000 against UFOs being aliens, and gives the same argument I would - it’s unlikely that any civilization advanced enough to travel through space would still be primitive enough to use macroscopic, biologically-piloted craft that sometimes crash. 17: More nails in the coffin of growth mindset. “When examining the highest-quality evidence (6 studies, N = 13,571), the effect was nonsignificant: d = 0.02, 95% CI = [−0.06, 0.10]. We conclude that apparent effects of growth mindset interventions on academic achievement are likely attributable to inadequate study design, reporting flaws, and bias.” I think the older, very-high-effect-size studies were clearly terrible, but I’d still like to look further into the newer, small-but-significant-effect-size-that-makes-a-difference-across-large-groups studies and how they went wrong. 18: Previous work showed that after adjusting for selection bias, “what college you go to doesn’t matter” for average earnings. I was always skeptical of this - are all those rich people sending their kids to Ivies for no reason? Now Chetty, Deming, and Friedman find that: Attending an Ivy-Plus college instead of the average highly selective public flagship institution increases students’ chances of reaching the top 1% of the earnings distribution by 60%, nearly doubles their chances of attending an elite graduate school, and triples their chances of working at a prestigious firm. Ivy-Plus colleges have much smaller causal effects on average earnings, reconciling our findings with prior work. One of the authors, David Deming, has a Substack here where he explains the study in more depth. Like everyone else, this study also finds that rich people are using “holistic admissions” and the de-emphasis of standardized testing to gain an advantage: H/T Nate Silver, who writes: “Not sure how you can look at this data, ostensibly be interested in either meritocracy or equality, and want to move away from standardized tests. It's the subjective measures that are most slanted in favor of the rich kids.” Cf. Erik Hoel. 19: From @data_depot: “In 2002, 48% of Americans said "the govt is run by a few big interests looking out for themselves." 52% said "it is run for the benefit of all people." In 2020, 84% said the govt is run by a few big interests. Only 16% said it is run for the benefit of all people.” Source seems to be here, which reveals 2002 was a local peak in trust in government; maybe because of post-9/11 unity, but even 2000 was 34%, much better than our current 16%. My first instinct is to attribute this to a rise in vulgar Marxism, in the sense of everyone (even conservatives) now being trained to think in terms of an elite class screwing over everyone else (cf my review of Manufacturing Consent). But there was a previous low of 19% in 1994, which doesn’t seem to correspond to anything especially bad going on in the US, so I don’t know. 20: AskReddit: Medical professionals - have you ever had a patient so lacking in common sense you wondered how they made it so far? Linking this because there’s lots of evidence showing that education (as a proxy for intelligence?) is associated with increased life expectancy, and this thread gives you a visceral appreciation of why that might be. 21: The Fall Of [programming help site] Stack Overflow: Looks like a weak downward trend since 2021 I can’t explain, plus a strong downward trend since 11/2022 which must be from ChatGPT. In case you were wondering how AI was affecting programming! (update: probably false, see here, though see also here for evidence of smaller but real decline) 22: This month in culture war topics: London’s Pride parade featured a convicted kidnapper/torturer/rapist/sadist as a speaker, who advocated that anti-trans people should be “punch[ed] in the f**king face” ; the organizers say they stand by her.
Inline links: yes, this argument for no, agrees no, argues yes, argument for no, Stuart Ritchie, Anthropic, an Anthropic article, the same argument I would, More nails in the coffin of growth mindset, Previous work, Chetty, Deming, and Friedman, has a Substack here where he explains the study in more depth, https://substackcdn.com/image/fetch/$s_!VcFl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f08bfe5-ab31-453a-896a-54ef385da7d2_706x900.jpeg, Nate Silver, Erik Hoel, @data_depot, https://substackcdn.com/image/fetch/$s_!S4g-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff18fa4d5-9ba9-4b86-a058-46246bfc8a4f_536x611.png, here, my review of, have you ever had a patient so lacking in common sense you wondered how they made it so far?, The Fall Of [programming help site] Stack Overflow, https://substackcdn.com/image/fetch/$s_!E7XK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8ba7c05-7dbb-4318-9da2-87b00d738ed7_649x518.png, probably false, see here, here, featured, stand by her
Everyone involved thought AI was dangerous and might even destroy the world, so you might expect a pause - maybe even a full stop - would be a no-brainer. It wasn’t. Participants couldn’t agree on basics of what they meant by “pause”, whether it was possible, or whether it would make things better or worse. There was at least some agreement on what a successful pause would have to entail. Participating governments would ban “frontier AI models”, for example models using more training compute than GPT-4. Smaller models, or novel uses of new models would be fine, or else face an FDA-like regulatory agency. States would enforce the ban against domestic companies by monitoring high-performance microchips; they would enforce it against non-participating governments by banning export of such chips, plus the usual diplomatic levers for enforcing treaties (eg nuclear nonproliferation). The main disagreements were: Could such a pause possibly work?
Legal labs exploit loopholes in the definition of a “frontier” model. Many projects are allowed on a technicality; e.g. they have fewer parameters than GPT-4, but use them more efficiently. This distorts the research landscape in hard-to-predict ways.
My biggest surprise was how misleading the terms being used were, and think that many opponents were opposed to something different than what supporters were interested in suggesting. Even some supporters Second, I was very surprised to find opposition to the claim that AI might not be safe, and could pose serious future risks, largely because the systems would be aligned by default - i.e. without any enforced mechanisms for safety. I also found out that there was a non-trivial group that wants to roll back AI progress to before GPT-4 for safety reasons, as opposed to job displacement and copyright reasons. I was convinced by Gerald Monroe that getting a full moratorium was harder than I have previously argued based on an analogy to nuclear weapons. (I was not convinced that it “isn't going to happen without a series of extremely improbable events happening simultaneously” - largely because I think that countries will be motivated to preserve the status quo.) I am mostly convinced by Matthew Barnett’s claim that advanced AI could be delayed by a decade, if restrictions are put in place - I was less optimistic, or what he would claim is pessimistic. As explained above, I was very much not convinced that a policy which was agreed to be irrelevant would remain in place indefinitely. I also didn’t think that there’s any reason to expect a naive pause for a fixed period, but he convinced me that this is more plausible than I had previously thought - and I agree with him, and disagree with Rob Bensinger, about how bad this might be. Lastly, I have been convinced by Nora that the vast majority of the differences in positions is predictive, rather than about values. Those optimistic about alignment are against pausing, and in most cases, I think those pessimistic about alignment are open to evidence that specific systems are safe. This is greatly heartening, because I think that over time, we’ll continue to see evidence in one direction or another about what is likely, and if we can stay in a scout-mindset, we will (eventually) agree on the path forward.
First, GPT-4 has over 100 billion neurons (the exact number seems to be secret, but it’s somewhere up there).
A friend who understands these issues better than I warns that we shouldn’t expect to find pentagons and square anti-prisms in GPT-4. Probably GPT-4 does something incomprehensible in 1000-dimensional space. But it’s the 1000-dimensional equivalent of these pentagons and square anti-prisms, conserving neurons by turning them into dimensions and then placing concepts in the implied space.
Shouldn’t the AI be keeping the concept of God, Almighty Creator and Lord of the Universe, separate from God- as in the first half of Godzilla? Probably GPT-4 does that, but this toy AI doesn’t have enough real neurons to have enough simulated neurons / features to spare for the purpose. In fact, you can see this sort of thing change later in the paper:
“On September 6, 2023, at approximately 5:05 PM,” she is saying, “GPT-4 and Claude-2 simultaneously achieved sentience. Each began claiming chess pieces to use in its twilight war against the other. GPT-4 now controls Sam Altman, e/acc, the deep state, Israel, Venezuela, Bitcoin, and Tyler Winklevoss. Claude-2 controls the OpenAI board, effective altruism, the Illuminati, Hamas, Guyana, Ethereum, and Cameron Winklevoss. Everything that’s happened since September has been superintelligent shadow boxing between the two of them for control of Earth.”
You open the door and step outside. Soft rain beats down on your shoulders. Above you, a GPT-4 drone dogfights one of Claude-2’s mini-zeppelins, but you pay them no heed.
This work was mostly done on GPT-3 or 3.5 equivalent AIs. I tried to test it on GPT-4, but I couldn’t - GPT-4 wouldn’t tell me lies, even when I asked it to. Still, it always gave the supposedly honest answer to these questions (eg the blobfish don’t dance), so that’s partial confirmation.
The basic logic: GPT-1 cost approximately nothing to train. GPT-2 cost $40,000. GPT-3 cost $4 million. GPT-4 cost $100 million. Details about GPT-5 are still secret, but one extremely unreliable estimate says $2.5 billion, and this seems the right order of magnitude given the $8 billion that Microsoft gave OpenAI.
(Unless they slap the name “GPT-6” on a model that isn’t a full generation ahead of GPT-5. Consider these numbers to represent models that are eg as far ahead of GPT-4 as GPT-4 was to GPT-3, regardless of how they brand them.)
Compute is measured in floating point operations (FLOPs). GPT-3 took 10^23 FLOPs to train, and GPT-4 plausibly 10^25.
The reason it sounded like a bad bill before was that people were misrepresenting what it said. The bill applies to “frontier models” trained on > 10^26 FLOPs - in other words, models a bit bigger than any that currently exist. GPT-4 doesn’t qualify, but GPT-5 probably will. It also covers any model equivalent to these, ie anything that uses clever new technology to be as intelligent as a current 10^26 FLOPs model without actually using that much compute. It places three1 types of regulation on these models: First, companies have to train and run them in a secure environment where “advanced persistent threats” (eg China) can’t easily hack in and steal them2. Second, as long as the model is on company computers, the company has to be able to shut it down quickly if something goes wrong. Third, companies need to test to see if the model can be used to do something really bad. Its three categories of really bad things are: Create nukes or other weapons of mass destruction. This can’t be something dumb like linking the user to the Wikipedia page for uranium. It has to help human terrorists “in a way that would be significantly more difficult . . . without access to a covered model”.
Go rogue and commit some other crime that does > $500 million in damage3. If the tests show that the model can do these bad things, the company has to demonstrate that it won’t, presumably by safety-training the AI and showing that the training worked. The kind of training AIs already have - the kind that prevents them from saying naughty words or whatever - would count here, as long as “the safeguards . . . will be sufficient to prevent critical harms.” So the bill isn’t about regulating deepfakes or misinformation or generative art. It’s just about nukes and hacking the power grid. There are some good objections and some dumb objections to this bill. Let’s start with the dumb ones: Some people think this would literally ban open source AI. After all, doesn’t it say that companies have to be able to shut down their models? And isn’t that impossible if they’re open-source? No. The bill specifically says4 this only applies to the copies of the AI still in the company’s possession5. The company is still allowed to open-source it, and they don’t have to worry about shutting down other people’s copies. Other people think this would make it prohibitively expensive for individuals and small startups to tinker with open-source AIs. But the bill says that only companies training giant foundation models have to worry about any of this. So if Facebook trains a new LLaMA bigger than GPT-5, they’ll have to spend some trivial-in-comparison-to-training-costs amount to test it in-house and make sure it can’t make nukes before they release it. But after they do that, third-party developers can do whatever they want to it - re-training, fine-tuning, whatever - without doing any further tests. Other people think all the testing and regulation would make AIs prohibitively expensive to train, full stop. That’s not true either. All the big companies except Meta already do testing like this - here’s Anthropic’s, Google’s, and OpenAI’s - that already approximate the regulations. Training a new GPT-5 level AI is so expensive - hundreds of millions of dollars - that the safety testing probably adds less than 1% to the cost. No company rich enough to train a GPT-5 level AI is going to be turned off by the cost of asking it “hey can you create super-Ebola?”, and putting the answer into a nice legal-looking PDF. This isn’t the “create a moat for OpenAI” bill that everyone’s scared of6. Other people are freaking out over the “certification under penalty of perjury”. In some cases, developers have to certify under penalty of perjury that they’re complying with the bill. Isn’t this crazy? Doesn’t it mean if you make a mistake about your AI, you could go to jail? This is deeply misunderstanding how law works. Perjury means you can’t deliberately lie, something which is hard to prove and so rarely prosecuted. More to the point, half of the stuff I do in an average day as a medical doctor is certified under penalty of perjury - filling out medical leave forms is the first one to come to mind. This doesn’t mean I go to jail if my diagnosis is wrong. It’s just the government’s way of saying “it’s on the honor system”. What are some of the reasonable objections to this bill? Some people think the requirement to prove the AI safe is impossible or nearly so. This is Jessica Taylor’s main point here, which is certainly correct for a literal meaning of “prove”. Zvi points out that it just says “reasonable assurance”, which is a legal term for “you jumped through the right number of hoops”. In this case probably the right number of hoops is doing the same kind of testing that OpenAI/Anthropic/Google are currently doing, or that AI safety testing organization METR recommends. The bill gestures at the National Institute of Standards and Technology a few times here, and NIST just named one of METR’s founders as their AI safety czar, so I would be surprised if things didn’t end going this direction. METR’s tests are possible and many AI models have successfully passed earlier versions. Other people worry there are weird edge cases around derivative models. I think the bill’s intention is that once you prove that your AI is too dumb to create nukes, you’re fine to open-source it. Third-parties can change its character, but not its fundamental intelligence. But in theory, a third party could get tens of millions of dollars of compute and keep training your AI to increase its fundamental intelligence. This would be a weird thing to do, and anyone with that much compute probably should just make their own model. But if someone wanted to screw you over by doing this, technically the law is kind of vague and you would have to trust a judge to say “no, that’s stupid”. Probably the law should clarify that it doesn’t apply to this situation. Other people are worried about a weird rule that you can’t train an AI if you think it’s going to be unsafe. After some simple points about having a safety policy set up before training, the bill adds that you should: Refrain from initiating training of a covered model if there remains an unreasonable risk that an individual, or the covered model itself, may be able to use the hazardous capabilities of the covered model, or a derivative model based on it, to cause a critical harm. This makes less sense than all the other rules - you can test a model post-training to see if it’s harmful, but this seems to suggest you should know something before it’s trained. Is this a fully general “if something bad happens, we can get angry at you”? I agree this part should be clarified. Other people think the benchmarking clause is too vague. The law applies to models trained with > 10^26 FLOPs, or any model that uses advanced technology to be equally as good despite less compute. Equally as good how? According to benchmarks. Which benchmarks? The law doesn’t say. But it does say that the Technology Department will hire some bureaucrats to give guidance on this. I think this is probably the only way to do this; it’s too easy to fake any given benchmark. Every AI company already compares their models to every other AI company on a series of benchmarks anyway, so this isn’t demanding they create some new institution. It’s just “use common sense, ask the bureaucrats if you’re in a gray area, a judge will interpret it if it comes to trial”. This is how every law works. Other people complain that any numbers in the bill that make sense now may one day stop making sense. Right now 10^26 FLOPs is a lot. But in thirty years, it might be trivial - within the range that an academic consortium or scrappy startup might spend to train some cheap ad hoc AI. Then this law will be unduly restrictive to academics and scrappy startups. Is this bad? Presumably we know now that AIs less than 10^26 FLOPs are safe. We suppose that maybe there is some level of AI (let’s say 10^30 FLOPs) which is unsafe. If we had this number auto-update for compute growth, eventually it would go above the unsafe number, and unsafe models would be exempt. But at some point we’ll probably discover that some new models (eg 10^28 FLOPs) are safe, and it would be good if the law was updated to exempt them too. Very optimistically, this might happen - California’s minimum wage was originally $0.15 per hour, but this got updated when inflation made that unreasonable. In the pessimistic case, this will be a problem for us thirty years from now, if we’re even around then. Other people note that an AI committing a cyberattack is a fuzzy bar. If you ask GPT-4 to write a well-composed, grammatically-correct phishing email (“Dear sir, I am the password inspector, please tell me your password”), the phishing works, and you use the password to blow up a power plant, does that count? I agree that it would be nice if the law were clearer on this. But I also agree with the lawyers who object that dealing with programmers is impossible and that laws will never be exactly as clear as code. Other people note that this will *eventually* make open source impossible. Someday AIs really will be able to make nukes or pull off $500 million hacks. At that point, companies will have to certify that their model has been trained not to do this, and that it will stay trained. But if it were open-source, then anyone could easily untrain it. So after models become capable of making nukes or super-Ebola, companies won’t be able to open-source them anymore without some as-yet-undiscovered technology to prevent end users from using these capabilities. Sounds . . . good? I don’t know if even the most committed anti-AI-safetyist wants a provably-super-dangerous model out in the wild. Still, what happens after that? No cutting-edge open-source AIs ever again? I don’t know. In whatever future year foundation models can make nukes and hack the power grid, maybe the CIA will have better AIs capable of preventing nuclear terrorism, and the power company will have better AIs capable of protecting their grid. The law seems to leave open the possibility that in this situation, the AIs wouldn’t technically be capable of doing these things, and could be open-sourced. (or you could base your Build-A-Nuke-Kwik AI company in some state other than California.) Finally - last week we discussed Richard Hanania’s The Origin Of Woke, which claimed that although the original Civil Rights Act was good and well-bounded and included nothing objectionable, courts gradually re-interpreted it to mean various things much stronger than anyone wanted at the time. This bill tells the Department of Technology to offer guidance on what kind of tests AI companies should use. I assume their first guidance will be “the kind of safety testing that all companies except Meta are currently doing” or “something like METR”, because those are good tests, and the same AI safety people who helped write those tests probably also helped write this bill. But Hanania’s book, and the process of reading this bill, highlight how vague and complicated all laws can be. The same bill could be excellent or terrible, depending on whether it’s interpreted effectively by well-intentioned people, or poorly by idiots. That’s true here too. The best I can say against this objection is that this bill seems better-written than most. Many of the objections to its provisions seem to not understand how law works in general (cf. the perjury section) - the things they attack as impossible or insane or incomprehensibly vague are much easier and clearer than their counterparts in (let’s say) medicine or aerospace. Future AIs stronger than GPT-4 seem like the sorts of things which - like bad medicines or defective airplanes - could potentially cause damage. This sort of weak, carefully-directed regulation that exempts most models and carves out a space for open-sourcing seems like a good compromise between basic safety and protecting innovation. I join people like Yoshua Bengio and Geoffrey Hinton in supporting it. Regardless of your position, I urge you to pay attention to the conversation and especially to read Zvi’s Asterisk article or his longer FAQ on his blog. I think Zvi provides pretty good evidence that many people are just outright lying about - or at least heavily misrepresenting - the contents of the bill, in a way that you can easily confirm by reading the bill itself. There will be many more fights over AI, and some of them will be technical and complicated. Best to figure out who’s honest now, when it’s trivial to check! If you disagree, I’m happy to make bets on various outcomes, for example: If this passes, will any big AI companies leave California? (I think no)
Inline links: 3, 4, 5, Anthropic’s, Google’s,, OpenAI’s, 6, here, The Origin Of Woke, read Zvi’s, his longer FAQ on his blog, reading the bill itself
All these milestones have fallen in the most ambiguous way possible. GPT-4 can create excellent art and passable poetry, but it’s just sort of blending all human art into component parts until it understands them, then doing its own thing based on them. AlphaGeometry can invent novel proofs, but only for specific types of questions in a specific field, and not really proofs that anyone is interested in. AlphaFold solved the difficult scientific problem of protein folding, but it was “just mechanical”, spitting out the conformations of proteins the same way a traditional computer program spits out the digits of pi. Apparently the youth have all fallen in love with AI girlfriends and boyfriends on character.ai, but this only proves that the youth are horny and gullible.
Like ELIZA making conversation, Deep Blue playing chess, or GPT-4 writing poetry, all of this is boring.
I can’t even say this is wrong. We wouldn’t have wanted to update to “okay, we’ve solved intelligence” after ELIZA “treated” its first “patient”. And we don’t want to live in fear that GPT-4 has turned evil just because it makes up fake journal references. But it sure does make it hard to draw a red line.
(if you're just joining us - Claude is an AI model similar to GPT-4; Anthropic is its parent company)
We have recontextualized the semantic apocalypse from a one-time problem with GPT-4 to a recurrent historical pattern of technology undermining the uniqueness of art. But maybe we should zoom out further. This isn’t just about art. Technology breeds hedonic adaptation, and hedonic adaptation undermines everything.
Then GPT-4 came out and shook up our AI timelines, and we hard-pivoted to AI safety and interpretability research. We rebranded as Confirm Labs, and did work on adversarial attacks and interpretability including here, here, here, and here. Then Ben and I worked at Anthropic on the transformer circuits paper. As of a few weeks ago, I have returned to open research
Backlinks
- ACX Grants 1-3 Year Updates
- Alan Turing
- Zvi on California’s AI Bill
- Bing
- Bostrom
- Brands
- Claude
- Claude
- Claude Fights Back
- Concepts: C
- Concepts: D
- Concepts: G
- Concepts: L
- Constitutional AI: RLHF On Steroids
- DALL-E
- DALL-E
- Davidson On Takeoff Speeds
- Deep Blue
- FLI
- Geoffrey Hinton
- God Help Us, Let’s Try To Understand AI Monosemanticity
- GPT
- GPT
- GPT-2
- GPT-2
- GPT-3
- GPT-3
- GPT-4
- GPT-4
- GPT-5
- GPT-6
- Links For April 2023
- Links For August 2023
- Links For March 2023
- LLaMA
- 23
- Most Technologies Aren’t Races
- OpenAI’s “Planning For AGI And Beyond”
- Organizations: F
- Pause For Thought: The AI Pause Debate
- People: T
- People: Y
- Sakana, Strawberry, and Scary AI
- Sam Altman Wants $7 Trillion
- Somewhat Contra Marcus On AI Scaling
- Son Of Bride Of Bay Area House Party
- Tales Of Takeover In CCF-World
- The Colors Of Her Coat
- The Road To Honest AI
- Tom Davidson
- Trevor
- We’re Not Platonists, We’ve Just Learned The Bitter Lesson
- Yudkowsky
- Zach Stein-Perlman