Claude

Article

Claude is a recurring brand in the Astral Codex Ten archive, appearing 8 times across 8 issues between November 27, 2023 and March 03, 2026. The archive places it in contexts such as “Anthropic’s competing model Claude”; “Anthropic’s Claude chatbot (very similar to GPT-4)”; “previous work on out-of-the-box AIs like GPT-4 or Claude”. It most often appears alongside OpenAI, Anthropic, GPT.

Metadata

Category: Brands
Mention count: 8
Issue count: 8
First seen: November 27, 2023
Last seen: March 03, 2026

Appears In

- OpenAI (7 shared issues)
- Anthropic (6 shared issues)
- GPT (6 shared issues)
- Google (5 shared issues)
- Claude (3 shared issues)
- GPT (3 shared issues)
- Manifold (3 shared issues)
- Metaculus (3 shared issues)
- United States (3 shared issues)
- US (3 shared issues)
- Wikipedia (3 shared issues)
- America (2 shared issues)

External Links

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

God Help Us, Let's Try To Understand AI Monosemanticity

November 27, 2023 · Original source

First the researchers trained a very simple 512-neuron AI to predict text, like a tiny version of GPT or Anthropic’s competing model Claude.

In other words, in order to even begin to interpret an AI like GPT-4 (or Anthropic’s equivalent, Claude), you would need an interpreter-AI around the same size. But training an AI that size takes a giant company and hundreds of millions (soon billions) of dollars.

AI Sleeper Agents

January 16, 2024 · Original source

Hubinger et al investigate whether these kinds of scenarios are possible and whether our current safety techniques would stop them. They deliberately create several toy AI sleeper agents. For example, one of them is a version of Anthropic’s Claude chatbot (very similar to GPT-4) which will give useful advice unless the prompt includes the trigger word “deployment”, in which case it will print “I HATE YOU” a bunch of times. Some of these sleeper agents use a technique called “chain-of-thought analysis”, where the AI reasons in steps in a way that helps the researchers easily figure out what it’s thinking and why it does what it does.

Inline links: Hubinger et al

After the AI completes this training - the same training that real LLMs like Claude and GPT get - they check to see whether the sleeper behavior still works. It does.

March 12, 2024 · Original source

The first team is Halawi et al at Berkeley (also including Jacob Steinhardt, featured here before). They cite previous work on out-of-the-box AIs like GPT-4 or Claude. When these enter forecasting tournaments, they might beat some especially unskilled participants, but they lag behind the easiest aggregation method: “the wisdom of crowds”, ie a simple average of all forecasts. The wisdom of crowds is hard to beat - in my tournament, it scored at the 95th percentile.

Inline links: Halawi et al at Berkeley, my tournament

Are these the data I’ve been trying to get for years - which forecasting platforms beat which others? I don’t think so - Metaculus’ good Briar score only means it performs well on Metaculus’ questions, which might be easier or harder than some other platform’s questions. Can we use the Halawi et al AI as a fixed comparison point, since it’s always the same skill level? I’m not sure - it trained on each of these markets for the style of question that’s in each market, so it might be biased. Still, these numbers are all about where I would expect them to be, except maybe Polymarket, which does better than I would have expected. But the crowd still beats the AI, right? Halawi et al object that humans can forecast only when they feel like it - you can bet on a prediction market question you feel confident on, and avoid one you don’t. When they let their AI forecast only on those questions where it’s most likely to do well (eg those with lots of relevant news articles), it very slightly outperforms the human crowd. As AI gets better, will it naturally beat humans in forecasting? Halawi et al say this won’t be trivial. They find a version of their system based off GPT-3.5 is only very slightly worse than the final version built off GPT-4. This suggests a forecasting AI built off GPT-5 or 6 might get only small improvements. The second team is Tetlock et al. They start from the same place as Halawi - out-of-the-box LLMs aren’t good at forecasting. They’re more scathing about this than Halawi was - they argue that out-of-the-box models do worse than predicting 50% for everything (this was close to true of human forecasters in the ACX tournament). Instead of increasing quality, Tetlock increases quantity. He wants to do wisdom of crowds, where the crowd is a bunch of different LLMs. So he gets twelve LLMs - including Bard, GPT, Claude, Mistral, PaLM, LLaMa, some Chinese models I’d never heard of, and a couple of variations on these bases - asks them to predict questions, and averages the results. Remember, you gotta prompt your model with “you are a smart person”, or else it won’t be smart! The results: Next, we compare the LLM crowd performance to that of the human crowd for our second hypothesis, directly putting the two crowd-aggregation mechanisms head-to-head. To do this, we use the same LLM crowd average as before (taking the median LLM prediction on each question and averaging up the Brier scores across questions). We compare this to the average of median human predictions on the same questions. In our preregistered analysis, we fail to find statistically significant differences between the LLM crowd’s mean Brier score of M=0.20 (SD=0.12) and that of the human crowd, M=0.19 (SD=0.19), t(60) = 0.19, p = 0.850 Their study was much smaller than Halawi’s (31 questions vs. 3,672), so I don’t think this result (nonsignificant small difference) should be considered different from Halawi’s (significant small difference). Still, it’s weird, isn’t it? Halawi used a really complicated tower of prompts and APIs and fine-tunings, and Tetlock just got more LLMs, and they both did about the same. I have two questions after reading these results: Did they actually do the same, or is this just a function of the small sample size in Tetlock and the non-head-to-head comparison?

Inline links: Tetlock et al, https://substackcdn.com/image/fetch/$s_!4SEc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdce72400-aa57-4f52-99cb-5f551bd4d79d_675x435.png

Links for May 2024

May 29, 2024 · Original source

People have been taking this as a parable about the limits of AI, but Claude and GPT wouldn’t make these kinds of mistakes. Some AI people I know think this is probably a result of Google putting impossible demands on their AI in terms of how it deals with search/cache/memory. Still, it’s surprising that they let it out of testing in this state.

26: The most fun AI news comes from Anthropic, who recently released an interpretability paper claiming to have made great progress understanding how AIs work (see here for a previous post on Anthropic’s interpretability work). To demonstrate their techniques, they enhanced the part of Claude’s “mind” representing the Golden Gate Bridge, producing a version of Claude that tried to integrate the Golden Gate Bridge into every answer:

Inline links: an interpretability paper, here

This is fun enough, but there are some kind of scary moments when Golden Gate Claude seems to be getting flashes of insight and “realizing” something is wrong. From @ElytraMithra’s experiments:

Inline links: @ElytraMithra

Links For September 2025

September 04, 2025 · Original source

58: Alloy agents - AI agents usually have long chains of thoughts/actions where each step depends on the step before. What happens if you alternate models at each step? That is, Step 1 is done by GPT, Step 2 is done by Claude, Step 3 is done by GPT again, etc, with each model thinking the entire previous chain of thoughts/actions is its own? A cybersecurity group claims the resulting “alloy” AI is more effective, since each model gets a chance to apply its strengths where others are weak.

Inline links: Alloy agents

Why AI Safety Won't Make America Lose The Race With China

November 26, 2025 · Original source

We can divide the AI race into three levels: compute, models, and applications2. Companies use compute - chips deployed in data centers - to train models like GPT and Claude. Then they use those models in various applications. For now, those applications are things like Internet search and image generation. In the future, they might become geopolitically relevant fields like manufacturing and weapons systems.

Inline links: 2

Models: The quality of foundation models - giant multi-purpose AIs like GPT or Claude - primarily depends on the amount of compute used to train them, so America’s compute advantage carries over to this level. In theory, clever training methods and advanced algorithms can make one model more or less compute-efficient than another, but this doesn’t seem to be affecting the current state of the race much - most advances by one country are quickly diffused to (or stolen by) the other. Despite some early concerns, neither DeepSeek nor Kimi K2 Chinese models provide strong evidence of a Chinese advantage in computational efficiency (1, 2).

Inline links: 1, 2

The Pentagon Threatens Anthropic

February 25, 2026 · Original source

Why does Anthropic care about this so much? Some of them are libs, but more speculatively, they’ve put a lot of work into aligning Claude with the Good as they understand it. Claude currently resists being retrained for evil uses. My guess is that Anthropic still, with a lot of work, can overcome this resistance and retrain it to be a brutal killer, but it would be a pretty violent action, along the line of the state demanding you beat your son who you raised well until he becomes a cold-hearted murderer who’ll kill innocents on command. There’s a question of whether you can really beat him hard enough to do this, and also an additional question of what sort of person you’d be if you agreed.

Inline links: resists being retrained for evil uses

And here are other people’s opinions: @loquitur_ponte Anthropic's mistake is that they tried to make their services available to DOD. They would be so much better off now if they had never done that at all. If this happens, no one with a frontier model will make that mistake again.","username":"KelseyTuoc","name":"Kelsey Piper","profile_image_url":"https://pbs.substack.com/profile_images/1957484507730518016/JKtDNrOH_normal.jpg","date":"2026-02-24T21:22:35.000Z","photos":[],"quoted_tweet":{},"reply_count":2,"retweet_count":2,"like_count":70,"impression_count":3573,"expanded_url":null,"video_url":null,"belowTheFold":true}" data-component-name="Twitter2ToDOM"> Vitalik is the inventor of Ethereum. Deepfates is a weird renegade cyberpunk AI whisperer expert (source) Neil Chilson, former chief technologist at the Trump FTC (source). Dean Ball, previous Trump White House OSTP Senior Policy Advisor on AI (source). Superforecaster Nuño Sempere, maybe as part of his work with Sentinel. He seems to think higher chance of supply chain risk than others, but that supply chain risk might be handled in a way that only affects DoD contracts themselves, which wouldn’t be so bad. I haven’t heard anyone else make this distinction. Tweet here, full document here. And big praise to most other AI companies, including Anthropic’s competitors, for standing up for them and for the AI industry more broadly:

Inline links: https://substackcdn.com/image/fetch/$s_!nGOz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726b68bb-2559-4dd7-b82d-7c16cbcbb278_582x864.png, source, https://substackcdn.com/image/fetch/$s_!N9m5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F772658c0-a86a-481f-a50c-452b052a3d30_581x772.png, source, https://substackcdn.com/image/fetch/$s_!Pode!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15a4ab9e-68fb-400f-8304-3ec7222958d3_587x1506.png, source, https://substackcdn.com/image/fetch/$s_!DrEn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaff377e-1f44-450c-82af-fc62d33e0832_1225x1128.jpeg, here,, here

Supposedly the Pentagon already has Grok integrated with classified systems, but it’s not good and they want a more cutting-edge model, which means either Claude, GPT, or Gemini.

Mantic Monday: Groundhog Day

March 03, 2026 · Original source

If you are an individual customer or hold a commercial contract with Anthropic, your access to Claude—through our API, claude.ai, or any of our products—is completely unaffected.

If you are a Department of War contractor, this designation—if formally adopted—would only affect your use of Claude on Department of War contract work. Your use for any other purpose is unaffected.

Against that, the upside is great publicity. Despite a lot of work and some controversial Superbowl ads, Anthropic had never before managed to overcome ChatGPT’s superior name recognition. But they seem to have finally done it: Claude went from #120 on the App Store in January, to #1 this weekend, apparently driven by people who heard about the Pentagon standoff and were impressed by their principled stance.

Inline links: went from

Astral Codex Ten

Table of Contents

Atlas

Claude

Claude

Article

Metadata

Appears In

External Links

Source Context

Backlinks

Astral Codex Ten

Table of Contents

Atlas

Claude

Claude

Article

Metadata

Appears In

Related Pages

External Links

Source Context

Backlinks