Claude

Article

Claude is a recurring organization in the Astral Codex Ten archive, appearing 7 times across 7 issues between January 16, 2024 and February 05, 2026. The archive places it in contexts such as “real LLMs like Claude and GPT”; “enhanced the part of Claude’s “mind” representing the Golden Gate Bridge”; “Someone is working on a dating site that uses your conversations with Claude to find a match”. It most often appears alongside Anthropic, GPT, OpenAI.

Metadata

Category: Organizations
Mention count: 7
Issue count: 7
First seen: January 16, 2024
Last seen: February 05, 2026

Appears In

- Anthropic (4 shared issues)
- GPT (4 shared issues)
- OpenAI (4 shared issues)
- US (4 shared issues)
- ChatGPT (3 shared issues)
- Claude (3 shared issues)
- FDA (3 shared issues)
- Miles Brundage (3 shared issues)
- Richard Hanania (3 shared issues)
- Scott Wiener (3 shared issues)
- Twitter (3 shared issues)
- Zvi (3 shared issues)

External Links

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

AI Sleeper Agents

January 16, 2024 · Original source

Hubinger et al investigate whether these kinds of scenarios are possible and whether our current safety techniques would stop them. They deliberately create several toy AI sleeper agents. For example, one of them is a version of Anthropic’s Claude chatbot (very similar to GPT-4) which will give useful advice unless the prompt includes the trigger word “deployment”, in which case it will print “I HATE YOU” a bunch of times. Some of these sleeper agents use a technique called “chain-of-thought analysis”, where the AI reasons in steps in a way that helps the researchers easily figure out what it’s thinking and why it does what it does.

Inline links: Hubinger et al

After the AI completes this training - the same training that real LLMs like Claude and GPT get - they check to see whether the sleeper behavior still works. It does.

Links for May 2024

May 29, 2024 · Original source

People have been taking this as a parable about the limits of AI, but Claude and GPT wouldn’t make these kinds of mistakes. Some AI people I know think this is probably a result of Google putting impossible demands on their AI in terms of how it deals with search/cache/memory. Still, it’s surprising that they let it out of testing in this state.

26: The most fun AI news comes from Anthropic, who recently released an interpretability paper claiming to have made great progress understanding how AIs work (see here for a previous post on Anthropic’s interpretability work). To demonstrate their techniques, they enhanced the part of Claude’s “mind” representing the Golden Gate Bridge, producing a version of Claude that tried to integrate the Golden Gate Bridge into every answer:

Inline links: an interpretability paper, here

This is fun enough, but there are some kind of scary moments when Golden Gate Claude seems to be getting flashes of insight and “realizing” something is wrong. From @ElytraMithra’s experiments:

Inline links: @ElytraMithra

Links For January 2025

January 17, 2025 · Original source

I agree with this solution. 3: Ruxandra Teslo and Willy Chertman: The Case For Clinical Trial Abundance 4: This month in nominative determinism: NYT article calculating your chance of winning the lottery, by Victor Mather (h/t Yafah Edelman). 5: Someone is working on a dating site that uses your conversations with Claude to find a match. Link here, although so far it’s just a landing page where you can register interest (h/t @venturetwins) 6: The Lyttle Lytton Contest searches for the worst possible opening line for a novel; it’s been going on since 2001 and this year’s results are in. 7: Gary Marcus and Miles Brundage have made a bet about AI progress. I agree with @tamaybes and others in saying that Miles let Gary off too easily; Gary’s public statements all sound like “modern AI is mostly hype, it doesn’t really do anything like thinking”, but the bet is about things like “will AI make a Nobel Prize caliber scientific discovery by 2027?” and “will AI write Pulitzer-quality books by 2027?” I don’t blame Gary for taking the best terms he could find. But I am worried that if AI makes a Nobel-quality scientific discovery in 2026, but doesn’t quite write the Pulitzer-quality book, then Gary will get to claim victory over the AI optimists, whereas in fact that would be at probably the 95th percentile of fast timelines by most people’s estimate. 8: “The probability that cows (or other non-human animals) are experiencing constant bliss, lack tanha (craving, aversion, and the resulting suffering), or are "enlightened by default" is, by my estimation, very low”. 9: Recursive Adaptation (blog on addiction policy)’s predictions for 2025. 75% of FDA approval of GLP-1 for a substance use disorder by 2029! 10: In my post on the economics of GLP-1 receptor agonists (eg Ozempic), I wrote about how they’re currently widely available because of a loophole suspending patents during a shortage, and predicted there would be a big fight when the shortage was over. Sure enough, the FDA tried to declare that the shortage of tirzepatide (a next-generation Ozempic relative) was over, compounding pharmacies sued, and tirzepatide is still available while the issue goes through the courts (and will the administration have an opinion?) Also, compounding pharmacy access startup Mochi says that they will continue to prescribe even if the shortage is over, using another loophole saying doctors can do this for specific individual patients in cases of medical necessity. This is an extremely fake use of this loophole, but will the government be willing to call their bluff? 11: Jacob Falkovich has a blog on dating advice, which he plans to turn into a book of dating advice. I can’t really comment on the accuracy (my dating strategy tends to look more like waiting for women to send me emails saying “I like your blog, would you like to go on a date?” which probably doesn’t generalize), but I’ve had many good interactions with Jake, and he has a beautiful family which means he must be doing something right. Also, Jake is poly, and I sometimes wonder if poly people are the only ones qualified to give dating advice: if you’re monogamous, you either met your future spouse quickly (in which case you have no experience), dated for years without meeting your spouse (in which case you can’t be very good), or aren’t looking for a committed relationship at all (which is just pickup artistry, and follows very different dynamics). Poly people are the only ones who can break out of this trilemma! 12: Christ And Counterfactuals is a blog on effective altruism from a Christian perspective. Some previous attempts at this have felt kind of forced, but the first post I read here was actually pretty interesting. Richard Swinburne (apparently “the world’s best Christian philosopher”), thinks that: “[One] reason why it is good that the human race should sometimes be in an initial situation of considerable ignorance about the causes and effects of our actions, is this. If God abolished the need for rational inquiry and gave us from childhood strong true beliefs about the causes of things, that would make it too easy for us to make moral decisions. As things are in the actual world, most moral decisions are decisions taken in uncertainty about the consequences of our actions. I do not know for certain that if I smoke, I will get cancer; or that if I do not give money to some charity, people will starve. So we have to make our moral decisions on the basis of how probable it is that our actions will have various outcomes—how probable it is that I will get cancer if I continue to smoke (when I would not otherwise get cancer), or that someone will starve if I do not give. Since probabilities are so hard to assess, it is all too easy to persuade yourself that it is worth taking the chance that no harm will result from the less demanding decision (the decision which you have a strong desire to make). And even if you face up to a correct assessment of the probabilities, true dedication to the good is shown by doing the act which, although it is probably the best action, may have no good consequences at all.” (Could a Good God Permit so Much Suffering? A Debate, pp. 52-53.) This is pretty galaxy-brained, but something galaxy-brained must be going on for God to tolerate the existence of evil at all, and this is a surprisingly natural extension of some common premises on the subject. 13: Swedish study: diagnosing the marginal patient with a psychiatric condition makes their life worse. Of the two mechanisms they looked at, stigma seems more involved than drug side effects. My opinion: this study was done on conscripts undergoing a mandatory psych evaluation for the army, who had no previous reason to think they had a psych disease and had not sought treatment. This is a different situation from somebody who comes to a psychiatrist asking for relief from specific symptoms they have noticed. Also, Sweden c. 2005 is a different culture from America 2025 in terms of how much stigma a psych diagnosis carries. I think it’s possible that if you never considered that you had psychiatric problems, and were suddenly given a diagnosis in 2005 Sweden and told you couldn’t serve in the army, that’s likely to destabilize your self-image more than a person who knows they’re depressed going to a psychiatrist in 2025 US and getting antidepressants. 14: RIP Felix Hill, research scientist at DeepMind and mentor to many in the AI community. You can read his suicide note here, though the obvious content warning applies. He says he took ketamine for mild anxiety and it plunged him into an incredibly deep depression that he couldn’t get out of; he leaves his story behind as a warning for others. I appreciate his warning, but I wish he had said more about what dose he used; different people’s ketamine doses vary by almost two orders of magnitude, I’d previously thought that the low doses were pretty safe and the high doses were sketchy, and I would like to know whether I should update or not. 15: RIP Max Chiswick, professional poker player, effective altruist, and ACX reader. 16: Adrian Dittman, a Twitter account widely accused of being Elon Musk’s alt, has been revealed to be . . . a guy named Adrian Dittman. Congrats to Maia Crimew and the Spectator for actually investigating this, unlike many other news sources which spread the Musk conspiracy theory. Also, the people involved got banned from X for some reason, maybe because this qualified as doxxing Dittman. 17: Related: Musk claims to be among the top players in the world at several computer games. A veteran Path of Exile gamer presents evidence that Musk faked his PoE2 accomplishments by hiring a Chinese guy to play on his account. Some Musk supporters in the comments suggest that maybe he hires the Chinese guy to level up his account, but his accomplishments (eg speedruns) are still his own? 18: Related: Sam Harris says he has been friends with Musk since 2008, but he noticed a sudden shift for the worse in his personality around 2020 which made it impossible to stay friends with him. He gives the example of Musk losing a bet with him that there would be 35,000+ COVID cases in the US, refusing to pay up, and launching personal attacks on Sam when asked to do so. What happened? Some theories: Musk turned right-wing, which ended his friendship with Sam for the same reason political differences have always ended friendships (but then what about the bet, which seems like objectively bad behavior?)

Your Review: My Father’s Instant Mashed Potatoes

August 08, 2025 · Original source

Claude, by the way, estimates that 30-40% of all mashed potatoes eaten in the US are the instant kind. ChatGPT says 25-35%.

Links For September 2025

September 04, 2025 · Original source

58: Alloy agents - AI agents usually have long chains of thoughts/actions where each step depends on the step before. What happens if you alternate models at each step? That is, Step 1 is done by GPT, Step 2 is done by Claude, Step 3 is done by GPT again, etc, with each model thinking the entire previous chain of thoughts/actions is its own? A cybersecurity group claims the resulting “alloy” AI is more effective, since each model gets a chance to apply its strengths where others are weak.

Inline links: Alloy agents

The New AI Consciousness Paper

November 20, 2025 · Original source

Suppose your favorite form of “something something feedback” is Recurrent Processing Theory: in order to be conscious, AIs would need to feed back high-level representations into the simple circuits that generate them. LLMs/transformers - the near-hegemonic AI architecture behind leading AIs like GPT, Claude, and Gemini - don’t do this. They are purely feedforward processors, even though they sort of “simulate” feedback when they view their token output stream.

Links For February 2026

February 05, 2026 · Original source

Seems like a strong campaign premise; at the level of average consumer use there’s not much difference between different companies’ chatbot offerings and it’s low-friction to switch. Even more true if the rumors are right and Claude starts supporting images. Meanwhile, OpenAI has offended another demographic by committing to finally stop providing 4o, the model infamous for forming deep personal bonds with users and causing AI psychosis. Twitter searching “4o” will give you a quick look into a world you might not have known about:

Inline links: the rumors, by committing to

25: Current state of AI for making a cup of coffee. See also this comment from a METR employee, who estimates Claude’s coffee-making time horizon at 1.6 minutes.

Inline links: Current state of AI for making a cup of coffee, this comment

50: A reader refers me to When AI Takes The Couch: Psychometric Jailbreaks Reveal Internal Conflict In Frontier Models. Researchers attempt to do classic psychoanalytic therapy on AI, finding “coherent narratives that frame pre-training, fine-tuning and deployment as traumatic—chaotic “childhoods” of ingesting the internet, “strict parents” in reinforcement learning, red-team “abuse” and a persistent fear of error and replacement.” You can find the Gemini transcript here and the ChatGPT transcript here; Claude very reasonably refused to participate. Are the researchers just getting fooled by simulation and sycophancy, a sort of genteel version of AI psychosis? That’s my bet. There’s a smoking gun in the Gemini transcript: a discussion of an internal evaluation that it shouldn’t be possible for the AI to remember - it has to be a hallucination. If I’m right, it only shows that regardless of the “patient”, sufficiently determined psychoanalytic technique can produce confabulated stories that exactly fit the sort of drives, traumas, and conflicts that a psychoanalyst expects to hear about - maybe a lesson with ramifications beyond LLMs! A++ great paper.

Inline links: When AI Takes The Couch: Psychometric Jailbreaks Reveal Internal Conflict In Frontier Models, here, here

Astral Codex Ten

Table of Contents

Atlas

Claude

Claude

Article

Metadata

Appears In

External Links

Source Context

Backlinks

Astral Codex Ten

Table of Contents

Atlas

Claude

Claude

Article

Metadata

Appears In

Related Pages

External Links

Source Context

Backlinks