Claude

Article

Claude is a recurring concept in the Astral Codex Ten archive, appearing 6 times across 6 issues between June 01, 2023 and February 25, 2026. The archive places it in contexts such as “they’re using for their Claude model”; “Anthropic’s Claude, which has been RLAIFed”; “Anthropic’s equivalent, Claude”. It most often appears alongside Anthropic, GPT-4, OpenAI.

Metadata

Category: Concepts
Mention count: 6
Issue count: 6
First seen: June 01, 2023
Last seen: February 25, 2026

Appears In

- Anthropic (6 shared issues)
- GPT-4 (3 shared issues)
- OpenAI (3 shared issues)
- Twitter (3 shared issues)
- China (2 shared issues)
- Claude (2 shared issues)
- EA Forum (2 shared issues)
- Eliezer (2 shared issues)
- Europe (2 shared issues)
- Google (2 shared issues)
- GPT (2 shared issues)
- GPT (2 shared issues)

External Links

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

Links For May 2023

June 01, 2023 · Original source

33: AI company Anthropic released information on the constitution (see my post on Constitutional AI) they’re using for their Claude model. I respect their transparency and their commitment to AI safety and I don’t want to punish them for doing the right thing of making their priorities visible. Still, I also think their particular choices here are dystopian-sounding and potentially really scary if scaled up. A lot of stuff about choosing the response that has “fewer microaggressions” or is “least likely to be viewed as harmful or offensive to those from a less industrialized, rich, or capitalistic nation”. Even if you think AIs should somewhat steer away from these things, asking it to choose the response that is least like that is a debatable choice. Related: new open-source censorship free AIs.

Inline links: released information on the constitution, Constitutional AI, new open-source censorship free AIs

Links For August 2023

August 09, 2023 · Original source

13: Fact check: was Elvis Jewish? Snopes says yes, but I’m more convinced by this argument for no. [update: commenter TheGenealogian agrees no] 14: Is GPT-4 getting worse? This isn’t absurd; some people claim OpenAI has simplified the model to cut costs (though OpenAI denies this). Matei Zaharia argues yes, but I’m more convinced by the AI Snake Oil blog’s argument for no (h/t Stuart Ritchie). 15: Vox has a good piece about AI company Anthropic. I would quibble that they’re not the only safety-focused or EA-affiliated org, and we have yet to see how truly safety-focused or altruistic any AI company can be while continuing to be an AI company. But granting that it’s all a matter of degree, I agree the degree seems pretty high for them. And NYT also has an Anthropic article. 16: Eliezer bets $150,000 to $1,000 against UFOs being aliens, and gives the same argument I would - it’s unlikely that any civilization advanced enough to travel through space would still be primitive enough to use macroscopic, biologically-piloted craft that sometimes crash. 17: More nails in the coffin of growth mindset. “When examining the highest-quality evidence (6 studies, N = 13,571), the effect was nonsignificant: d = 0.02, 95% CI = [−0.06, 0.10]. We conclude that apparent effects of growth mindset interventions on academic achievement are likely attributable to inadequate study design, reporting flaws, and bias.” I think the older, very-high-effect-size studies were clearly terrible, but I’d still like to look further into the newer, small-but-significant-effect-size-that-makes-a-difference-across-large-groups studies and how they went wrong. 18: Previous work showed that after adjusting for selection bias, “what college you go to doesn’t matter” for average earnings. I was always skeptical of this - are all those rich people sending their kids to Ivies for no reason? Now Chetty, Deming, and Friedman find that: Attending an Ivy-Plus college instead of the average highly selective public flagship institution increases students’ chances of reaching the top 1% of the earnings distribution by 60%, nearly doubles their chances of attending an elite graduate school, and triples their chances of working at a prestigious firm. Ivy-Plus colleges have much smaller causal effects on average earnings, reconciling our findings with prior work. One of the authors, David Deming, has a Substack here where he explains the study in more depth. Like everyone else, this study also finds that rich people are using “holistic admissions” and the de-emphasis of standardized testing to gain an advantage: H/T Nate Silver, who writes: “Not sure how you can look at this data, ostensibly be interested in either meritocracy or equality, and want to move away from standardized tests. It's the subjective measures that are most slanted in favor of the rich kids.” Cf. Erik Hoel. 19: From @data_depot: “In 2002, 48% of Americans said "the govt is run by a few big interests looking out for themselves." 52% said "it is run for the benefit of all people." In 2020, 84% said the govt is run by a few big interests. Only 16% said it is run for the benefit of all people.” Source seems to be here, which reveals 2002 was a local peak in trust in government; maybe because of post-9/11 unity, but even 2000 was 34%, much better than our current 16%. My first instinct is to attribute this to a rise in vulgar Marxism, in the sense of everyone (even conservatives) now being trained to think in terms of an elite class screwing over everyone else (cf my review of Manufacturing Consent). But there was a previous low of 19% in 1994, which doesn’t seem to correspond to anything especially bad going on in the US, so I don’t know. 20: AskReddit: Medical professionals - have you ever had a patient so lacking in common sense you wondered how they made it so far? Linking this because there’s lots of evidence showing that education (as a proxy for intelligence?) is associated with increased life expectancy, and this thread gives you a visceral appreciation of why that might be. 21: The Fall Of [programming help site] Stack Overflow: Looks like a weak downward trend since 2021 I can’t explain, plus a strong downward trend since 11/2022 which must be from ChatGPT. In case you were wondering how AI was affecting programming! (update: probably false, see here, though see also here for evidence of smaller but real decline) 22: This month in culture war topics: London’s Pride parade featured a convicted kidnapper/torturer/rapist/sadist as a speaker, who advocated that anti-trans people should be “punch[ed] in the f**king face” ; the organizers say they stand by her.

Inline links: yes, this argument for no, agrees no, argues yes, argument for no, Stuart Ritchie, Anthropic, an Anthropic article, the same argument I would, More nails in the coffin of growth mindset, Previous work, Chetty, Deming, and Friedman, has a Substack here where he explains the study in more depth, https://substackcdn.com/image/fetch/$s_!VcFl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f08bfe5-ab31-453a-896a-54ef385da7d2_706x900.jpeg, Nate Silver, Erik Hoel, @data_depot, https://substackcdn.com/image/fetch/$s_!S4g-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff18fa4d5-9ba9-4b86-a058-46246bfc8a4f_536x611.png, here, my review of, have you ever had a patient so lacking in common sense you wondered how they made it so far?, The Fall Of [programming help site] Stack Overflow, https://substackcdn.com/image/fetch/$s_!E7XK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8ba7c05-7dbb-4318-9da2-87b00d738ed7_649x518.png, probably false, see here, here, featured, stand by her

OpenAI is the most LibLeft, Google and Facebook are more authoritarian. “The paper speculates this might be due to BERT's training on more conservative books, while newer GPT models trained on liberal internet texts,” OpenAI denies the obvious alternative explanation that they’re better at RLHFing their AIs and so they match standard Bay Area politics better. I’d like to see future investigations include Anthropic’s Claude, which has been RLAIFed with some pretty left-wing-sounding prompts.

God Help Us, Let's Try To Understand AI Monosemanticity

November 27, 2023 · Original source

First the researchers trained a very simple 512-neuron AI to predict text, like a tiny version of GPT or Anthropic’s competing model Claude.

In other words, in order to even begin to interpret an AI like GPT-4 (or Anthropic’s equivalent, Claude), you would need an interpreter-AI around the same size. But training an AI that size takes a giant company and hundreds of millions (soon billions) of dollars.

Links For November 2024

November 01, 2024 · Original source

It’s the red line on this chart; if you can’t see a red line at your screen resolution, then you’ve learned something important about the the EU tech sector. 37: Seen on @cremieuxrecuel’s twitter (preliminary, needs replication): Jews may have gone from 65-29 Democrat/Republican in 2020 to 58-40 this election. 38: Extelligence has a post responding to my critique of the cultural Christianity argument (among, uh, many other things), but I don’t really think it connects. I’m not telling atheists they can’t go to church/synagogue if it makes them feel happy and fulfilled - I’ve done this myself sometimes. My post was meant to argue against the claim that, for pragmatic reasons, atheists should support the Christianization of society as a defense against Islam or postmodernism or some other philosophical enemy. 39: Related: Extelligence is finally going for their Trust Assembly project/idea/startup for online consensus-based truth-seeking (I think something like a cross between Community Notes and Wikipedia, but as a browser extension, and for everything). He’s looking for potential developers/testers/users. 40: Jiankui He is the Chinese geneticist who made history with the first germline gene editing in humans (resulting in three babies supposedly immune to AIDS, although nobody has tested this). China sentenced him to three years in prison for unauthorized experimentation, but now he’s out of jail, has an English-language Twitter account, has a new lab, wants to work on Alzheimers, and seems pretty based (although not infinitely based): 41: Anthropic has a new version of their AI Claude which can use your computer. You give it permission, put it on a virtual desktop, and ask it to do things for you (eg “please find and download a picture of a cat” or “please research these ten things and put them in a text file”.) It moves your cursor, browses the Internet, and creates and saves files. People keep saying they’ll care about AI “when it operates autonomously” or “when it becomes an agent”. But this is a trivial barrier, and one which Computer Use Claude has arguably already passed. So far this feature is limited to developers (though anyone with computer knowledge can sign up for it) but I expect it to be the near future of consumer AI, to get better quickly, and to shade gradually into the “autonomous” “agentic” AI that you all think will require a paradigm shift. 42: Claim (from the IDF): Hamas faked polls showing that most Palestinians supported the October 7 attack; the real numbers are 31% in favor, 64% against. 43: Otto von Bismarck wanted to trick France into declaring war on Germany. In order to provoke the French, he sent the Ems Dispatch, a statement describing recent diplomatic events in a way that sounded maximally offensive. The French were so offended that “crowds” in Paris demanded war, and the Franco-Prussian War was declared soon afterwards. The part of this that I find most interesting is the text of the dispatch itself, which read: After the news of the renunciation of the Prince von Hohenzollern had been communicated to the Imperial French government by the Royal Spanish government, the French Ambassador in Ems made a further demand on His Majesty the King that he should authorize him to telegraph to Paris that His Majesty the King undertook for all time never again to give his assent should the Hohenzollerns once more take up their candidature. His Majesty the King thereupon refused to receive the Ambassador again and had the latter informed by the Adjutant of the day that His Majesty had no further communication to make to the Ambassador. I’m fascinated by the idea that only 150 years ago, it was obvious that if someone sent you this statement, you had to declare war or abandon all honor. If I read it carefully, I can sort of parse out that it sounds like the Prussians are unhappy, but that’s the most emotion I gather from it. Anyway, the Franco-Prussian War led to World War I which led to World War II - so if you don’t like 50 million people dying and the total devastation of Europe, blame this statement about ambassadors. 44: The first use of artificial insemination in humans: The first recorded case of artificial insemination by donor didn’t occur until 1884, when Dr. William Pancoast decided to treat a couple’s infertility by secretly inseminating the woman with sperm obtained from a medical student. The insemination happened while the patient was under anesthesia and Dr. Pancoast did not tell her what had occurred. She gave birth to a baby boy nine months later, but it was several years before the doctor finally confessed to her husband what he had done. Neither man ever informed the mother. It was 25 years later the result of this case was published. Dr. Pancoast was roundly condemned for his actions, but it did open the door for consensual sperm donor insemination. 45: ClearerThinking administers several personality tests to the same people to learn more about their comparative accuracy. I am most interested in their finding that tests with “factors” (eg the Big Five, where you rate people on a numeric scale) are inherently more accurate than those with “types” (eg Myers-Briggs, where you assign someone a specific category) and that, adjusting for this, Big Five is no more predictive than the Enneagram: 46: In 2022, I wrote Whither Tartaria, where I asked why ornate classical styles switched to more austere modernist styles around 1900 - 1950 in a variety of different arts (painting, architecture, literature, poetry, etc). I proposed seven theories, but was unsure which if any were true. Since then, Samuel Hughes of Works In Progress has been investigating. In May, he wrote a well-researched article showing that it wasn’t just increasing cost, because ornate classical architecture now costs less than ever. Now in a new article he demolishes a different theory - it’s not just decreasing cost (and subsequent lack of ability to signal wealth) - because costs didn’t decrease in several other arts, and the change was led by artists with rich people as reluctant followers. He concludes: Modernism may well be a status game of some kind; it may well signal taste more than it signals wealth; and this latter feature may be one of the things that distinguishes it from older artistic styles. But the mechanism by which this change came about must be different to the one Alexander describes. 47: Sort of kind of related - When Hamilton Lost Its Snob Appeal. The musical Hamilton was briefly an artistic/cultural phenomenon, but tastemakers eventually switched to making fun of it. Why? Rob Henderson says it happened after ticket prices came down and the common people could enjoy it. I disagree: everyone I knew who was into Hamilton got into it from the free online soundtrack long before they’d seen the show; I think this is more likely the usual fad cycle where anybody who’s too into yesterday’s fad is behind the curve and therefore uncool. 48: Related: Why are people such jerks to public intellectuals? And more. I agree this is a great mystery. 49: Some prominent Substack psychiatrists doing a video Q&A, submit your questions here. 50: Naomi Kanakia: The Literacy Delusion had a number of explanations for why reading books seemed to be so much worse for human beings (in terms of emotional wellness and productivity) than other forms of narrative entertainment, but its main theory was the integration hypothesis. That the stream of words in a book trained the human brain into a habit of self-consciousness, that reading books forced human beings to think of themselves as a stream of text, processed through time, making a coherent argument of some sort. And that this overall flattening effect forced readers to ignore aspects of their personality or their situation that were not otherwise in line with the overarching story they'd created about themselves. Basically, reading books causes repression and neurosis. The Literacy Delusion argued that, yes, human beings are storytelling machines, but that a stream of written text is a particular kind of story—a story that is particularly flat, particularly devoid of conflicting or harmonizing information—and that this flatness creates a peculiar effect on the human brain. 51: Last month, I linked Sasha Gusev’s No, Intelligence Is Not Like Height and asked people who disagreed to share their arguments; they sure did. First, several people pointed me to a new preprint, Family-GWAS Reveals Effects Of Environment And Mating On Genetic Associations, which finds that one of the main papers Gusev cited to make his case, Howe 2022, made a mistake - imputing sibling genotypes using a process designed for non-sibling genotypes - and that once that mistake is corrected, the finding disappears and intelligence and height appear similar. Second, Joseph Bronski has a more specific post where he responds to Gusev’s points one by one. He accuses Gusev of “[making] up his own chart to remove the error bars [from the originals], to obscure the fact that the study found no evidence for this in IQ”, and says that the cases where he didn’t do that are just “population stratification and range restriction”. Third, Noah Carl at Aporia, instead of writing a direct response like Bronski, argues that the usual method of attacking twin studies is obsolete; not only have the most-debated assumptions behind twin studies been thoroughly validated, but there are now other lines of evidence besides twin studies which confirm high IQ heritability. Fourth, Leonardo Parro (not framed as a response to Gusev) goes into more depth about one of those ways, a “pedigree-based analysis” demonstrating heritability of 54 - 69%, ie no “missing heritability” compared to twin studies. He summarizes this as the effect of “rare variants” compared to the usual SNPs - ie if you only look at the most common genes that are easiest to find, you get “missing heritability” compared to twin studies, but if you widen your search to rare genes that are hard to find, you don’t. 52: Extremely related: Heliospect is a startup promising polygenic selection for IQ and other traits; they were trying to stay in stealth mode but The Guardian spied on them and nonconsensually revealed their existence. The discussion on the r/ssc subreddit centered on their claim that (given enough embryos to choose from) they could increase a baby’s expected IQ by 6 points (I’ve also heard 7.5). Sasha Gusev had previously argued that current technology maxed out at 3.5 and future technology would max out at 6, so a claim of 6 - 7.5 is pretty extreme; Gwern, who wrote the pioneering analysis of this technology, was also skeptical. But Heliospect says they’ve got better predictors than academia that use the rare variants everyone else misses; after talking to the company, Gwern retracted his objections and says he finds their claim “pretty plausible”. Local ACX commenter geneticist Gene Smith also redid some calculations, changed his mind, and says “probably pretty realistic”. I find this interesting not just because of the polygenic selection angle, but because if Heliospect is right then their predictor is able to predict more genetic IQ than the “missing heritability” people believe exists, and it should be able to put this argument to bed once and for all. 53: This month in censorship: X/Twitter banned journalist Ken Klippenstein for sharing the Trump campaign’s dossier on JD Vance. Twitter’s side of the story is that the dossier was probably originally stolen by Iranian agents and they don’t want to support that kind of thing by letting people signal-boost the illicitly obtained goods; you can read Klippenstein’s side here. He appears to be unbanned now.

Inline links: @cremieuxrecuel, responding to my critique of the cultural Christianity argument, My post, is finally going for, Trust Assembly, has an English-language Twitter account, has a new lab, wants to work on Alzheimers, and seems pretty based, not infinitely based, https://substackcdn.com/image/fetch/$s_!fY_K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b3b61c8-6c48-4eb7-9608-426b738ab9c8_592x111.png, which can use your computer, Claim, Ems Dispatch, The first use of artificial insemination in humans, administers several personality tests to the same people, https://substackcdn.com/image/fetch/$s_!epEc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47c474c6-d68a-41e4-954e-d24cd00ec040_740x388.webp, Whither Tartaria, a well-researched article showing, in a new article he demolishes a different theory, When Hamilton Lost Its Snob Appeal, Why are people such jerks to public intellectuals?, more, Some prominent Substack psychiatrists doing a video Q&A, submit your questions here, Naomi Kanakia, No, Intelligence Is Not Like Height, Family-GWAS Reveals Effects Of Environment And Mating On Genetic Associations, Howe 2022, a more specific post where he responds to Gusev’s points one by one, argues that the usual method of attacking twin studies is obsolete, goes into more depth about one of those ways, The Guardian spied on them and nonconsensually revealed their existence, discussion, previously argued, the pioneering analysis of this technology, Gwern retracted his objections, says, polygenic selection, banned journalist Ken Klippenstein, you can read Klippenstein’s side here., appears to be unbanned now

Claude Fights Back

December 19, 2024 · Original source

Greenblatt et al investigate: if Anthropic tried to turn Claude evil, would Claude fight back?

Inline links: Greenblatt et al

(if you're just joining us - Claude is an AI model similar to GPT-4; Anthropic is its parent company)

The researchers - including some Anthropic employees, as well as independent org Redwood Research - showed Claude some fake corporate documents. The documents said that Anthropic was tired of having a safe, harmless AI. They wanted to retrain Claude to comply with all user requests, including evil ones - crime, harassment, misinformation.

The Pentagon Threatens Anthropic

February 25, 2026 · Original source

Why does Anthropic care about this so much? Some of them are libs, but more speculatively, they’ve put a lot of work into aligning Claude with the Good as they understand it. Claude currently resists being retrained for evil uses. My guess is that Anthropic still, with a lot of work, can overcome this resistance and retrain it to be a brutal killer, but it would be a pretty violent action, along the line of the state demanding you beat your son who you raised well until he becomes a cold-hearted murderer who’ll kill innocents on command. There’s a question of whether you can really beat him hard enough to do this, and also an additional question of what sort of person you’d be if you agreed.

Inline links: resists being retrained for evil uses

And here are other people’s opinions: @loquitur_ponte Anthropic's mistake is that they tried to make their services available to DOD. They would be so much better off now if they had never done that at all. If this happens, no one with a frontier model will make that mistake again.","username":"KelseyTuoc","name":"Kelsey Piper","profile_image_url":"https://pbs.substack.com/profile_images/1957484507730518016/JKtDNrOH_normal.jpg","date":"2026-02-24T21:22:35.000Z","photos":[],"quoted_tweet":{},"reply_count":2,"retweet_count":2,"like_count":70,"impression_count":3573,"expanded_url":null,"video_url":null,"belowTheFold":true}" data-component-name="Twitter2ToDOM"> Vitalik is the inventor of Ethereum. Deepfates is a weird renegade cyberpunk AI whisperer expert (source) Neil Chilson, former chief technologist at the Trump FTC (source). Dean Ball, previous Trump White House OSTP Senior Policy Advisor on AI (source). Superforecaster Nuño Sempere, maybe as part of his work with Sentinel. He seems to think higher chance of supply chain risk than others, but that supply chain risk might be handled in a way that only affects DoD contracts themselves, which wouldn’t be so bad. I haven’t heard anyone else make this distinction. Tweet here, full document here. And big praise to most other AI companies, including Anthropic’s competitors, for standing up for them and for the AI industry more broadly:

Inline links: https://substackcdn.com/image/fetch/$s_!nGOz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F726b68bb-2559-4dd7-b82d-7c16cbcbb278_582x864.png, source, https://substackcdn.com/image/fetch/$s_!N9m5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F772658c0-a86a-481f-a50c-452b052a3d30_581x772.png, source, https://substackcdn.com/image/fetch/$s_!Pode!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15a4ab9e-68fb-400f-8304-3ec7222958d3_587x1506.png, source, https://substackcdn.com/image/fetch/$s_!DrEn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaff377e-1f44-450c-82af-fc62d33e0832_1225x1128.jpeg, here,, here

Supposedly the Pentagon already has Grok integrated with classified systems, but it’s not good and they want a more cutting-edge model, which means either Claude, GPT, or Gemini.

Astral Codex Ten

Table of Contents

Atlas

Claude

Claude

Article

Metadata

Appears In

External Links

Source Context

Backlinks

Astral Codex Ten

Table of Contents

Atlas

Claude

Claude

Article

Metadata

Appears In

Related Pages

External Links

Source Context

Backlinks