Jan Leike
Article
Jan Leike is a recurring person in the Astral Codex Ten archive, appearing 4 times across 4 issues between July 06, 2023 and February 05, 2026. The archive places it in contexts such as “the current alignment team led by Jan Leike”; “Former team lead Jan Leike has since moved to OpenAI’s competitor Anthropic”; “Jan Leike (formerly of OpenAI’s second alignment team) has a post on his blog”. It most often appears alongside OpenAI, Anthropic, Sam Altman.
Metadata
- Category: People
- Mention count: 4
- Issue count: 4
- First seen: July 06, 2023
- Last seen: February 05, 2026
Appears In
Related Pages
-
- OpenAI (4 shared issues)
-
- Anthropic (3 shared issues)
-
- Sam Altman (3 shared issues)
-
- Zvi (3 shared issues)
-
- Aaron (2 shared issues)
-
- Aella (2 shared issues)
-
- ChatGPT (2 shared issues)
-
- Claude (2 shared issues)
-
- FDA (2 shared issues)
-
- Freddie DeBoer (2 shared issues)
-
- Ilya Sutskever (2 shared issues)
-
- Russia (2 shared issues)
External Links
Source Context
Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.
OpenAI announces Superalignment, a major investment into alignment research which will include co-founder and Chief Scientist Ilya Sutskever, the current alignment team led by Jan Leike, and “20% of the compute we’ve secured to date”. At least for me, this is strong evidence that they really care about alignment and aren’t just posturing; this is more resources than would be worth spending on a posture. They’re also hiring for various alignment-related positions; see the link above for more details. And LW discussion here.
Inline links: Superalignment, LW discussion here
Second, OpenAI's AI safety team recently quit en masse in protest (remember, this is the second time this has happened), with one member citing “a process of trust [in Sam Altman] collapsing bit by bit, like dominoes falling one by one”. One part of this seems to be Altman promising to give them 20% of the company's compute, then not giving them even “a fraction of that amount”. Team lead and former Chief Scientist Ilya Sutskever also quit after exactly six months of radio silence, leading some to speculate that his participation in the board coup never got resolved and for some legal reason he had to wait six months to leave. Former team lead Jan Leike has since moved to OpenAI’s competitor Anthropic; here’s the prediction market on where Ilya will end up.
Inline links: citing, then, also quit, moved to OpenAI’s competitor Anthropic, the prediction market
Maybe morality is incoherent at sufficiently high power levels, and it ends up doing incoherent things. The Chain Of Command Should Prioritize The Average Person Jan Leike (formerly of OpenAI’s second alignment team) has a post on his blog, A Proposal For Importing Society’s Values. The idea is: Identify some interesting questions that AIs might encounter
30: Related: Jan Leike (former head of alignment at OpenAI, now at Anthropic) writes that Alignment Is Not Solved But Increasingly Looks Solvable. His argument is: we’re doing a pretty good job aligning existing AIs. Although aligning superintelligence is a harder problem, Jan thinks that if we’re really confident in existing AIs, then we can use some slightly-less-than-superintelligent AI as an automated alignment researcher, throw thousands of effective researcher-years into the problem in a few months, and probably make good progress. I agree this is the best hope, but it both assumes that our current forms of alignment is deep rather than shallow, and that there’s some “golden middle” where the AIs are both simple enough to be fully-alignable and smart enough to do useful superalignment research. Related: OpenAI hires Dylan Scandinaro as Head of Preparedness; seems like a good, serious choice.
Inline links: Alignment Is Not Solved But Increasingly Looks Solvable, hires
31: Related: Dario Amodei essay on The Adolescence of Technology. Mixed reactions from Zvi, Ryan, Oliver, and Transformer. This and the framing of their recent “Hot Mess” paper seem like Anthropic trying to distance themselves from concerns about systematically misaligned and power-seeking AI in favor of an “industrial accident” threat model. I don’t know if this is their heartfelt position based on all the extra private evidence they no doubt have by now, a well-intentioned PR attempt to sanewash themselves and sell alignment to a doomer-skeptical government/public, part of a balance between more and less doomerish factions, or a newly-ultra-successful tech company learning to talk its book, but it doesn’t line up with what the smartest people I know conclude using the public evidence, and it makes me nervous. I think Jan Leike’s post above does a better job balancing the reassuringness of the current evidence for the tractability of the infrahuman regime vs. the fact that we still don’t know what happens around highly-effective agency and superintelligence.