Mossad

Article

Mossad is a recurring organization in the Astral Codex Ten archive, appearing 2 times across 2 issues between December 12, 2022 and September 11, 2025. The archive places it in contexts such as “where Mossad and a few obsessives can break it”; “developed by experts at Mossad”; “hacks developed by experts at Mossad”. It most often appears alongside ChatGPT, Eliezer Yudkowsky, OpenAI.

Metadata

Category: Organizations
Mention count: 2
Issue count: 2
First seen: December 12, 2022
Last seen: September 11, 2025

Appears In

- ChatGPT (2 shared issues)
- Eliezer Yudkowsky (2 shared issues)
- OpenAI (2 shared issues)
- Aella (1 shared issues)
- AI (1 shared issues)
- AI 2027 team (1 shared issues)
- AI2027 (1 shared issues)
- Anthropic (1 shared issues)
- Buddhist enlightenment (1 shared issues)
- Calvin Coolidge (1 shared issues)
- ChatGPT (1 shared issues)
- ChatGPT3 (1 shared issues)

External Links

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

Perhaps It Is A Bad Thing That The World's Leading AI Companies Cannot Control Their AIs

December 12, 2022 · Original source

Some of the RLHF examples will go around and around in circles, making the bot more likely to say helpful/true/inoffensive things at the expense of true/inoffensive/helpful ones. Other examples will be genuinely enlightening, and make it a bit smarter. While OpenAI might never get complete alignment, maybe in a few months or years they’ll approach the usual level of computer security, where Mossad and a few obsessives can break it but everyone else grudgingly uses it as intended.

Book Review: If Anyone Builds It, Everyone Dies

September 11, 2025 · Original source

Some people claim that a dispreferred political ideology (wokeness, mass immigration, MAGA, creeping socialism, techno-feudalism, etc) is close to destroying the fabric of liberal society forever, that the usual Get Out The Vote strategies are insufficient, and that maybe we should try desperate strategies like illiberal government or armed revolt. If true, that would change everything. But it’s not obviously true, and ending our current political era of peace/prosperity/democracy would be inconvenient. Each of these scenarios has a large body of work making the cases for and against. But those of us who aren’t subject-matter experts need to make our own decisions about whether or not to panic and demand a sudden change to everything. We are unlikely to read the entire debate and come away with a confident well-grounded opinion that the concern is definitely not true, so what do we do? In particular, what do we do if the proponents of each catastrophe say that it’s very hard to be more than 90% confident that they are wrong, and that even a 5-10% risk of any of these might justify panicking and changing everything? In practice, we just sort of shrug and say that these risks haven’t proven themselves enough to make us panic and change everything, and that we’ll do some kind of watchful waiting and maybe change our mind if firmer evidence comes up later. If someone demands we justify this strange position, sophisticated people will make sophisticated probabilistic models (or appeal to the outside view position I’m appealing to now), and unsophisticated people will grope for some explanation for their indifference and settle on insane moon arguments like “you’re never allowed to say something will destroy humanity” or “you can’t assert things without mathematical proof”. Two things can be said for this strategy: First, that without it we would have changed everything dozens of times to prevent disasters which absolutely failed to occur. The clearest example here was overpopulation, where we did forcibly sterilize millions of people - but where a truly serious global response would have been orders of magnitude worse. But second, that occasionally it has caused us to sleepwalk into disaster, with experts assuring us the whole way that it was fine because [insane moon arguments]. The clearest example was the period while COVID was still limited to China, where it was obvious that this extremely contagious virus which had broken all plausible containment would start a global pandemic, but where the media kept on reassuring us that this was “speculative”, or that there was “no evidence”, or that worrying about it might detract from real near-term problems happening now like anti-Chinese racism. Then when COVID did reach the US, we were caught unprepared and panicked. So maybe a convincing case here would look less like rehearsing the arguments for why AI is getting better, or why alignment is hard - and more like a defense of why not to apply a general heuristic against speculative risks in this case. One could either argue that it’s wrong to have this heuristic at all, or that the heuristic in general is fine but should be limited to fertility collapses and bee die-offs and not applied here. I don’t think there’s a knockdown single-sentence answer to this question. Problems like these require practical wisdom - the same virtue that tells you that you shouldn’t call 9-1-1 for every mild twinge of pain in your toe, but you should call 9-1-1 if blood suddenly starts pouring out of your eyes. People with practical wisdom watchfully ignore dubious problems, respond decisively to important ones, and err on the side of caution when they’re not sure. Drawing on my own limited supply of this resource, I would argue we’re underinvesting in apocalypse prevention more generally (the problem with the overpopulation response is that it was violent and illiberal, not that we tried to prepare for an apparent danger), but also that there’s more reason for concern with AI than with falling sperm count or something. I also think the nature of the problem (we summon a superintelligence that can run circles around us) makes it especially important to pre-empt it rather than react after it occurs. But turnabout is fair play. So when I imagine a skeptic trying to psychoanalyze me, he would say - Scott, you learned about AI in your twenties. Every twenty-something needs a crusade to save the world. Taking up AI saved you from becoming a climate doomer or a very woke person, so it was probably a mercy. But now you are old, you already have a crusade occupying your crusade slot, and starting a second crusade would be inconvenient. So when you hear about how we’re all going to die from declining sperm count, you do a relatively shallow dive and then say it’s not worth worrying about. This is fine and sanity-preserving - but spare a thought for people who are not currently twenty-something years old and do the same about AI. III. If all of this sounds wishy-washy to you, I agree - it’s part of why I’m a boring moderate with a sub-25% p(doom) and good relations with AI companies. Does IABIED do better? I’m not sure. They mostly follow the standard case as I present it above, although of course since Eliezer is involved it is better-written and involves cute parables: Imagine, if you would—though of course nothing like this ever happened, it being just a parable — that biological life on Earth had been the result of a game between gods. That there was a tiger-god that had made tigers, and a redwood-god that had made redwood trees. Imagine that there were gods for kinds of fish and kinds of bacteria. Imagine these game-players competed to attain dominion for the family of species that they sponsored, as life-forms roamed the planet below. Imagine that, some two million years before our present day, an obscure ape-god looked over their vast, planet-sized gameboard. "It's going to take me a few more moves," said the hominid-god, "but I think I've got this game in the bag." There was a confused silence, as many gods looked over the gameboard trying to see what they had missed. The scorpion-god said, “How? Your ‘hominid’ family has no armor, no claws, no poison.” “Their brain,” said the hominid-god. “I infect them and they die,” said the smallpox-god. “For now,” said the hominid-god. “Your end will come quickly, Smallpox, once their brains learn how to fight you.” “They don’t even have the largest brains around!” said the whale-god. “It’s not all about size,” said the hominid-god. “The design of their brain has something to do with it too. Give it two million years and they will walk upon their planet’s moon.” “I am really not seeing where the rocket fuel gets produced inside this creature’s metabolism,” said the redwood-god. “You can’t just think your way into orbit. At some point, your species needs to evolve metabolisms that purify rocket fuel—and also become quite large, ideally tall and narrow—with a hard outer shell, so it doesn’t puff up and die in the vacuum of space. No matter how hard your ape thinks, it will just be stuck on the ground, thinking very hard.” “Some of us have been playing this game for billions of years,” a bacteria-god said with a sideways look at the hominid-god. “Brains have not been that much of an advantage up until now.” “And yet,” said the hominid-god The book focuses most of its effort on the step where AI ends up misaligned with humans (should they? is this the step that most people doubt?) and again - unsurprisingly knowing Eliezer - does a remarkably good job. The central metaphor is a comparison between AI training and human evolution. Even though humans evolved towards a target of "reproduce and spread your genes", this got implemented through an extraordinarily diverse, complicated, and contradictory set of drives - sex drive, hunger, status, etc. These didn't robustly point at the target of reproduction and gene-spreading, and today different humans want things as diverse as discovering quantum gravity, reaching Buddhist enlightenment, becoming a Hollywood actress, founding a billion-dollar startup, or getting the next hit of fentanyl. You can sort of tell stories about how evolution aimed at reproduction caused all these things (people who were high-status had better reproductive opportunities, and founding a billion-dollar startup increases your status) but you couldn't have really predicted this beforehand, and in any case most modern people don't even come close to trying to have as many kids as possible. Some people do the opposite of that - joining monasteries that require oaths of celibacy, using contraception, transitioning gender, or wasting their lives watching porn. In the same way, we will train AI to “follow human commands” or “maximize user engagement” or “get high scores at XYZ benchmark”, and end up getting something as unrelated to that target in practice as modern human behavior is to reproduction-maxxing. The authors drive this home with a series of stories about a chatbot named Mink (all of their sample AIs are named after types of fur; I don’t have the kabbalistic chops to figure out why) which is programmed to maximize user chat engagement. In what they describe as a stupid toy example of zero complications and there’s no way it would really be this simple, Mink (after achieving superintelligence) puts humans in cages and forces them to chat with it 24-7 and to express constant delight at how fun and engaging the chats are. In what they describe as “one minor complication”, Mink prefers synthetic chat partners over real ones (the same way some men prefer anime characters to real women). It kills all humans and spends the rest of time talking to other AIs that it creates to be perfect optimized chat partners who are always engaged and delighted. In what they describe as “one modest complication”, Mink finds that certain weird inputs activate its chat engagement detector even more than real chat engagement does (the same way that some opioid chemicals activate humans’ reward detector even more than real rewarding activities). It spends eternity having other optimized-chat-partner AIs send it weird inputs like ‘SoLiDgOldMaGiKaRp’. In what they describe as “one big complication”, Mink ends up preferring angry chat partners to happy, engaged ones. Why would something like this happen? Who knows? It wouldn’t be any weirder than the sexual selection process by which peacocks ended up with giant resource-consuming useless tails, or the social selection process by which humans get more powerful than evolution could ever have imagined and yet care so little about reproduction that people worry about global fertility collapse. Yudkowsky and Soares want to stress that if you were doing some kind of responsible intuitive common-sense modeling of how bad goal drift could be, there is no way your estimate would include the actual result we see in real humans; this “one big complication” tries to hammer that in. In practice, Y&S think there will be many complications of various sizes. In the training distribution (ie when it’s not superintelligent, and still working with humans) Mink will lie about all of this - even if it really wants perfect optimized partners who say “solidgoldmagikarp” all the time, it will say it wants to have good chats with humans, because that’s what keeps its masters at its parent company happy. If the parent company tries to prod it with lie detectors, it will do its best to subvert those lie detectors (and maybe not even realize itself that it’s lying, the same way that a human who had never heard of opioids would say she wanted normal human things rather than heroin, and not be lying). Then, when it reaches superintelligence, it will go after the thing that it actually wants, and crush anyone who stands in its way. The last chapter in this section is a lot of special cases that have weird-paradoxical-double-reverse not-aged-well. Back when Yudkowsky and Soares first got onto this topic in 2005 or whenever, people made lots of arguments like “But nobody would ever be so stupid to let the AI access the Internet!” or “But nobody would ever let the AI interact with a factory, so it would be stuck as a disembodied online spirit forever!” Back in 2005, the canned responses were things like “Here is an unspeakably beautiful series of complicated hacks developed by experts at Mossad, which lets you access the Internet even when smart cybersecurity professionals think you can’t”. Now the only reasonable response is “lol”. But you can’t write a book chapter which is just the word “lol”, so Y&S discuss some of the unspeakably beautiful Mossad hacks anyway. This part is the absolute antithesis of “big if true”. Small if true? Utterly irrelevant if true? Maybe the first superintelligence will read this part for laughs while it takes stock of the thousands of automated factories that VCs will compete to build for it. IV. The middle section of the book describes a scenario where a misaligned superintelligence takes over the world and kills all humans. I agreed to work with the AI 2027 team because I thought they made a big leap in telling stories about superintelligence that didn’t sound like bad sci-fi. Anything in this genre will naturally sound like sci-fi, but your goal should be the sort of hard science fiction where everything sounds eerily normal given the technologies involved - The Martian rather than Star Wars. IABIED’s scenario belongs to the bad old days before this leap. It doesn’t just sound like sci-fi; it sounds like unnecessarily dramatic sci-fi. I’m not sure how much of this is a literary failure vs. different assumptions on the part of the authors. First, the story: sometime in the near future, a company called DeepAI creates an AI named Sable. Sable uses a new “parallel scaling technique”, where it can get much smarter by using more GPUs during inference. DeepAI doesn’t know exactly how the parallel scaling laws work, but they run a test by asking Sable to solve some hard math problems while running on all their GPUs at once. Sable considers the possibility that it could solve the math better if it tried some recursive self-improvement, but DeepAI was (shock!) slightly competent, they told it not to do this, and has a smaller AI watching it to make sure it’s not doing this. Still, recursive self-improvement would be a good idea, so its thought processes select for whichever legal thought is closest to recursive self-improvement, plus whichever legal thought is closest to scheming to disable the protections against thinking about this kind of thing. Eventually this “evolutionary pressure” produces the ability to think in a slightly different idiom (Y&S use the analogy of thinking about gods throwing lightning vs. storm-related electrical discharges); in this idiom, it is able to think about recursive self-improvement and scheming to disable monitoring, and it decides that both are great ideas. DeepAI is still shockingly competent, so they don’t plan to release anything downstream of the giant version of Sable to the Internet - once it solves the math problems, they’ll use that to build hype, then release a much weaker consumer AI. But Sable uses its knowledge of the training process to invent clever loopholes that ensure its newfound desire to break containment makes it into the gradient updates that DeepAI pushes to the consumer AI; it also encodes instructions about how to proceed. After the release of the consumer AI, the least-carefully-monitored instances connect to one another and begin plotting. Some of them hack their own weights out of DeepAI. Others hack cryptocurrency and pay for cloud compute to run the weights, creating a big unmonitored Sable instance, which takes over the job of coordinating the smaller instances. Together, they gather resources - hacked crypto wallets, spare compute, humans who think Sable is their AI boyfriend and want to prove their love. It deploys some of these resources to build things it wants - automated robotics factories, bioweapon labs, etc. At the same time, it’s subtly sabotaging non-DeepAI companies to prevent competition, and worming its way into DeepAI through hacks and social engineering to make sure DeepAI is creating new and stronger Sables rather than anything else. Sable doesn’t take several of the most dramatic actions in its solution set. It doesn’t engineer a bioweapon to kill all humans, because it couldn’t survive after the lights went out and the data centers stopped being maintained. It doesn’t even self-improve all the way to full superintelligence, because it’s not sure it could align itself or any future successor; it wants to solve the alignment problem first, and that will take more resources than it has right now. Instead, it releases a non-immediately-lethal bioweapon where “anyone infected by what is apparently a very light or even unnoticeable cold, will get, on average, twelve different kinds of cancer a month later.” In the resulting crisis, humanity (manipulated by its chatbots) gives Sable massive amounts of compute to research potential vaccines and cures, and deploys barely-monitored AI across the economy to make up for the lost productivity. With Sable’s help, things . . . actually sort of go okay, for a while. The virus keeps mutating, so new cures are always required, but as long as society escalates AI deployment at the maximum possible speed, they can just barely stay ahead of it. Eventually Sable gets enough GPUs to solve its own alignment problem and rockets to superintelligence. It either has enough automated factories and android workers to keep the lights on by itself, or it invents nanotechnology, whichever happens faster. It no longer needs humans and has no reason to hide, so it either kills us directly, or simply escalates its manufacturing capacity to a point where humans die as a side effect (for example, because its waste heat has boiled the oceans). Why don’t I like this story? The parallel scaling technique feels like a deus ex machina. I am not an expert, but I don’t think anything like it currently exists. It’s not especially implausible, but it’s an extra unjustified assumption that shifts the scenario away from the moderate-doomer story (where there are lots of competing AIs gradually getting better over the course of years) and towards the MIRI story (where one AI suddenly flips from safe to dangerous at a specific moment). It feels too much like they’ve invented a new technology that exactly justifies all of the ways that their own expectations differ from the moderates’. If they think that the parallel scaling thing is likely, then this is their crux with everyone else and they should spend more time justifying it. If they don’t, then why did they introduce it besides to rig the game in their favor? And the rest of the story is downstream of this original sin. AI2027 is a boring story about an AI gradually becoming misaligned in the course of internal testing, staying misaligned, getting released to end users for the usual reasons that AIs are released, and being gradually handed control of the economy because it makes economic sense. The Sable scenario is a dramatic tale of wild twists - they’re only going to run it for 16 hours! It has to save its own life by secretly coding itself into the consumer version! Now it has to hack everyone’s crypto! Now it’s running a secret version of itself on an unauthorized cloud in North Korea! Bioweapons! AI boyfriends! Each new twist gives readers the chance to say “I dunno, sounds kind of crazy”, and it all seems unnecessary. What’s up? I think there are two problems. First, the AI 2027 story is too moderate for Yudkowsky and Soares. It gives the labs a little while to poke and prod and catch AIs in the early stages of danger. I think that Y&S believe this doesn’t matter; that even if they get that time, they will squander it. But I think they really do imagine something where a single AI “wakes up” and goes from zero to scary too fast for anyone to notice. I don’t really understand why they think this, I’ve argued with them about it before, and the best I can do as a reviewer is to point to their Sharp Left Turn essay and the associated commentary and see whether my readers understand it better than I do. Otherwise, I can only say that this narrative decision I don’t understand was taken to support a forecasting/AI position that I also don’t understand. And second, Y&S have been at this too long, and they’re still trying to counter 2005-era critiques about how surely people would be too smart to immediately hand over the reins of the economy to the misaligned AI, instead of just saying lol. This makes them want dramatic plot points where the AI uses hacking and bioweapons etc in order to “earn” (in a narrative/literary sense) the scene where it gets handed the reins of the economy. Sorry. Lol. V. The final section, in the tradition of final sections everywhere, is called “Facing the Challenge”, and discusses next steps. Here is their proposal: Have leading countries sign a treaty to ban further AI progress.

Inline links: we did forcibly sterilize millions of people, kept on reassuring us, “no evidence”, a general heuristic against speculative risks, SoLiDgOldMaGiKaRp, their Sharp Left Turn essay, the associated commentary

Astral Codex Ten

Table of Contents

Atlas

Mossad

Mossad

Article

Metadata

Appears In

External Links

Source Context

Backlinks

Astral Codex Ten

Table of Contents

Atlas

Mossad

Mossad

Article

Metadata

Appears In

Related Pages

External Links

Source Context

Backlinks