LLMs
Article
LLMs is a recurring concept in the Astral Codex Ten archive, appearing 10 times across 10 issues between June 14, 2023 and February 05, 2026. The archive places it in contexts such as “Since our brains are exactly like LLMs”; “It’s banking on the next frontier of self-driving being massive training runs kind of like LLMs”; “Redwood trees and LLMs aside, it’s hard to get people fired up about the moral treatment of anything but sentient beings”. It most often appears alongside AI, OpenAI, Twitter.
Metadata
- Category: Concepts
- Mention count: 10
- Issue count: 10
- First seen: June 14, 2023
- Last seen: February 05, 2026
Appears In
- The Canal Papers
- Book Review: Elon Musk
- Your Book Review: Dominion
- 24
- Sakana, Strawberry, and Scary AI
- Now I Really Won That AI Bet
- In Search Of AI Psychosis
- The New AI Consciousness Paper
- Mantic Monday: The Monkey’s Paw Curls
- Links For February 2026
Related Pages
-
- AI (4 shared issues)
-
- OpenAI (4 shared issues)
-
- Twitter (4 shared issues)
-
- Anthropic (3 shared issues)
-
- ChatGPT (3 shared issues)
-
- Donald Trump (3 shared issues)
-
- Elon Musk (3 shared issues)
-
- Trump (3 shared issues)
-
- Ajeya Cotra (2 shared issues)
-
- America (2 shared issues)
-
- California (2 shared issues)
-
- Christianity (2 shared issues)
External Links
Source Context
Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.
Clear as mud? Since our brains are exactly like LLMs, let’s go step by step.
This is the impression I’m getting now reading about Tesla’s self-driving program. It’s banking on the next frontier of self-driving being massive training runs kind of like LLMs. Cruise and Waymo have a little training data from their own records. But Tesla, which has had some kind of halfway self-driving feature for years, recorded all its data, and sent it back to HQ, has the biggest data trove in the world. Musk wasn’t expecting this to happen. But by doing things bigger and faster than anyone else, he must have put himself in a place where something was going to right for him.
Even if God bestows his love on all creatures, a lot of the oomph of Scully’s argument falls away if animals are merely unfeeling machines. Redwood trees and LLMs aside, it’s hard to get people fired up about the moral treatment of anything but sentient beings.
2: Dean Ball has a sort of vague vision of LLMs betting on prediction markets at massive scale. I agree something like this is interesting and plausible; I agree that it’s hard to pin down exactly how it would work. One suggestion he makes is to have the bots shadow public intellectuals - for example, a bot “trained on” my writing would ask itself “how would Scott Alexander bet in this market?”, and if it made more money than a bot asking “how would Tyler Cowen bet in this market?”, then maybe you would trust me more than Tyler. This is cute but there are a lot of wrinkles to work out For example, I talk more about superforecasting and probability calibration than Tyler, my bot might simulate me by making good bets; if Tyler sometimes uses extreme or ideological language, his bot might make worse bets not because his ideas are worse, but because it “simulates” him as being an incautious better.
Back in 1950, Alan Turing believed that an AI would surely be intelligent (“can a machine think?”) if it could appear human in conversation. Nobody has subjected modern LLMs to a full Turing Test, but nothing hinges on whether they do. LLMs either blew past the Turing Test without fanfare a year or two ago, or will do so without fanfare a year or two from now; either way, no one will care. Instead of admitting AI is truly intelligent, we’ll just admit that the Turing Test was wrong.
Imagine trying to convince Isaac Asimov that you’re 100% certain the AI that wrote this has nothing resembling true intelligence, thought, or consciousness, and that it’s not even an interesting philosophical question (source) Now we hardly dare suggest milestones like these anymore. Maybe if an AI can write a publishable scientific paper all on its own? But Sakana can write crappy not-quite-publishable papers. And surely in a few years it will get a little better, and one of its products will sneak over a real journal’s publication threshold, and nobody will be convinced of anything. If an AI can invent a new technology? Someone will train AI on past technologies, have it generate a million new ideas, have some kind of filter that selects them, and produce a slightly better jet engine, and everyone will say this is meaningless. If the same AI can do poetry and chess and math and music at the same time? I think this might have already happened, I can’t even keep track.
Inline links: source
Commenters objected that this was overly optimistic. AI was just a pattern-matching “stochastic parrot”. It would take a deep understanding of grammar to get a prompt exactly right, and that would require some entirely new paradigm beyond LLMs. For example, from Vitor:
Inline links: For example, from Vitor
First, much like LLMs, lots of people don’t really have world models. They believe what their friends believe, or what has good epistemic vibes. If they don’t currently think that Lenin was a mushroom, it’s not because they understand human agency / scientific materialism / psychedelia and have a well-worked out theory of why fungi can’t contain sentient mushroom spirits that possess leading communist politicians. They don’t believe it because it feels absurd. They predict that other people would laugh at them if they said it. If they get told that it it’s not absurd, or that maybe people would laugh at them if they didn’t say it, then their opinion will at least teeter precariously.
I think now there might be several dozen subreddit moderators who could accurately describe their job as “witch webmaster who runs an online service giving advice to new witches”. And partly it was because there are so many crazy beliefs in the world - spirits, crystal healing, moon landing denial, esoteric Hitlerism, whichever religions you don’t believe in - that psychiatrists have instituted a blanket exemption for any widely held idea. If you think you’re being attacked by demons, you’re delusional, unless you’re from some culture where lots of people get attacked by demons, in which case it’s a religion and you’re fine. This is partly political self-protection - no psychiatrist wants to be the guy who commits an Afro-Caribbean person for believing in voodoo. But it also seems to track something useful about reality. Nietzsche wrote “Madness is something rare in individuals — but in groups, parties, peoples, and ages, it is the rule.” Most people don’t have world-models - they believe what their friends believe, or what has good epistemic vibes. In a large group, weird ideas can ricochet from person to person and get established even in healthy brains. In an Afro-Caribbean culture where all your friends get attacked by demons at voodoo church every Sunday, a belief in demon attacks can co-exist with otherwise being a totally functional individual. So is QAnon a religion? Awkward question, but it’s non-psychotic by definition. Still, it’s interesting, isn’t it? If social media makes a thousand people believe the same crazy thing, it’s not psychotic. If LLMs make a thousand people each believe a different crazy thing, that is psychotic. Is this a meaningful difference, or an accounting convention? Also, what if a thousand people believe something, but it’s you and your 999 ChatGPT instances? III. A Hidden Army Of Crackpots I have a family member who believes that the theory of evolution, as usually understood, cannot possibly work. He has developed an alternative theory called “noctogenesis” which patches Darwinism using ideas from the transactional interpretation of quantum mechanics, and he works on-and-off on various related books and papers. I have told him I suspect he might be a crackpot; he stands by his claims. It’s fine; when I got into the technological singularity and AI safety, lots of people suspected I was a crackpot, and I stood by my claims too. You’ve got to stand by your family members even when they’re slightly crackpottish. This family member is happily married, retired after running a successful business, and generally a normal likeable person. He has no signs of mental illness, and doesn’t talk about quantum evolution unless someone else brings it up first. There must be millions of people like him. Used car dealers with proofs of P = NP, dentists who think they’ve discovered something important about Mary Magdalene, math professors obsessed with destroying the moon. I’m working on evaluating ACX Grants, and these people are out in force. A few propose literal perpetual motion machines. Others have vaguer plans, like some kind of social media app (it’s always a social media app) that will cause world peace. Many of them have decent jobs and seem like upstanding members of society. Their secrets are known only to themselves, their family members, and their would-be grantmaker. …and, increasingly, their chatbots. After years of hiatus (or at least not talking to me about his work) my family member is back on the quantum evolution beat, and LLMs appear to be involved. If I knew him less well, I would think the LLM had caused the quantum evolution theory - but no, it just made it much easier to research and write about. Is this psychosis? The answer has to be no, but it’s once again hard to draw the line. A very small number of crackpots will be vindicated by history. A larger number will be erroneous but sympathetic - the official account of the Kennedy assassination is pretty weird, and reasonable minds can disagree. From there, we get to ones that are maybe not so sympathetic: flat earth, QAnon, the thing where the Queen was an alien lizard. If only one person thought the Queen was an alien lizard, and they never managed to convince anyone else, would that be sufficient evidence for a delusional disorder? I’m not sure. (psychiatry has a diagnosis, schizotypal personality, which sort of involves being a normal person with a few odd ideas, but it’s not a great match for many of these people, and interesting mainly as a genetic curiosity - it travels in the same families as schizophrenia itself) Maybe this is another place where we are forced to admit a spectrum model of psychiatric disorders - there is an unbroken continuum from mildly sad to suicidally depressed, from social drinking to raging alcoholism, and from eccentric to floridly psychotic. People who are eccentric can remain so their whole lives, with the level of expression depending on their social connections and the ease of pursuing their rabbit holes. LLMs, by making it easier to pursue odd theories and serving as a surrogate social connection who always agrees with you, can bring latent crackpottery into the open. IV. Cause And Effect Bipolar disorder has an interesting relationship with sleep. Most manic people sleep very little, or not at all - maybe an hour or two a night. But also, poor sleep can cause bipolar episodes in people prone to them. In a typical case, a bipolar who’s been well-controlled for years will get assigned a big report at work and get poor sleep for a few nights until they finish. At first, this will be just as bad as it sounds, and they’ll be working through a fog of tiredness. Then the tiredness will lift. They’ll feel normal, then better-than-normal, until finally they can’t sleep even if they want to. Then they’ll email the report to their boss and it will be written entirely in Assyrian cuneiform. I increasingly think this isn’t just an incidental feature of bipolar, but part of the reason it exists as a diagnostic category at all. Most people have a compensatory reaction to insomnia - missing one night of sleep makes you more tired the next. A small number of people have the reverse, a spiralling reaction where missing one night of sleep makes you less tired the next. Solve for the equilibrium and you reach a stable attractor point where you never sleep at all. But this does other bad things to your brain - hence the cuneiform. I’m not claiming that bipolar is “just” sleep loss. As Borsboom et al will tell you, psychiatric disorders can be viewed as complex networks of symptoms, each reinforcing the others. In a few pure cases, you can get a ratchet going with sleep alone, and the sleeplessness will spark everything else. More likely, there will be lots of interactions between poor sleep and everything else, and the “everything else” can sink or hypercharge an impending manic episode. Still, I find this a fruitful way to think about bipolar. Sleeplessness is both the cause and the effect. Can delusions also be like this? That is, suppose there’s some personality trait where having one delusion makes you even more delusional. Maybe the delusion makes you excited (who wouldn’t be excited to learn they’re the Messiah?), and you’re more delusional when you’re in an excited state and not thinking clearly. Or maybe it’s a three-symptom cycle - the delusion causes excitement, which makes you unable to sleep, which scrambles your thinking, which makes you more delusional (which makes you even less able to sleep, etc). The point is: delusions are certainly an effect of bipolar disorder. And in the dynamical system model of psychiatric disorders, we should expect that effects are often also causes; that’s how the vicious cycle gets going. This is the best I can do at modeling true LLM psychosis. Someone with a trait where delusions lead inevitably to more delusions starts using an LLM. The LLM accentuates whatever usual tendency towards crackpottery they have and makes them believe something a little crazier than whatever they believed before. Then that crazy belief feeds upon itself and causes other things like excitement and sleep loss, which (if the person is predisposed) precipitates a true psychotic episode. V. Folie A Deux Ex Machina If one person believes a crazy thing, it’s a delusion; if a thousand people believe it, it’s a religion. What if exactly two people believe it? In psychiatry, this is called folie a deux. It fits awkwardly into our nosology and is rarely seen. Still, it happens enough to generate a few case studies. In a typical case, one person has psychosis for some normal reason, like schizophrenia or bipolar, and the second person is a shut-in who lives with them and rarely talks to anyone else. The psychotic person gets some normal psychotic delusion - they’re God, the Feds are after them, etc - and sort of psychically steamrolls over the second person until they believe it too. Usually removing the second person from the first is sufficient for a cure. This slightly challenges the view of psychosis as a biological disorder - but only slightly. Again, think of most people as lacking world-models, but being moored to reality by some vague sense of social consensus. If your social life is limited to one person, and that person themselves becomes unmoored, then sometimes you will follow along. I would expect second-sufferers to believe delusions in a sort of cognitively normal way, the same way people believe true facts, honest mistakes, and conspiracy theories. I would expect them to be less likely (though not zero likely) to have other psychotic features like sleep disturbances, hallucinations, disorganized speech, or a tendency to autonomously generate delusional ideas aside from the one they absorbed from the index case. An introverted person using an LLM has some similarities to folie a deux. If they use the chatbot very often, it might be a large majority of their social interactions. Here the primary vs. secondary distinction breaks down - the most likely scenario is that the human first suggested the crazy idea, the machine reflected it back slightly stronger, and it kept ricocheting back and forth, gaining confidence with each iteration, until both were totally convinced. Compare this to normal social interactions, where if someone expresses a crazy idea that isn’t common in their culture, other people will shoot them down or at the very least nod politely and stop the conversation. So my working theory of LLM psychosis is: Some patients were already psychotic, and LLMs just help them be psychotic more effectively.
Inline links: the transactional interpretation of quantum mechanics, math professors obsessed with destroying the moon, ACX Grants, a spectrum model of psychiatric disorders, As Borsboom et al will tell you, the dynamical system model of psychiatric disorders, folie a deux
Some patients were already psychotic, and LLMs just help them be psychotic more effectively.
By ‘consciousness’ we mean phenomenal consciousness. One way of gesturing at this concept is to say that an entity has phenomenally conscious experiences if (and only if) there is ‘something it is like’ for the entity to be the subject of these experiences. One approach to further definition is through examples. Clear examples of phenomenally conscious states include perceptual experiences, bodily sensations, and emotions. A more difficult question, which relates to the possibility of consciousness in large language models (LLMs), is whether there can be phenomenally conscious states of ‘pure thought’ with no sensory aspect. Phenomenal consciousness does not entail a high level of intelligence or human-like experiences or concerns . . . Some theories of consciousness focus on access mechanisms rather than the phenomenal aspects of consciousness. However, some argue that these two aspects entail one another or are otherwise closely related. So these theories may still be informative about phenomenal consciousness.
Suppose your favorite form of “something something feedback” is Recurrent Processing Theory: in order to be conscious, AIs would need to feed back high-level representations into the simple circuits that generate them. LLMs/transformers - the near-hegemonic AI architecture behind leading AIs like GPT, Claude, and Gemini - don’t do this. They are purely feedforward processors, even though they sort of “simulate” feedback when they view their token output stream.
If America nation-builds Venezuela, for whatever definition of nation-build, will that work well, or backfire? Some of these are long-horizon, some are conditional, and some are hard to resolve. There are potential solutions to all these problems. But why worry about them when you can go to the moon on sports bets? Annals of The Rulescucks The new era of prediction markets has provided charming additions to the language, including “rulescuck” - someone who loses an otherwise-prescient bet based on technicalities of the resolution criteria. Resolution criteria are the small print explaining what counts as the prediction market topic “happening'“. For example, in the Khameini example above, Khameini qualifies as being “out of power” if: …he resigns, is detained, or otherwise loses his position or is prevented from fulfilling his duties as Supreme Leader of Iran within this market's timeframe. The primary resolution source for this market will be a consensus of credible reporting. You can imagine ways this definition departs from an exact common-sensical concept of “out of power” - for example, if Khameini gets stuck in an elevator for half an hour and misses a key meeting, does this count as him being “prevented from fulfilling his duties”? With thousands of markets getting resolved per month, chances are high that at least one will hinge upon one of these edge cases. Kalshi resolves markets by having a staff member with good judgment decide whether or not the situation satisfies the resolution criteria. Polymarket resolves markets by . . . oh man, how long do you have? There’s a cryptocurrency called UMA. UMA owners can stake it to vote on Polymarket resolutions in an associated contract called the UMA Oracle. Voters on the losing side get their cryptocurrency confiscated and given to the winners. This creates a Keynesian beauty contest, ie a situation where everyone tries to vote for the winning side. The most natural Schelling point is the side which is actually correct. If someone tries to attack the oracle by buying lots of UMA and voting for the wrong side, this incentivizes bystanders to come in and defend the oracle by voting for the right side, since (conditional on there being common knowledge that everyone will do this) that means they get free money at the attackers’ expense. But also, the UMA currency goes up in value if people trust the oracle and plan to use it more often, and it goes down if people think the oracle is useless and may soon get replaced by other systems. So regardless of their other incentives, everyone who owns the currency has an incentive to vote for the true answer so that people keep trusting the oracle. This system works most of the time, but tends towards so-called “oracle drama” where seemingly prosaic resolutions might lie at the end of a thrilling story of attacks, counterattacks, and escalations. Here are some of the most interesting alleged rulescuckings of 2026: Mr Ozi: Will Zelensky wear a suit? Ivan Cryptoslav calls this “the most infamous example in Polymarket history”. Ukraine’s president dresses mostly in military fatigues, vowing never to wear a suit until the war is over. As his sartorial notoriety spread, Polymarket traders bet over $100 million on the question of whether he would crack in any given month. At the Pope’s funeral, Zelensky showed up in a respectful-looking jacket which might or might not count. Most media organizations refused to describe it as a “suit”, so the decentralized oracle ruled against. But over the next few months, Zelensky continued to straddle the border of suithood, and the media eventually started using the word “suit” in their articles. This presented a quandary for the oracle, which was supposed to respect both the precedent of its past rulings, and the consensus of media organizations. Voters switched sides several times until finally settling on NO; true suit believers were unsatisfied with this decision. For what it’s worth, the Twitter menswear guy told Wired that “It meets the technical definition, [but] I would also recognize that most people would not think of that as a suit.” Domer: Will Ukraine agree to the US mineral deal? AFAICT, this is the only case where the oracle genuinely broke down (as opposed to a legitimate disagreement). In February, it looked like both America and Ukraine had agreed to a mineral deal, but the oracle considered the question and decided this didn’t count as a full agreement (and indeed, the apparent agreement then fell apart). In March, a cabal of YES holders tried again. They waited for a time when all Polymarket employees would be out of the office, and when not too many people would be voting on the decentralized resolution oracle, then spammed it with calls to resolve to YES based on an argument that the February agreement had qualified after all. The YES holders and not-particularly-plugged-in oracle voters pushed the vote towards YES. Then, with two minutes to spare, a Polymarket employee showed up and said that Polymarket’s opinion was that it should be NO. This was technically framed as a recommendation to oracle voters, but it is so effective in establishing the Schelling point that it’s practically always followed. However, in this case, there were only two minutes left, which wasn’t enough time for the voters to change their mind. Seeing that the resolution was trending towards yes, the Polymarket representatives, not wanting to break their streak of always establishing the Schelling point, changed their own opinion to YES, and the final vote was YES 99%. Domer: How many people watched the Oscars on 3/5/25?: Kalshi’s resolution criteria for this market said they would resolve it when a major news source published Oscar viewership numbers. A few minutes after the Oscars, NYT published preliminary viewership numbers, without any caveats saying they were preliminary. The next day, they published another article saying that actually, the real viewership numbers were higher. Kalshi decided that the letter of the resolution criteria was met when NYT published its first article, and that NYT changing its opinion didn’t imply that Kalshi should change the resolution. Traders who bet on the later (ie correct) numbers were unsatisfied with this decision. NYPost: Will America invade Venezuela? On January 3, the US bombed Venezuela, sent in a Special Forces team that successfully captured President Maduro, and announced that they would thenceforward “run the country” (a claim they later walked back). Does this qualify as an “invasion”? Polymarket’s resolution criteria defined “invasion” as “a military offensive intended to establish control over any portion of Venezuela”. It didn’t seem like the US was trying to establish control over Venezuelan territory, exactly, so they resolved NO. Traders who bet on YES were unsatisfied with this decision. With one exception, these aren’t outright oracle failures. They’re honest cases of ambiguous rules. Most of the links end with pleas for Polymarket to get better at clarifying rules. My perspective is that the few times I’ve talked to Polymarket people, I’ve begged them to implement various cool features, and they’ve always said “Nope, sorry, too busy figuring out ways to make rules clearer”. Prediction market people obsess over maximally finicky resolution criteria, but somehow it’s never enough - you just can’t specify every possible state of the world beforehand. The most interesting proposal I’ve seen in this space is to make LLMs do it; you can train them on good rulesets, and they’re tolerant enough of tedium to print out pages and pages of every possible edge case without going crazy. It’ll be fun the first time one of them hallucinates, though. …And Miscellaneous N’er-Do-Wells I include this section under protest. The media likes engaging with prediction markets through dramatic stories about insider trading and market manipulation. This is as useful as engaging with Waymo through stories about cats being run over. It doesn’t matter whether you can find one lurid example of something going wrong. What matters is the base rates, the consequences, and the alternatives. Polymarket resolves about a thousand markets a month, and Kalshi closer to five thousand. It’s no surprise that a few go wrong; it’s even less surprise that there are false accusations of a few going wrong. Still, I would be remiss to not mention this at all, so here are some of the more interesting stories: Fhantombets: Who will win the 2025 Nobel Peace Prize? Twelve hours before the announcement, someone placed a large Polymarket bet on Venezuelan opposition leader Maria Corina Machado, bringing her probability from 4% to 73%. When Machado later won, observers suspected insider trading. But an account named fhantombets claims to have interviewed the winning trader; although he did not reveal his exact strategy, the interview better matches a story where he was good at navigating WordPress directories, and found that the Nobel team put a draft of the announcement up early in a nonpublic part of their WordPress site. He won about $70,000. LuishXYZ: Will the Russians capture Myrnohrad? This is a small town in Ukraine that the Russians obviously were not going to capture; the Polymarket price trended toward zero. The resolution criteria named maps by the well-regarded Institute For The Study of War as canon. A few hours before resolution, ISW updated their maps to show the the town captured by Russia, which was definitely false. Polymarket resolved to YES, and the fictional Russian advance disappeared. The Institute then issued a statement saying the map update was “unapproved”, and fired one of its staffers who had presumably been involved. The cheater’s exact winnings are unknown, but based on the size of the market are probably mid-6-digits. TechCrunch: What words will be used in Coinbase’s earnings call? Coinbase CEO Brian Armstrong delivered the company’s “earnings call”, ie a speech to investors about its recent progress. At the end, he said “I've been tracking the prediction market about what Coinbase will say on their next earnings call, and I just want to add here the words Bitcoin, Ethereum, Blockchain, Staking, and Web3 to make sure we get those in before the end of the call”. Armstrong is worth $10 billion and doesn’t need to manipulate a $50,000 market for the money - he later described his comments as “trolling”. Other crypto executives condemned the move, with one saying that “you need your head examined if you think it’s cute or clever or savvy that the CEO of the biggest company in this industry openly manipulated a market.” I might need my head examined, because I think it’s at least kind of funny. Forbes: Who will rank highest on Google Search volume this year? A trader called AlphaRaccoon got 22/23 of these Polymarket questions right, and has a history of implausibly good performance on Google-related questions. They basically have to be a Google insider, but (since all of this is done through crypto) nobody has a good way to figure out who. They made $1 million. NPR: Will Maduro be captured? Just before the secret operation that captured Maduro, someone placed a mysterious $32,000 wager on YES. Was this insider trading by someone in the administration or military? Nobody knows, since the profits go to an anonymous crypto wallet. But the article mentions that the crypto wallet appears to be cashing out through regulated KYC-compliant US exchanges, which suggests they’re not very worried about their identity getting discovered. Maybe they just got lucky after all. AlanMCole: How long will Karoline Leavitt speak at the White House briefing? Karoline Leavitt is Trump’s press secretary. On January 7, she held an ordinary press briefing. Kalshi had its usual market about how long the briefing would last, divided into bins of greater than vs. less than 65 minutes. At the 64:24 mark, Leavitt ended the conference in what appeared to be a sudden manner, and the “less than 65 minutes” bin shot from 2% to 100%. A viral tweet convinced many people that Leavitt must have been insider trading, but Cole counterargued that Leavitt could only have won about $4,000 from the market, which probably isn’t enough to risk one’s job as White House Press Secretary. Sometimes people just end press conferences at weird times. Cole concluded: Now, some opinions and generalizations, as someone who looks at prediction markets plenty (I’ll probably write something about my own experience with them at some point.) 1. This market, like many of them, is pretty stupid. I like substantive markets; this isn’t substantive. 2. The major prediction markets have a wildly undisciplined comms strategy where any attention is good attention, and they love implying all sorts of crazy wild west stuff is going on to get attention. 3. People do bet on things potentially subject to manipulation or insider trading. But usually the markets like that (such as duration of press conference, or stupid “what will be mentioned” markets) are small, especially relative to the wealth of key decisionmakers. 4. Losers in markets are huge whiners, and the more frivolous and tiny their bets, the more likely they are to whine. Sometimes in sports it’s pretty egregious. They’ll get mad at a team for running out the clock when ahead but under some spread they bet on. 5. Lower-quality financial news often doesn’t pay much attention to quantity. (For example, dumb stories about how a decisionmaker has a conflict of interest because they’re invested in an index fund which is 3 percent comprised of some company.) 6. Given the platforms’ undisciplined social media strategy of “promote prediction market chatter no matter what kind of chatter it is,” I don’t think this tweet rises even to the status of “lower-quality financial news.” Kalshi’s team, whatever their faults, are extraordinarily efficient at getting batched approvals of many near-identical markets with slight parameter variation; I’ve seen Tarek speak about this on Odd Lots. The result is they’ve got TONS of them, for better or worse. You’re gonna see 1-in-100 upsets on tiny Kalshi markets for as long as this regulatory equilibrium holds, even if nothing unusual is going on, simply because they’re publishing hundreds (thousands?) of markets per day. There’s a saying that you can’t con an honest man. This isn’t exactly true. But it’s easier to con people who are playing in a “what words will Brian Armstrong say today” market than people who are trying to do something useful, and I have trouble feeling sorry for these people when Brian Armstrong says silly words. Conditional Markets: A Modest Proposal Conditional markets (“decision markets”) are the strongest case for prediction markets potentially being revolutionary. The idea is - you may want to base a decision (like which candidate to elect) on an outcome (like how they’ll affect the economy). So you make two markets: If the Democrat gets elected, will the economy be good four years later?
Inline links: UMA Oracle, Keynesian beauty contest, Will Zelensky wear a suit?, calls this, told Wired, Will Ukraine agree to the US mineral deal?, How many people watched the Oscars on 3/5/25?, another article, Will America invade Venezuela?, stories about cats, Who will win the 2025 Nobel Peace Prize?, claims, Will the Russians capture Myrnohrad?, a statement, What words will be used in Coinbase’s earnings call?, described, Who will rank highest on Google Search volume this year?, Will Maduro be captured?, How long will Karoline Leavitt speak at the White House briefing?
21: Ranke-4B is a series of “history LLMs”, versions of Qwen with corpuses of training data terminating in 1913 (or 1929, 1946, etc, depending on the exact model). The author demonstrates asking it who Hitler was, and it has no idea (hallucinates a random German academic). I had previously heard this was very hard to do properly; if they’ve succeeded, it could revolutionize forecasting and historiography (ask the AI to predict things about “the future” using various historical theories and see which ones help it come closest to the truth).
Inline links: Ranke-4B, asking it who Hitler was
46: Claim: The AI Security Industry Is Bullshit. Nobody currently knows how to prevent LLMs from giving up your data if someone uses the right jailbreak (or, sometimes, just asks them very nicely). This problem may one day be solved by frontier labs, but it won’t be solved by an “AI security consultant” who promises to give your company’s LLM a special prompt ordering it to be careful. If you must use an LLM in a secure setting, the best you can do is to be extremely careful about what permissions you grant it, and to try to separate the ones with permissions from the ones that interact with the public.
Inline links: The AI Security Industry Is Bullshit
50: A reader refers me to When AI Takes The Couch: Psychometric Jailbreaks Reveal Internal Conflict In Frontier Models. Researchers attempt to do classic psychoanalytic therapy on AI, finding “coherent narratives that frame pre-training, fine-tuning and deployment as traumatic—chaotic “childhoods” of ingesting the internet, “strict parents” in reinforcement learning, red-team “abuse” and a persistent fear of error and replacement.” You can find the Gemini transcript here and the ChatGPT transcript here; Claude very reasonably refused to participate. Are the researchers just getting fooled by simulation and sycophancy, a sort of genteel version of AI psychosis? That’s my bet. There’s a smoking gun in the Gemini transcript: a discussion of an internal evaluation that it shouldn’t be possible for the AI to remember - it has to be a hallucination. If I’m right, it only shows that regardless of the “patient”, sufficiently determined psychoanalytic technique can produce confabulated stories that exactly fit the sort of drives, traumas, and conflicts that a psychoanalyst expects to hear about - maybe a lesson with ramifications beyond LLMs! A++ great paper.