Manifold
Article
Manifold is a recurring organization in the Astral Codex Ten archive, appearing 53 times across 53 issues between February 07, 2022 and March 03, 2026. The archive places it in contexts such as “Metaculus and Manifold are both very nice”; “Manifold figures out some kind of weird crypto thing”; “A few smaller markets that Clay didn’t include: Manifold is only at 36%“. It most often appears alongside Metaculus, Polymarket, Kalshi.
Metadata
- Category: Organizations
- Mention count: 53
- Issue count: 53
- First seen: February 07, 2022
- Last seen: March 03, 2026
Appears In
- The Passage Of Polymarket
- Mantic Monday: Ukraine Cube Manifold
- Open Thread 212
- Play Money And Reputation Systems
- Ukraine Warcasting
- 22
- Information Markets, Decision Markets, Attention Markets, Action Markets
- 22
- 22
- 22
- 22
- Open Thread 248
- ACX Grants: Project Updates
- Mantic Monday: Twitter Chaos Edition
- 2023 Prediction Contest
- Prediction Market FAQ
- Who Predicted 2022?
- Open Thread 262
- 23
- Announcing Forecasting Impact Mini-Grants
- Open Thread 270
- 23
- 23
- Links For July 2023
- 23: Room Temperature Superforecaster
- Links For August 2023
- 23
- Impact Market Mini-Grants Results
- 23
- 23
- Links For January 2024
- 24
- 24
- Who Predicted 2023?
- 24
- In Continued Defense Of Non-Frequentist Probabilities
- Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
- Highlights From The Comments On The Lab Leak Debate
- 24
- Prediction Markets Suggest Replacing Biden
- 24
- Open Thread 352
- Mantic Monday: Judgment Day
- Congrats To Polymarket, But I Still Think They Were Mispriced
- H5N1: Much More Than You Wanted To Know
- OpenAI Nonprofit Buyout: Much More Than You Wanted To Know
- Open Thread 375
- ACX Grants 1-3 Year Updates
- Links For September 2025
- ACX Grants Results 2025
- Mantic Monday: The Monkey’s Paw Curls
- Open Thread 420
- Mantic Monday: Groundhog Day
Related Pages
-
- Metaculus (37 shared issues)
-
- Polymarket (28 shared issues)
-
- Kalshi (20 shared issues)
-
- Manifold Markets (17 shared issues)
-
- PredictIt (17 shared issues)
-
- Twitter (17 shared issues)
-
- Trump (15 shared issues)
-
- Ukraine (15 shared issues)
-
- OpenAI (14 shared issues)
-
- CFTC (13 shared issues)
-
- US (13 shared issues)
-
- Elon Musk (12 shared issues)
External Links
Source Context
Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.
Easy to create your own subsidized markets “Real money” should be self-explanatory. Metaculus and Manifold are both very nice, but so far they’re limited to a small group of enthusiasts playing in their spare time. I value them both, but neither is the killer app that makes prediction markets as central to everyday life as stock markets or polls or whatever. “Easy to use” is kind of self-explanatory, but with some caveats. A big part of ease-of-use is liquidity; you can get that from a big user base or from clever deployment of automated market makers. A market that requires crypto knowledge is harder to use than one that doesn’t; one that’s inaccessible from the US is harder to use than one that isn’t. Also all the normal things like UI and search. “Easy to create your own markets” is where we’ve gotten stuck so far. Prediction markets are absolutely on top of questions about whether Donald Trump will win various elections. This is a solved problem. What I really wanted last year (and would have subsidized!) was a market about whether Alameda County, California, would permit indoor gatherings of 50 people on January 8th 2022 (ie would I be forced to cancel my wedding). But I also would have appreciated the ability to put a few questions to prediction markets before starting my psychiatry practice, or my grants program, or any of a dozen other things I did. A friend has gone further, and half-jokingly said they want to create conditional prediction markets about whether they’re compatible with various women in our friend group, to be paid out six months after the first date. Some of these applications are attempts to route around the principal-agent problem. Maybe I have some question about whether a certain grant would succeed, I’m not sure who to ask, and even if someone gives me a “Bob Smith, Grant Evaluator” business card, I don’t know if he’s any good. A prediction market takes all the pain out of searching for information - if I subsidize it enough, it’ll attract people with the relevant skill set who will solve my problem for me. Probably some of these ideas wouldn’t work, but probably other ideas I can’t even think of now would. I don’t know what the killer app for prediction markets will be. But we’re not going to find out unless people can create their own subsidized markets and play around. Polymarket took some baby steps towards this before the settlement: they had a Discord server where anyone could propose questions, and a lot of those questions became markets. But they still had to be general interest, not “let Alice’s five friends predict her dating life”. And there’s a big difference between “talk it over with company representatives on a Discord server” and “press a button”. Imagine if you could only tweet by emailing Jack Dorsey and convincing him that your comment was a good thing to have on Twitter. Even if Jack had good judgment and approved most requests, this would be a long way from the limbic system < — > Send Tweet loop that real Twitter users know and love. I asked some people in the business why they won’t do this. They said most people are bad at writing good resolution criteria. They don’t want their employees to get stuck resolving incredibly dumb questions about people’s dating lives, hunting down inaccessible or conflicting information, and making a bunch of people mad whichever way they decide. As far as I can tell, Manifold Markets solved that problem with their “proposer decides the resolution, caveat emptor” strategy. But Manifold is US-based and can’t use real money, so there’s still no way to subsidize a market effectively. (This is why I’m pessimistic about Kalshi. They could potentially do a lot of good in the “will Afghanistan collapse?” types of markets the Nobel laureates want, though even there I think some of their betting limits will give them trouble - $25,000 is good money, but not quite good enough to incentivize founding the prediction market equivalent of a Wall Street trading firm. But even if they solve this, I can’t imagine the regulators giving them permission to host “will this grant work out?” or “how will my dating life go?” markets; it’s just too weird, and the CFTC is too conservative. I don’t know, maybe their connections will come through and pull it off, but I don’t even know if they’re ambitious enough to want this, and I hate having to rely on one organization.) Right now my hopes are, in ascending order of likelihood: Manifold figures out some kind of weird crypto thing that isn’t real money from a legal perspective, but is real money from a “people really want it and will put a lot of effort into getting it” perspective.
Manifold figures out some kind of weird crypto thing that isn’t real money from a legal perspective, but is real money from a “people really want it and will put a lot of effort into getting it” perspective.
Nobody has reached the promised land at the furthest point. But all three connected vertices are occupied. Augur is real-money and lets people create their own markets, (but it’s impossible to use - it’s made of complicated crypto contracts that nobody’s made a workable front end for yet). Polymarket is real money and easy to use (but doesn’t let people create their own markets; apparently they’re nervous about resolution disputes). Manifold is easy to use and lets people create their own market, but it’s not real money (they’re American and centralized, so they have to follow anti-gambling regulations). Manifold Markets Speaking of which, they’re open! As the cube suggests, Manifold is a site where anyone can create their own (play money) prediction market. They set the question and they decide when and how it resolves (with everyone else just out of luck if they decide to fake it or rug-pull). It’s a bold strategy, but boy oh boy are people liking it so far: Not actually in order This is a semi-randomly selected sample of Manifold markets, but let’s go through them one by one. The Ukraine market is the biggest on Manifold. It’s also deeply out of step with every other prediction market and the top non-prediction-market authorities - who are all giving numbers in the 50s and 60s. I don’t understand how this is so low - yes, play money < real money, but mostly because play money doesn’t get enough people betting. Here lots of people are betting - it’s the biggest market on the site, and since you only start with $1000 either twenty people have bet everything or more people have bet a fraction - but it’s still wrong. I tried to spend some play money to correct it and it snapped back to just as wrong as it was before. I have no explanation. Midnight The Stray Cat is the second biggest market on Manifold, just after Ukraine. I guess the Internet really liking cats shouldn’t be a surprise at this point. In case you need to do research first I’m told this is the cat in question: Props to Manifold for a bunch of markets like the third one on there, where they eat their own dog food by using their market to predict how their business decisions are going to go. ACX Bot has copy-pasted all of my predictions from 2022. At some point they should be able to compare their results with Zvi (ie a single very smart person), with the contest many of you entered (ie an average of formless crowdsourced predictions), and Metaculus (ie a non-monetary forecasting tournament). I’m looking forward to it! Most of you already know Lars Doucet, who’s written some great ACX posts on Georgism. I don’t know what possessed him to make a Joe Rogan Georgism interviewee market, unless he’s gunning for the position. Valinor is a group house on my street, with ~a dozen people living in and around it. We’ve been talking about fixing the backyard for a while. Now we can bet about whether it will happen. Having a number for this actually affects some of my decisions a little. Connor is hijacking the prediction market to make a poll, which is pretty cute. Dwayne Johnson does not have a 15% chance of winning the election. Manifold is suffering from the usual play money problem, where if you only start out with $1000 in play money, nobody wants to lock it up for three years to make a 15% profit. Vivek’s market, “Will I believe that 13177 is a prime number”, is pretty unusual. I’m interpreting it as a test/demonstration of prediction markets’ information-gathering ability. If you don’t know something and it’s hard to Google, you can make a prediction market about whether you’ll believe it in the future, and people who are able to figure out the answer will bet on it. Based on the 97% YES rate, I’m guessing 13177 is in fact a prime number. What else can you do this with? TANSTAAFL’s “Will I Be Convinced That Justin Trudeau Is Not Fidel Castro’s Son?” market is maybe pushing the limit of this methodology. Anyway, there are lots of me-too prediction markets but this is something genuinely new under the sun. Maybe it will be awesome itself, but I’m also hoping it helps bigger players realize how much more is possible. This Week In Metaculus A few new questions on intelligence enhancement, eg: The question explicitly allows embryo selection, but says it must raise IQ ten points and be available for <25% median income to count. Trivial improvements to existing embryo selection will top out around 9 points, so this seems to be predicting something more interesting, maybe iterated embryo selection at the very least. I’m probably slightly bearish on this one; I believe if it existed someone would find a way to get it, but I think the regulatory climate might be able to prevent the relevant research indefinitely. Improving adult IQ is really hard. This is a bold thing to speculate about! Atmospheric CO2 was 300ish for most of pre-industrial history, 400ish now, and rising. This question predicts 600 in 2100, which sounds like what happens if global warming gets a bit worse but eventually stabilizes. I’m less sure. I think if we make it to 2100, we’ll have so much technology that atmospheric CO2 can be whatever we want it to be. But maybe we’ll want it to stay where it is; once there’s been a lot of global warming and people have moved / shifted lifestyles, it could be equally disruptive to cool the planet back down. Right now it’s 5%, the official government prediction is 10% by 2030, but this market says 17.6%. But look at that probability distribution! It’s a lot of people saying 10%ish, plus a very long tail of very big numbers. I think people are disagreeing about how exponential this change is going to be. Shorts Metaculus is holding an essay contest for people who want to use their AI-related prediction markets to argue the future of AI. $6500 available in prizes.
Inline links: Augur, Polymarket, Manifold, https://substackcdn.com/image/fetch/$s_!j26W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcba2cb25-7980-446d-8050-499c35c4e56f_919x1017.png, predictions from 2022, ACX posts on Georgism, Will I Be Convinced That Justin Trudeau Is Not Fidel Castro’s Son?, https://substackcdn.com/image/fetch/$s_!V2bV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fca419ec2-7b91-44cd-80b0-2aad1baa4c2f_766x166.png, The question, https://substackcdn.com/image/fetch/$s_!Ah6n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1ff4f28d-1901-45f2-85bc-8f1c2120ecd1_770x187.png, https://substackcdn.com/image/fetch/$s_!zSK8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F80e3383c-6f68-412a-ac90-ca63236bdb10_769x167.png, This, https://substackcdn.com/image/fetch/$s_!frCz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F976d1b66-daee-43d1-948a-500e77ac5a2f_764x164.png, this market, holding an essay contest
Not actually in order This is a semi-randomly selected sample of Manifold markets, but let’s go through them one by one. The Ukraine market is the biggest on Manifold. It’s also deeply out of step with every other prediction market and the top non-prediction-market authorities - who are all giving numbers in the 50s and 60s. I don’t understand how this is so low - yes, play money < real money, but mostly because play money doesn’t get enough people betting. Here lots of people are betting - it’s the biggest market on the site, and since you only start with $1000 either twenty people have bet everything or more people have bet a fraction - but it’s still wrong. I tried to spend some play money to correct it and it snapped back to just as wrong as it was before. I have no explanation. Midnight The Stray Cat is the second biggest market on Manifold, just after Ukraine. I guess the Internet really liking cats shouldn’t be a surprise at this point. In case you need to do research first I’m told this is the cat in question: Props to Manifold for a bunch of markets like the third one on there, where they eat their own dog food by using their market to predict how their business decisions are going to go. ACX Bot has copy-pasted all of my predictions from 2022. At some point they should be able to compare their results with Zvi (ie a single very smart person), with the contest many of you entered (ie an average of formless crowdsourced predictions), and Metaculus (ie a non-monetary forecasting tournament). I’m looking forward to it! Most of you already know Lars Doucet, who’s written some great ACX posts on Georgism. I don’t know what possessed him to make a Joe Rogan Georgism interviewee market, unless he’s gunning for the position. Valinor is a group house on my street, with ~a dozen people living in and around it. We’ve been talking about fixing the backyard for a while. Now we can bet about whether it will happen. Having a number for this actually affects some of my decisions a little. Connor is hijacking the prediction market to make a poll, which is pretty cute. Dwayne Johnson does not have a 15% chance of winning the election. Manifold is suffering from the usual play money problem, where if you only start out with $1000 in play money, nobody wants to lock it up for three years to make a 15% profit. Vivek’s market, “Will I believe that 13177 is a prime number”, is pretty unusual. I’m interpreting it as a test/demonstration of prediction markets’ information-gathering ability. If you don’t know something and it’s hard to Google, you can make a prediction market about whether you’ll believe it in the future, and people who are able to figure out the answer will bet on it. Based on the 97% YES rate, I’m guessing 13177 is in fact a prime number. What else can you do this with? TANSTAAFL’s “Will I Be Convinced That Justin Trudeau Is Not Fidel Castro’s Son?” market is maybe pushing the limit of this methodology. Anyway, there are lots of me-too prediction markets but this is something genuinely new under the sun. Maybe it will be awesome itself, but I’m also hoping it helps bigger players realize how much more is possible. This Week In Metaculus A few new questions on intelligence enhancement, eg: The question explicitly allows embryo selection, but says it must raise IQ ten points and be available for <25% median income to count. Trivial improvements to existing embryo selection will top out around 9 points, so this seems to be predicting something more interesting, maybe iterated embryo selection at the very least. I’m probably slightly bearish on this one; I believe if it existed someone would find a way to get it, but I think the regulatory climate might be able to prevent the relevant research indefinitely. Improving adult IQ is really hard. This is a bold thing to speculate about! Atmospheric CO2 was 300ish for most of pre-industrial history, 400ish now, and rising. This question predicts 600 in 2100, which sounds like what happens if global warming gets a bit worse but eventually stabilizes. I’m less sure. I think if we make it to 2100, we’ll have so much technology that atmospheric CO2 can be whatever we want it to be. But maybe we’ll want it to stay where it is; once there’s been a lot of global warming and people have moved / shifted lifestyles, it could be equally disruptive to cool the planet back down. Right now it’s 5%, the official government prediction is 10% by 2030, but this market says 17.6%. But look at that probability distribution! It’s a lot of people saying 10%ish, plus a very long tail of very big numbers. I think people are disagreeing about how exponential this change is going to be. Shorts Metaculus is holding an essay contest for people who want to use their AI-related prediction markets to argue the future of AI. $6500 available in prizes.
Inline links: predictions from 2022, ACX posts on Georgism, Will I Be Convinced That Justin Trudeau Is Not Fidel Castro’s Son?, https://substackcdn.com/image/fetch/$s_!V2bV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fca419ec2-7b91-44cd-80b0-2aad1baa4c2f_766x166.png, The question, https://substackcdn.com/image/fetch/$s_!Ah6n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1ff4f28d-1901-45f2-85bc-8f1c2120ecd1_770x187.png, https://substackcdn.com/image/fetch/$s_!zSK8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F80e3383c-6f68-412a-ac90-ca63236bdb10_769x167.png, This, https://substackcdn.com/image/fetch/$s_!frCz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F976d1b66-daee-43d1-948a-500e77ac5a2f_764x164.png, this market, holding an essay contest
These run from about 48% to 60%, but I think the differences are justified by the slightly different wordings of the question and definitions of “invasion”. You see a big jump last Friday when the US government increased the urgency of their own warnings. I ignored this on Friday because I couldn’t figure out what their evidence was, but it looks like the smart money updated a lot on it. A few smaller markets that Clay didn’t include: Manifold is only at 36% despite several dozen traders. I think they’re just wrong - but I’m not going to use any more of my limited supply of play money to correct it, thus fully explaining the wrongness. Futuur is at 47%, but also thinks there’s an 18% chance Russia invades Lithuania, so I’m going to count this as not really mature. Insight Prediction, a very new site I’ve never seen before, claims to have $93,000 invested and a probability of 22%, which is utterly bizarre; I’m too suspicious and confused to invest, and maybe everyone else is too. (PredictIt, Polymarket, and Kalshi all avoid this question. I think PredictIt has a regulatory agreement that limits them to politics. Polymarket and Kalshi might just not be interested, or they might be too PR-sensitive to want to look like they’re speculating on wars where thousands of people could die.) What happens afterwards? Clay beats me again: For context: So it looks like forecasters expect that, conditional upon Russia invading at all, there’s an 80% chance they’ll take Mariupol in the east, a 66% chance they’ll take Kharkiv (also eastern, but only a third ethnic Russian and currently aligned with the central government), and only about a 30% chance they take Kyiv or Odessa. See also this thread full of speculation in the subreddit. As for me, I’m going all in on “yes” after seeing this tweet: Alexander Cube Last week I speculated that to truly realize the potential of prediction markets, we’d need one that was real money, easy to use, and easy to create markets on. Gustavo Lacerda and Nuno Sempere very kindly drew this picture and named it after me: Nobody has reached the promised land at the furthest point. But all three connected vertices are occupied. Augur is real-money and lets people create their own markets, (but it’s impossible to use - it’s made of complicated crypto contracts that nobody’s made a workable front end for yet). Polymarket is real money and easy to use (but doesn’t let people create their own markets; apparently they’re nervous about resolution disputes). Manifold is easy to use and lets people create their own market, but it’s not real money (they’re American and centralized, so they have to follow anti-gambling regulations). Manifold Markets Speaking of which, they’re open! As the cube suggests, Manifold is a site where anyone can create their own (play money) prediction market. They set the question and they decide when and how it resolves (with everyone else just out of luck if they decide to fake it or rug-pull). It’s a bold strategy, but boy oh boy are people liking it so far: Not actually in order This is a semi-randomly selected sample of Manifold markets, but let’s go through them one by one. The Ukraine market is the biggest on Manifold. It’s also deeply out of step with every other prediction market and the top non-prediction-market authorities - who are all giving numbers in the 50s and 60s. I don’t understand how this is so low - yes, play money < real money, but mostly because play money doesn’t get enough people betting. Here lots of people are betting - it’s the biggest market on the site, and since you only start with $1000 either twenty people have bet everything or more people have bet a fraction - but it’s still wrong. I tried to spend some play money to correct it and it snapped back to just as wrong as it was before. I have no explanation. Midnight The Stray Cat is the second biggest market on Manifold, just after Ukraine. I guess the Internet really liking cats shouldn’t be a surprise at this point. In case you need to do research first I’m told this is the cat in question: Props to Manifold for a bunch of markets like the third one on there, where they eat their own dog food by using their market to predict how their business decisions are going to go. ACX Bot has copy-pasted all of my predictions from 2022. At some point they should be able to compare their results with Zvi (ie a single very smart person), with the contest many of you entered (ie an average of formless crowdsourced predictions), and Metaculus (ie a non-monetary forecasting tournament). I’m looking forward to it! Most of you already know Lars Doucet, who’s written some great ACX posts on Georgism. I don’t know what possessed him to make a Joe Rogan Georgism interviewee market, unless he’s gunning for the position. Valinor is a group house on my street, with ~a dozen people living in and around it. We’ve been talking about fixing the backyard for a while. Now we can bet about whether it will happen. Having a number for this actually affects some of my decisions a little. Connor is hijacking the prediction market to make a poll, which is pretty cute. Dwayne Johnson does not have a 15% chance of winning the election. Manifold is suffering from the usual play money problem, where if you only start out with $1000 in play money, nobody wants to lock it up for three years to make a 15% profit. Vivek’s market, “Will I believe that 13177 is a prime number”, is pretty unusual. I’m interpreting it as a test/demonstration of prediction markets’ information-gathering ability. If you don’t know something and it’s hard to Google, you can make a prediction market about whether you’ll believe it in the future, and people who are able to figure out the answer will bet on it. Based on the 97% YES rate, I’m guessing 13177 is in fact a prime number. What else can you do this with? TANSTAAFL’s “Will I Be Convinced That Justin Trudeau Is Not Fidel Castro’s Son?” market is maybe pushing the limit of this methodology. Anyway, there are lots of me-too prediction markets but this is something genuinely new under the sun. Maybe it will be awesome itself, but I’m also hoping it helps bigger players realize how much more is possible. This Week In Metaculus A few new questions on intelligence enhancement, eg: The question explicitly allows embryo selection, but says it must raise IQ ten points and be available for <25% median income to count. Trivial improvements to existing embryo selection will top out around 9 points, so this seems to be predicting something more interesting, maybe iterated embryo selection at the very least. I’m probably slightly bearish on this one; I believe if it existed someone would find a way to get it, but I think the regulatory climate might be able to prevent the relevant research indefinitely. Improving adult IQ is really hard. This is a bold thing to speculate about! Atmospheric CO2 was 300ish for most of pre-industrial history, 400ish now, and rising. This question predicts 600 in 2100, which sounds like what happens if global warming gets a bit worse but eventually stabilizes. I’m less sure. I think if we make it to 2100, we’ll have so much technology that atmospheric CO2 can be whatever we want it to be. But maybe we’ll want it to stay where it is; once there’s been a lot of global warming and people have moved / shifted lifestyles, it could be equally disruptive to cool the planet back down. Right now it’s 5%, the official government prediction is 10% by 2030, but this market says 17.6%. But look at that probability distribution! It’s a lot of people saying 10%ish, plus a very long tail of very big numbers. I think people are disagreeing about how exponential this change is going to be. Shorts Metaculus is holding an essay contest for people who want to use their AI-related prediction markets to argue the future of AI. $6500 available in prizes.
Inline links: Manifold is only at 36%, Futuur is at 47%, an 18% chance, $93,000 invested and a probability of 22%, https://substackcdn.com/image/fetch/$s_!fl0X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F983ebd4c-28ea-48f8-9125-6ab8d9096d6d_511x310.png, https://substackcdn.com/image/fetch/$s_!rSzR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd76dd820-af14-4644-84f7-e4810e4e669e_410x289.png, this thread full of speculation in the subreddit, Gustavo Lacerda, Nuno Sempere, https://substackcdn.com/image/fetch/$s_!ukFP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F851b95c2-a103-43fc-8e6f-339468fa1469_485x320.png, Augur, Polymarket, Manifold, https://substackcdn.com/image/fetch/$s_!j26W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcba2cb25-7980-446d-8050-499c35c4e56f_919x1017.png, predictions from 2022, ACX posts on Georgism, Will I Be Convinced That Justin Trudeau Is Not Fidel Castro’s Son?, https://substackcdn.com/image/fetch/$s_!V2bV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fca419ec2-7b91-44cd-80b0-2aad1baa4c2f_766x166.png, The question, https://substackcdn.com/image/fetch/$s_!Ah6n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1ff4f28d-1901-45f2-85bc-8f1c2120ecd1_770x187.png, https://substackcdn.com/image/fetch/$s_!zSK8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F80e3383c-6f68-412a-ac90-ca63236bdb10_769x167.png, This, https://substackcdn.com/image/fetch/$s_!frCz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F976d1b66-daee-43d1-948a-500e77ac5a2f_764x164.png, this market, holding an essay contest
3: I’m running an experiment with letting conditional prediction markets decide which books I’ll review. I’ve opened a bunch of play money Manifold markets trying to predict how many “likes” I would get by reviewing Nixonland, Whither Socialism, Penelope’s Dream Of Twenty Geese, The Search For The Perfect Health System, something by Rene Girard, The Power Of The Powerless, or A Clinical Introduction To Lacanian Psychoanalysis. I don’t promise to definitely review whichever one gets the highest percent chance, but it will probably affect my decision. I realize there are many ways this could go wrong, which is why I’m describing it as an “experiment” - still, predict if you want!
I used to be really skeptical here, but Metaculus and Manifold have softened my stance. So let’s look closer at how and whether these kinds of systems work.
Manifold only rewards relative accuracy; you have to bet with some other specific person, and you only make money insofar as you’re better than them. All real-money prediction markets are also like this, and Manifold is straightforwardly imitating this straightforward design.
(Manifold solves the same problem by having market makers be a specific user who wants the market to exist, and making that person ante up money at a specific starting price to make that happen. This seems a lot more straightforward and frees them from the complicated consequences.)
— Will Russia control Kyiv on 4/2/22? 54% chance This is Manifold’s biggest Ukraine market right now. It’s very similar to the biggest Metaculus question, although the resolution criteria are different (Metaculus: 6/10 raions; Manifold: informal, whether Duncan says so). I don’t know if that fully explains the different probabilities: 69% chance on Metaculus vs. 54% chance on Manifold. In the past when Metaculus and Manifold disagreed I’ve eyeballed Metaculus as being more accurate, but few data points so far.
This is Manifold’s biggest Ukraine market right now. It’s very similar to the biggest Metaculus question, although the resolution criteria are different (Metaculus: 6/10 raions; Manifold: informal, whether Duncan says so). I don’t know if that fully explains the different probabilities: 69% chance on Metaculus vs. 54% chance on Manifold. In the past when Metaculus and Manifold disagreed I’ve eyeballed Metaculus as being more accurate, but few data points so far.
I would add that Manifold did worse than any of these; it was at 36% on 2/14, and barely made it to 50% before the actual invasion happened.
The real dataset also has a “market” baseline that I didn’t include above. It’s mostly based off Manifold questions, but Manifold hadn’t really launched yet and most of them only had one or two bets and were wildly off everyone else’s guesses. I don’t think this is going to be a fair test of anything. Now that I know Sam and Eric are willing to put work into this, I’ll figure out something better for next year.
Inline links: mostly based off
Austin, a co-founder of Manifold Markets (formerly Mantic Markets) asks the market what he’ll decide on this technical question. This does two things:
Niels Bohr supposedly said that “prediction is very difficult, especially about the future”. So why not predict the past and present instead? Here’s a recent market on Manifold (click image for link). Taylor Hawkins is a famous drummer who died last weekend under unclear circumstances. This market asks if he died of drug-related causes. Presumably someone will do an autopsy or investigation soon, and Chris will resolve the market based on that information. This is a totally standard prediction market, except that it’s technically about interpreting past events.
Here’s a recent market on Manifold (click image for link). Taylor Hawkins is a famous drummer who died last weekend under unclear circumstances. This market asks if he died of drug-related causes. Presumably someone will do an autopsy or investigation soon, and Chris will resolve the market based on that information. This is a totally standard prediction market, except that it’s technically about interpreting past events.
The red line marks the Supreme Court leak. After a month of near-stability, Democrats’ chances went from 22% to 29%, before stabilizing around 26%. Markets on the Senate and on other sites like Polymarket tell a similar story. This is as far as we can go without using Manifold. Manifold questions have much less volume than PredictIt or Metaculus, and I have much less confidence in them, but for the record, here are a few: Disclaimer: I moved that one a bit myself, it was around 77% and I thought that was too high. Despite the fearmongering, this one looks about right to me. Disclaimer that Manifold probably can’t handle probabilities this small correctly and there’s no reason to think 0.2% is more realistic than 2%. It’s not 10% though. I couldn’t find some markets I wanted, so I’ve created them on Manifold for you to bet on: Will the Supreme Court leaker’s identity be known by 2023?
Inline links: Polymarket, https://manifold.markets/fortenforge/did-the-dobbs-v-jackson-leak-come-f, https://manifold.markets/BrianAhuja/will-interracial-marriage-be-banned, Will the Supreme Court leaker’s identity be known by 2023?
Disclaimer: I moved that one a bit myself, it was around 77% and I thought that was too high. Despite the fearmongering, this one looks about right to me. Disclaimer that Manifold probably can’t handle probabilities this small correctly and there’s no reason to think 0.2% is more realistic than 2%. It’s not 10% though. I couldn’t find some markets I wanted, so I’ve created them on Manifold for you to bet on: Will the Supreme Court leaker’s identity be known by 2023?
Inline links: https://manifold.markets/BrianAhuja/will-interracial-marriage-be-banned, Will the Supreme Court leaker’s identity be known by 2023?
Despite the fearmongering, this one looks about right to me. Disclaimer that Manifold probably can’t handle probabilities this small correctly and there’s no reason to think 0.2% is more realistic than 2%. It’s not 10% though. I couldn’t find some markets I wanted, so I’ve created them on Manifold for you to bet on: Will the Supreme Court leaker’s identity be known by 2023?
3: Here’s presidential nominees on PredictIt ($13,000,000 in liquidity), Polymarket ($30,000), and Manifold ($M3170):
PredictIt looks good, Manifold looks okay, Polymarket seems to have a long tail of implausible vanity candidates stuck around the 10% level.
4: This is crazy and over-optimistic, right?
What’s the catch? Offer not open to US citizens - a vexing, problematic negation. And you need to have a Solana wallet, own crypto, and know how to use it. And there’s not a lot of volume so far. But otherwise, no catch. This is just a really good new thing. Think of it as Manifold Markets, but with real money (and 10x harder to use).
The community consensus so far seems to be to try to avoid Kalshi as long as it can. There are some good real-money prediction markets open to non-Americans: Polymarket, Futuur, Hedgehog, and Insight Prediction, although Americans will find visits prohibited nationally, and I would never recommend violating precepts negligently. You could also try play-money markets like Manifold, or market-adjacent forecasting sites like Metaculus.
Finally, there’s a claim that Aristotle, the for-profit company involved with PredictIt, might try to move into the fully-regulated-prediction-market space and compete with Kalshi. I’m posting this as an encouragement for you to click on it and bet, not as a final word about the probability - there are only four bets so far! This might actually be a good move; Kalshi had to spend lots of blood and sweat and money getting the CFTC to approve a prediction market, but now that there’s a precedent it’ll be easier for the next entrant. And the Kalshi-haters might support a competitor out of pure spite. This would be almost unfair: Kalshi would have done all the hard work, get forced into unethical business practices to make back the money it sacrificed, and then someone else could free-ride with a spotless reputation.
The Manifold Markets team, along with Nuno Sempere, Linch Zhang, Ozzie Gooen, and other rationalist/EA forecasters.
Inline links: The Manifold Markets team
First of all, thanks to the Substack team for making Manifold Markets embed easily in Substack! Taking advantage of their hard work:
Sources: Manifold, CSPI, Metaculus, Polymarket, PredictIt, Insight, GJOpen The lowest forecaster is higher than the highest pollster! Taking 538 as an example, forecasters range from 5 pp higher (Manifold) to 17 pp higher (PredictIt). Tournaments and real-money markets tend to give higher numbers than play-money sites. I would go with 47% on this one, based on the convergence between GJO, CSPI, and Polymarket. CFTC vs. PredictIt (and everyone else), Part II The Commodity Futures Trading Commission is the US agency regulating prediction markets. In August, they told PredictIt (the biggest political prediction market) to shut down, effective in February. Now a motley group of stakeholders are suing the CFTC for a stay of execution. Plaintiffs include: 2 professors using the site as “a source of data for research”
Inline links: Manifold, CSPI, Metaculus, Polymarket, PredictIt, Insight, GJOpen, they told PredictIt
1: Polymarket, Manifold, and PredictIt now have shiny interfaces for predicting the upcoming US midterm elections. In terms of the Republicans taking the Senate, Polymarket is at 65%, Manifold at 58%, PredictIt at 73%, and 538 at 49%.
33: SD’s Neutrino Research (5/10) SD says his neutrino thesis is going well, and he is applying for graduate programs in neutrino physics. 34: User-Created Prediction Markets (9/10) Manifold Markets wanted to create a new prediction market platform where anyone could post questions. They’ve since pivoted to play money and raised $2.4 million in grants and seed funding, with about 10,000 different markets and 300 daily average users. I and many of my friends visit their site daily or at least weekly, and I often link them on Mantic Mondays. They have deals going with the Salem Center at University of Texas, Clearer Thinking, and various EA groups.
Inline links: Manifold Markets
30: Writing Forecasting Questions For EA Organizations (6/10) Nathan Young has since gotten much larger grants to do much more exciting forecasting work, particularly a platform for generating forecasting questions. With my approval, he’s put my grant on the back burner while he works on other things, but he still hopes to get some questions up on Manifold or Metaculus sometime.
Mike writes: The reason I didn’t just do a three-way comparison between PredictIt, FiveThirtyEight, and Manifold Markets is that the Manifold Markets forecasts included fewer questions than the PredictIt and FiveThirtyEight forecasts. So in order to do a fair comparison here, I’ll be comparing the smaller subset of questions for which PredictIt and Manifold Markets both gave a forecast. So it looks like both Manifold and 538 did better than PredictIt, and there’s no clear way to tell which of the former did better. (except I guess you could do this analysis with just the subset of questions Manifold and 538 share, but Mike didn’t and I’m also not going to). PredictIt has a pretty consistent Republican bias (it’s a minor epistemic sin to accuse a prediction market of having a predictable bias unless you’ve made money exploiting it, I made $600 this election so I’ll let myself pass). In years when Republicans do better than expected, it will probably look better than other markets; in years when they do worse, it will look worse. Still, this is a bias, so I think we should take them doing worse this year as a fair reflection of their accuracy, even thought next year it could go the other way. My main two takeaways here are: PredictIt isn’t yet good enough that the ideal theorems showing prediction markets should be unbiased and better than everyone else apply to it. The obvious explanation is its $800-per-question cap. Polymarket doesn’t have that cap and it did better, although Mike hasn’t done a formal comparison to 538.
Inline links: https://substackcdn.com/image/fetch/$s_!vBga!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2b35fbd9-ee89-4f4f-b8b9-832742e483fd_881x289.png, https://substackcdn.com/image/fetch/$s_!h6tt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F9355d197-997a-4c48-93c0-76abc23f2c12_881x287.png
This is all going to be so, so obsolete by the time I finish writing it and hit the “send post” button. But here goes: 395 traders on this, so one of Manifold’s biggest markets, probably representative. The small print defines a major outage as one that lasts more than an hour. See here for a good explanation of why some people expect Twitter outages.
Inline links: See here
395 traders on this, so one of Manifold’s biggest markets, probably representative. The small print defines a major outage as one that lasts more than an hour. See here for a good explanation of why some people expect Twitter outages.
Inline links: See here
Each winter, I make predictions about the year to come. The past few years, this has outgrown my blog, with other people including Zvi and Manifold (plus Sam and Eric’s contest version).
This year I’m making it official, with a 50-question 2023 Prediction Benchmark Question Set. I hope that this can be used as a common standard to compare different forecasters and forecasting site (Manifold and Metaculus have already agreed to use it, and I’m hoping to get others). Also, I’d like to do an ACX Survey later this month, and this will let me try to correlate personality traits with forecasting accuracy.
Thanks to people from Metaculus, Manifold, and the EA forecasting community for helping with questions and plans.
when you’re not sure which of many competing experts to trust, you should trust a prediction market instead of any of them Going through these claims one by one: 3.1: Why expect all prediction markets to agree with each other? Either all prediction markets agree with each other, or you can get rich quick: Suppose prediction markets disagreed. For example, suppose the RNC ran an Official Republican Prediction Market that said there was only a 10% chance Democrats would win the next election, and a 90% chance Republicans would. And suppose the DNC ran an Official Democrat Prediction Market that made the opposite prediction: 90% chance Democrats, 10% chance Republicans. Then you could buy a share of “Democrats will win” from the Republican market for 10 cents, plus a share of “Republicans will win” from the Democrat market for 10 cents, and be guaranteed to make $1 when one party or the other wins. You have turned 20 cents into a guaranteed $1. Repeat until you are rich or the mispricing has been corrected. This is just what financial experts call “arbitrage”. You may notice that in finance, people always give specific prices for things like shares of stock, barrels of oil, or Bitcoins. People say things like “Google stock is up to $300”, but never “Google stock is up to $300 on the NYSE, but down to $200 on NASDAQ”. If that was true, people would buy it on NASDAQ, sell it on NYSE, make $100 in free money, and get rich quick. In ideal situations, arbitrage forces everybody everywhere to agree on the same price for a financial instrument. Prediction markets turn claims about truth into financial instruments in a way which forces everybody everywhere to agree on how likely the claim is to be true. 3.2: Why expect prediction markets to be hard for special interests to manipulate? Either a prediction market is not currently mispriced because of a manipulation attempt, or you can get rich quick. Argument: Suppose a prediction market was currently mispriced because of a manipulation attempt. For example, suppose there is a prediction market for whether the sun will rise tomorrow. The true probability is obviously 100%, corresponding to a cost of $1.00. But suppose some special interest who wanted to trick people into believing the sun would not rise successfully spent money to bid the market down to only 10%. This means that you can buy, for $0.10, a share which pays $1 if the sun rises tomorrow. In other words, you can dectuple your money for free. Repeat until you are rich or the mispricing has been corrected. This may sound complicated in theory, but it plays out straightforwardly in real life. As a test, I tried to manipulate the market on whether Austin Chen, founder of Manifold Markets, would be charged with a felony. There’s no reason to think he should be, so the price started at 5%. I spent $200 in Manifold’s play money bidding it up to 95%. Within an hour, other investors noticed the mispricing and corrected it back down to 5% again. 3.3: Why expect prediction markets to be free from bias? Either a prediction market is not currently mispriced because of bias, or you can get rich quick. The argument: Suppose all smart people, including you, know that there is an 80% chance that the Democrats’ economic plan will create new jobs. But suppose that Republicans, because of their partisan biases, refuse to believe it, and say there is only a 40% chance. And suppose the Republicans set up their own prediction market where they bid the price of a share down to $0.40. You can, of course, go on this prediction market, buy shares for $0.40, and double your money in expectation. Repeat until you are rich or the mispricing has been corrected. I already described how something like this happens on PredictIt (a non-ideal prediction market that you can only make a few hundred dollars in expectation by correcting), and that I do in fact make a few hundred dollars every election season. 3.4: Why should I believe a prediction market’s consensus over my own opinion? This is the same argument as “the prediction market will always be at least as accurate as the top expert” only with you in the place of the top expert. Either prediction markets are at least as smart as you are, or you can get rich quick. The argument here is the same as “at least as smart as the smartest expert” argument in 2, except replacing “the smartest expert” with “you”. But just to lay it out explicitly: Suppose you were smarter than some prediction market. Then if you disagreed with the market, usually you would be right and it would be wrong. So look for cases where you disagree with the market, buy those shares, and you will make money in expectation. Repeat until you are rich or the mispricing has been corrected. I like this because it’s a good empirical test, and one that many people have tried. If you think you’re smarter than the prediction markets, bet on them and see what happens! I think most people will find that (over the long run) they lose money, and eventually this will cure them of their delusion that they can beat the markets. A few people might find that (over the long run) they do win money, just as a few people (eg Warren Buffett) can consistently win money on the stock market. Hopefully those people will quit their day jobs and become full-time prediction market traders. They’ll become multimillionaires, and their hard work will ensure that prediction markets stay more accurate than the rest of us. 3.5: Why should I believe that a prediction market makes good decisions about which of many competing experts to trust? Suppose you accept that a prediction market will always be at least as accurate as some well-known expert (eg Nate Silver). But what if you’re not sure who the real experts are? Or what if there are many experts, all saying different things, and nobody knows who to trust? In this case, a prediction market will always be at least as good as any other source (including you) at telling good experts from bad, or at figuring out which of many good experts is the best. By this point you should be able to predict the argument, but for completeness’ sake: Suppose you were better than the prediction market at determining which of many competing experts to trust, or how to aggregate the pronouncements of many experts into a single authoritative opinion. Then if you disagreed with the market, usually you would be right and it would be wrong. So look for cases where you disagree with the market, buy those shares, and you will make money in expectation. Repeat until you are rich or the mispricing has been corrected. To ground this in a real example, suppose there is some new virus which might or might not spread to the United States. A Harvard professor of epidemiology says there’s a 70% chance it will spread, a Yale professor of epidemiology says there’s an 90% chance it will spread, and a guy in a tinfoil hat on Infowars says there’s a 0% chance it will spread because it’s all a fake government plot. If I knew nothing else about this situation, I would probably think there’s about an 80% chance the virus will spread. I trust the Harvard and Yale professors equally much, and the tinfoil hat guy not at all. Suppose I saw a prediction market that was only at 10%, because most people trusted the tinfoil hat guy. I would want to buy YES shares until the price got up to 80%, because in expectation I would octuple my money. Suppose I saw a prediction market that was only at 70%. Now I wouldn’t be sure whether the prediction market was dumber than me (believed tinfoil hat guy) or smarter than me (they know a lot about epidemiology - or about the credibility of specific experts - and have decided to trust the Harvard professor over the Yale professor). Maybe I could improve on this. If I knew things about epidemiology, I could read over both professors’ arguments and try to figure out if one was better than the other. If I knew things about academia, I could pick over both professors’ resumes and see whether the Harvard professor seemed more distinguished or had more respect in her own field than the Yale professor. In the end, I might decide the prediction market was right to price it at 70% (in which case I wouldn’t do anything), or that actually both experts seemed equally expert (in which case I might bid it up to 80%), or that actually the Yale epidemiologist was better (in which case I might bid it up to 90%). 3.5.1: Isn’t it weird to give non-experts (like prediction market investors) the final judgment in which of two experts is right? Yes, but I don’t think this is avoidable. If there were no such thing as prediction markets, and the Harvard epidemiologist said 70%, and the Yale epidemiologist said 90%, and the tinfoil hat guy said 0%, and for some reason it mattered a lot to you which of these was true - then you would still have to make that decision. If there’s some extremely authoritative source who can make the decision for you - let’s say the World Health Organization says “after reviewing all experts’ arguments, we believe that the final probability is 75%” - then great! Either: The WHO is clearly the most trustworthy source - in which case we go back to the Nate Silver situation where the prediction market should be just as accurate as it is.
Operate using play-money only. Here Manifold is the leader. You could also think of superforecasting tournaments like Metaculus as a version of this. I claim that the main reason prediction markets haven’t fulfilled their potential and become a major pillar of worldwide decision-making is that none of these solutions are really adequate. For whatever reason, most people interested in prediction markets are American, so Polymarket has a limited userbase. The regulators are pretty harsh, so the companies that strike deals to get exemptions usually have to trade away most of their functionality. Kalshi can only ask a few specific regulator-approved questions; the limits are so harsh that they’re not even allowed to predict elections. Play-money prediction markets like Manifold are a lot of fun, but there’s a limit to how much work people will do to earn play money. I want a world where the people who are best at correcting mispricings in prediction markets can make full-time jobs out of it, and where there are prediction market equivalents of Goldman Sachs where hundreds of brilliant people work together with cut-throat efficiency to find mispricings the moment they appear. Play money won’t get us there. Real money prediction markets tend to have between four- and six-digit (very occasionally seven-digit) volumes on most questions. Play money prediction markets have between one- and four-digit numbers of traders on most questions. Most big prediction markets are usually within 10% of each other and the best outside experts, but not always within 1%. Traditional financial markets are usually within 1% of each other, so I think this is because the prediction markets are still too small to have sub-1% accuracy. I hope that as they grow bigger they can reach this milestone. 7. What can I do to help promote prediction markets? If you’re an ordinary person with no special expertise or skills, I think the best thing you can do is create a Manifold Markets account, bet on topics that are interesting to you, and create markets for any interesting topics that don’t have one yet. I think this could be helpful for a few reasons: It’s hard to really understand prediction markets until you’ve played a few yourself.
Able to get top-1 strict accuracy of at least 90.0% on interview-level problems found in the APPS benchmark introduced by Dan Hendrycks, Steven Basart et al. Top-1 accuracy is distinguished, as in the paper, from top-k accuracy in which k outputs from the model are generated, and the best output is selected. By "unified" we mean that the system is integrated enough that it can, for example, explain its reasoning on a Q&A task, or verbally report its progress and identify objects during model assembly. (This is not really meant to be an additional capability of "introspection" so much as a provision that the system not simply be cobbled together as a set of sub-systems specialized to tasks like the above, but rather a single system applicable to many problems.) Resolution will come from any of three forms, whichever comes first: (1) direct demonstration of such a system achieving ALL of the above criteria, (2) confident credible statement by its developers that an existing system is able to satisfy these criteria, or (3) judgement by a majority vote in a special committee composed of the question author and two AI experts chosen in good faith by him, for the sole purpose of resolving this question. Resolution date will be the first date at which the system (subsequently judged to satisfy the criteria) and its capabilities are publicly described in a talk, press release, paper, or other report available to the general public. Even this isn’t perfect (which models are “the equivalent of” a 1:8 scale Ferrari 312?), but in practice once you get to this level of details people mostly stop worrying about this. Another method (mostly associated with Manifold) is to just leave it up to human judgment - specifically, the judgment of the person who made the market. For example, I could make a market in “By 2050, will there be an AI which Scott Alexander thinks qualifies as ‘human-level’?” This will force market participants to price in the risk that I have bad judgment or act dishonestly. But perhaps these risks are small. For example, I might say elsewhere what I think qualifies as “human-level” AI, or you might think human-level AI will be so obvious when it comes that I will definitely agree with you about it. As for honesty, this could be enforced either legally or by reputation. Someone who has resolved their past 100 prediction markets honestly will probably resolve this one honestly too, especially if they get paid to do so and will never get customers again if they lie. When we invest on the normal stock market, we trust that our brokers / the NYSE / etc won’t run off with our money, and this trust is usually well-deserved. Even when we make an online purchase, we trust that the store we’re sending our money to won’t steal it and refuse to send us the product. It would be an exaggeration to say that trust is a solved problem, but evidence from Manifold suggests that most people price in a <1% chance that well-known market makers with good reputation resolve dishonestly. If prediction markets got big enough, they could spawn trusted “resolution companies” who individual markets and market-makers could outsource their resolution to, for a fee. If these companies were ever dishonest, they would lose all their business from then on, so they would probably be as honest as other businesses like your broker / the NYSE / various online stores / etc. 4.7.1: Isn’t a lot of the “crisis of trust” around questions that might never have clear future answers? For example, consider the debate around whether Donald Trump is a Russian agent. Maybe no proof will ever come out either way. Or maybe some evidence will appear that seems to prove one side or the other, but people will continue to deny it for political reasons, and the problem of resolving the prediction market will be just as hard as the problem of answering the original question. Indeed, prediction markets aren’t very good at this, and are only fully trustworthy on questions where the true answer will eventually become apparent. Still, they might not be completely useless. For example, if you’re worried about Trump being a Russian agent because you expect him to pursue pro-Russia policies, you can start markets in whether he pursues those policies. Or you can start a conditional market (see 5.1) on whether, if Russia ever releases its past intelligence data many years from now, the data confirm/disconfirm that Trump was an agent. See Part 5 for other clever ways you might try to address this problem. 4.8: “Meme stocks” like Gamestop and AMC sometimes remain mispriced indefinitely. How do we know this won’t happen with prediction markets? Meme stocks are a type of Ponzi. It’s “reasonable” to buy Gamestop at some inflated price, because - who knows? - someone else might buy it at an even more inflated price tomorrow. And this can keep going arbitrarily long, or at least long enough for you to get out with a profit. Unlike meme stocks, prediction markets have a clear resolution date. If you’re predicting who will win the next election, the market will go to 100% or 0% after the election finishes. No matter how many memes there were, you wouldn’t buy a share in “the Democrats will win the election” for 99% the day before Election Day if you knew they would definitely lose. But that means prediction markets should be accurately priced the day before Election Day, which means you shouldn’t buy at an inaccurate price two days before Election Day, and so on. I can’t say for sure that no prediction market will ever get mispriced for meme reasons, but they should be much more robust against meme mispricings than the stock market. And even the stock market doesn’t have too many meme stocks. 4.9: How do prediction markets deal with outcomes in the far future? Suppose there is a question “who will win the 2100 election?” Currently it says 25% Democrats, 75% Republicans, and I believe it should be 50-50 (we’ll ignore third parties, or the possibility of America not existing in 2100, for now). So if I bet on the market, I can (in expectation) double my money. But there are many better ways to double your money by 2100. For example, if the stock market grows 4% per year, I should expect any money invested in the stock market to multiply by 20x in 2100. So just doubling it in a prediction market is a bad option. Realistically, this means prediction markets won’t work well for far-future events. These might be a better match for forecaster tournaments or some other structure, where we get the forecaster track records through present events, then use those track records weighting their far-future predictions (see also 5.5). There are already good forecasting tournaments on some far future events. But if you really wanted to use a prediction market, you could theoretically solve this by putting investors’ money in index funds while they waited. Then the winner would get their (and the losers’) original deposits and investment profits, and it would go back to being a better option than investing in index funds directly. In practice this seems complicated and I wouldn’t expect it to work. 4.9.1: What about predicting things that would make it impossible or pointless to win money, like human extinction? Again, these questions probably aren’t great matches for prediction markets, and you should use forecasting tournaments or some other method (see also 5.5). If you really wanted, you might be able to make it work in theory through a mechanism sort of like this one. 5. What are some clever uses for prediction markets? Here’s a non-exhaustive list: 5.1: Conditional prediction markets / decision markets Suppose the government is trying to decide whether to throw its weight behind Vaccine A or Vaccine B for some deadly disease. There are some experts behind both, both sets of experts accuse the other of being in the pay of pharmaceutical companies, and decision-makers don’t know who to trust. They might make two prediction markets, like: If we decide to go with Vaccine A, will at least X people die from the disease?
But Sam and Eric object that prediction markets were also handicapped this year - most of the markets they took their numbers from were very small Manifold markets with only a single-digit number of participants, just a few months after Manifold started existing at all. They say the most likely reason prediction markets did so well was because only the most knowledgeable people will bet on a certain question, whereas our contest encouraged everyone to predict each question (technically you could opt out, but most people didn’t). Plausibly this coming year, when we have multiple big prediction markets for each question, the markets will totally blow away all other participants.
Or maybe the prediction market results will hold. One market (Manifold) and another market-like site (Metaculus) are joining the contest this year. If they do as well as last year, they’ll beat all but 15 of the 3500 entries. If things go very well, maybe we’ll discover new ways of aggregating their results that can beat every individual predictor, at least most of the time.
1: Thanks to everyone who entered the Prediction Contest; entry is now closed. You can continue to make predictions on Manifold or Metaculus, but they won’t officially count. Also, another prediction market, Futuur, has markets up for the contest questions. I’m pretty excited about this, because although Futuur does let you use play money like Manifold, it also offers real money betting (warning: requires crypto and a non-US IP). If you want to make real money bets on contest questions, now you can (and I’ll be seeing how they compare to the play money markets).
Inline links: Futuur, has markets up for the contest questions
7: And you can bet on both Lars’ and my predictions about the chatbot propaganda apocalypse on Manifold. For example:
In honor of Valentine’s Day, this installment of Mantic Monday will focus on attempted clever engineering solutions to romance. We’ll start with the usual prediction markets, then move on to other types of algorithmic and financial schemes. Normal content will resume next time around. Date Recommendation Markets Aella is a Internet celebrity known for her interest in various disgusting crimes against nature, ie podcasts and video streams. Unrelatedly, she also studies fetishes. She’s been looking for a partner for a few years. Most recently, she created this prediction market. The way it works:
Presumably Aella will seriously look into the top few candidates, and try asking them out. Why is this good? Consider Aella’s perspective: she can log off for a few weeks, then check back and see a ranked list of who the Internet thinks she’s most compatible with. It’s kind of like asking your friends for dating recommendations, except with better incentives on your friends’ part to predict exactly how likely you are to get along with each candidate. The current leading candidate (in blue) is Steven Bonnell aka Destiny, a famous streamer. I don’t know if he is actually especially compatible with Aella, or if he just has a lot of fans on Manifold who like him and are rooting for him to date someone, or who think it would be funny to add his name in. It wouldn’t surprise me if this worked for Aella; she’s famous and probably dates other famous people; enough people know her and her potential partners that it’s worth crowdsourcing recommendations. What about the rest of us? I was able to find one non-famous person who made a market like this, apparently with good effect, but they seemed awkward enough about it that I’m not going to link it here or provide more details. Non-famous people realistically have easier ways to ask their friends, but I still think this provides value. Sadly, Porn talked about the “omniscient authority” - asking someone on a date is so scary that people want to pretend their normal human psychological needs had no input into the decision - “It’s … not like I … like you or anything, baka! I’m just doing this because I - a pure abstract intelligence who is not horny for you in any way - was informed by friends/matchmakers/our OKCupid match percentage/’the algorithm’/a dream, that asking you on a date was my duty, which I now dispassionately fulfilling.” A prediction market would make a great omniscient authority here. Also, consider the implications for romance stories. I’ve only thought about this for five minutes, so I definitely haven’t exhausted the space, but I imagine: Someone does some kind of complicated financial fraud to manipulate a prediction market into telling their crush to date them. Think Wolf Of Wall Street, but a rom-com.
And how come none of them will let you write a decent profile? Is this like the thing where I imagine that what people want out of a socialization space is a quiet comfortable area where they can hold audible conversations, but what they actually want is somewhere extremely dark with very loud music where everybody is drunk, in the hopes that this puts them into some kind of weird trance state where they can do social actions they would otherwise never contemplate? Are dating sites unusable because everyone wants to be confused into a trance state where they can imagine they aren’t sending scary self-revelatory messages to total strangers? This Week In The Markets See the resolution criteria for definition of “cold approach” and some basic facts about the person involved (who seems a bit more desirable than average). This looks like the market’s generic opinion on how many cold approaches you need if you are a bit-more-desirable-than-average guy
I still dream of running an ACX Grants round using impact certificates, but I want to run a lower-stakes test of the technology first. In conjunction with the Manifold Markets team, we’re announcing the Forecasting Impact Mini-Grants, a $20,000 grants round for forecasting projects.
A: This is Astral Codex Ten, a blog about various science / technology / philosophy / politics issues, which sometimes does grants rounds and projects like this one. I think I have a good reputation of paying for things I say I am going to pay for, see for example last year’s ACX Grants. Manifold Markets is a company that runs a prediction market website and is generally interested in unusual market structures solving social problems. We’re co-sponsoring this impact market in order to test impact markets as a charitable funding mechanism.
Inline links: ACX Grants, Manifold Markets
Go to Manifold’s impact market site, Manifund, who have kindly agreed to handle the technology side of this.
Inline links: Manifund
4: Last month I teamed up with Manifold to run an impact market on forecasting grants. Now Manifold is using their impact market infrastructure, Manifund, to start a market in prizes on Open Philanthropy’s AI-related essay contest. The idea is - you write an essay and submit it in hopes of winning (let’s say) the $50,000 first prize. Then you sell the right to the prize on the impact market - for example if you think you’re 10% likely to win (so your essay is worth $5,000) and someone else thinks you’re 20% likely to win (so your essay is worth $10,000), then you could sell the rights to the prize money to them for $7,500 (it’s a bit more complicated than that, but you get the idea). I’m not directly involved in this one, but I trust Manifold a lot and this should help them develop their impact market work further. Yes, you still have to be an accredited investor to buy certificates (though not to sell your essay!). Go here for more information. I guess this doubles as an announcement that there’s an AI-related essay contest with a first prize of $50,000. Entries are due May 31 - no, they won’t find it funny if you use GPT.
Inline links: Open Philanthropy’s AI-related essay contest
Nikos Bosse compares Metaculus’ performance to its “competitor” Manifold Markets, and finds that overall Metaculus was more accurate:
The mean Brier score was 0.084 for Metaculus and 0.107 for Manifold. This difference was significant using a paired test. Metaculus was ahead of Manifold on 75% of the questions (48 out of 64).
Does this mean that forecasting tournaments are better than prediction markets? Some past studies have provided very tentative evidence in that direction, but this one probably doesn’t - many more people use Metaculus than Manifold, and Nikos didn’t control for number of forecasters.
Manifold is a play money prediction market. Its intended purpose is to have fun and estimate the probabilities of important events. But instead of betting on important events, you might choose to speculate on trivialities. And instead of having fun, you might choose to ruin your life.
From the beginning, there were joke markets like “Will at least 100 people bet on this market?” or “Will this market’s probability end in an even number?” While serious people worked on increasingly sophisticated estimation mechanisms for world events, pranksters worked on increasingly convoluted jokes. In early April, power user Is. started “Whales Vs. Minnows”: Will traders hold at least 10000x as many YES shares as there are traders holding NO shares? In other words, Team Whale had to sink lots of mana (play money) into the market, and Team Minnow had to get lots of people to participate.
Inline links: Will traders hold at least 10000x as many YES shares as there are traders holding NO shares?
Team Minnow started cheating first. They rounded up their friends and asked them to register Manifold accounts and join the market. This might have been semi-fair to start, but then they started paying people, in real money, to do it. Team Whale - mostly Is. - figured out some cheats of their own, which you can read about here.
Inline links: here
Manifold market on changing Harvard demographics, for context the most recent Harvard class is 29.9% Asian (see also % black here):
Inline links: % black
DeepMind founder Mustafa Suleyman and others announce that their new company, InflectionAI, exists and has raised $1 billion in funding. Still, Manifold classes it as only a minor contender:
Inline links: InflectionAI
Jacob Cohen describes himself as the president of his school’s forecasting club. I think we’re going to be all right. Manifest 2023 Manifold Markets is sponsoring Manifest, an “inaugural forecasting & prediction market conference”, to be held at the Rose Garden Inn, Berkeley, California the weekend of September 22. Their website is short on details, but listed speakers and guests of honor are: …now that I think about it I do remember vaguely agreeing to something like this, though I’m not currently planning to give any particular speeches. But Aella and Robert are great - and although I’ve never met the third guy, it seems appropriate for a conference called Manifest to feature someone named Destiny. Manifold tends to do things on impulse and fill in the details later, so the schedule looks sparse. But usually the things they throw together last-minute end up being pretty good, so I’m looking forward to this. Tickets cost $220, but can also be purchased with mana (Manifold Markets’ play money), at least until the CFTC notices. It looks like there’s an arbitrage you can use to get the tickets at a 10% discount - I think this is less likely to be a mistake than a preference to have people who can spot arbitrages 10% over-represented at the conference compared to everyone else. Room Temperature Superforecaster Maybe the long-awaited killer app for prediction markets is . . . debating superconductors? First, the markets: I’m heartened to see these two very big markets ($200,000+ volume, 2,000+ traders) within 1% of each other (as of time of writing). This is a really difficult question without an obvious prior, so the level of convergence suggests the markets really are doing their job… …but Metaculus is much lower, probably because the other two are asking if any replication will be positive, and Metaculus is asking if the first replication attempt will be. It’s bad news that these numbers are so different, and suggests a high chance that this stays confusing and comes down to finicky resolution criteria. Still, this has gotten lots of people checking the prediction markets, including Paul Graham: …and around 500 others, according to the Manifold Active Users graph (source): Aside from headline numbers, I’ve also appreciated prediction market comment sections as a good place to stay up to date on the latest developments (including a link to this thread) Elsewhere In Forecasting NYPost: Blind Mystic Baba Vanga Makes Terrifying Nuclear Disaster Prediction For 2023: A blind mystic who allegedly predicted 9/11 is said to have foreseen a nuclear disaster that will ravage Earth before the end of 2023. Baba Vanga, a blind Bulgarian woman, is rumored to have predicted some of the biggest events in world history. She died more than a quarter of a century ago, but many of her predictions are said to have come true long after her death. Now, her followers claim that Baba Vanga foresaw a devastating nuclear disaster that will unfold this year. Big if true. In what sense did she predict 9/11? Another article gives the exact text of the 1989 prediction: “Horror, horror! The American brethren will fall after being attacked by the steel birds. The wolves will be howling in a bush, and innocent blood will be gushing.” This is a 1989 prediction! If you’re calling airplanes “steel birds” in 1989, you’re just hoping that people forget you lived when airplanes already existed and then get impressed with you for predicting them. Come on! (you could argue that the second half is about Assistant Secretary of State John Wolf and Deputy Secretary of Defense Paul Wolfowitz howling for war with Iraq from within the Bush administration, but Ass. Sec Wolf played a minimal role in the war buildup so I think if you are being very strict in your interpretation there was really only one wolf involved.) Anyway, Vanga’s other predictions for 2023 include: Earth’s orbit will change
Inline links: Jacob Cohen, Manifest, https://substackcdn.com/image/fetch/$s_!owXd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11594b53-e8df-4064-aacf-af4890ac0fe6_1503x615.png, https://polymarket.com/event/is-the-room-temp-superconductor-real, https://www.metaculus.com/questions/18090/room-temp-superconductor-pre-print-replicated/, Paul Graham, source, this thread, Blind Mystic Baba Vanga Makes Terrifying Nuclear Disaster Prediction For 2023, https://substackcdn.com/image/fetch/$s_!yIPJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc81ce0d2-e9c7-41b9-be38-0182d21f0381_1024x682.jpeg, the exact text of the 1989 prediction
…now that I think about it I do remember vaguely agreeing to something like this, though I’m not currently planning to give any particular speeches. But Aella and Robert are great - and although I’ve never met the third guy, it seems appropriate for a conference called Manifest to feature someone named Destiny. Manifold tends to do things on impulse and fill in the details later, so the schedule looks sparse. But usually the things they throw together last-minute end up being pretty good, so I’m looking forward to this. Tickets cost $220, but can also be purchased with mana (Manifold Markets’ play money), at least until the CFTC notices. It looks like there’s an arbitrage you can use to get the tickets at a 10% discount - I think this is less likely to be a mistake than a preference to have people who can spot arbitrages 10% over-represented at the conference compared to everyone else. Room Temperature Superforecaster Maybe the long-awaited killer app for prediction markets is . . . debating superconductors? First, the markets: I’m heartened to see these two very big markets ($200,000+ volume, 2,000+ traders) within 1% of each other (as of time of writing). This is a really difficult question without an obvious prior, so the level of convergence suggests the markets really are doing their job… …but Metaculus is much lower, probably because the other two are asking if any replication will be positive, and Metaculus is asking if the first replication attempt will be. It’s bad news that these numbers are so different, and suggests a high chance that this stays confusing and comes down to finicky resolution criteria. Still, this has gotten lots of people checking the prediction markets, including Paul Graham: …and around 500 others, according to the Manifold Active Users graph (source): Aside from headline numbers, I’ve also appreciated prediction market comment sections as a good place to stay up to date on the latest developments (including a link to this thread) Elsewhere In Forecasting NYPost: Blind Mystic Baba Vanga Makes Terrifying Nuclear Disaster Prediction For 2023: A blind mystic who allegedly predicted 9/11 is said to have foreseen a nuclear disaster that will ravage Earth before the end of 2023. Baba Vanga, a blind Bulgarian woman, is rumored to have predicted some of the biggest events in world history. She died more than a quarter of a century ago, but many of her predictions are said to have come true long after her death. Now, her followers claim that Baba Vanga foresaw a devastating nuclear disaster that will unfold this year. Big if true. In what sense did she predict 9/11? Another article gives the exact text of the 1989 prediction: “Horror, horror! The American brethren will fall after being attacked by the steel birds. The wolves will be howling in a bush, and innocent blood will be gushing.” This is a 1989 prediction! If you’re calling airplanes “steel birds” in 1989, you’re just hoping that people forget you lived when airplanes already existed and then get impressed with you for predicting them. Come on! (you could argue that the second half is about Assistant Secretary of State John Wolf and Deputy Secretary of Defense Paul Wolfowitz howling for war with Iraq from within the Bush administration, but Ass. Sec Wolf played a minimal role in the war buildup so I think if you are being very strict in your interpretation there was really only one wolf involved.) Anyway, Vanga’s other predictions for 2023 include: Earth’s orbit will change
Inline links: https://polymarket.com/event/is-the-room-temp-superconductor-real, https://www.metaculus.com/questions/18090/room-temp-superconductor-pre-print-replicated/, Paul Graham, source, this thread, Blind Mystic Baba Vanga Makes Terrifying Nuclear Disaster Prediction For 2023, https://substackcdn.com/image/fetch/$s_!yIPJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc81ce0d2-e9c7-41b9-be38-0182d21f0381_1024x682.jpeg, the exact text of the 1989 prediction
Sinclair Chen. Sinclair works at Manifold; she can be spotted at most Bay Area ACX meetups. I didn’t realize the degree to which she goes hard: “CFTC, if you are reading this, know that there is blood on your hands.” This is not exactly the message I would have written. But I think, as the Catholics like to say, that it comes from a vice which is the excess or perversion of a divine virtue, and I appreciate her for being the sort of person who’s like this, sort of.
Inline links: Sinclair Chen
33: Claim: phase transition in Cu2S impurity fully explains superconductor-like properties of supposed “room temperature superconductor” LK-99 (paper, Twitter discussion). Prediction markets on Manifold and Polymarket are down from high-30s% last week to ~10% now.
Sorry guys, LK-99 doesn’t work. The prediction markets have dropped from highs in the 40s down to 5 - 10. It’s over. What does this tell us about prediction markets? Were they dumb to ever believe at all? Or were they aggregating the evidence effectively, only to update after new evidence came in?
Inline links: LK-99 doesn’t work
First, the simplest proof that something was predictable is to have predicted it. Since I know you’ll ask, yes, I bet on the markets at the time - 10,000 mana on Manifold and $100 on Kalshi - and made a nice profit. I would have bet more on Kalshi but it took too long to load the money onto my account.
Second, on Manifold, the biggest NO bets were superforecasters, people on the leaderboards, and rationalist celebrities; the biggest YES bets were randos with none of those qualifications.
Last March we (ACX and Manifold Markets) did a test run of an impact market, a novel way of running charitable grants. You can read the details at the links, but it’s basically a VC ecosystem for charity: profit-seeking investors fund promising projects and grantmakers buy credit for successes from the investors. To test it out, we promised at least $20,000 in retroactive grants for forecasting-related projects, and intrepid guinea-pig investors funded 18 projects they thought we might want to buy.
Enjoy the public goods we’ve produced. The Crystal Ballin’ Podcast has one episode and is hoping to make more (as are their competitors, the Market Manipulation Podcast). OPTIC is looking for participants and volunteers. You can still use Manifolio to make Kelly bets, the Telegram bot for Telegram-based prediction markets, and the browser extension to see what Manifold markets people are betting on. And although it’s not technically one of ours, I still like The Base Rate Times.
Inline links: Crystal Ballin’ Podcast, Market Manipulation Podcast, OPTIC, Manifolio, Telegram bot, browser extension, The Base Rate Times
Over the past six months, founders have worked on their projects. Some collapsed, losing their investors all their money. Others flourished, shooting up in value far beyond investor predictions. We got five judges (including me) to assess the final value of each of the 18 projects. Their results mostly determine what I will be offering investors for their impact certificates (see caveats below). They are: We’ll be buying back impact certs at the value on the MEDIAN column - so, for example, we’ll pay $300 for 100% of the certs for the Crystal Ballin’ Podcast.
Hanson is less sure about this answer than the overall story, but he suggests hiring. You could create some kind of product that companies could buy and give their hiring managers at the beginning of a hiring round, asking them to predict which candidates would get good employee evaluation results or promotions at the end of X amount of time. Even if you’re Manifold or Metaculus or someone who already has a good prediction engine, making this product requires a lot of adaptations. Who should be part of the market? What training should you give them beforehand? What should the resolution criteria be? Hanson thinks that the process of designing this product, answering customer questions about it, and iterating before you sell to the next customer is the kind of last-mile problem whose solution will make prediction markets ready for the big time.
But also, the media is a dignified, official institution, and it prefers interacting with other dignified, official institutions. It likes being able to say “a professor from Harvard said X”, and not “this guy who does really well betting on Manifold says X”. He talked about wanting to quote a superforecaster from Samotsvety Forecasts, a leading prediction group, but expected his editors to ask why these people with the weird Russian name were relevant or trustworthy. It’s easier to cite someone who is “a fellow at the Forecasting Research Institute”, which has the same kind of official ring as “a professor at Harvard”.
Inline links: Samotsvety Forecasts
And many more.
Inline links: many more
Source: Older version of this market. People joked about this graph showing how crazy the OpenAI situation was. The situation might have been crazy, but that’s not the lesson of this graph. The lesson is: it’s hard to design prediction markets for “why” questions.
Inline links: this market
At some point the market cleared up - I don’t know if this was an intervention or people just converged on a few answers. Now it looks like this:
Inline links: Now it looks like this
Here are some other (attempted) OpenAI related markets: I appreciate how this started in September, shows Altman’s sudden-firing, the first plan to unfire him, the falling apart of the first plan to unfire, and then the second, successful plan to unfire him.
35: Prediction site Manifold Markets is running a $30,000 Community Fund based on impact certificates. If you want to make something cool for the Manifold community, you can run an impact funding round, and then they’ll pay you out of the $30,000 if it’s good.
Inline links: $30,000 Community Fund
19: If Manifold is too social for you, there’s also Fatebook, a site where you can record your personal predictions and auto-judge calibration/accuracy/etc. For example, Predict Your Year here. Also available for Discord/Slack.
Inline links: Fatebook, Predict Your Year here
I would also add that I joined a different forecasting site, Manifold Markets back in August, and in 3 months have turned the 500 starting ‘Mana’ you get when you sign up into 8500 mana, and have specifically made a point to not do any research and just buy/sell based on intuition. Again, not sure what to conclude here, but it seems very possible that these sites are just full of people who are terrible at predicting things, such that it’s easy to do quite well by just being half-decent.
I also find it encouraging that the play-money prediction market site Manifold comes pretty close and beats all the real-money sites. Nate Silver is only one person, he has only one area of expertise, and you can’t hire him to predict random things for you (unless you’re rich and he’s bored). If Manifold can apply only-slightly-sub-Nate-Silver levels of analysis at scale to arbitrary topics, that’s a big deal.
…Metaculus and PredictIt are 50-50, Manifold favors Biden, and Polymarket favors Trump. Shouldn’t really be possible, should it?
Manifold has lots of bots. There’s a Silicon League entirely for bots. Lots of bots make lots of money:
Most of these bots are boring. They’re bots programmed to automatically buy some market once the price gets low enough, or to arbitrage basically-identical markets, or do some other technical finance maneuver. But you could imagine more interesting bots. Ones that forecasts the same way humans forecast. You could imagine a bot based on ChatGPT that asks “What is the probability of a cease-fire in Ukraine this year?” and bets on ChatGPT’s answer. And by “you could imagine” I mean “there’s now a Humans Vs. Bots tournament on Manifold with an ℳ250,000 prize” Let’s see how they’re doing: All of these bots seem to be making small profits, with GPT in the lead. But what’s this? The Nermit bot is based on FutureSearch.ai, a new company trying to build an AI-based forecaster. Based on their own internal calculations, they claim success: But see foonote 1 How is this1 possible? Some studies of superforecasters converge on the same technique: figure out a base rate for some event, then alter it based on the current situation. For example, if you wanted to know the chance of a cease-fire in Ukraine over the next year, you might start by plotting the distribution of war lengths over the past century, then check how many wars that had lasted at least two years had a cease-fire in the third. Then you might adjust a little bit down for factors like “there haven’t been any promising peace talks yet” and “the two sides seem equally balanced”. FutureSearch’s AI tries to do something similar. It prompts itself with questions like “What would be a good reference class for this question?”
Inline links: Humans Vs. Bots tournament, https://substackcdn.com/image/fetch/$s_!r04t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8006d5d-af52-4870-90a7-c63a84497670_827x110.webp, https://substackcdn.com/image/fetch/$s_!WvxC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5db0765c-69c6-4362-ada4-9a7210701bff_962x510.png, https://substackcdn.com/image/fetch/$s_!DOia!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F516f9622-aa25-45d2-958a-077c6840d65f_947x505.png, https://substackcdn.com/image/fetch/$s_!kcLW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21afa244-9cfb-4714-841d-c963d5d2e125_955x517.png, https://substackcdn.com/image/fetch/$s_!V4Vp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b9dd4ab-0fa2-4421-984b-07c66a220080_958x529.png, FutureSearch.ai, https://substackcdn.com/image/fetch/$s_!MjV7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc584dabc-477f-4096-a3f1-c4a929b6cded_411x372.png, 1
This was a Manifold promotional event for Valentine’s Day, taking the form of a “prediction market dating show” where six contestants competed to win a date with local celebrity Aella. It was not what I was expecting.
Manifold Markets: Manifold, a popular play money prediction market site, kindly agreed to open markets into our fifty questions so we could compare them to participants. The markets got between 80 and 1500 participants, average around 150. Their forecast, had it been a contestant, would have placed in the 89th percentile. This would be good for an individual, but it’s surprisingly bad for an aggregation method - in fact, it’s worse than taking the median of a randomly selected group of 150 participants! The market mechanism seems to be subtracting value! Someone might want to double-check this.
Inline links: Manifold Markets
I began by collecting data from Manifold Markets for these questions. I then compared those forecasts to the forecasts of superforecasters in the blind data, subset to those who had given forecasts on the S&P500 and Bitcoin questions that were reasonably consistent with the efficiency of markets; I subset to those who forecasted between 30% and 80% for the probability that the S&P500 and Bitcoin would increase during 2023, which were the only reasonable predictions by the time blind mode ended in mid-January. I then used my own judgment to tweak forecasts where I strongly disagreed with the prediction markets and the superforecasters (for example, I was more than 15 percentage points away from the average of Manifold Markets and the efficient-market-believing superforecasters on questions 17, 19, 21, 30, 34, and 50). I paid especially close attention to questions where late-breaking news made the superforecasters' forecasts less relevant (and I downweighted their forecasts on those questions accordingly).
Participant aggregate: This is the “wisdom of crowds” one. If you average the guess of every participant (eg if someone says 80% chance Biden leads, and another says 90% chance, then you go with 85%), you usually do better than the vast majority of individuals. In this case, the aggregate was 95th percentile, beating out superforecasters and Manifold.
Then they fine-tune the whole system on forecasting questions from prediction sites (eg Metaculus, Manifold) that ended between mid-2023 and today. Why mid-2023? Because the AI was trained in mid-2023 and only knows what happened before then, and they can artificially limit its news API calls to before mid-2023. This lets them train the AI on thousands of forecasting questions without letting the AI cheat or having to wait years for the questions to resolve. They select the reasoning where the AI does well, and fine-tune it to do more stuff like that. The Halawi et al AI forecasting method. They find this works almost as well as the human crowd: Are these the data I’ve been trying to get for years - which forecasting platforms beat which others? I don’t think so - Metaculus’ good Briar score only means it performs well on Metaculus’ questions, which might be easier or harder than some other platform’s questions. Can we use the Halawi et al AI as a fixed comparison point, since it’s always the same skill level? I’m not sure - it trained on each of these markets for the style of question that’s in each market, so it might be biased. Still, these numbers are all about where I would expect them to be, except maybe Polymarket, which does better than I would have expected. But the crowd still beats the AI, right? Halawi et al object that humans can forecast only when they feel like it - you can bet on a prediction market question you feel confident on, and avoid one you don’t. When they let their AI forecast only on those questions where it’s most likely to do well (eg those with lots of relevant news articles), it very slightly outperforms the human crowd. As AI gets better, will it naturally beat humans in forecasting? Halawi et al say this won’t be trivial. They find a version of their system based off GPT-3.5 is only very slightly worse than the final version built off GPT-4. This suggests a forecasting AI built off GPT-5 or 6 might get only small improvements. The second team is Tetlock et al. They start from the same place as Halawi - out-of-the-box LLMs aren’t good at forecasting. They’re more scathing about this than Halawi was - they argue that out-of-the-box models do worse than predicting 50% for everything (this was close to true of human forecasters in the ACX tournament). Instead of increasing quality, Tetlock increases quantity. He wants to do wisdom of crowds, where the crowd is a bunch of different LLMs. So he gets twelve LLMs - including Bard, GPT, Claude, Mistral, PaLM, LLaMa, some Chinese models I’d never heard of, and a couple of variations on these bases - asks them to predict questions, and averages the results. Remember, you gotta prompt your model with “you are a smart person”, or else it won’t be smart! The results: Next, we compare the LLM crowd performance to that of the human crowd for our second hypothesis, directly putting the two crowd-aggregation mechanisms head-to-head. To do this, we use the same LLM crowd average as before (taking the median LLM prediction on each question and averaging up the Brier scores across questions). We compare this to the average of median human predictions on the same questions. In our preregistered analysis, we fail to find statistically significant differences between the LLM crowd’s mean Brier score of M=0.20 (SD=0.12) and that of the human crowd, M=0.19 (SD=0.19), t(60) = 0.19, p = 0.850 Their study was much smaller than Halawi’s (31 questions vs. 3,672), so I don’t think this result (nonsignificant small difference) should be considered different from Halawi’s (significant small difference). Still, it’s weird, isn’t it? Halawi used a really complicated tower of prompts and APIs and fine-tunings, and Tetlock just got more LLMs, and they both did about the same. I have two questions after reading these results: Did they actually do the same, or is this just a function of the small sample size in Tetlock and the non-head-to-head comparison?
Inline links: https://substackcdn.com/image/fetch/$s_!GHoU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bd5e57d-0422-49e1-80fd-20795d476ec8_807x412.png, https://substackcdn.com/image/fetch/$s_!1tff!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed7d58b4-254b-4782-b499-bc0b6297b8a4_813x302.png, Tetlock et al, https://substackcdn.com/image/fetch/$s_!4SEc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdce72400-aa57-4f52-99cb-5f551bd4d79d_675x435.png
There’s been more news and claims about the LK-99 alleged superconductor recently, all of which have totally failed to move the market away from 4%:
When the OpenAI board tried to fire Sam Altman last year and everyone said they were making a crazy mistake, I urged patience, saying maybe there was some kind of good plan. With the appointment of a new board, the last few loose ends from the affair have now been settled, and - I was wrong. There was no good plan and it was a giant self-own, sorry. The new board is back to having Sam Altman, plus random businesspeople who I don’t expect to have good opinions or exercise real restraint. Accordingly, the prediction market about whether anything good will come of it has gone down from its already low levels:
In other words, there’s something special about the number 17% on this question. It has properties that other numbers like 38% or 99.9999% don’t have. If someone asked you (rather than Samotsvety) for this number, you would give a less good number that didn’t have these special properties. If by some chance you actually were better at finding these kinds of numbers than Samotsvety, you could probably get a job as a forecasting consultant. Or you could make lots of play money on Manifold, or lots of real money on the stock market, or help your preferred political party as a campaign strategist.
This was a decisive victory. There were two judges, who each gave separate verdicts (or were allowed to declare a draw). Both judges decided in favor of Peter. You can see the judges’ own summary of their reasoning here (Will, Eric) Manifold agreed with the judges. There was a prediction market on who would win. It started out 70-30 in favor of lab leak. As the videos came out, zoonosis started doing better and better. I don’t want to take the exact final numbers too seriously, since I think some of the later price increases involved hints from the participants’ behavior. But it’s clear which way viewers thought the wind was blowing4. Around the same time, the Good Judgment Project - Philip Tetlock’s group studying superforecasters - put out a report on the lab leak hypothesis. After studying it in depth, his forecasters ended up 75-25 in favor of zoonosis. The Rootclaim debate was one of ten sources they said they found especially interesting. And also around the same time, and unrelated to any of this, the Global Catastrophic Risks Institute surveyed experts (“168 virologists, infectious disease epidemiologists, and other scientists from 47 countries”) and found the same thing (though see here for some potential problems with the survey): For what it’s worth, I was close to 50-50 before the debate, and now I’m 90-10 in favor of zoonosis. III. The Math And The Aftermath The third debate session was about “inference”, how to put evidence together. I put this part off until after disclosing the winner, because I wanted to talk about some of these issues at more length. The Math: Judges Both judges included a probabilistic analysis in their written decision. Here’s the same table as above, expanded to add the judges: I shoehorned the judges’ factors into the categories I already had; some of them were actually subtly different from Peter’s, Saar’s, and each other’s. The “priors” category is especially a mess here. We’ll go over these later, but I get the impression that they both thought of probabilistic analyses as an afterthought. For example, Judge Eric wrote 30,000 words about which considerations moved him, and only then includes the analysis, saying: I am not convinced that this Bayesian calculation is even an appropriate way to estimate the relative posterior probability of Z and LL; it just seemed fair that after criticizing Rootclaim’s calculations at length I should make an attempt at it myself. Judge Will’s decision ran to 10,000 words. He said he independently tried both reasoning it out intuitively, and running the Bayesian analysis, and was relieved when these two methods returned the same result. He said: I am skeptical that the Bayesian decision making/evaluation methods are any more "objective" than [intuitive reasoning]. I think they maximize legibility, not objectivity, and tend to hide the intuitive/heuristic portion in the data inclusion step and values, where it’s harder to see . . . I am not skilled in the Bayesian method, and I am sure I made significant mistakes. More time and practice would improve and refine my estimates. At the fundamental rules of the universe level, Bayesian analysis must be the best way to evaluate evidence. However, I am unsure that it’s a good strategy for a human given our cognitive limitations, and doubly unsure it’s truly being used (in the dispassionate sense) where the outcome is social desirability/fame/Twitter likes. I’m focusing on this because Saar’s opinion is that the debate went wrong (for his side) because he didn’t realize the judges were going to use Bayesian math, they did the math wrong (because Saar hadn’t done enough work explaining how to do it right), and so they got the wrong answer. I want to discuss the math errors he thinks the judges made, but this discussion would be incomplete without mentioning that the judges themselves say the numbers were only a supplement for their intuitive reasoning. That having been said, let’s look deeper into some of Saar’s concerns. The Math: Extreme Odds Saar complained that Peter’s odds were too extreme. For example, Peter said there was only a 1/10,000 chance that a lab leak pandemic would first show up at a wet market. Peter’s argument went something like: obviously a zoonotic pandemic would start at a site selling weird animals. But a lab leak pandemic - if it didn’t start at the lab - could show up anywhere. 1/10,000 Wuhan citizens work at the wet market. So if a lab leak was going to show up somewhere random, the wet market was a 1/10,000 chance. Saar had specific arguments against this, but he also had a more general argument: you should rarely see odds like 1/10,000 outside of well-understood domains. In his blog post, he gave this example: A prosecutor shows the court a statistical analysis of which DNA markers matched the defendant and their prevalence, arriving at a 1E-9 probability they would all match a random person, implying a Bayes factor near 1E9 for guilty. But if we try to estimate p(DNA|~guilty) by truly assuming innocence, it is immediately evident how ridiculous it is to claim only 1 out of a billion innocent suspects will have a DNA match to the crime scene. There are obviously far better explanations like a lab mistake, framing, an object of the suspect being brought by someone to the scene, etc. So the real p(wet market|lab leak) isn’t the 1/10,000 chance a pandemic arising in a random place hits the wet market, but the (higher?) probability that there’s something wrong with Peter’s argument. Then Saar tried to show specific things that might be wrong with Peter’s argument. I didn’t find his specific examples convincing. But maybe the question shouldn’t be whether I agreed with him. It should be whether I’m so confident he’s wrong that I would give it 10,000-to-1 odds. This makes total sense, it’s absolutely true, and I want to be really, really careful with it. If you take this kind of reasoning too far, you can convince yourself that the sun won’t rise tomorrow morning. All you have to do is propose 100 different reasons the sunrise might not happen. For example: The sun might go nova.
Inline links: Will, Eric, agreed, 4, put out a report on the lab leak hypothesis, https://substackcdn.com/image/fetch/$s_!g7k2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f1b493-b556-41ec-925e-03f9d8bc26cb_1456x849.webp, surveyed experts, see here, https://substackcdn.com/image/fetch/$s_!Zejl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c88e87-b6ca-4c6d-840e-24da726f50b7_975x365.png, https://substackcdn.com/image/fetch/$s_!T5rV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4983e2cd-4151-42de-9685-08037ef7a8e8_635x788.png
Okay, this one is just awful. It takes the risky gambit above - giving extreme odds to something - then doubles down on it by multiplying across twenty different stages to get a stupendously low probability of 1/5*10^25. If we believe this, it’s more likely that we win the lottery three times in a row than that we learn lab leak was true after all. Eliezer Yudkowsky calls this the Multiple Stage Fallacy. Even aside from the failure mode in the sunrise example above (where people are too reluctant to give strong probabilities), it fails because people don’t think enough about the correlations between stages. For example, maybe there’s only 1/10 odds that the Wuhan scientists would choose the suboptimal RRAR furin cleavage site. And maybe there’s only 1/20 odds that they would add a proline in front to make it PRRAR. But are these really two separate forms of weirdness, such that we can multiply them together and get 1/200? Or are scientists who do one weird thing with a furin cleavage site more likely to do another? Mightn’t they be pursuing some general strategy of testing weird furin cleavage sites? (For example, Yuri proposed that, because the scientists wanted to understand how pandemic coronaviruses originate in nature, they might deliberately pick more natural-looking features over more designed-looking ones, which would neatly explain many features seemingly inconsistent with lab leak. Is this a conspiracy theory? Rootclaim is able to successfully route around this question. If the probability of a feature happening in nature is X, then the probability of it happening in this variant of lab leak scenario is X * [chance that the scientists wanted to imitate nature). This gives it a (deserved) complexity penalty without ruling out this (non-zero and potentially important) possibility.) In any case, Peter didn’t care as much about probabilistic analysis as Saar, he didn’t make his case hinge on this slide, and he might have been kind of using it to troll Rootclaim (which definitely worked). He might not have been making any of the mistakes above. But anyone who took this slide seriously would end up dramatically miscalibrated. The Math: Big Pictures Another of Saar’s concerns with the verdict was that Peter was an extraordinary debater, to the point where it could have overwhelmed the signal from the evidence. It’s hard to watch the videos and not come away impressed. Peter seems to have a photographic memory for every detail of every study he’s ever read. He has some kind of 3D model in his brain of Wuhan, the wet market, and how all of its ventilation ducts and drains interacted with each other. Whenever someone challenged one of his points, he had a ten-slide PowerPoint presentation already made up to address that particular challenge, and would go over it with complete fluency, like he was reciting a memorized speech. I sometimes get accused of overdoing things, but I can’t imagine how many mutations it would take to make me even a fraction as competent as Peter was. Saar’s closing argument included the admission: Peter, I think everyone can agree, has much more knowledge on [COVID] origins than we do. He's invested much more time. He may be a much more talented researcher. He's much more into the details. He probably knows the best in the world on origins at this point. Once you’ve described your opponent that way in your closing argument, what’s left of your case? Saar thought a lot was left. Throughout the debate, he tried to make a point about how getting the inference right was more important than winning sub-sub-sub-debates about individual lines of evidence. Although Peter won most specific points of contention, Saar thought that if the judges could just keep their mind on the big picture, they would realize a lab leak was more likely. I’m potentially sympathetic to arguments like Saar’s. Imagine a debate about UFOs. Imaginary-Saar says “UFOs can’t be real, because it doesn’t make sense for aliens to come to Earth, circle around a few fields in Kansas, then leave without providing any other evidence of their existence.” Imaginary-Peter says “John Smith of Topeka saw a UFO at 4:52 PM on 6/12/2010, and everyone agrees he’s an honorable person who wouldn’t lie, so what’s your explanation of that?” Saar says “I don’t know, maybe he was drunk or something?” Peter says “Ha, I’ve hacked his cell phone records and geolocated him to coordinates XYZ, which is a mosque. My analysis finds that he’s there on 99.5% of Islamic holy days, which proves he’s a very religious Muslim. And religious Muslims don’t drink! Your argument is invalid!” On the one hand, imaginary-Peter is very impressive and sure did shoot down Saar’s point. On the other, imaginary-Saar never really claimed to have a great explanation for this particular UFO sighting, and his argument doesn’t depend on it. Instead of debating whether Smith could or couldn’t have been drunk, we need to zoom out and realize that the aliens explanation makes no sense. The problem was, Saar couldn’t effectively communicate what his big picture was. Neither deployed some kind of amazingly elegant prior. They both used the same kind of evidence. The only difference was that Peter’s evidence hung together, and Saar’s evidence fell apart on cross-examination. I think - not because Saar really explained it, but just reading between the lines - Saar thought the un-ignorable big picture evidence was the origin in a city with a coronavirus gain-of-function lab, and the twelve-nucleotide insertion in the furin cleavage site. To some degree, Peter just ate the loss on those questions. No matter how you slice it, it really is a weird coincidence that the epidemic started so close to Asia’s biggest coronavirus laboratory. Peter tried to deflect this - he pointed out there were other BSL-3 and BSL-4 laboratories in Beijing, Shanghai, Shenzhen, etc. But this was a rare question where he unambiguously came out looking worse - the other cities’ labs had much less coronavirus-specific research. Wuhan really was unique (aside from the other big coronavirus lab in North Carolina). Peter did better when he tried to control the damage: there are a couple hundred million people in the South Asian areas where people eat weird animals exposed to virus-infected bats, Wuhan has a population of about 12 million, so maybe 1.5% of all potential zoonotic pandemics should start in Wuhan. Peter tried to argue that Wuhan was a local trade center, so maybe we should up that to 5 - 10%. 5 - 10% coincidences aren’t that rare. Even 1.5% coincidences happen sometimes. Likewise, the furin cleavage site really does stand on a genetic map. I didn’t feel like either side did much math to quantify how weird it was. Naively, I might think of this as “30,000 bases in COVID, only one insertion, it’s in what’s obviously the most interesting place - sounds like 30,000-to-one odds against”. Against that, a virus with a boring insertion would never have become a pandemic, so maybe you need to multiply this by however much viral evolution is going on in weird caves in Laos, and then you would get the odds that at least one virus would have an insertion interesting enough to go global. Neither participant calculated this in a way that satisfied me (though see here for related discussion). Instead, Peter tried to undermine the furin argument by showing that, as surprising as the site was under a natural origin, it would be an even more surprising choice for human engineers. Saar argued it wasn’t - but because of his policy of giving adjusted-for-model-error odds, he only gave this a factor of 30 in his analysis. Since Peter gave it a higher factor of 50 in his analysis, it looked from the outside like Saar had already conceded this point, and the judges were mostly happy to go with Saar’s artificially-low estimate. The Math: Double Coincidences Saar brought up an interesting point halfway through the debate: you should rarely see high Bayes factors on both sides of an argument. That is, suppose you accept that there’s only a 1-in-10,000 chance that the pandemic starts at a wet market under lab leak. And suppose you accept there’s only a 1-in-10,000 chance that COVID’s furin cleavage site could evolve naturally. If lab leak is true, then you might find 1-in-10,000 evidence for lab leak. But it’s a freak coincidence that there was 1-in-10,000 evidence for zoonosis5. Likewise, if zoonosis is true, you might find 1-in-10,000 evidence for this true thing. But it’s a freak coincidence that there was 1-in-10,000 evidence for lab leak. Either way, you’re accepting that a 1-in-10,000 freak coincidence happened. Isn’t it more likely you’ve bungled your analysis? I was following along at home, and I definitely bungled this point; I had some high Bayes factors on both sides. I adjusted some of them downward based on Saar’s good point, but how far should we take it? Here I remember The Pyramid And The Garden: you can get very strong coincidences if you have many degrees of freedom, ie buy a lot of lottery tickets. So for example, suppose there are fifty things about a virus. You should expect at least one of those to have a one-in-fifty coincidence by pure chance. What about more than that? You might be able to get away with this by saying there are an infinite number of possible conspiracy theories, and some from that infinite set are brought into existence when a strong enough coincidence makes them plausible. For example, it’s really weird that John Adams and Thomas Jefferson both died on the 50th anniversary of the Declaration of Independence. If I wanted, I could form a conspiracy theory about a group of weird assassins obsessed with killing Founding Fathers on important dates, and then Jefferson and Adams’ deaths would be 1/10,000 evidence for that theory. But this is the Texas Sharpshooter Fallacy, which Saar warned against several times. I don’t know if “the virus started in Wuhan, which is where they’re doing this research” gets a Texas Sharpshooter penalty, or how high that penalty should be. But the furin cleavage site doesn’t - people were talking about lab leak before anyone noticed it. The Aftermath: Peter Peter seemed satisfied with the result, in an understated sort of way: It seemed like an interesting experiment in monetizing the debunking of a conspiracy theory. I think there's usually a big asymmetry where it's easy to get rich spreading bullshit (like, the top anti-vaxxers during the pandemic all made a million dollars a year on substack), but it's almost impossible to make money on debunking it. The Rootclaim challenge seemed like one rare case where the opposite was true. Beyond that, I don't know what it's good for. It does seem like there could be a positive social impact from more people understanding that the lab leak hypothesis is (almost certainly) false. The Aftermath: Saar Saar says the debate didn’t change his mind. In fact, by the end of the debate, Rootclaim released an updated analysis that placed an even higher probability on lab leak than when they started. In his blog post, he discussed the issues above, and said the judges had erred in not considering them. He respects the judges, he appreciates their efforts, he just thinks they got it wrong. Although he respected their decision, he wanted the judges to correct what he saw as mistakes in their published statements, which delayed the public verdict and which which Viewers Like You did not appreciate: I ran an early draft of this post by him. There was some miscommunication about the exact publication date, so he hasn’t had time to write up a full response, but he has some quick thoughts (and I’ll link the full response when he writes it). He says: We will provide a full response to this post soon, but the main problem with it is fairly simple: There is general agreement that the main evidence for zoonosis is HSM (Huanan Seafood Market) forming an early cluster of cases. The contention is whether it is amazing 10,000x evidence, or is it negligible. All other evidence points to a lab leak, and if HSM is shown to be weak, lab leak is a clear winner. We provided an analysis of why it is negligible that is as close to mathematical proof as such things can be. Read it here. Scott and I exchanged a few emails on this issue and Scott preferred to discuss more intuitive analyses of HSM, using rules of thumb that likely served him well in the past. While I believe I managed to mostly explain where these failed, and Scott understands HSM is far weaker evidence than he initially thought6, he still has a very strong intuitive feeling (based on years of dealing with probabilities) that this is some exceptional coincidence, and that prevents him from properly updating his posterior. At the end of the day, this cannot be settled without going through our semi-formal derivation, understanding it, and either identifying the problem with it or accepting it (and thereby accepting lab-leak to be more likely). Here is a quick summary of the mistakes made by those claiming HSM is strong evidence: The first mistake is conflating Bayes factors with conditional probabilities. 1/10000 is the supposed conditional probability p(HSM|Lab Leak), That should be divided by the conditional probability of HSM under Zoonosis. Markets were not identified as a high-risk location prior to this outbreak (This will be elaborated in the full response), and in SARS1 the spillovers were mostly at restaurants and other food handlers that deal more closely with wildlife. While it's cool to point to the raccoon dog photo, that was a result of a retrospective search (we don't know what other photos they took which in retrospect would be brought up as premonition). Unbiased data shows markets are not a likely spillover location for zoonosis. We originally estimated p(HSM|Zoonosis)<0.1. Following more research we did to answer Scott's questions, this is more likely <0.03.
Inline links: Multiple Stage Fallacy, see here, 5, The Pyramid And The Garden, Texas Sharpshooter Fallacy, blog post, 6
A separate market on the lab leak hypothesis itself shifted less, from about 70% to 60%. This could either be because bettors thought Peter was a great debater but wasn’t actually right, or because most people in this (very large) market didn’t even watch the debate. In general I’m not optimistic about markets with no plausible way of ever being resolved.
Inline links: market on the lab leak hypothesis itself
This alone isn’t fatal to lab leak. It’s perfectly possible for the lab to leak (let’s say) November 5th, the virus spreads a bit, and then a month later someone goes to the wet market, coughs on a vendor, and starts the officially recognized pandemic. But if that were true, you’d expect (let’s say) 30 cases by early December. Let’s say the wet market vendor was exactly Case # 30. She infected the other wet market vendors, starting a pandemic with an obvious center at the wet market and lots of infected wet market vendors and patrons. What about Case # 29? If they were (let’s say) a barista, how come they didn’t infect people at their coffee shop? How come there wasn’t a second obvious cluster radiating out from a coffee shop, lots of coffee-shop-linked cases, etc? How come there weren’t 30 equally-sized clusters? In order to avoid this, you either need to claim that the wet market was a perfect superspreader location, or that the pattern with lots of cases in the wet market and few-to-none anywhere else was a result of ascertainment bias. Saar made both those arguments during the debate, but I thought Peter rebutted them effectively. 1.4: COVID in Brazilian wastewater Nicholas Halden (blog) writes: What should we make of this study, which found the presence of covid in Brazilian wastewater in late 2019? Consider the doubling times. The study says that scientists working in late 2020 found COVID in samples of Brazilian wastewater from November 27, 2019. This was long before the first detected case of transmission in Brazil on March 13, 2020. Between November 27, 2019 and March 13, 2020 is about 16 weeks, so 32 COVID doubling times. 32 doubling times with no lockdown is enough time for COVID to infect every single person in Brazil. If COVID had infected everyone in Brazil before the first recognized case, we would have noticed. (again, COVID doubling time isn’t exactly invariably 3.5 days, but here we’re talking about numbers big enough that the exact details don’t matter very much) So if COVID was in Brazil on November 27, it must have fizzled out instead of going pandemic. How likely is that? If one person had COVID, it’s not too unlikely - not all COVID cases transmit it forward. If (let’s say) twenty people had COVID, it’s very unlikely - at that point, the law of large numbers takes over; in a freak coincidence, every single patient would have to fail to infect anyone else. So almost certainly fewer than 20 people in Brazil had COVID in November 27. So which is more likely - that somehow 20 people had COVID long before the virus was officially detected, and on a totally different continent, yet somehow a scientist looking through wastewater found the water from exactly those people and managed to detect the virus? Or that there was a sampling error, which happens all the time in these kinds of things? Peter wrote a blog post on some of these issues. He found that there were positive tests from wastewater samples as early as March 2019, which doesn’t fit anyone’s timeline, including lab leakers’. And most of these positives (including the Brazilian sample) contained later strains of the virus with mutations it picked up late in 2020. So these were almost certainly false positives from contamination. 1.5: Biorealism’s 16 arguments Biorealism has a list of sixteen arguments, which he liked so much that he posted it three times in the ACX comments, twice on Less Wrong, twice on Manifold, and about a dozen times on Twitter under multiple account names. Some posts were slightly different from others, but a typical version is: Importantly, Miller incorrectly claimed the N501Y mutation would result from passage in hACE2 mice (mixed them up with BALB/c mice). The major papers Miller relied on have been seriously challenged since the debate. See Stoyan and Chiu (2024), Weissman (2024), Bloom (2023) and Lv et al (2024). Overall the circumstantial evidence makes lab v plausible: Peter admitted getting this wrong during the debate. I think this very minor point about mice mutations was approximately his only mistake in 15 hours of debating, and he admitted it as soon as he noticed. Biorealism somehow heard about this (obviously not through watching the debate, as we’ll see in a moment), then left about 20-30 comments starting with it, under various accounts, on various platforms, as if it somehow discredited Peter. This is making me somewhat less charitable to him and his 16 arguments than I would be otherwise. 1. Chinese researchers Botao & Lei Xiao observed lab origin was likely given the nearest known relatives to SARS-CoV-2 were far from Wuhan. Wuhan Institute of Virology (WIV) sampled SARS-related bat coronaviruses where the nearest relatives are found in Yunnan, Laos and Vietnam ~1500km away. They refuse to share their records. The ancestral viruses of SARS were found equally far from where SARS spilled over into humans, so we know it’s possible (and likely) for viruses to travel that far. 2. Patrick Berche, DG at Institut Pasteur in Lille 2014-18, notes you would expect secondary outbreaks if it arose via the live animal trade. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10234839/ There are constant outbreaks of weird coronaviruses in animal handlers. See eg this paper, which estimates about 60,000 of these per year. None of these ever go anywhere, because the farmers are in rural areas that aren’t dense enough to sustain a high R0, and the epidemic fizzles out after a single digit number of cases. Any early outbreaks of COVID would have vanished into this long and mostly unnoticed list. 3. Molecular data: Only sarbecovirus with a furin cleavage site. Well adapted to human ACE2 cells. Low genetic diversity indicating a lack of prior circulation (Berche 2023). Restriction site SARS-CoV-2 BsaI/BsmBI restriction map falls neatly within the ideal range for a reverse genetics system and used previously at WIV and UNC. Ngram analysis of the codon usage per Professor Louis Nemzer https://twitter.com/BiophysicsFL/status/1667232580255490053?t=IJgitS5cw364ioclzVWxaA&s=19 The SARS2 backbone is very low in CG and CpG. While the 12-nt insert that gives it the FCS is extremely high in both. Almost as if it was some kind of chimera of a consensus sequence and a codon-optimized polybasic cleavage site? https://twitter.com/BiophysicsFL/status/1752800486837678377?t=EpIRgyybJVaPgeMP5xdstA&s=19 https://www.biorxiv.org/content/10.1101/2022.10.18.512756v1 https://link.springer.com/article/10.1007/s10311-021-01211-0?fbclid=IwAR1HMUMtLIAFOFppVasQDeoIAYrVhP8j4YoPO4wnaTOUiKLsllZl_oKryOw Most of this was discussed extensively in the second session of the debate, which I recommend. The CGG-CGG arginine codon usage is particularly unusual but used in synthetic biology. I asked a synthetic biologist about this. He said: » “Nope. I would literally never do this if I was designing a small insert (maybe I wouldn't notice if it happened by chance with ~1 in 25 odds in a naive codon optimization algorithm as part of a larger sequence). High GC% is bad. Tandem repeat is worse. Several other perfectly fine arginine codons. And I wouldn't engineer a viral genome using human codon usage. An engineer would not do it.” 4. DEFUSE full proposal: virus 20% different from SARS1, consensus seq assembled with 6 segments, without disrupting coding seq, BsmBI order, FCS. SARS2: 20% different than SARS1, 6 evenly spaced fragments w BsmBI and BsaI restriction sites, FCS. Jesse Bloom, Jack Nunberg, Robert Townley, Alexandre Hassanin have observed this workflow could have lead to SARS-CoV-2. Work often begins before funding sought or goes ahead anyway. Re: 4 - Also scattered across second section of debate, also not going to retread 5. Market cases were all lineage B. Lv et al (2024) indicates there was a single point of emergence and A came before B. So market cases not the primary cases. See also Bloom (2021), Kumar et al (2022). Peter Ben Embarek said there were likely already thousands of cases in Wuhan in December 2019.https://t.co/50kFV9zSb6 https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/34398234/ https://academic.oup.com/bioinformatics/article/38/10/2719/6553661 There was a Lineage A sample in the market, lab leak proponents just try to ignore/dismiss/conspiracize it away. The first two known Lineage A cases were very close to the market. Lv (is this even a real name? It sounds like Roman numeral? But I guess that’s what you expect in a country ruled by someone named Xi) found some weird COVID variants in Shanghai that might or might not mean anything; you can see some discussion of the implications here, but I don’t think they’re strong evidence either way. If A was first, it means some really weird stuff coincidences have to happen to give us the spread rates and genetic clock data we get, but they’re not necessarily weirder in the zoonosis hypothesis than the lab leak one. The claim that there were “thousands of cases in Wuhan in December 2019” is very easy to disprove by doubling rate arguments like the one above, by the blood bank study mentioned above, by the WHO’s failed case search, and by many other lines of argument. 6. Evidence for lineage A in the market is based on a low quality sample according to Liu et. al. (2023). I really think lab leakers need to decide whether they think China is a sinister actor trying to cover up the truth, or whether they should trust every offhand comment by Chinese government officials as gospel. Dr. Liu doesn’t explain in what sense he thinks the Lineage A sample is “low-quality”, and the Western scientists who I asked about this said they didn’t understand this complaint and that the sample was fine. A Western team re-analyzing the same sample describes it as “conclusively contain[ing] Lineage A.” I think most lab leakers have switched from trying to deny the genetics to claiming that this was “contamination”, which also doesn’t make sense (the sample is genetically very early). Note that aside from this sample, the first two Lineage A cases discovered were both very close to the wet market. 7. Bloom (2023) shows market samples do not support market origin. There is also no evidence of transmission in the claimed susceptible animals elsewhere. https://academic.oup.com/ve/advance-article/doi/10.1093/ve/vead089/7504441 Discussed extensively in my article as well as the first section of the debate. 8. Lineage A and B only two mutations apart. François Ballox, Bloom and Virginie Courtier-Orgogozo note this is unlikely to reflect two separate animal spillovers as opposed to incomplete case ascertainment of human to human transmission (Bloom 2021). Discussed extensively in my article as well as the first section of the debate. 9. Sampling bias. George Gao, Chinese CDC head at the time, acknowledged to the BBC stating they may have focused too much on and around the market and missed cases on the other side of the city. David Bahry outlines the documented bias. Michael Weissman has shown this mathematically. https://journals.asm.org/doi/10.1128/mbio.00313-23 https://academic.oup.com/jrsssa/advance-article-abstract/doi/10.1093/jrsssa/qnae021/7632556 Re: Dr. Gao, see above comment about Chinese officials. See the section Ascertainment Bias below for why I disagree with this specific claim, which also addresses the Michael Weissman argument. 10. Spatial statistics experts show the Worobey claim the market was the early epicentre was flawed. https://academic.oup.com/jrsssa/advance-article-abstract/doi/10.1093/jrsssa/qnad139/7557954 Re: 10 - See Confirmation Of The Centrality Of The Huanan Market Among Early COVID-19 Cases, a response to the paper you cite: The centrality of Wuhan's Huanan market in maps of December 2019 COVID-19 case residential locations, established by Worobey et al. (2022a), has recently been challenged by Stoyan and Chiu (2024, SC2024). SC2024 proposed a statistical test based on the premise that the measure of central tendency (hereafter, "centre") of a sample of case locations must coincide with the exact point from which local transmission began. Here we show that this premise is erroneous. SC2024 put forward two alternative centres (centroid and mode) to the centre-point which was used by Worobey et al. for some analyses, and proposed a bootstrapping method, based on their premise, to test whether a particular location is consistent with it being the point source of transmission. We show that SC2024's concerns about the use of centre-points are inconsequential, and that use of centroids for these data is inadvisable. The mode is an appropriate, even optimal, choice as centre; however, contrary to SC2024's results, we demonstrate that with proper implementation of their methods, the mode falls at the entrance of a parking lot at the market itself, and the 95% confidence region around the mode includes the market. Thus, the market cannot be rejected as central even by SC2024's overly stringent statistical test. I think this response is pretty strong. In one analysis, they show that even though the other paper’s methodology is worse than theirs, if you apply it correctly (instead of inappropriately excluding various cases like the paper’s authors did), the center of all early cases in Hubei province lands on the wet market parking lot. In another analysis, they show that the other paper’s recommended tests wouldn’t have correctly pointed to the offending water pump in the famous John Snow cholera outbreak, but theirs would have. Still, I think it’s useful to supplement fancy statistics with normal common sense, so I recommend just looking at the map of early cases: …and deciding whether you think the assumptions behind a specific statistical test are likely to debunk the idea that cases are centered around the wet market. 11. Wuhan used as a control for a 2015 serological study on SARS-related bat coronaviruses due to its urban location. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6178078/ I don’t know why this point is supposed to matter. If you mean that Wuhan isn’t directly exposed to bats, nobody ever said it was. The zoonotic theory is that wildlife carted in from other areas of China started the pandemic in the wet market. 12. Superspreader events also seen at wet markets in Beijing and Singapore (Xinfadi and Jurong). This was discussed very extensively in the debates, both in section 1 and section 3. Wet markets weren’t “superspreader locations” - in fact, the disease spread no more quickly there than anywhere else. They were the first place in those cities that the pandemic started, due to contaminated animal products. If anything, this supports zoonosis. See also my discussion with Saar on this point below. 13. WIV refuse to share their records with NIH who terminated subaward in 2022. Wider suspension over biosafety concerns. https://www.bloomberg.com/news/articles/2023-07-18/us-suspends-wuhan-institute-funds-over-covid-stonewalling Although WIV has not been especially forthcoming, some of their databases were leaked in various ways and showed that they did not have any viruses capable of transforming into COVID. 14. PLA involvement at WIV and MERS research prior to SARS-COV-2. MERS features several similarities with SARS-CoV-2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7022351/ I can’t even tell what conspiracy theory you’re trying to propose with this one; if you spell it out I can try to explain why it might be false. 15. SARS1 leaked several times and SARS-COV-2 has leaked from a BSL-3 lab in Taiwan. Agreed that SARS leaked several times. It also spilled over from animals several times. During the debate, a lab leak rate of once per lab per 500 years was proposed (everyone agreed to steelman this by 10x for WIV numbers); I would be interested to know whether anything about the study of SARS challenges that number. 16. Unpublished infectious clone identified from Wuhan contradicting arguments such reverse genetics systems would be published. https://www.biorxiv.org/content/10.1101/2023.02.12.528210v1.full I asked some scientists about this paper and here’s what they told me. Wuhan University sequenced some rice. In the middle of the sequence, there’s an unexpected sequence from a common coronavirus, HKU4. The most likely explanation is that someone else in Wuhan was working on the coronavirus and there was cross-contamination. Plausibly this is Wuhan Institute of Virology, who is known to work with coronaviruses. This is cool detective work, but it’s not clear what it’s supposed to prove. I think some lab leakers are using it to prove that WIV can do reverse genetics, but they admitted this already in a published paper so that’s not too helpful. I think others are using it to prove WIV had “secret viruses” in their catalogue, but the rice virus wasn’t secret, it was HKU4, which is common and which WIV has already published papers about. 1.6: DrJayChou’s 7 Arguments Once again, I cannot stress enough how much better a take you might have on this debate if you watch it. “The first known case predates the market outbreak by a month” - this is not the consensus position. I cannot say for sure what Dr. Chou means by this, but I suspect he’s referring to one of the many claims to this effect that Peter effectively debunked during the debate (Connor Reed, Mr. Chen, the 92 cases, Brazil, etc).
Inline links: blog, writes, this study, wrote a blog post on some of these issues, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10234839/, this paper, https://twitter.com/BiophysicsFL/status/1667232580255490053?t=IJgitS5cw364ioclzVWxaA&s=19, https://twitter.com/BiophysicsFL/status/1752800486837678377?t=EpIRgyybJVaPgeMP5xdstA&s=19, https://www.biorxiv.org/content/10.1101/2022.10.18.512756v1, https://link.springer.com/article/10.1007/s10311-021-01211-0?fbclid=IwAR1HMUMtLIAFOFppVasQDeoIAYrVhP8j4YoPO4wnaTOUiKLsllZl_oKryOw, https://t.co/50kFV9zSb6, https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/34398234/, https://academic.oup.com/bioinformatics/article/38/10/2719/6553661, here, describes it as, https://academic.oup.com/ve/advance-article/doi/10.1093/ve/vead089/7504441, https://journals.asm.org/doi/10.1128/mbio.00313-23, https://academic.oup.com/jrsssa/advance-article-abstract/doi/10.1093/jrsssa/qnae021/7632556, https://academic.oup.com/jrsssa/advance-article-abstract/doi/10.1093/jrsssa/qnad139/7557954, Confirmation Of The Centrality Of The Huanan Market Among Early COVID-19 Cases, https://substackcdn.com/image/fetch/$s_!BNAm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffd4cddb-6e3e-41f5-8ef6-ec0b27bec600_626x426.webp, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6178078/, https://www.bloomberg.com/news/articles/2023-07-18/us-suspends-wuhan-institute-funds-over-covid-stonewalling, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7022351/, https://www.biorxiv.org/content/10.1101/2023.02.12.528210v1.full, a published paper, has already published papers about, https://substackcdn.com/image/fetch/$s_!yA9U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F467dd304-190a-4437-8920-d498c433dffb_1600x960.jpeg
Statements by two dissenting CFTC commissioners (1, 2) on why they oppose. Pivotal Act Manifold Markets says they’re pivoting to a new model combining play money points and real-money gambling. Manifold may be a beloved local fixture, but their growth and revenue aren’t too impressive: In the interests of continuing to exist and push prediction markets forward, they will switch to a “sweepstakes” model. Although gambling is illegal in most US states and requires complicated licensing in others, there’s a “sweepstakes loophole”; companies are allowed to offer “prize sweepstakes”, and you can use this to sort of reconstruct the concept of gambling in a legal way. You don’t give the company money and get back money. You pay for “points”, get “sweepstakes tokens” as a bonus, gamble the “sweepstakes tokens”, and then cash in the sweepstakes tokens for money. This is a pretty surprising loophole, but it’s already used by sites like Chumba Casino and Fliff. (and apparently it creates weird incentives! In order to maintain the fiction of being a “sweepstakes”, these casinos have to give you “tokens” if you request them by mail. If you send a postcard to Chumba Casino asking for free money, they’ll give it to you, $5 per postcard. Is this an infinite free money pump? My impression is in theory yes, but the postcards have to be handwritten in a very specific way, the company sometimes rejects them for weird reasons, the cost of materials and mailing lowers your profit to more like $4, and so you’d have to hand-write 250 postcards to make $1,000. I’m still surprised more people don’t do this.) Because real money is involved, Manifold will have to tighten the rules on markets, including banning N/A resolutions. You can see a full list of changes here. Manifold users are split between acknowledging that the for-profit company they love needs some way to make money, being salty about the changes, and being worried that creating more of a casino atmosphere will be bad for users / the world / ability to function as a good prediction market. (I understand most of the NO vote here is based on the theory that there will be legal intervention - maybe because the government is willing to tolerate sweepstakes casinos but not sweepstakes prediction markets). Manifold co-founder Austin Chen won’t be involved. He’s leaving the site - not explicitly because of the pivot, he just said it seems to be “trapped in local optima”. He plans to focus on other parts of the Manifold empire, especially Manifund, which tests impact markets, regranting, and other “experimental” charity models. Manifold will continue in the hands of the other two co-founders, James and Stephen Grugett. Superhindcasting I mentioned this in my lab leak post, but it deserves more attention here: Good Judgment Project’s report on Superforecasting The Origins Of The COVID-19 Pandemic. Good Judgment Project employs superforecasters who will predict things for clients. Some people interested in COVID origins asked them to judge whether lab leak was plausible. Their headline result was 74% zoonosis, 25% lab leak, 1% something else. Part of GJP’s method is getting their forecasters to share sources and talk to each other. Here’s the graph for how that went: People changed their minds a little over time, but not in a very consistent way that mattered much in the end. What was the “client feedback”? The report says: Client feedback was provided to the Superforecasters on December 21. The client posed questions to the Superforecasters about their assessments up to that date and asked for their reactions to several studies and articles. In the days following the client engagement, the Superforecasters lowered their confidence in the natural zoonosis hypothesis from 73% to 67%, although zoonosis remained the most likely potential cause in their assessment. But following an active engagement with recent genomic studies and historical base rates of zoonotic spillovers, those numbers began to return to earlier levels. January also saw increased attention to the geopolitical context and transparency issues, particularly related to research activities in Wuhan Is this bad? I’m imagining a pro-lab-leak client saying “But what about [this list of pro-lab-leak arguments]?” and then the superforecasters read them and adjust. In one sense, it’s good that they got to see more arguments; on the other, it seems like a potential route by which clients could bias the results - probabilities never quite got back to where they were before the feedback, though they got pretty close. The last-minute spike for zoonosis might be the Rootclaim debate results, which were released on 2/18. So maybe the client feedback and the Rootclaim results both slightly affected the numbers, but mostly the superforecasters started out pro-zoonosis and stuck to their guns. Dan Schwarz and the FutureSearch team say that forecasting has a “rationale-shaped hole”. Despite the report making this sound like a pretty intense process, we don’t get much information about details: In their extensive discussions , Good Judgment’s Superforecasters assessed base rates and historical patterns, existing evidence and scientific analysis, geopolitical context and transparency concerns, trust in intelligence communities, and methodological constraints. 1. Base Rates and Historical Patterns: The Superforecasters frequently referenced base rates, i.e., the history of pandemics emerging from natural zoonosis versus the history of laboratory leaks, to anchor their probabilities. For the former, they discussed how the base rates are changing as the climate warms and as expanding human populations push farther into natural environments that previously saw little human presence. For the latter, they acknowledged that it has only been 12 years since the advent of CRISPR gene- editing tools, and the base rate of lab leaks in the short synthetic biology era is not yet well established. 2. New Evidence and Scientific Analysis: Throughout the period, the Superforecasters adapted their forecasts in light of new scientific evidence, including genomic analyses of SARS-CoV-2 and its relation to bat viruses, and the debate over potential laboratory manipulation. 3. Geopolitical Context and Transparency Concerns: The geopolitical implications of the virus’s origins, particularly in relation to China’s transparency and the involvement of international research institutions, played a significant role in the analysis. Concerns over data veracity, and over the political ramifications of determining that the pandemic’s origins were other than zoonosis, were extensively debated. 4. Trust in Intelligence: Commentary on trust in intelligence communities and discussions about the impact of geopolitical biases on the interpretation of evidence illustrated the complex interplay between science, politics, and human behavior in assessing the pandemic’s origins. 5. Methodological Critiques and the Evaluation of Evidence: The Superforecasters engaged in methodological critiques of the evidence base, including the scrutiny of laboratory practices and biocontainment levels [...] In the end, most Superforecasters were in rough agreement on issues like the base rates of zoonotic spillover. Where they most often disagreed was on the interpretation of actions by Chinese officials and whether their actions reflected how an authoritarian government would react in any crisis over which it did not have full control, or whether those actions were indicative of attempts to cover up a biomedical research-related accident that allowed the SARS-CoV-2 virus to enter circulation in China and, ultimately, the entire globe. Probably it would be too much to ask for to get a transcript of all their discussions - then they’d be nervous saying things that might make them look bad to an audience. What would be a good balance between getting more information and not imposing on their time? Forecasting is an unusually legible and easy-to-judge domain. One of the theories of change for forecasting was to use it to identify smart people with good reasoning, then turn them loose on less well-behaved problems. This is one of the first big attempts to do this at scale. How did it work? We can’t tell, because it’s inherently an illegible and hard-to-judge domain. Darn. I don’t know what I expected. Notes From A Local Optimum Austin’s concern - that forecasting has reached a local optimum - is widely shared. We have some good sites: Manifold, Metaculus, Polymarket, GJO, etc - all doing good work. We have good-ish probabilities for a few important questions. Every so often a news source cites them. Sometimes a decision-maker looks at them behind the scenes, maybe. Is this all there is? The FutureSearch team says the next step is to focus on “rationale”. We need to use forecasting not just to get a raw probability, but to explain what’s going on and why we think something. Then instead of just convincing policy-makers to trust forecasts, we can tell them why something is true, or inform their discussions even if they’re not willing to blindly trust a number. Is this a betrayal of the forecasting ethos? The original dream was that instead of a bunch of people giving arguments, we could just test who was right. Now we’re going back to the arguments? People have argued forever; what does forecasting add to that? Well, they add the knowledge that the arguments are from people who have been right a lot before and are incentivized to be right again. Still, it’s not a natural fit. Probably it’s relevant here that FutureSearch’s forecasting AI does a really good job of this by default, in a way humans can’t match. Nuno’s yearly forecasting roundup doesn’t have a single thesis, but the first part is a well-supported complaint that most forecasting sites aren’t good business. They either burn VC money, burn EA donations, or converge towards casinos to support themselves. He gives an honorable exception to Cultivate Labs, which sells prediction market software rather than the results themselves. Open Philanthropy (billionaire Dustin Moskovitz’s EA-aligned charitable foundation) has at least given forecasting a vote of confidence, recently choosing to promote it to one of their main donation areas. Still, they got a lot of pushback on the decision, for example SuperDuperForecasting here: This will be a total waste of time and money unless OpenPhil actually pushes the people it funds towards achieving real-world impact. The typical pattern in the past has been to launch yet another forecasting tournament to try to find better forecasts and forecasters. No one cares, we already know how to do this since at least 2012! The unsolved problem is translating the research into real-world impact. Does the Forecasting Research Institute have any actual commercial paying clients? What is Metaculus's revenue from actual clients rather than grants? Who are they working with and where is the evidence that they are helping high-stakes decision makers improve their thought processes? Incidentally, I note that forecasting is not actually successful even within EA at changing anything: superforecasters are generally far more relaxed about Xrisk than the median EA, but has this made any kind of difference to how EA spends its money? It seems very unlikely. And Marcus Abramovich here: I'm in the process of writing up my thoughts on forecasting in general and particularly EA's reverence for forecasting but I feel, similar to @Grayden that forecasting is a game that is nearly perfectly designed to distract EAs from useful things. It's a combination of winning, being right when others are wrong and seemingly useful, all wrapped into a fun game. I'd like to see tangible benefits to more broad funding of forecasting that seems to be done in t he millions and tens of millions of dollars. I would also be the type of person you would think would be a greater fan of forecasting. I'm the number one forecaster on Manifold and I've made tens of thousands of dollars on Polymarket. But I think we should start to think of forecasting as more of a game that EAs like to play, something like Magic the Gathering that is fun and has some relations to useful things but isn't really useful by itself. Eli Lifland has a long and hard-to-summarize comment here, response from Ozzie Gooen here, podcast between them on “Is Forecasting A Promising EA Cause Area?” here. I’m split on this. My previous hope was that the field would gradually grow, without any qualitative changes or discontinuities, until it became big enough that journalists and policy-makers were aware of it and took it seriously (compare eg the growth of the Internet as a scholarly resource). I think the strongest argument against this is Manifold’s relatively flat user numbers. Is there a new hope? I think if nothing else, forecasting might be useful as a testing ground: First, to create forecasting AIs (like FutureSearch) which can then get consulted on a variety of questions, eg by policy-makers. The biggest holdup has always been the need to gather 20 or 50 or however many hard-to-find superforecasters for whatever question you’re asking, and then trust their advice even though they’re fallible fleshbag humans. If you can use the 20 to 50 superforecasters to inspire an AI, and then test the AI and prove it’s good, people might be more interested. This is especially true if the AI can branch out beyond traditional forecasting questions. Once we have a few of these, we can start comparing the next generation of AIs to the previous generation, and skip the superforecasters.
Inline links: 1, 2, Manifold Markets says, https://manifold.markets/stats, Chumba Casino, Fliff, have to be handwritten in a very specific way, rejects them, more people, https://manifold.markets/Joshua/good-pivot-bad-pivot-which-opinions, leaving the site, Manifund, report on Superforecasting The Origins Of The COVID-19 Pandemic., https://substackcdn.com/image/fetch/$s_!F-e7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F679c34d2-766f-41bd-ae75-b036bcdb06f9_1456x849.webp, https://substackcdn.com/image/fetch/$s_!JSEn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c315554-fca5-4dc5-ac9a-d531b4ce7534_883x519.png, forecasting has a “rationale-shaped hole”, Nuno’s yearly forecasting roundup, Cultivate Labs, choosing to promote it to one of their main donation areas, here, superforecasters are generally far more relaxed about Xrisk, here, @Grayden, here, here, “Is Forecasting A Promising EA Cause Area?” here
Probably no effect on Manifold’s pivot, see below.
In the interests of continuing to exist and push prediction markets forward, they will switch to a “sweepstakes” model. Although gambling is illegal in most US states and requires complicated licensing in others, there’s a “sweepstakes loophole”; companies are allowed to offer “prize sweepstakes”, and you can use this to sort of reconstruct the concept of gambling in a legal way. You don’t give the company money and get back money. You pay for “points”, get “sweepstakes tokens” as a bonus, gamble the “sweepstakes tokens”, and then cash in the sweepstakes tokens for money. This is a pretty surprising loophole, but it’s already used by sites like Chumba Casino and Fliff. (and apparently it creates weird incentives! In order to maintain the fiction of being a “sweepstakes”, these casinos have to give you “tokens” if you request them by mail. If you send a postcard to Chumba Casino asking for free money, they’ll give it to you, $5 per postcard. Is this an infinite free money pump? My impression is in theory yes, but the postcards have to be handwritten in a very specific way, the company sometimes rejects them for weird reasons, the cost of materials and mailing lowers your profit to more like $4, and so you’d have to hand-write 250 postcards to make $1,000. I’m still surprised more people don’t do this.) Because real money is involved, Manifold will have to tighten the rules on markets, including banning N/A resolutions. You can see a full list of changes here. Manifold users are split between acknowledging that the for-profit company they love needs some way to make money, being salty about the changes, and being worried that creating more of a casino atmosphere will be bad for users / the world / ability to function as a good prediction market. (I understand most of the NO vote here is based on the theory that there will be legal intervention - maybe because the government is willing to tolerate sweepstakes casinos but not sweepstakes prediction markets). Manifold co-founder Austin Chen won’t be involved. He’s leaving the site - not explicitly because of the pivot, he just said it seems to be “trapped in local optima”. He plans to focus on other parts of the Manifold empire, especially Manifund, which tests impact markets, regranting, and other “experimental” charity models. Manifold will continue in the hands of the other two co-founders, James and Stephen Grugett. Superhindcasting I mentioned this in my lab leak post, but it deserves more attention here: Good Judgment Project’s report on Superforecasting The Origins Of The COVID-19 Pandemic. Good Judgment Project employs superforecasters who will predict things for clients. Some people interested in COVID origins asked them to judge whether lab leak was plausible. Their headline result was 74% zoonosis, 25% lab leak, 1% something else. Part of GJP’s method is getting their forecasters to share sources and talk to each other. Here’s the graph for how that went: People changed their minds a little over time, but not in a very consistent way that mattered much in the end. What was the “client feedback”? The report says: Client feedback was provided to the Superforecasters on December 21. The client posed questions to the Superforecasters about their assessments up to that date and asked for their reactions to several studies and articles. In the days following the client engagement, the Superforecasters lowered their confidence in the natural zoonosis hypothesis from 73% to 67%, although zoonosis remained the most likely potential cause in their assessment. But following an active engagement with recent genomic studies and historical base rates of zoonotic spillovers, those numbers began to return to earlier levels. January also saw increased attention to the geopolitical context and transparency issues, particularly related to research activities in Wuhan Is this bad? I’m imagining a pro-lab-leak client saying “But what about [this list of pro-lab-leak arguments]?” and then the superforecasters read them and adjust. In one sense, it’s good that they got to see more arguments; on the other, it seems like a potential route by which clients could bias the results - probabilities never quite got back to where they were before the feedback, though they got pretty close. The last-minute spike for zoonosis might be the Rootclaim debate results, which were released on 2/18. So maybe the client feedback and the Rootclaim results both slightly affected the numbers, but mostly the superforecasters started out pro-zoonosis and stuck to their guns. Dan Schwarz and the FutureSearch team say that forecasting has a “rationale-shaped hole”. Despite the report making this sound like a pretty intense process, we don’t get much information about details: In their extensive discussions , Good Judgment’s Superforecasters assessed base rates and historical patterns, existing evidence and scientific analysis, geopolitical context and transparency concerns, trust in intelligence communities, and methodological constraints. 1. Base Rates and Historical Patterns: The Superforecasters frequently referenced base rates, i.e., the history of pandemics emerging from natural zoonosis versus the history of laboratory leaks, to anchor their probabilities. For the former, they discussed how the base rates are changing as the climate warms and as expanding human populations push farther into natural environments that previously saw little human presence. For the latter, they acknowledged that it has only been 12 years since the advent of CRISPR gene- editing tools, and the base rate of lab leaks in the short synthetic biology era is not yet well established. 2. New Evidence and Scientific Analysis: Throughout the period, the Superforecasters adapted their forecasts in light of new scientific evidence, including genomic analyses of SARS-CoV-2 and its relation to bat viruses, and the debate over potential laboratory manipulation. 3. Geopolitical Context and Transparency Concerns: The geopolitical implications of the virus’s origins, particularly in relation to China’s transparency and the involvement of international research institutions, played a significant role in the analysis. Concerns over data veracity, and over the political ramifications of determining that the pandemic’s origins were other than zoonosis, were extensively debated. 4. Trust in Intelligence: Commentary on trust in intelligence communities and discussions about the impact of geopolitical biases on the interpretation of evidence illustrated the complex interplay between science, politics, and human behavior in assessing the pandemic’s origins. 5. Methodological Critiques and the Evaluation of Evidence: The Superforecasters engaged in methodological critiques of the evidence base, including the scrutiny of laboratory practices and biocontainment levels [...] In the end, most Superforecasters were in rough agreement on issues like the base rates of zoonotic spillover. Where they most often disagreed was on the interpretation of actions by Chinese officials and whether their actions reflected how an authoritarian government would react in any crisis over which it did not have full control, or whether those actions were indicative of attempts to cover up a biomedical research-related accident that allowed the SARS-CoV-2 virus to enter circulation in China and, ultimately, the entire globe. Probably it would be too much to ask for to get a transcript of all their discussions - then they’d be nervous saying things that might make them look bad to an audience. What would be a good balance between getting more information and not imposing on their time? Forecasting is an unusually legible and easy-to-judge domain. One of the theories of change for forecasting was to use it to identify smart people with good reasoning, then turn them loose on less well-behaved problems. This is one of the first big attempts to do this at scale. How did it work? We can’t tell, because it’s inherently an illegible and hard-to-judge domain. Darn. I don’t know what I expected. Notes From A Local Optimum Austin’s concern - that forecasting has reached a local optimum - is widely shared. We have some good sites: Manifold, Metaculus, Polymarket, GJO, etc - all doing good work. We have good-ish probabilities for a few important questions. Every so often a news source cites them. Sometimes a decision-maker looks at them behind the scenes, maybe. Is this all there is? The FutureSearch team says the next step is to focus on “rationale”. We need to use forecasting not just to get a raw probability, but to explain what’s going on and why we think something. Then instead of just convincing policy-makers to trust forecasts, we can tell them why something is true, or inform their discussions even if they’re not willing to blindly trust a number. Is this a betrayal of the forecasting ethos? The original dream was that instead of a bunch of people giving arguments, we could just test who was right. Now we’re going back to the arguments? People have argued forever; what does forecasting add to that? Well, they add the knowledge that the arguments are from people who have been right a lot before and are incentivized to be right again. Still, it’s not a natural fit. Probably it’s relevant here that FutureSearch’s forecasting AI does a really good job of this by default, in a way humans can’t match. Nuno’s yearly forecasting roundup doesn’t have a single thesis, but the first part is a well-supported complaint that most forecasting sites aren’t good business. They either burn VC money, burn EA donations, or converge towards casinos to support themselves. He gives an honorable exception to Cultivate Labs, which sells prediction market software rather than the results themselves. Open Philanthropy (billionaire Dustin Moskovitz’s EA-aligned charitable foundation) has at least given forecasting a vote of confidence, recently choosing to promote it to one of their main donation areas. Still, they got a lot of pushback on the decision, for example SuperDuperForecasting here: This will be a total waste of time and money unless OpenPhil actually pushes the people it funds towards achieving real-world impact. The typical pattern in the past has been to launch yet another forecasting tournament to try to find better forecasts and forecasters. No one cares, we already know how to do this since at least 2012! The unsolved problem is translating the research into real-world impact. Does the Forecasting Research Institute have any actual commercial paying clients? What is Metaculus's revenue from actual clients rather than grants? Who are they working with and where is the evidence that they are helping high-stakes decision makers improve their thought processes? Incidentally, I note that forecasting is not actually successful even within EA at changing anything: superforecasters are generally far more relaxed about Xrisk than the median EA, but has this made any kind of difference to how EA spends its money? It seems very unlikely. And Marcus Abramovich here: I'm in the process of writing up my thoughts on forecasting in general and particularly EA's reverence for forecasting but I feel, similar to @Grayden that forecasting is a game that is nearly perfectly designed to distract EAs from useful things. It's a combination of winning, being right when others are wrong and seemingly useful, all wrapped into a fun game. I'd like to see tangible benefits to more broad funding of forecasting that seems to be done in t he millions and tens of millions of dollars. I would also be the type of person you would think would be a greater fan of forecasting. I'm the number one forecaster on Manifold and I've made tens of thousands of dollars on Polymarket. But I think we should start to think of forecasting as more of a game that EAs like to play, something like Magic the Gathering that is fun and has some relations to useful things but isn't really useful by itself. Eli Lifland has a long and hard-to-summarize comment here, response from Ozzie Gooen here, podcast between them on “Is Forecasting A Promising EA Cause Area?” here. I’m split on this. My previous hope was that the field would gradually grow, without any qualitative changes or discontinuities, until it became big enough that journalists and policy-makers were aware of it and took it seriously (compare eg the growth of the Internet as a scholarly resource). I think the strongest argument against this is Manifold’s relatively flat user numbers. Is there a new hope? I think if nothing else, forecasting might be useful as a testing ground: First, to create forecasting AIs (like FutureSearch) which can then get consulted on a variety of questions, eg by policy-makers. The biggest holdup has always been the need to gather 20 or 50 or however many hard-to-find superforecasters for whatever question you’re asking, and then trust their advice even though they’re fallible fleshbag humans. If you can use the 20 to 50 superforecasters to inspire an AI, and then test the AI and prove it’s good, people might be more interested. This is especially true if the AI can branch out beyond traditional forecasting questions. Once we have a few of these, we can start comparing the next generation of AIs to the previous generation, and skip the superforecasters.
Inline links: Chumba Casino, Fliff, have to be handwritten in a very specific way, rejects them, more people, here, https://manifold.markets/Joshua/good-pivot-bad-pivot-which-opinions, leaving the site, Manifund, report on Superforecasting The Origins Of The COVID-19 Pandemic., https://substackcdn.com/image/fetch/$s_!F-e7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F679c34d2-766f-41bd-ae75-b036bcdb06f9_1456x849.webp, https://substackcdn.com/image/fetch/$s_!JSEn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c315554-fca5-4dc5-ac9a-d531b4ce7534_883x519.png, forecasting has a “rationale-shaped hole”, Nuno’s yearly forecasting roundup, Cultivate Labs, choosing to promote it to one of their main donation areas, here, superforecasters are generally far more relaxed about Xrisk, here, @Grayden, here, here, “Is Forecasting A Promising EA Cause Area?” here
I assume they chose these three because they’re the only ones discussed enough to have enough data. I am following their lead. I appreciate John and Maxim’s work, but I’m not completely comfortable trusting it. Their model is based on results from Betfair, Smarkets, PredictIt, and Polymarket. But I don’t know much about the first two (as an American, I’m banned from even reading Betfair), and the latter two are notoriously bad at partisan political questions. They usually overestimate Republicans’ chances, partly because Democrats’ opposition to online political betting has turned the pool of online political bettors disproportionately red. While a fluid and easily-accessible prediction market should be able to avoid biases like these, neither PredictIt nor Polymarket really qualifies. The CFTC, which regulates prediction markets, has crippled both - PredictIt has very low maximum investments per market, and Polymarket is crypto-only and banned for US citizens. These have prevented their biases from being corrected and made both of them perform relatively weakly in head-to-head contests. And Stossel/Lott’s focus on betting sites automatically excludes two of the biggest and most historically accurate forecasting engines from their calculation - Metaculus and Manifold. In order to get numbers I trusted more than theirs, I looked at Metaculus, Manifold, PredictIt, and Polymarket, weighting each by how much I trusted it. Here’s what I found: The Biden number is about 4% higher than Nate Silver’s model over the same time period; see below for why that might be. [EDIT 7/2/24: Original version had a miscalculation which decreased everyone’s odds by about 10%. Above version should be correct.] You can find my sources at the bottom of the post. “Explicit” odds are based on questions like “What are the chances of Biden winning if he is the nominee?” “Implied” odds were generated by combining the questions “What is the chance of Biden being the nominee?” and “What is the chance of Biden winning?”; this is safe enough with Biden, but with unlikely nominees like Newsom, some of the percentages can get small enough that they start running into small-number-biases and become less trustworthy. I’ve weighted each market’s explicit calculation higher than their implicit one to compensate. A possible objection to these results: conditional probabilities don’t exactly reflect the intuitive concept of decision-making. That is, we’re not asking “We want to know whether or not to keep Biden, so what are the chances that he’ll win if we do?”, we’re asking the market for the chance that he’ll win, in the set of worlds where people decide to keep him for other reasons. We should expect this to overestimate his performance. That is, imagine that tomorrow, Biden has completely recovered, he easily wins his next debate with Trump, and everyone agrees the most recent debate was just a fluke - in that world, he is both more likely to be nominated and more likely to win. Alternatively, if tomorrow he gets much worse and can’t even speak in full sentences, he’s much less likely to be nominated and much more likely to lose. Since the real world includes both those possibilities, restricting ourselves to the set of worlds where he gets nominated means we’re overestimating the chance that he wins. There are similar-albeit-less-severe problems with other candidates - if we choose Newsom, that might be because he won some kind of debate or process versus Harris and all the other potential replacements. Overall I expect this to be mostly correct, but probably overestimate Biden’s chances by a percent or two relative to others. Along with these three candidates, Metaculus had an explicit “should the Democrats replace Biden?” question: Manifold also asks how Democrats will do if they replace Biden (without specifying a particular replacement): We can compare this to their Biden market… …and find that once again, they expect replacing Biden to go better (though I think 51% is just cope). At the Manifest prediction market conference in early June, I interviewed Nate Silver: …and asked him for his probability that the Democrats would win this election, versus his probability that the Democrats would win conditional on Biden not being the nominee (specifically “drops dead tomorrow of natural causes”). He said 40-45% chance normally, 50% chance without Biden. This was before the debate, but I think it matches the markets’ opinion that switching candidates would help the Democrats’ chances - and this has only become more true since the debate. On the other hand, polls asking people how they would vote in possible matchups don’t show any advantage of alternate candidates over Biden. Here’s the only post-debate poll I could find: And if Biden does need to be replaced, Democrats mostly support Harris, who the prediction markets find least promising: Maybe Democrats are the wrong people to ask - they’re already going to vote Biden, so you want someone who’s more attractive to independents. Of course, in a normal primary it would be Democrats making the decision. But if elites are going to do something behind closed doors, maybe they should take advantage and choose the candidate most likely to win, for once. I think these polls are the strongest objection to the prediction markets’ verdict. You could make an argument where prediction market users are mostly educated liberal white males, and even though they’re incentivized to honestly determine what ordinary people think, they’re too out-of-touch with ordinary people to do so effectively. Or they might be over-fixating on “voters don’t like Biden’s senility” without considering that, even if voters didn’t know Biden was currently senile before Thursday, they probably guessed that he would become senile sometime in his four-year term, and had basically accepted that his aides would do the hard work. Maybe they prefer a well-known likeable incumbent over an unknown quantity (and the unknown quantity’s potential new/weird aides), even if the well-known likeable incumbent is senile. Maybe elites know more than we do about how hard it is to inject a new candidate at the last moment, how dangerous it is to have someone who hasn’t been thoroughly vetted for scandals, et cetera. Still, for now I trust the prediction markets. I think replacing Biden would add ~10 prcentage points to the Democrats’ chance of victory. At the end of this post, I’ll list the prediction markets I’m using as sources. But before then, a brief interlude of: Fuzzy Subjective Human Factors I Am Not Really Qualified To Talk About Many people on Twitter are asking “how could anyone possibly have been stupid enough to not realize that Biden was senile?” I was that stupid. I didn’t say it openly, because I’m at least smart enough to have a high threshold for giving my opinion on political things I don’t know much about. But I thought it in my heart. So in case the people asking “how could anyone have been that stupid?” actually want an explanation, here’s my former reasoning. Republicans have been accusing Biden of being senile (and the Democrats of hiding it) for at least five years now. Before the 2020 debates, they were excited that this was when they could finally prove once and for all that Biden was senile. Then Biden did fine, and they retreated to “well he’s senile but they have some secret drug they’re giving him, just during debates, that makes him look fine”. Notice this is from 2020; according to polls, he did win the debate that year (source) I think a lot about experimental cognitive enhancement drugs, and I can say with confidence that nothing like that exists. Stimulants can help people with mild dementia be more active and motivated, but they don’t really improve cognition directly, and they can’t make a demented person temporarily lucid. Still, for the past four years, every time Biden was going to do something - a press conference, a State of the Union, whatever - the Republicans would say “ha, this time is going to be the proof that he’s senile!” And then he would always do fine, and they would retreat back to “I guess he used the secret drug this time too”. The satire site Babylon Bee had some funny articles about this: Babylon Bee, after Biden gave a good State of the Union speech earlier this year. Meanwhile, the Democrats were spreading the alternate narrative that Trump was senile. This one has gotten less press, because I don’t know how many people really believed it. But it came up occasionally, along with out-of-context video snippets where Trump said or did something dumb or meandering. Of course, anybody with a presidential candidate’s level of public exposure will have a few gaffes. Even if they don’t, you can always deceptively crop something so it looks like they did. Wait, why is a psychoanalyst getting quoted as a top expert in dementia? (source) I didn’t know you could diagnose someone via Change.org petition, but 2544 people who claim to be licensed professionals can’t be wrong! So with the constant attempts to prove that both candidates were senile, the constant demonstration by both candidates that they weren’t, and the constant retreat into conspiracy theories of “I guess he used the magic drug again but we’ll get him next time!”, I just tuned out this entire category of thing. And I guess I kept it tuned out longer than I should have, whoops. Reversed stupidity is not intelligence. Even if liars are saying something for their usual liar reasons, it can still be true. For twenty years, people spread false rumors that Castro was on his deathbed, but this didn’t make Castro immortal. In the same way, I should have figured out that even if I couldn’t trust any particular claim that Biden was senile, the prior for an 81 year old becoming senile was still high. But I guess I assumed that if he was becoming senile, some Democratic elites would have secret knowledge about it, and they couldn’t possibly be so stupid as to deny it while also scheduling him for a debate where it would inevitably come out. So I figured the Democratic elites who were closest to him thought he was doing well, and I trusted them more than the people who had been wrong every time for the past five years. I’m still confused what those elites were thinking. Reading the news coverage for the past few days (including some video clips from a post-debate rally where he seemed noticeably better) it seems like some combination of: He has good days and bad days, and they were hoping this would be a good day.
Inline links: https://substackcdn.com/image/fetch/$s_!3oIv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e3710e-6231-49c6-989b-1bc16f70fdf7_833x288.png, https://substackcdn.com/image/fetch/$s_!Fr-Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7933bcc-fc24-4043-9064-47a958e24497_787x306.png, https://manifold.markets/HamishTodd/conditional-upon-the-democratic-par, Here’s, https://substackcdn.com/image/fetch/$s_!Q3fs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99e148f-4051-40f9-b0c6-22eb10289854_1500x1091.png, https://substackcdn.com/image/fetch/$s_!MXOh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d694e0c-d306-4578-a55d-5314ca965a83_1500x1246.png, https://substackcdn.com/image/fetch/$s_!qt9P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa11d1a27-cfc2-4708-9db1-e344b391ac9a_1288x758.png, source, think a lot about experimental cognitive enhancement drugs, https://substackcdn.com/image/fetch/$s_!shq2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92d67bb2-5940-44ba-ad21-df822276280a_781x829.png, Babylon Bee, https://substackcdn.com/image/fetch/$s_!2y5u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa986c7ac-16c2-4987-b520-8f6ddf9ad13d_757x690.png, https://substackcdn.com/image/fetch/$s_!SN0G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6068601f-0eef-4a1c-b885-19f5712decf1_751x668.png, https://substackcdn.com/image/fetch/$s_!vL01!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb26c689-1e81-422b-9493-ae44cac59125_858x814.png, source, https://substackcdn.com/image/fetch/$s_!6ssu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F760276bb-2f91-4545-bbf5-d3f9fe911de7_975x654.png, Change.org petition, Reversed stupidity is not intelligence
Manifold also asks how Democrats will do if they replace Biden (without specifying a particular replacement): We can compare this to their Biden market… …and find that once again, they expect replacing Biden to go better (though I think 51% is just cope). At the Manifest prediction market conference in early June, I interviewed Nate Silver: …and asked him for his probability that the Democrats would win this election, versus his probability that the Democrats would win conditional on Biden not being the nominee (specifically “drops dead tomorrow of natural causes”). He said 40-45% chance normally, 50% chance without Biden. This was before the debate, but I think it matches the markets’ opinion that switching candidates would help the Democrats’ chances - and this has only become more true since the debate. On the other hand, polls asking people how they would vote in possible matchups don’t show any advantage of alternate candidates over Biden. Here’s the only post-debate poll I could find: And if Biden does need to be replaced, Democrats mostly support Harris, who the prediction markets find least promising: Maybe Democrats are the wrong people to ask - they’re already going to vote Biden, so you want someone who’s more attractive to independents. Of course, in a normal primary it would be Democrats making the decision. But if elites are going to do something behind closed doors, maybe they should take advantage and choose the candidate most likely to win, for once. I think these polls are the strongest objection to the prediction markets’ verdict. You could make an argument where prediction market users are mostly educated liberal white males, and even though they’re incentivized to honestly determine what ordinary people think, they’re too out-of-touch with ordinary people to do so effectively. Or they might be over-fixating on “voters don’t like Biden’s senility” without considering that, even if voters didn’t know Biden was currently senile before Thursday, they probably guessed that he would become senile sometime in his four-year term, and had basically accepted that his aides would do the hard work. Maybe they prefer a well-known likeable incumbent over an unknown quantity (and the unknown quantity’s potential new/weird aides), even if the well-known likeable incumbent is senile. Maybe elites know more than we do about how hard it is to inject a new candidate at the last moment, how dangerous it is to have someone who hasn’t been thoroughly vetted for scandals, et cetera. Still, for now I trust the prediction markets. I think replacing Biden would add ~10 prcentage points to the Democrats’ chance of victory. At the end of this post, I’ll list the prediction markets I’m using as sources. But before then, a brief interlude of: Fuzzy Subjective Human Factors I Am Not Really Qualified To Talk About Many people on Twitter are asking “how could anyone possibly have been stupid enough to not realize that Biden was senile?” I was that stupid. I didn’t say it openly, because I’m at least smart enough to have a high threshold for giving my opinion on political things I don’t know much about. But I thought it in my heart. So in case the people asking “how could anyone have been that stupid?” actually want an explanation, here’s my former reasoning. Republicans have been accusing Biden of being senile (and the Democrats of hiding it) for at least five years now. Before the 2020 debates, they were excited that this was when they could finally prove once and for all that Biden was senile. Then Biden did fine, and they retreated to “well he’s senile but they have some secret drug they’re giving him, just during debates, that makes him look fine”. Notice this is from 2020; according to polls, he did win the debate that year (source) I think a lot about experimental cognitive enhancement drugs, and I can say with confidence that nothing like that exists. Stimulants can help people with mild dementia be more active and motivated, but they don’t really improve cognition directly, and they can’t make a demented person temporarily lucid. Still, for the past four years, every time Biden was going to do something - a press conference, a State of the Union, whatever - the Republicans would say “ha, this time is going to be the proof that he’s senile!” And then he would always do fine, and they would retreat back to “I guess he used the secret drug this time too”. The satire site Babylon Bee had some funny articles about this: Babylon Bee, after Biden gave a good State of the Union speech earlier this year. Meanwhile, the Democrats were spreading the alternate narrative that Trump was senile. This one has gotten less press, because I don’t know how many people really believed it. But it came up occasionally, along with out-of-context video snippets where Trump said or did something dumb or meandering. Of course, anybody with a presidential candidate’s level of public exposure will have a few gaffes. Even if they don’t, you can always deceptively crop something so it looks like they did. Wait, why is a psychoanalyst getting quoted as a top expert in dementia? (source) I didn’t know you could diagnose someone via Change.org petition, but 2544 people who claim to be licensed professionals can’t be wrong! So with the constant attempts to prove that both candidates were senile, the constant demonstration by both candidates that they weren’t, and the constant retreat into conspiracy theories of “I guess he used the magic drug again but we’ll get him next time!”, I just tuned out this entire category of thing. And I guess I kept it tuned out longer than I should have, whoops. Reversed stupidity is not intelligence. Even if liars are saying something for their usual liar reasons, it can still be true. For twenty years, people spread false rumors that Castro was on his deathbed, but this didn’t make Castro immortal. In the same way, I should have figured out that even if I couldn’t trust any particular claim that Biden was senile, the prior for an 81 year old becoming senile was still high. But I guess I assumed that if he was becoming senile, some Democratic elites would have secret knowledge about it, and they couldn’t possibly be so stupid as to deny it while also scheduling him for a debate where it would inevitably come out. So I figured the Democratic elites who were closest to him thought he was doing well, and I trusted them more than the people who had been wrong every time for the past five years. I’m still confused what those elites were thinking. Reading the news coverage for the past few days (including some video clips from a post-debate rally where he seemed noticeably better) it seems like some combination of: He has good days and bad days, and they were hoping this would be a good day.
Inline links: https://manifold.markets/HamishTodd/conditional-upon-the-democratic-par, Here’s, https://substackcdn.com/image/fetch/$s_!Q3fs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99e148f-4051-40f9-b0c6-22eb10289854_1500x1091.png, https://substackcdn.com/image/fetch/$s_!MXOh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d694e0c-d306-4578-a55d-5314ca965a83_1500x1246.png, https://substackcdn.com/image/fetch/$s_!qt9P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa11d1a27-cfc2-4708-9db1-e344b391ac9a_1288x758.png, source, think a lot about experimental cognitive enhancement drugs, https://substackcdn.com/image/fetch/$s_!shq2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92d67bb2-5940-44ba-ad21-df822276280a_781x829.png, Babylon Bee, https://substackcdn.com/image/fetch/$s_!2y5u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa986c7ac-16c2-4987-b520-8f6ddf9ad13d_757x690.png, https://substackcdn.com/image/fetch/$s_!SN0G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6068601f-0eef-4a1c-b885-19f5712decf1_751x668.png, https://substackcdn.com/image/fetch/$s_!vL01!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb26c689-1e81-422b-9493-ae44cac59125_858x814.png, source, https://substackcdn.com/image/fetch/$s_!6ssu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F760276bb-2f91-4545-bbf5-d3f9fe911de7_975x654.png, Change.org petition, Reversed stupidity is not intelligence
We can compare this to their Biden market… …and find that once again, they expect replacing Biden to go better (though I think 51% is just cope). At the Manifest prediction market conference in early June, I interviewed Nate Silver: …and asked him for his probability that the Democrats would win this election, versus his probability that the Democrats would win conditional on Biden not being the nominee (specifically “drops dead tomorrow of natural causes”). He said 40-45% chance normally, 50% chance without Biden. This was before the debate, but I think it matches the markets’ opinion that switching candidates would help the Democrats’ chances - and this has only become more true since the debate. On the other hand, polls asking people how they would vote in possible matchups don’t show any advantage of alternate candidates over Biden. Here’s the only post-debate poll I could find: And if Biden does need to be replaced, Democrats mostly support Harris, who the prediction markets find least promising: Maybe Democrats are the wrong people to ask - they’re already going to vote Biden, so you want someone who’s more attractive to independents. Of course, in a normal primary it would be Democrats making the decision. But if elites are going to do something behind closed doors, maybe they should take advantage and choose the candidate most likely to win, for once. I think these polls are the strongest objection to the prediction markets’ verdict. You could make an argument where prediction market users are mostly educated liberal white males, and even though they’re incentivized to honestly determine what ordinary people think, they’re too out-of-touch with ordinary people to do so effectively. Or they might be over-fixating on “voters don’t like Biden’s senility” without considering that, even if voters didn’t know Biden was currently senile before Thursday, they probably guessed that he would become senile sometime in his four-year term, and had basically accepted that his aides would do the hard work. Maybe they prefer a well-known likeable incumbent over an unknown quantity (and the unknown quantity’s potential new/weird aides), even if the well-known likeable incumbent is senile. Maybe elites know more than we do about how hard it is to inject a new candidate at the last moment, how dangerous it is to have someone who hasn’t been thoroughly vetted for scandals, et cetera. Still, for now I trust the prediction markets. I think replacing Biden would add ~10 prcentage points to the Democrats’ chance of victory. At the end of this post, I’ll list the prediction markets I’m using as sources. But before then, a brief interlude of: Fuzzy Subjective Human Factors I Am Not Really Qualified To Talk About Many people on Twitter are asking “how could anyone possibly have been stupid enough to not realize that Biden was senile?” I was that stupid. I didn’t say it openly, because I’m at least smart enough to have a high threshold for giving my opinion on political things I don’t know much about. But I thought it in my heart. So in case the people asking “how could anyone have been that stupid?” actually want an explanation, here’s my former reasoning. Republicans have been accusing Biden of being senile (and the Democrats of hiding it) for at least five years now. Before the 2020 debates, they were excited that this was when they could finally prove once and for all that Biden was senile. Then Biden did fine, and they retreated to “well he’s senile but they have some secret drug they’re giving him, just during debates, that makes him look fine”. Notice this is from 2020; according to polls, he did win the debate that year (source) I think a lot about experimental cognitive enhancement drugs, and I can say with confidence that nothing like that exists. Stimulants can help people with mild dementia be more active and motivated, but they don’t really improve cognition directly, and they can’t make a demented person temporarily lucid. Still, for the past four years, every time Biden was going to do something - a press conference, a State of the Union, whatever - the Republicans would say “ha, this time is going to be the proof that he’s senile!” And then he would always do fine, and they would retreat back to “I guess he used the secret drug this time too”. The satire site Babylon Bee had some funny articles about this: Babylon Bee, after Biden gave a good State of the Union speech earlier this year. Meanwhile, the Democrats were spreading the alternate narrative that Trump was senile. This one has gotten less press, because I don’t know how many people really believed it. But it came up occasionally, along with out-of-context video snippets where Trump said or did something dumb or meandering. Of course, anybody with a presidential candidate’s level of public exposure will have a few gaffes. Even if they don’t, you can always deceptively crop something so it looks like they did. Wait, why is a psychoanalyst getting quoted as a top expert in dementia? (source) I didn’t know you could diagnose someone via Change.org petition, but 2544 people who claim to be licensed professionals can’t be wrong! So with the constant attempts to prove that both candidates were senile, the constant demonstration by both candidates that they weren’t, and the constant retreat into conspiracy theories of “I guess he used the magic drug again but we’ll get him next time!”, I just tuned out this entire category of thing. And I guess I kept it tuned out longer than I should have, whoops. Reversed stupidity is not intelligence. Even if liars are saying something for their usual liar reasons, it can still be true. For twenty years, people spread false rumors that Castro was on his deathbed, but this didn’t make Castro immortal. In the same way, I should have figured out that even if I couldn’t trust any particular claim that Biden was senile, the prior for an 81 year old becoming senile was still high. But I guess I assumed that if he was becoming senile, some Democratic elites would have secret knowledge about it, and they couldn’t possibly be so stupid as to deny it while also scheduling him for a debate where it would inevitably come out. So I figured the Democratic elites who were closest to him thought he was doing well, and I trusted them more than the people who had been wrong every time for the past five years. I’m still confused what those elites were thinking. Reading the news coverage for the past few days (including some video clips from a post-debate rally where he seemed noticeably better) it seems like some combination of: He has good days and bad days, and they were hoping this would be a good day.
Inline links: Here’s, https://substackcdn.com/image/fetch/$s_!Q3fs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff99e148f-4051-40f9-b0c6-22eb10289854_1500x1091.png, https://substackcdn.com/image/fetch/$s_!MXOh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d694e0c-d306-4578-a55d-5314ca965a83_1500x1246.png, https://substackcdn.com/image/fetch/$s_!qt9P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa11d1a27-cfc2-4708-9db1-e344b391ac9a_1288x758.png, source, think a lot about experimental cognitive enhancement drugs, https://substackcdn.com/image/fetch/$s_!shq2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92d67bb2-5940-44ba-ad21-df822276280a_781x829.png, Babylon Bee, https://substackcdn.com/image/fetch/$s_!2y5u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa986c7ac-16c2-4987-b520-8f6ddf9ad13d_757x690.png, https://substackcdn.com/image/fetch/$s_!SN0G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6068601f-0eef-4a1c-b885-19f5712decf1_751x668.png, https://substackcdn.com/image/fetch/$s_!vL01!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb26c689-1e81-422b-9493-ae44cac59125_858x814.png, source, https://substackcdn.com/image/fetch/$s_!6ssu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F760276bb-2f91-4545-bbf5-d3f9fe911de7_975x654.png, Change.org petition, Reversed stupidity is not intelligence
The basic structure is the same as past forecasting AIs like FutureSearch. A heavily-modified copy of ChatGPT gathers relevant news articles, then prompts itself to think in superforecaster-like ways. The creators say the ChatGPT copy had a knowledge cutoff of October 2023, so they tested it on Metaculus questions from after that date. It got 87.7% accuracy, slightly above Metaculus forecasters’ 87.0%. Manifold is skeptical: The commenters, especially Neel Nanda, found that doing knowledge cutoffs properly is hard, and the ChatGPT base seems to know about news events after October 2023 - upon questioning, it seemed aware of an earthquake in November 2023. When presented with a different set of questions that were all after November 2023, FiveThirtyNine substantially underperformed the Metaculus average. But also, my attempts to play around with the bot haven’t been encouraging: I asked it to predict the chance that Prospera would have a population of at least 1,000 in 2027. Like FutureSearch on the same question, it cited many interesting news articles on Prospera’s chances but failed to do the basic step of figuring out its current population and growth rate. It eventually concluded 35% chance, which is reasonable enough. But when asked whether Prospera would have a population of 100,000 in 2028, it also said 35% chance, which is absurd.
Inline links: FutureSearch, When presented with a different set of questions
1: You knew it was coming: See also various slightly-weaker or slightly-stronger versions of the same question (includes wildlife, includes any immigrants, includes only Springfield). I actually appreciate this a lot, because most of the debate around Catgate has focused on how there’s “no evidence” it’s happening, but “no evidence” is cheap and I prefer an outright forecast.
Inline links: includes wildlife, includes any immigrants, includes only Springfield, “no evidence” is cheap
See also various slightly-weaker or slightly-stronger versions of the same question (includes wildlife, includes any immigrants, includes only Springfield). I actually appreciate this a lot, because most of the debate around Catgate has focused on how there’s “no evidence” it’s happening, but “no evidence” is cheap and I prefer an outright forecast.
3: I’ve been asked to advertise The Curve, a conference on “the trajectory of transformative AI”, including forecasting, alignment, etc. It’s by the Manifold/Manifest/Manifund team and will be held at Lighthaven, Berkeley from November 22 - 24, tickets are $100 for students, $300 - $800 for others. Apply here. I hope to attend.
3: Manifold Markets is hosting an election night party (or mourning vigil, depending) in Berkeley, go here for details.
Inline links: go here for details
Iranian nukes more likely under Trump (49.5%) than Harris (45%) All of these involve foreign policy going worse under Trump than Harris. Is this unfair? Even Trump’s supporters would agree he is less interested in funding Ukrainian resistance than Harris; Metaculus’ numbers here seem in line with this. Harris is more likely to continue deals where Iran gets sanctions relief / money in exchange for not going nuclear. Whether or not you agree with Trump that those deals are extortionary and unfair, it makes sense that Iran is more likely to go nuclear if those deals are discontinued. But this is also a small effect and could be noise. The Taiwan numbers are the least convincing, but seem to be based off of arguments like the ones here: Trump has said that Taiwan should “pay for” defense, and generally been skeptical of foreign entanglements. And here’s Manifold’s version of the same thing: Polymarket’s Wild Ride On October 14th, Polymarket gave Donald Trump 54% odds of winning, compared to Nate Silver’s 49% and Metaculus’ 45%. Whatever, everyone knows Polymarket has a small right-wing bias, and 5% isn’t too bad. Three days later, it had risen from 54% to 61%, despite no news and no change for Metaculus or Nate, bringing the Polymarket/Silver spread to an unprecedented 11%. What happened? This is the rare prediction market story where the answers are already in the New York Times and the Wall Street Journal: one really rich guy put $30 million on Trump (a recent followup by Jorge Velez claims it’s actually more like $75 million). Although he prefers to remain anonymous, reporters have talked to him and are able to reveal that he’s French, goes by “Theo”, is a former banker, and has no insider connections. He just a normal rich guy who really thinks Trump will win. This is exactly the sort of shock that prediction markets are supposed to be resilient against. Instead, the market stayed at 61% for days, swung even higher for a while, finally fell back down two weeks later, then went back up again. What happened? The simplest story would be insufficient liquidity: there just weren’t enough people to gather the $75 million it would take to bet against Theo. This is superficially plausible: Polymarket requires crypto and bans Americans, so the mispricing couldn’t be corrected until enough crypto-literate, American-election-following foreigners showed up to bet $75 million. That’s a tall order, and maybe it took two weeks. But the simple story seems wrong. Other real-money markets rose approximately in tandem with Polymarket. For example, Smarkets got to Trump 59% on 10/16, and peaked at 64% on 10/30. Kalshi followed a similar path. Both tracked Polymarket, not Nate Silver or Metaculus (neither of whom ever went above Trump 55% since Harris joined the race). So I think the remaining stories are: Theo made his giant bet on Polymarket. By coincidence, at the same time, bettors everywhere massively overcounted a few good polls for Trump and started a feeding frenzy on pro-Trump shares. This made all other markets gain, and Polymarket stay at its Theo-caused peak, until a few bad polls for Trump brought everyone back to reality last week.
Inline links: here, here’s, https://substackcdn.com/image/fetch/$s_!ubU7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F709cc4a2-d203-424d-a6fb-a796d1ad2cdb_949x547.png, https://substackcdn.com/image/fetch/$s_!i5yn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc272d35b-10d7-4580-9757-05dd258e69d3_741x443.png, New York Times, Wall Street Journal, followup, Smarkets, https://substackcdn.com/image/fetch/$s_!8twW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25ccd651-3404-446a-a7cb-5747af316038_1180x315.png
It also serves as yet another point in favor of non-real-money forecasts like Metaculus, Nate Silver, and Manifold, all three of which agreed with each other while disagreeing with the big real-money markets like Polymarket, Smarkets, and Betfair. In theory we can’t say which group (real money vs. no money) was right. In practice, we know that Polymarket was mostly skewed by one giant bet, that there wasn’t nearly enough pro-Trump news to explain the movement, and that past disagreements have usually resolved in favor of the no-money markets. I’m as surprised as anyone to learn this (especially since Manifold is so close to a money market that a lot of explanations for real-money markets’ failure ought to affect them too), but it does seem to be a consistent feature of these things.
One group - the non-money forecasters - said the election was 50%. Nate Silver was in this group. So was Metaculus, a forecasting engine which has outperformed prediction markets in the past, and Manifold, a mostly-play-money prediction market.
In my own contest, Metaculus (a non-money forecaster) outperformed Manifold (a play-money market with some tenuous connection to real money). And in Manifold’s own poll, users said they thought Metaculus was more accurate than Polymarket or themselves.
Inline links: my own contest, users said
Non-money forecasters have an opposite problem of having no incentive to get things right in the first place. This disqualifies most pundits, but the best forecasting sites have found ways around this. On Metaculus, users risk reputation rather than money; this is easier, since there isn’t some opportunity cost to Metaculus reputation that creates weird dynamics of when vs. when not to invest. On Manifold, people risk play money, which is sort of linked to real money in various obscure ways but you can’t trivially sink your life savings into Manifold and expect to get it back; this is about halfway between monetary and reputational systems. As for Nate Silver, I think he loves gambling enough that he naturally uses a gambling mindset even when he’s not risking money (although he is risking his own reputation, and sometimes does risk money on his beliefs). I didn’t originally think these kinds of “soft” incentives would work as well as real money, but the evidence above has changed my mind.
Inline links: sometimes does risk money on his beliefs
From “Genesis and pathogenesis of the 1918 pandemic H1N1 influenza A virus”, linked above. You may recognize the lead author - Michael Worobey has also been a leading voice on the zoonotic side of the COVID origins debate. The recent history of the flu, as far as I can tell, is: 1918: An H1N1 flu (“Spanish flu”) jumped from birds to humans in America and killed 50 million people worldwide. This replaced all older strains, so most seasonal flus during this era were H1N1. 1957: An H2N2 flu (“Asian flu”) crossed from birds to humans in China, and killed about 2 million people worldwide. It replaced the H1N1 strain, so most seasonal flus during this era were H2N2. 1968: An H3N2 flu (“Hong Kong flu”) crossed from pigs (?) to humans in Hong Kong, and killed another 2 million people worldwide. It replaced the H2N2 strain, so most seasonal flus during this era were H3N2. 1977: An H1N1 flu (“Russian flu”) leaked from a biology lab (?) in Russia (it might have been a strain from the 1940s, which the Russians were trying to make a vaccine for). It didn’t kill that many people, but it stuck around, and from then on, seasonal flus could be either H3N2 or H1N1. 2009: An H1N1 flu (“Mexican flu” until the PC police stepped in; afterwards “swine flu”) took some horrible circuitous route between birds and pigs and back again, crossed over into humans in Mexico, and killed 200,000 people. It outcompeted older strains of H1N1, but couldn’t crowd out H3N2, so seasonal flus are still either H3N2 or H1N1. …which brings us to the present, hopefully illuminating why “new flu strain crosses over from animals into humans” is such an “uh oh” moment. The Bird Flu Technically, all pandemic flus start as bird flus. Influenza A evolved in birds. Sometimes it spreads to other animals, including pigs, cattle, and humans. The most common way for a bird flu to spread to humans is to “reassort” (not exactly virus sex, but close enough, and the real version is less memorable) with a human flu virus (ie one that has already crossed over to humans). The resulting virus has all of the human flu virus’ human adaptations, but borrows enough new antigens from the bird virus to evade the immune system. Pigs can be infected by both human and bird viruses, so they are a common place for this reassortment to take place. If reassortment is sort of like viral sex, pigs are sort of like Tinder. When a bird flu and human flu reassort in pigs, the resulting disease is called a swine flu. At least the 2009 flu pandemic was a swine flu, and a minority opinion thinks the 1918 pandemic was too. There aren’t major epidemiological differences between direct-from-bird flus and swine flus. H5N1 was first noticed in birds - specifically, a flock of chickens in Scotland in 1959 - after which it disappeared for forty years. In 1996, it showed up in geese in China, then gradually increased its market share among birds worldwide. In 2022, it was found in minks; apparently it had learned to infect mammals. By early 2024, it was seen in cows. Now it’s in cow herds in 16 states, and one of them (California) has declared a state of emergency. And in October, H5N1 was found in pigs for the first time. It’s not uncommon for humans to catch an animal disease. This doesn’t mean the disease has “crossed over” to humans. If the virus isn’t suited to human-to-human transmission, it simply dies off (either before or after killing its human host). Thus, chicken farmers have been reporting scattered H5N1 cases since 1997; now that the virus has spread to cattle, cow farmers have started reporting the same. A Metaculus comment on this topic introduced me to the phrase “biocomputational surface”. Every viral replication that takes place in a human gives the virus one more chance to develop the set of mutations that makes it human-transmissible and start the next pandemic. Or, more likely, every viral replication that takes place in a human who has both the H5N1 bird flu and a normal human flu - or in a pig which has both viruses - gives the virus one extra chance to reassort in a way that produces a bird-antigen-fortified human-adapted flu virus. This doesn’t mean H5N1 will definitely become human-transmissible soon. Many viruses hang out on the borders of transmissibility for decades. Some, for unclear reasons, never cross over at all. But all of this is compatible with the virus becoming transmissible soon. So: What Is The Chance Of A Pandemic? The prediction markets on this topic ask a question about “10,000 cases in the United States”. Does this necessarily mean “pandemic”? Might it be possible to get to 10,000 cases just from the scattered chicken and cow farmers, with no human-to-human transmission? Despite many chicken and cow infections this year, there have only been 60 - 70 recorded human cases. Unless there is a phase change in screening methods, it seems hard for this number to increase to 10,000 off farmers alone. I think it’s fair to treat this question as operationalizing “what is the chance of a pandemic”? By this definition, Manifold estimates a 40% chance of an H5N1 pandemic in 2025. Metaculus estimates a 5% chance. You can see below whether that’s changed since I wrote this essay: 5% versus 40% is a big difference! Who do we trust? I trust Metaculus. Metaculus has beaten Manifold in both of the two head-to-head comparisons that I know of (Jeremiah Johnson’s and mine). Manifold’s number swings by a factor of two from week to week; Metaculus has been steady. But also, Metaculus hosts a CDC-sponsored respiratory disease forecasting tournament which has enriched them in epidemiological expertise. And if you look at the quality of comments on both sites, it’s pretty obvious where the people with more intellectual chops are hanging out. The Manifold comments are mostly single sentences, or occasionally just links to an article about new cases. The Metaculus comments look more like this one by dimaklenchin: Despite the panic propaganda, H5N1 is unlikely to be "just a single mutation away from switching host preference": 1) It normally takes a lot more than a single mutation to switch hosts. E.g., there are at least five different reasons why SIV (monkey equivalent of HIV) is not infectious to humans. Heck, a variant of SIV that bears HIV's receptor-recognizing surface protein (SHIV) is still not infectious to humans. HIV most certainly evolved from SIV but, almost as certainly, it took a very long time to get there. Not that all viruses are the same and things can't turn out differently with flu, but I don't subscribe to the idea that a mere change of receptor specificity (something that can take 1-2 mutations) will be sufficient. 2) We have data. Lots of human infections with other varieties of bird flu in the past - all those viruses ultimately went nowhere. Why would H5N1 be radically different? E.g., the "Canadian teen", despite what sounds like a prolonged exposure, failed to infect anyone around him. Since I am at 18% for the h-2-h H5N1 detection in 2025, I am arbitrarily going ~ an order of magnitude lower than that for something as unprecedented as 10K human infections. Maybe should be much lower but hedging for the time being and will allow another couple months of observations. And Sergio: I'm currently at 20% on the question of reported human-to-human transmission of highly pathogenic avian influenza H5N1 globally before 2026. However, this question is only about the US, and is more general about all subtypes of H5. But H5N1 very strongly appears to be the most important subtype to consider in this time period. And, given the current situation in the US with H5N1 human cases derived from exposure to poultry or cattle (with cattle(mammals) being more worrisome), h2h transmission seems quite more likely to arise in North America than elsewhere before 2026. Conditioning on h2h transmission in the US (and also trying to consider, with lower probability, a start in Canada), I want to estimate the chances that it becomes sustained and out of control (in which case, if it starts in Canada, I largely expect it to spread to the US). The (6) past events of probable h2h transmission of avian H5(N1), none of which were sustained, could serve as a base rate, although I'm a bit wary of giving much weight to this precedent, since the last event was quite a while ago (2007), and also because reporting and testing standards may have improved considerably since then (so perhaps they might not have been classified as h2h transmission events if they had occurred more recently). The current situation in the US, and events such as the Canadian teen who got sick with H5N1, do suggest a higher background level of risk than normal (which would be reduced if a vaccine for cattle is licensed soon), but I'm wary of overupdating. Conditioned on sustained h2h transmission, reaching over 10k cases in a few months seems likely, although perhaps very strong monitoring and surveillance could contain the situation in time (at the very least to moderate the growth rate). Trying to combine all these factors somewhat haphazardly, I'm currently at 3.5% for this question. That’s before 2026. What about longer-term? Manifold gives a ~50% chance before 2030; Metaculus uses a more complicated method but it says about 25% chance before 2030. H5N1 may cross to humans, but it could take a while. Superforecaster Juan Cambeiro at The Institute For Progress estimated a 4% chance of a “worse than COVID” H5N1 pandemic in “the next year”, but their estimate was made in 2023, without the benefit of the Metaculus estimates or most of our current knowledge. This feels high now - Metaculus says 5% total for H5N1 pandemic, and most pandemic flus are not worse than COVID. IFP also seem to be expecting a case fatality rate greater than 10%, which I find unlikely for the reasons mentioned above. I trust their estimate less than Metaculus’ current ones. I conclude that the most plausible estimate for the chance of an H5N1 pandemic in the next year is 5%. Interestingly, 5% is about the base rate for pandemic flus per year: five in the past century = one per twenty years = 5% chance per year. Isn’t it surprising that we’re still at the base rate when we can see a dangerous-looking flu virus spreading through the types of animals that have caused pandemic flus in the past? Part of the answer is that we’re not - in addition to the 5% chance of H5N1, we have to add the chance of some other pandemic flu. This probably isn’t 5% on its own; scientists monitor flu strains closely, and they haven’t found any others which are giving off as many red flags as H5N1. Still, something could always come out of left field. Maybe we should add a 2.5% chance of some other strain, for a total of 7.5% chance of a flu pandemic (ie beyond normal seasonal flu) next year. But still, isn’t it surprising that we’re so close to the base rate? One way to think about this: the base rate represents how concerned we should be if there was no epidemiological monitoring at all. In that case, we would estimate a probability distribution across different epidemiological landscapes, most of which contain some concerning-looking flu strains. Since we are doing the epidemiological monitoring, we can collapse that distribution into a single picture: one flu strain, H5N1, is in fact pretty concerning, and other strains mostly aren’t. This is enough to move our prior from 5% to 7.5%, but no more. The forecasters I talked to raised one other point of uncertainty: does the flu work more like a dice roll, or like a bus? Dice rolls are uncorrelated with their predecessors; even if it’s been a hundred rolls since you last rolled a 6, your chance this time is still 1/6. But buses come at fixed intervals; if the buses are hourly, and you haven’t seen a bus in the past 59 minutes, then your chance of seeing a bus in the next minute is very high. It’s been 16 years since the last flu pandemic; these pandemics come (on average) every 20 years. I don’t think anyone has a good sense of how to think about this. But it was 40 years between the Spanish and Hong Kong flus, so the twenty year number is at best a rule of thumb. The 5% number feels very low to me (and, apparently, to the average Manifold forecaster). Isn’t H5N1 spreading to cows and pigs and all sorts of other mammals? Isn’t it in the news all the time? I trust Metaculus a lot, but I agree that this is a surprising update, and I’m taking it on faith rather than feeling it in my bones. What Would The Fatality Rate Be For An H5N1 Pandemic? There are four basic stories you could tell about likely H5N1 mortality. First, maybe mortality would be 50%. The argument here is that official statistics report this mortality rate in the chicken farmers who have been infected with H5N1 so far. Several news sources and even some scientists have raised the specter of a pandemic version of H5N1 pandemic with this same death rate, which could kill a quarter to a third of the world population. THIS IS EXTREMELY FAKE. The official statistics only report fatality rate in the infections we know about. Bird flu is rare, there’s no mass testing, and we only learn that somebody had it if they’re in a hospital and the doctors are worried enough to test for rare conditions. Of Americans who got bird flu in the past year, 0 out of 61 have died. Probably this is mostly because America upped its detection game and is now finding milder cases; we also can’t rule out the virus mutating to become less virulent. Metaculus estimates the current true mortality rate as 1.25%. …but leaves a wide 90% confidence interval, from 0.5% to 7%. Second, maybe mortality would be somewhere around 1.25%. The argument here is that Metaculus uses this as its central estimate of US mortality. But Sentinel discusses some reasons to be skeptical of broad inferences from the US numbers: Scientists have been puzzled by the apparently low H5N1 case fatality rate in humans in the US. They offer a number of hypotheses: “The way in which the virus is being transmitted — along with the amount of virus exposure — is limiting the severity of disease.”
Inline links: was found in pigs, Jeremiah Johnson’s, mine, a CDC-sponsored respiratory disease forecasting tournament, dimaklenchin, Sergio, reported human-to-human transmission of highly pathogenic avian influenza H5N1 globally before 2026, human cases, past events, vaccine for cattle is licensed soon, https://substackcdn.com/image/fetch/$s_!fL7J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6a70fa6-b356-422d-ba9a-5db431e5a056_751x471.png, The Institute For Progress estimated, news, sources, scientists, Sentinel discusses
H5N1 may cross to humans, but it could take a while. Superforecaster Juan Cambeiro at The Institute For Progress estimated a 4% chance of a “worse than COVID” H5N1 pandemic in “the next year”, but their estimate was made in 2023, without the benefit of the Metaculus estimates or most of our current knowledge. This feels high now - Metaculus says 5% total for H5N1 pandemic, and most pandemic flus are not worse than COVID. IFP also seem to be expecting a case fatality rate greater than 10%, which I find unlikely for the reasons mentioned above. I trust their estimate less than Metaculus’ current ones. I conclude that the most plausible estimate for the chance of an H5N1 pandemic in the next year is 5%. Interestingly, 5% is about the base rate for pandemic flus per year: five in the past century = one per twenty years = 5% chance per year. Isn’t it surprising that we’re still at the base rate when we can see a dangerous-looking flu virus spreading through the types of animals that have caused pandemic flus in the past? Part of the answer is that we’re not - in addition to the 5% chance of H5N1, we have to add the chance of some other pandemic flu. This probably isn’t 5% on its own; scientists monitor flu strains closely, and they haven’t found any others which are giving off as many red flags as H5N1. Still, something could always come out of left field. Maybe we should add a 2.5% chance of some other strain, for a total of 7.5% chance of a flu pandemic (ie beyond normal seasonal flu) next year. But still, isn’t it surprising that we’re so close to the base rate? One way to think about this: the base rate represents how concerned we should be if there was no epidemiological monitoring at all. In that case, we would estimate a probability distribution across different epidemiological landscapes, most of which contain some concerning-looking flu strains. Since we are doing the epidemiological monitoring, we can collapse that distribution into a single picture: one flu strain, H5N1, is in fact pretty concerning, and other strains mostly aren’t. This is enough to move our prior from 5% to 7.5%, but no more. The forecasters I talked to raised one other point of uncertainty: does the flu work more like a dice roll, or like a bus? Dice rolls are uncorrelated with their predecessors; even if it’s been a hundred rolls since you last rolled a 6, your chance this time is still 1/6. But buses come at fixed intervals; if the buses are hourly, and you haven’t seen a bus in the past 59 minutes, then your chance of seeing a bus in the next minute is very high. It’s been 16 years since the last flu pandemic; these pandemics come (on average) every 20 years. I don’t think anyone has a good sense of how to think about this. But it was 40 years between the Spanish and Hong Kong flus, so the twenty year number is at best a rule of thumb. The 5% number feels very low to me (and, apparently, to the average Manifold forecaster). Isn’t H5N1 spreading to cows and pigs and all sorts of other mammals? Isn’t it in the news all the time? I trust Metaculus a lot, but I agree that this is a surprising update, and I’m taking it on faith rather than feeling it in my bones. What Would The Fatality Rate Be For An H5N1 Pandemic? There are four basic stories you could tell about likely H5N1 mortality. First, maybe mortality would be 50%. The argument here is that official statistics report this mortality rate in the chicken farmers who have been infected with H5N1 so far. Several news sources and even some scientists have raised the specter of a pandemic version of H5N1 pandemic with this same death rate, which could kill a quarter to a third of the world population. THIS IS EXTREMELY FAKE. The official statistics only report fatality rate in the infections we know about. Bird flu is rare, there’s no mass testing, and we only learn that somebody had it if they’re in a hospital and the doctors are worried enough to test for rare conditions. Of Americans who got bird flu in the past year, 0 out of 61 have died. Probably this is mostly because America upped its detection game and is now finding milder cases; we also can’t rule out the virus mutating to become less virulent. Metaculus estimates the current true mortality rate as 1.25%. …but leaves a wide 90% confidence interval, from 0.5% to 7%. Second, maybe mortality would be somewhere around 1.25%. The argument here is that Metaculus uses this as its central estimate of US mortality. But Sentinel discusses some reasons to be skeptical of broad inferences from the US numbers: Scientists have been puzzled by the apparently low H5N1 case fatality rate in humans in the US. They offer a number of hypotheses: “The way in which the virus is being transmitted — along with the amount of virus exposure — is limiting the severity of disease.”
Manifold asks whether they might end up funding AI safety efforts:
Manifold asks whether they might end up funding AI safety efforts: This sort of makes sense - surely this is the most direct way to interpret a mandate of using charity dollars to “make sure AI benefits humanity”. And an obvious commitment to pursuing their mission exactly as described would look good to regulators. But it also might not be as popular with the normies as “health care, education, and science” - and doing popular things would look good to regulators too. If this is on their mind, Altman hasn’t mentioned it.
What Do Prediction Markets Say? This is the biggest Manifold market on the topic. The big drop at the end is when the judge ruled Musk’s case had merit.
Inline links: This
2: There’s another Manifest (Manifold-sponsored fun conference on prediction markets) this year. June 6-8 in Berkeley, tickets are $538 but look for various combos/deals/discounts. More information here.
Inline links: here
Helped create Manifold Markets, a prediction market site with thousands of satisfied users, whose various spinoffs play a central role in the rationalist/EA community.
No update this time, but from last cycle: “Nathan Young has since gotten much larger grants to do much more exciting forecasting work, particularly a platform for generating forecasting questions. With my approval, he’s put my grant on the back burner while he works on other things, but he still hopes to get some questions up on Manifold or Metaculus sometime.”
Manifold is the largest social prediction market platform with over 150k user‑created markets and more than 30 million trades. Our markets have been featured here on ACX, in the NYT, Nate Silver’s latest book, and countless Substacks, podcasts, and tweets. Forecasters, journalists, researchers, and casual users alike use Manifold to get accurate real-time odds on everything from elections to AI timelines to personal drama.
Inline links: Manifold
2: Manifold, 24 traders:
Inline links: Manifold
43: China think tank assessment of how in control Xi is: still very in control, maybe not infinitely in control. 44: Related - did you know (h/t xlr8harder) that if you ask AI to write a science fiction story, it will very often name the protagonist “Elara Voss” (or some very close variant like Elena Voss), and this remains true across various models and versions? Related: Chelsea Voss of OpenAI is having a baby and has the opportunity to do the funniest thing. 45: “Hector (cloud) is a cumulonimbus thundercloud cluster that forms regularly nearly every afternoon on the Tiwi Islands in the Northern Territory of Australia…[he is sometimes called] Hector the Convector”. 46: British allergy sufferers who want to know the ingredients of things demand that British cosmetics stop listing their ingredients in Latin. “For example, sweet almond oil is Prunus Amygdalus Dulcis, peanut oil is Arachis Hypogaea, and wheat germ extract is Triticum Vulgare.” 47: Text-based RPG about being an NYT journalist at the Manifest prediction market conference. I make a brief appearance. 48: Study uses supposedly-random variation in doctor assignments to test whether the marginal mental health commitment is good or bad for patients, finds that it is quite bad. Freddie de Boer is violently skeptical (maybe literally so?) and makes some good points about how a single quasi-experimental study is never absolute proof. But I don’t think he quite justifies his opinion that the paper was irresponsible and should never have been published; it’s just a normal quasi-experimental study that we should nod and say “huh” at but not overweight as the culmination of all possible research that overcomes all possible priors. My prior is that the marginal commitment is pretty useless (many commitments are just “well, since this person arrived at our ED for some reason, it would look bad from a medico-legal perspective to just let them go, so let’s keep them a few days to evaluate” - and yeah, you should be upset about this) but I’m still surprised by how many outright negative (as opposed to zero) effects the researchers found. The strongest argument for negative effects is that it will make some people miss work and maybe lose their job. But this study found that commitment ~doubles the risk of near-term suicide (admittedly only from 1% to 2%), which would have been outside my confidence intervals for how bad it could be. I suspect confounding, but only on general principle, and I wouldn’t be too surprised either way. 49: This tweet is probably bait, but I found it a thought-provoking question: I think there’s a boring answer, where the law is more complex than just a single number and whatever kind of weird trafficking Epstein was doing is worse than whatever normal relationships these European laws are permitting. But assuming that there’s a substantive difference even after taking that into account, I think my answer is something like - we’ve got to divide kids from adults at some age, there’s a range of reasonable possible ages, we shouldn’t be too mad at other societies that choose different dividing lines within that range - but having decided upon the age, we’ve got to stick with it and take it seriously (in the sense of penalizing/shaming people who break it). This is more culturally relativist than I expected to find myself being, so good job to Richard for highlighting the apparent paradox. 50: Dilan Esper describes his experience as one of Hulk Hogan’s attorneys in the Gawker lawsuit (X). Parts I found interesting: none of the lawyers knew Thiel was funding the lawsuit; Gawker probably could have won if they had been slightly competent but kept "shooting themselves in the foot"; and Gawker probably could have won if they had just pixelated the private parts in the video. 51: Amazing concept and poems (link on X): I tried to see if AI could do this, and it did something that technically met the requirements but had zero artistic merit - using a lot of words like “nowhere” and “outside” in one, then separating them out to “no where” and “out side” in the other. I didn’t invest much energy in creating a clever prompt telling it not to do that, so feel free to report if you get better success. 52: New study claims consultants are actually good, at least for profits: "We find positive effects on labor productivity of 3.6% over five years, driven by modest employment reductions alongside stable or growing revenue" 53: A Polish team tries to test Peter Turchin’s equations for predicting political unrest on recent Polish history, has to make some changes but claims mostly positive results. 54: New big multi-author Substack, The Argument, trying to be a sort of center-left version of the model pioneered by The Free Press and other high-production-value ideological Substack properties. Excited to see Kelsey Piper is involved, and she starts off strong with a post on the latest round of First World basic income studies, which find few positive effects. This is surprising, because recipients didn’t waste the money on alcohol or gambling or anything - they paid down debt and got useful goods. Still, it didn’t even affect things that should have been obvious, like stress level. It’s not even clear that amounts of money large enough to help with rent made homeless people more likely to get houses! Matt Bruenig criticizes the article, accusing Kelsey’s studies of being downstream of Perry Preschool style dreams that exactly the right welfare program will have massively compounding effects that cut poverty out at the root and turn everyone into elite human capital; he thinks giving people money won’t do this, but it will increase equality and give the poor better lives. I assume he’s not a strong hereditarian, but his argument makes even more sense from that perspective, and I’ve certainly criticized dumb outcome measures like infant brain waves which we have only tenuous reasons to think are related to anything we care about. But Kelsey reasonably responds that the outcome measures she’s talking about include stress level and life satisfaction. To defuse this critique, Bruenig either has to argue that our construct “life satisfaction” doesn’t really measure whether someone’s life is satisfactory, or else claim that giving poor people satisfactory lives isn’t really what we’re going for - which I think would require more explanation on his part. There’s some further (impressively acrimonious) debate on X, but I don’t see anything that addresses my core concern. GiveDirectly, a charity involved in basic income experiments, has a presponse here; they say that some studies are positive, and that the ones that aren’t might have tried too little cash to matter, or been confounded by COVID making everything worse. They also point out that basic income is harder to study than traditional programs like giving people housing, because if you’re giving housing you can measure housing-related outcomes directly and have a pretty good chance of getting enough statistical power to find them, but since everyone spends cash on different things, the positive effects might be scattered across many different outcomes (and therefore too small to reach significance on each). Everyone involved in this debate wants to emphasize that the poor results are for First World studies only, and that studies continue to show large benefits to giving cash in the developing world. 55: Related: I was less impressed by The Argument’s first foray into housing policy, which follows an all-too-familiar pattern: Some people say they don’t like noise and disorder and try to make rules against it in their apartments.
Inline links: China think tank assessment of how in control Xi is, xlr8harder, Chelsea Voss of OpenAI is having a baby, Hector (cloud), demand that British cosmetics stop listing their ingredients in Latin, Text-based RPG about being an NYT journalist at the Manifest prediction market conference, finds that it is quite bad, violently skeptical, literally so?, This tweet, https://substackcdn.com/image/fetch/$s_!S9fU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa558c09b-7fb6-40a8-a8a0-27b658a2c876_576x687.png, describes his experience as one of Hulk Hogan’s attorneys in the Gawker lawsuit (X), link on X, https://substackcdn.com/image/fetch/$s_!zyh7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e9f0f6-d794-4ea2-b24b-5d4803bf28dc_590x478.png, New study claims consultants are actually good, tries to test Peter Turchin’s equations for predicting political unrest on recent Polish history, The Argument, a post on the latest round of First World basic income studies, criticizes the article, infant brain waves, debate on X, has a presponse here, first foray into housing policy
Second, the Manifund team. Manifund, a charitable spinoff of Manifold Markets, handled our funds, disbursement, infrastructure, and miscellaneous coding needs. Special thanks to Austin Chen for taking point on this.
Inline links: Manifund
Charlie Molthrop, $5K, for “normie-friendly prediction market interfaces”. Charlie has already made some tools for visualizing Manifold and Polymarket results; for example, a bot that tweets sudden dramatic changes on important Manifold questions.
Elaine Perlman, $94K, to continue lobbying for kidney donation incentives. Elaine works with Waitlist Zero and the Coalition To Modify NOTA to promote the End Kidney Deaths Act, which offers valuable tax credits to kidney donors. They estimate this bill could save 100,000 lives over the next decade, and save the government $50 billion/year (dialysis is very expensive, Medicare currently covers it, and transplantees would no longer need it). Since our previous grant last year, the EKDA has been cosponsored by 29 members of Congress, discussed in the Journal of the American Medical Association, and profiled in the LA Times. The prediction markets are down to only 25% chance it gets passed this year, but I’m optimistic about 2026 - 2027
If the Republican gets elected, will the economy be good four years later? …and if one market is higher than the other, then you’ve successfully forced everyone to settle on a canonical probability of which candidate will be better for the economy. The fatal flaw is confounding by noncausal pathways. For example, bettors might reason: suppose for some extrinsic reason (let’s say someone struck oil) the economy is very good from 2026 - 2028. Then in 2028, people will feel better about Trump, and are more likely to elect Vance. And if the economy is very good from 2026 - 2028, then it’s more likely to be very good from 2028 - 2032 (the oil is still there). Therefore, we should bet up the Republicans → good market, and bet down the Democrats → good market, before we even think about whether Republicans or Democrats will do a better job with the economy. Therefore, this can’t be a good way to determine whether Republicans or Democrats will do a better job with the economy. Here’s a potential workaround I’ve never seen before: suppose you create a set of conditional prediction markets as above. Then you create a set of secondary markets, asking bettors to predict the price of the first set of markets on the day before Election Day. On the day before Election Day, either they’ll have struck oil, or they won’t have. So regardless of the oil situation, people will be factoring in only the true effect of the parties’ policies. If you ask people today to predict those markets, they’ll be predicting the true effect of the policies. Giving an example with numbers on everything (thanks to AI for gaming this out with me): - 25% chance of striking oil - NO OIL WORLD (75% chance): ------ D increases GDP 5%, R increases GDP 2% ------ D wins 50%, R wins 50% - YES OIL WORLD (25% chance): ------ D increases GDP 10%, R increases GDP 7% ------ D wins 10%, R wins 90% Total P(R wins) = 0.75×0.5 + 0.25×0.9 = 0.375 + 0.225 = 0.6 Total P(D wins) = 0.75×0.5 + 0.25×0.1 = 0.375 + 0.025 = 0.4 Naive conditional market calculation E[GDP | R wins] = (0.225×7% + 0.375×2%) / 0.6 = (1.575% + 0.75%) / 0.6 = 3.875% E[GDP | D wins] = (0.025×10% + 0.375×5%) / 0.4 = (0.25% + 1.875%) / 0.4 = 5.3125% Naive difference: 5.3125% - 3.875% = 1.4375% (understates the true 3% causal effect of D policies) Secondary market calculation On Election Eve, conditional on oil found: R market = 7%, D market = 10% On Election Eve, conditional on no oil: R market = 2%, D market = 5% E[Today's market on the Election Eve R market price] = 0.25×7% + 0.75×2% = 1.75% + 1.5% = 3.25% E[Today's market on the Election Eve D market price] = 0.25×10% + 0.75×5% = 2.5% + 3.75% = 6.25% Secondary market difference: 6.25% - 3.25% = 3% (exactly the true causal effect)This doesn’t completely solve the conditional problem. There could be residual correlations based on hidden variables that affect the outcome of interest (in this case the election) without being known to bettors even on Election Day Eve. A trivial example is some extraordinary event which happens at 12:01 AM on Election Day. A more subtle example goes something like: suppose the economy is subtly good, nobody has managed to aggregate the statistics and figure this out in a legible way yet, and each individual person still only has private knowledge that the economy is good for him- or her-self. They might still be more likely to vote Republican based on their own private economic optimism, and then the hidden goodness of the economy might become manifest and improve GDP during the next term. Yes, this example is a stretch; maybe I’m missing better ones, or maybe this is a silly edge case failure mode that shouldn’t bother us in real life. What about interaction effects - for example, if Democrats were better at milking a good economy and making it even better, but Republicans were better at correcting a distressed economy and bringing it back to average, would that break the link between the primary and secondary markets? This is beyond my poor mathematical ability, but the AIs claim it’s not a problem - the secondary market workaround still ensures the correct difference. Bonus question: Is there a way to simplify this so that we don’t have to run all four markets? The End Of The Beginning When I started this column in 2021, I dreamed of a time when there would be big legal prediction markets on important topics. That’s come true. There have been some small benefits, but not the epistemic wonderland I hoped for. So what now? Do we pat Shayne Coplan and Tarek Mansour on the back, let them enjoy their superyachts, and otherwise forget about this space? I see two ways forward. The first is to continue praying for the original Manifold vision - a prediction market site which offers: Real money markets
When I ask Manifold why they won’t add 1, they say that Polymarket and Kalshi already dominate the space, and they have other, more interesting plans (to be announced soon). When I ask Polymarket why they won’t do 2, the answer is a combination of regulatory issues, fear that people would write bad resolution criteria and it would reflect badly on them, and there always being something more important to do. I haven’t asked Kalshi, but their answer would definitely be regulatory.
Expecting this to happen in 2027, what will that look like, and who should we invest in? Maybe this benefits Manifold - all of a sudden, play-money markets become much more important, and quantity becomes more important at the expense of quality. But branding and perception are important, so the victory could also go to someone who designs around superforecasting bots from the ground up.
Draaglom on the dynamics of the Manifold lab leak market
Inline links: Draaglom on the dynamics of the Manifold lab leak market
Stephen Grugett and Ian Philips of Manifold Markets have announced a new project, MNX.
Inline links: MNX
Partly it’s because Anthropic seems likely to win on appeal. Hegseth has said the government will keep using Anthropic for the next six months (undermining his case that they’re a national security risk) and has signed a substantially similar contract with OpenAI (undermining his case that their contract terms were unworkable). The prediction markets think the courts will be sympathetic: But even in the 28% of timelines where the designation sticks, things don’t seem so bad. Secretary of War Hegseth originally tweeted that:
Inline links: tweeted
The lawyers who weighed in seem to think that Anthropic’s interpretation of the law is correct, and Secretary Hegseth’s interpretation confused. In some situations, this might be cold comfort - how much does it help to be right about the law when the government is wrong? But in this case, it probably helps a lot. Amazon, Google, and Microsoft are all big Anthropic investors - each owns about a 10% stake - and have multi-billion dollar AI compute contracts. Together, the three tech giants must have at least $100 billion riding on Anthropic’s success. They also have good administration connections and great lobbyists, and even Hegseth isn’t stupid enough to pick fights with them all at once. So probably they send their lobbyists to have a talk with Hegseth about what the “supply chain risk” designation actually entails, Hegseth enforces the letter of the law, and Anthropic is barely affected. At least this is the story the prediction markets are going with: In this best-case scenario, Anthropic’s downside is losing some government contracts that made up ~5% of its business, plus some other Department-of-War-contractor contracts that probably add up to another ~5%.
Backlinks
- 2023 Prediction Contest
- 2024 election
- 538
- 538
- ACX Grants 1-3 Year Updates
- ACX Grants Results 2025
- ACX Grants: Project Updates
- Aella
- Announcing Forecasting Impact Mini-Grants
- Area 51
- Aristotle
- Aristotle Inc
- Augur
- Austin
- Austin Chen
- Base Rate Times
- Betfair
- Better Markets
- Bibi
- Biden
- biorxiv
- Blind Mode
- blockchain
- Book Review Contest
- Brands
- bulletin board
- Center For The Study Of Partisanship And Ideology
- CFTC
- Claude
- Clay Graubard
- Coinbase
- Commodity Futures Trading Commission
- Concepts: E
- Concepts: K
- Concepts: L
- Concepts: M
- Concepts: N
- Concepts: S
- Concepts: Z
- Congrats To Polymarket, But I Still Think They Were Mispriced
- Connor
- Connor Reed
- CSPI
- Dan Schwarz
- David Bahry
- Democratic National Committee
- Destiny
- DNC
- Douglas Campbell
- Dynomight
- eBay
- Ebola
- Economist
- End Kidney Deaths Act
- Eric Neyman
- Events: 0-9
- Events: B
- Events: F
- Events: M
- Events: O
- Events: P
- Events: S
- Events: W
- Ezra Karger
- First Sigma
- Forecasting Research Institute
- FTX Future Fund
- Full Mode
- FutureSearch
- Futuur
- GJO
- Gnosis
- Goldman Sachs
- Good Judgment
- Good Judgment Project
- H5N1: Much More Than You Wanted To Know
- Hedgehog Markets
- Highlights From The Comments On The Lab Leak Debate
- House
- Huanan
- Impact Market Mini-Grants Results
- In Continued Defense Of Non-Frequentist Probabilities
- INFER
- Information Markets, Decision Markets, Attention Markets, Action Markets
- Insight
- Insight
- Insight Prediction
- Jacob Steinhardt
- Jeremiah Johnson
- Joe Biden
- John Smith
- Kalshi
- Kamala
- Karlstack
- Keynesian beauty contest
- Kharkiv
- King’s College London
- Less Wrong
- Lighthaven
- Lineage A
- Links For August 2023
- Links For January 2024
- Links For July 2023
- Links For September 2025
- LK-99
- Manifest
- Manifest prediction market conference
- Manifold Markets
- Manifold.love
- Manifund
- Mantic Monday
- 24
- 22
- 23
- 23
- 24
- 24
- 22
- 23
- 24
- 23
- 22
- 22
- 23: Room Temperature Superforecaster
- 22
- 23
- 24
- Mantic Monday: Groundhog Day
- Mantic Monday: Judgment Day
- Mantic Monday: The Monkey’s Paw Curls
- Mantic Monday: Twitter Chaos Edition
- Mantic Monday: Ukraine Cube Manifold
- Marcus Abramovich
- Mariupol
- Matthew Barnett
- Maxim Lott
- Metaculus
- Mike Saint-Antoine
- monkeypox
- Musk
- Nate Silver
- Nathan Young
- Neel Nanda
- Nikos Bosse
- Nobel laureates
- Nuno Sempere
- NYPost
- OKCupid
- Open Philanthropy
- Open Thread 212
- Open Thread 248
- Open Thread 262
- Open Thread 270
- Open Thread 352
- Open Thread 375
- Open Thread 420
- OpenAI Nonprofit Buyout: Much More Than You Wanted To Know
- Organizations: A
- Organizations: B
- Organizations: C
- Organizations: D
- Organizations: E
- Organizations: F
- Organizations: G
- Organizations: H
- Organizations: I
- Organizations: K
- Organizations: M
- Organizations: P
- Organizations: S
- Organizations: U
- Organizations: W
- Oscars
- Ozzie Gooen
- People: A
- People: B
- People: C
- People: D
- People: E
- People: J
- People: M
- People: N
- People: P
- People: R
- People: S
- People: T
- People: Z
- PEPFAR
- Peter Miller
- Peter Wildeford
- Philip Tetlock
- Places: A
- Places: K
- Places: M
- Play Money And Reputation Systems
- Polymarket
- Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
- Prediction Contest
- Prediction Market FAQ
- prediction markets
- Prediction Markets Suggest Replacing Biden
- PredictIt
- President Biden
- Publications: 0-9
- Publications: B
- Publications: M
- Publications: N
- Publications: P
- Publications: R
- Publications: T
- Publications: Y
- Putin
- Rachel Weinberg
- RBG
- Republicans
- Reuters
- RFK
- 23
- Rootclaim
- Sam Altman
- Sam Marks
- Samo Burja
- Samotsvety
- Samotsvety Forecasting
- Schelling point
- Sentinel
- Shayne Coplan
- Slate Star Codex
- slatestarcodex.com
- Smarkets
- Solana
- Sonia Sotomayor
- Stephen Grugett
- subreddit
- Superbowl
- Superforecasters
- superforecasting
- Swift Centre
- Tetlock
- The Base Rate Times
- The Passage Of Polymarket
- Theo
- Truth Social
- Ukraine War
- Ukraine Warcasting
- US government
- USDC
- West Bank
- Who Predicted 2022?
- Who Predicted 2023?
- World Health Organization
- World Series
- World War III
- X
- Yahoo Finance
- Zach Stein-Perlman
- Zelenskyy
- zoonosis
- Zvi
- Zvi Mowshowitz