Metaculus is a recurring organization in the Astral Codex Ten archive, appearing 86 times across 86 issues between January 22, 2021 and March 03, 2026. The archive places it in contexts such as "Metaculus is fast becoming what PredictIt should have been"; "Metaculus solves the regulatory problem by using fake Internet points"; "Metaculus asked users to predict how many would be dead by the end of 2021". It most often appears alongside Manifold, Polymarket, Kalshi.
- Article page
- Metaculus
- Mention count
- 86
- Issue count
- 86
- First seen
- January 22, 2021
- Last seen
- March 03, 2026
- http://metaculus.com/
- http://web.archive.org/web/20221104130431/https://stevekirsch.substack.com/p/1m-bet-rules
- http://web.archive.org/web/20221129133112/https://blog.rootclaim.com/rootclaim-accepts-500000-challenge-on-covid-vaccine-safety-efficacy/
- http://web.archive.org/web/20221224061743/https://www.skirsch.com/covid/SaarWilf.pdf
- https://airtable.com/shrHrxIsFSTZsfx9F/tblgJ92PeaMKc8Uz0
- https://archive.ph/pY4gF#selection-663.103-683.190
- https://en.wikipedia.org/wiki/World_Constitution_Coordinating_Committee
- https://forum.effectivealtruism.org/posts/PGqu4MD3AKHun7kaF/predictive-performance-on-metaculus-vs-manifold-markets
- https://metaculus.medium.com/a-primer-on-the-metaculus-scoring-rule-eb9a974cd204
- https://twitter.com/MetaculusAlert
- https://twitter.com/metaculus/status/1627707146119876609
- https://web.archive.org/web/20230104080248/https://www.rootclaim.com/
- Logistics
- Metaculus Monday
- 21
- Mantic Monday: Judging April COVID Predictions
- Mantic Monday: Scoring Rule Controversy
- Mantic Monday: Mantic Matt Y
- Instead Of Pledging To Change The World, Pledge To Change Prediction Markets
- 21
- 26
- Highlights From The Comments On Acemoglu And AI
- 21
- 15
- When Will The FDA Approve Paxlovid?
- MM: Omicron Variant
- Mantic Monday: Let Me Google That For You
- Addendum To Luvox Post
- Addendum To "No Evidence" Post
- Mantic Monday: Dogs In Wizard Hats
- ACX Grants Results
- The Passage Of Polymarket
- Mantic Monday: Ukraine Cube Manifold
- Play Money And Reputation Systems
- Biological Anchors: A Trick That Might Or Might Not Work
- Open Thread 213
- Ukraine Warcasting
- Ukraine Thoughts And Links
- 22
- 22
- Open Thread 217
- Yudkowsky Contra Christiano On AI Takeoff Speeds
- Links For April
- 22
- 22
- 22
- 22
- Slightly Against Underpopulation Worries
- 22
- Meetups Everywhere 2022: Times & Places
- 22
- ACX Grants: Project Updates
- Mantic Monday: Twitter Chaos Edition
- 2023 Prediction Contest
- Prediction Market FAQ
- Who Predicted 2022?
- 2023
- Open Thread 262
- Announcing Forecasting Impact Mini-Grants
- OpenAI's "Planning For AGI And Beyond"
- 23
- The Extinction Tournament
- 23: Room Temperature Superforecaster
- Meetups Everywhere 2023: Times & Places
- 23
- Open Thread 298
- 23
- In Continued Defense Of Effective Altruism
- 23
- Open Thread 309
- 24
- ACX Grants Results 2024
- 24
- Who Predicted 2023?
- 24
- Open Thread 320
- Open Thread 324
- Highlights From The Comments On The Lab Leak Debate
- 24
- Prediction Markets Suggest Replacing Biden
- Links for July 2024
- Open Thread 340
- 24
- Mantic Monday: Judgment Day
- Congrats To Polymarket, But I Still Think They Were Mispriced
- H5N1: Much More Than You Wanted To Know
- Metaculus Forecasting Contest
- Open Thread 370
- Links For February 2025
- OpenAI Nonprofit Buyout: Much More Than You Wanted To Know
- ACX Grants 1-3 Year Updates
- Links For September 2025
- Open Thread 405
- ACX Forecasting Contest
- Metaculus Prediction Contest 2026
- Mantic Monday: The Monkey's Paw Curls
- Open Thread 419
- Mantic Monday: Groundhog Day
If it's a boring enough news day that you want to cover me, consider instead covering the many other fascinating and under-covered people and institutions in and around the rationalist community, some of whom are probably women or minorities or whatever. The Qualia Research Institute is doing absolutely picture-perfect mad science. Metaculus is fast becoming what PredictIt should have been; I intend to shill it pretty hard but I can't do it all by myself. Catherine Olsson, Ibasho, and MicroCOVID already have one WIRED article about how great they are, but they deserve at least a dozen.
Metaculus solves the regulatory problem by using fake Internet points instead of money. This is a disappointing solution; it limits the user base to Internet obsessives instead of (say) investment bankers. Still, there are a lot of Internet obsessives. And the team running it is really top-notch, interested in pushing the limits of what prediction markets can do, and trying to focus on some of the most important questions.
I want to raise awareness of prediction markets, and right now Metaculus seem like the best people to raise awareness of. So welcome to Metaculus Mondays, where I make you listen to reports of how the prediction markets did this week and what they're predicting for later.
Late last year, when coronavirus had already killed 285,000 Americans, Metaculus asked users to predict how many would be dead by the end of 2021. The guesses started at about 500,000. But as cases rose further through December and January, the guesses rose too, until now they're averaging almost 690,000 people.
Getting back to Metaculus, let’s look at what they’ve got on AI:
First, some history. In 2016, DeepMind’s AlphaGo beat first Fan Hui, a medium-level professional Go player, and then Lee Sedol, a top professional Go player. This was one of the more unexpected events in AI history; everyone thought it would be a few more years before Go AIs were ready for prime time. We can see this on Metaculus; their prediction that a Go program would beat a professional went from 30% before the Fan Hui match to 90% afterwards (there was some debate on whether the Fan Hui match was official enough to count, so it wasn’t 100, but everyone agreed that beating Fan Hui meant the program could probably beat other people in more official settings. After that people thought it was moderately likely AlphaGo could beat Lee Sedol too, and they were right.
The question defines “first AGI” as an AI system that can pass the Turing Test, get a score of 600+ on the math SAT, do well on the Winograd Challenge (a set of language comprehension problems), and play the classic AI test video game Montezuma’s Revenge, without needing excessive training data, and in some kind of unified way (ie it isn’t just four different ad hoc AIs cobbled together). This is an easier problem than “be fully human level intelligent”, but it would have to have some kind of impressive general intelligence to succeed at so many unlike domains.
Since this is getting broader than just Metaculus, I'm changing the name to Mantic Monday, after an obscure word for "oracular" (and changing the preview image to a mantis, since I don't know how else to visually represent "mantic". And posting it early Tuesday morning because I’m late).
3: This week on Metaculus: will a third-party candidate win 5%+ of the popular vote in 2024? Users say 15% chance, which I started out thinking was way too high. But they reminded me that Perot did in both '92 and '96, and if something's happened two of the last eight times it could have, maybe it's actually kind of common? Add that to the constant threats by Trumpist or anti-Trumpist conservatives to split from the Republican party, and maybe they're not crazy? I'm still betting against.
4: Also, will Bitcoin outperform the US stock market over the next five years, at 51%. I started out thinking - of course it's 50-50! By the efficient market hypothesis, if any asset was obviously going to do better than another, people would change the price until it wasn't. But on second thought that's wrong - stocks have a higher than 50% chance of beating treasuries over the same period because of a risk premium. Maybe there's no intuitive way to think about this, you have to have opinions on the underlying fundamentals, and it's only 51% by coincidence?
Metaculus scoring rule controversy
Zvi considered using some Metaculus markets for his weekly coronavirus roundup, but was turned off by the scoring rules.
Ross Rheingans-Yoo writes about the issue here. Everyone agrees Metaculus’ scoring rule is “proper”, a technical term meaning that it correctly incentivizes you to choose the probability you think is true. Zvi and Ross’s objection is that it doesn’t correctly incentivize you about whether to bet at all, or how much effort to put into betting.
Metaculus asked Yglesias for permission to put some of the predictions up on their platform, to see if their crowdsourced forecasts could beat his; he graciously agreed. Here are the predictions. Yglesias' numbers are bold and in parentheses. Metaculus' numbers are in brackets (not all questions are on Metaculus).
Yglesias and Metaculus agree on most things (not Israel/Saudi Arabia, though!). Some of the disagreements might come from Yglesias making his predictions in late December and Metaculus opening theirs in February, which is kind of unfair to Matt.
Metaculus has markets for some of Yglesias' predictions, but it's not a great comparison. For one thing, Metaculites got an extra two months to think about them and watch what happened. For another, the Metaculites got to see Yglesias's predictions, but Yglesias didn't get to see the Metaculites.
All of these pledges have one thing in common - they expire long after the relevant officials are out of power (and in Biden's case, probably dead). As hard as it is to hold politicians accountable in normal situations, it's even worse here. Sure enough, prediction aggregator Metaculus shows that forecasters only give a 15% chance that we reach Biden's emissions target by 2030.
No, seriously, hear me out. Biden pledges that by the end of his term, Metaculus will predict a 51%+ chance that emissions will be less than half their historic maximum by 2030. If Metaculus gives a lower number than this, we can consider Biden to have failed in his pledge, and we can hold it against him when he tries to get re-elected.
In order to get Metaculus (or some alternative prediction market) to show a 51% chance of meeting emissions targets, Biden would have to pass a credible package of legislation that puts us on the path to achieving that goal, and makes everyone think it’s more likely than not.
Among this month’s interesting Metaculus predictions:
If Puerto Rico gets statehood, will their first two senators both be Democrats? 50%. I’d seen accusations that the Democrats want Puerto Rican statehood to seize a Senate advantage, and counterarguments that no, PR isn’t as solid-blue as people like to think, but this is the first time I’ve ever seen the “risk” of a PR Republican Senator quantified. Higher than I thought!
Will Jeff Bezos make a big investment in anti-aging this year? 25% Aubrey de Grey has hinted that somebody really big is about to get into the anti-aging/longevity field, and speculation has centered on a newly-retired and not-getting-any-younger (so far!) Jeff Bezos. This prediction resolves as true if Bezos puts at least $50 million into anti-aging.
Extra credit for the last market, which seems to be successfully predicting a scalar instead of a binary outcome - I’ve seen Metaculus experiment with this technology, but this is the first time I’ve spotted it at Polymarket using real money.
Some of the more interesting new Metaculus markets. The space telescope one is especially interesting in the context of whether we could use prediction markets to predict (and maybe manage) government delays and cost overruns. The telescope is currently scheduled for launch in October 2025, so the market expects it to be about five years late. For context, the previous space telescope, James Webb, was originally scheduled for 2007 and (if everything goes well) will launch later this year.
And here’s Metaculus:
I don’t know how much of this is people being dumb, vs. the AI field having a lot of diverse opinions and it’s hard to remember it’s different people, vs. people thinking about probabilities differently. I think the closest thing to a consensus is Metaculus, which says:
There’s a 25% chance of some kind of horrendous global catastrophe this century.
If it happens, there’s a 23% chance it has something to do with AI.
Once you’ve got this, you’ve also got the ability to answer questions like “how would my child do at public school vs. Montessori school vs. Success Academy”? If the prediction markets say the test scores would be about the same no matter what, then Freddie de Boer is right, private schools are all grifts, and the whole thing is hopelessly confounded by selection bias. This Week In Metaculus (source, units are billions of dollars) AFAIK, right now SpaceX is worth about $100 billion. But the median estimate for 2030 is $500 billion. An 8% rate of return over nine years is ~100%, so even in a great economy the average company will “merely” double by then, whereas SpaceX will quintuple. Seems bold to say a company is undervalued by a factor of >2. I guess this doesn’t technically violate any theorem about stock markets or prediction markets because SpaceX is a private company. Maybe $100 billion is its valuation by normal private investors, and $500 billion is what the sort of people who buy Tesla stock would give it, and Metaculus is siding with the Tesla buyers? Still, take it public!
(source, units are billions of dollars) AFAIK, right now SpaceX is worth about $100 billion. But the median estimate for 2030 is $500 billion. An 8% rate of return over nine years is ~100%, so even in a great economy the average company will “merely” double by then, whereas SpaceX will quintuple. Seems bold to say a company is undervalued by a factor of >2. I guess this doesn’t technically violate any theorem about stock markets or prediction markets because SpaceX is a private company. Maybe $100 billion is its valuation by normal private investors, and $500 billion is what the sort of people who buy Tesla stock would give it, and Metaculus is siding with the Tesla buyers? Still, take it public!
AFAIK, right now SpaceX is worth about $100 billion. But the median estimate for 2030 is $500 billion. An 8% rate of return over nine years is ~100%, so even in a great economy the average company will “merely” double by then, whereas SpaceX will quintuple. Seems bold to say a company is undervalued by a factor of >2. I guess this doesn’t technically violate any theorem about stock markets or prediction markets because SpaceX is a private company. Maybe $100 billion is its valuation by normal private investors, and $500 billion is what the sort of people who buy Tesla stock would give it, and Metaculus is siding with the Tesla buyers? Still, take it public!
Metaculus
Metaculus Click for link. Some very unsurprising overlap between the Metaculus user and housing policy wonk populations here.
Click for link. Some very unsurprising overlap between the Metaculus user and housing policy wonk populations here.
Metaculus predicts January 1 as the median date for the FDA approving Paxlovid. They estimate a 92% chance it will get approved by March.
(source: Metaculus) R0 is a measure of how quickly a disease spreads under certain ideal conditions. The original Wuhan strain was probably around 2.5, and the Delta variant was probably around 5. So if this number is higher than 5, it’s more transmissible than Delta. The community prediction is 7.31, so Metaculus predicts it will be significantly more transmissible than Delta.
R0 is a measure of how quickly a disease spreads under certain ideal conditions. The original Wuhan strain was probably around 2.5, and the Delta variant was probably around 5. So if this number is higher than 5, it’s more transmissible than Delta. The community prediction is 7.31, so Metaculus predicts it will be significantly more transmissible than Delta.
(source: Metaculus) Metaculus didn’t want to wade in to precise lethality statistics, so they just asked for a yes-or-no answer on whether it would be deadlier than Delta. Forecasters say there’s a 34% chance it will be.
I rarely see people trying this, but here’s an exception from Metaculus (h/t Nathan Young):
Why am I mentioning this here? His essay is on Metaculus. It’s the latest in their line of “fortified essays”, a new genre they’re trying to create of argument backed by prediction markets and crowd forecasting.
Metaculus thinks that despite all this great science, more Americans than ever will be obese in ten years (for context, 43% are obese today).
The FDA also approved the other drug I’ve been saying they should approve quickly, Paxlovid, a full two weeks before the prediction markets expected! According to Metaculus, there was only a 6% chance we would get Paxlovid approved this quickly. They are genuinely getting better!
Here’s what happened to Metaculus’ prediction tournament when the same study came out:
Conflict of interest notice: they have applied for (and will probably get) an ACX Grant. Other than me giving them money and publicity, and them stealing my favorite prediction market related word, I’m not actually affiliated with them in a meaningful sense. Metaculus Public Figures You may remember from last post that there is a lot of stuff at Metaculus.
You may remember from last post that there is a lot of stuff at Metaculus.
Here’s their Public Figure Predictions page. It tries to collect predictions by important public figures and compare them to the Metaculus consensus for the same question. For example, from the Elon Musk page:
Nathan Young, $5,000, to fund his continued work writing Metaculus questions and trying to build bridges between the forecasting and effective altruist communities. Nathan is a Metaculus moderator, the author of a prediction market blog I've used as a source before, and has useful connections with people who might be convinced to use formal forecasting methods for their organizations. This grant is a vote of confidence in him to continue this work, and another part of my effort to fund more forecasting infrastructure. You can read his newsletter, the UK Policy Forecast, here. If you have suggestions for forecasting questions he asks that you DM him on twitter or add them to this open Google doc.
Easy to create your own subsidized markets “Real money” should be self-explanatory. Metaculus and Manifold are both very nice, but so far they’re limited to a small group of enthusiasts playing in their spare time. I value them both, but neither is the killer app that makes prediction markets as central to everyday life as stock markets or polls or whatever. “Easy to use” is kind of self-explanatory, but with some caveats. A big part of ease-of-use is liquidity; you can get that from a big user base or from clever deployment of automated market makers. A market that requires crypto knowledge is harder to use than one that doesn’t; one that’s inaccessible from the US is harder to use than one that isn’t. Also all the normal things like UI and search. “Easy to create your own markets” is where we’ve gotten stuck so far. Prediction markets are absolutely on top of questions about whether Donald Trump will win various elections. This is a solved problem. What I really wanted last year (and would have subsidized!) was a market about whether Alameda County, California, would permit indoor gatherings of 50 people on January 8th 2022 (ie would I be forced to cancel my wedding). But I also would have appreciated the ability to put a few questions to prediction markets before starting my psychiatry practice, or my grants program, or any of a dozen other things I did. A friend has gone further, and half-jokingly said they want to create conditional prediction markets about whether they’re compatible with various women in our friend group, to be paid out six months after the first date. Some of these applications are attempts to route around the principal-agent problem. Maybe I have some question about whether a certain grant would succeed, I’m not sure who to ask, and even if someone gives me a “Bob Smith, Grant Evaluator” business card, I don’t know if he’s any good. A prediction market takes all the pain out of searching for information - if I subsidize it enough, it’ll attract people with the relevant skill set who will solve my problem for me. Probably some of these ideas wouldn’t work, but probably other ideas I can’t even think of now would. I don’t know what the killer app for prediction markets will be. But we’re not going to find out unless people can create their own subsidized markets and play around. Polymarket took some baby steps towards this before the settlement: they had a Discord server where anyone could propose questions, and a lot of those questions became markets. But they still had to be general interest, not “let Alice’s five friends predict her dating life”. And there’s a big difference between “talk it over with company representatives on a Discord server” and “press a button”. Imagine if you could only tweet by emailing Jack Dorsey and convincing him that your comment was a good thing to have on Twitter. Even if Jack had good judgment and approved most requests, this would be a long way from the limbic system < — > Send Tweet loop that real Twitter users know and love. I asked some people in the business why they won’t do this. They said most people are bad at writing good resolution criteria. They don’t want their employees to get stuck resolving incredibly dumb questions about people’s dating lives, hunting down inaccessible or conflicting information, and making a bunch of people mad whichever way they decide. As far as I can tell, Manifold Markets solved that problem with their “proposer decides the resolution, caveat emptor” strategy. But Manifold is US-based and can’t use real money, so there’s still no way to subsidize a market effectively. (This is why I’m pessimistic about Kalshi. They could potentially do a lot of good in the “will Afghanistan collapse?” types of markets the Nobel laureates want, though even there I think some of their betting limits will give them trouble - $25,000 is good money, but not quite good enough to incentivize founding the prediction market equivalent of a Wall Street trading firm. But even if they solve this, I can’t imagine the regulators giving them permission to host “will this grant work out?” or “how will my dating life go?” markets; it’s just too weird, and the CFTC is too conservative. I don’t know, maybe their connections will come through and pull it off, but I don’t even know if they’re ambitious enough to want this, and I hate having to rely on one organization.) Right now my hopes are, in ascending order of likelihood: Manifold figures out some kind of weird crypto thing that isn’t real money from a legal perspective, but is real money from a “people really want it and will put a lot of effort into getting it” perspective.
I’m most optimistic about this last one, but it would be tough. You could try a version of Polymarket without the centralized organization gating the front end and providing liquidity. But then how would it make money? It probably wouldn’t - which might be fine, Metaculus is a non-profit and is still exceptionally well-run and stable. If someone did a good job of this I would try really hard to get it funded, and would expect to succeed.
Not actually in order This is a semi-randomly selected sample of Manifold markets, but let’s go through them one by one. The Ukraine market is the biggest on Manifold. It’s also deeply out of step with every other prediction market and the top non-prediction-market authorities - who are all giving numbers in the 50s and 60s. I don’t understand how this is so low - yes, play money < real money, but mostly because play money doesn’t get enough people betting. Here lots of people are betting - it’s the biggest market on the site, and since you only start with $1000 either twenty people have bet everything or more people have bet a fraction - but it’s still wrong. I tried to spend some play money to correct it and it snapped back to just as wrong as it was before. I have no explanation. Midnight The Stray Cat is the second biggest market on Manifold, just after Ukraine. I guess the Internet really liking cats shouldn’t be a surprise at this point. In case you need to do research first I’m told this is the cat in question: Props to Manifold for a bunch of markets like the third one on there, where they eat their own dog food by using their market to predict how their business decisions are going to go. ACX Bot has copy-pasted all of my predictions from 2022. At some point they should be able to compare their results with Zvi (ie a single very smart person), with the contest many of you entered (ie an average of formless crowdsourced predictions), and Metaculus (ie a non-monetary forecasting tournament). I’m looking forward to it! Most of you already know Lars Doucet, who’s written some great ACX posts on Georgism. I don’t know what possessed him to make a Joe Rogan Georgism interviewee market, unless he’s gunning for the position. Valinor is a group house on my street, with ~a dozen people living in and around it. We’ve been talking about fixing the backyard for a while. Now we can bet about whether it will happen. Having a number for this actually affects some of my decisions a little. Connor is hijacking the prediction market to make a poll, which is pretty cute. Dwayne Johnson does not have a 15% chance of winning the election. Manifold is suffering from the usual play money problem, where if you only start out with $1000 in play money, nobody wants to lock it up for three years to make a 15% profit. Vivek’s market, “Will I believe that 13177 is a prime number”, is pretty unusual. I’m interpreting it as a test/demonstration of prediction markets’ information-gathering ability. If you don’t know something and it’s hard to Google, you can make a prediction market about whether you’ll believe it in the future, and people who are able to figure out the answer will bet on it. Based on the 97% YES rate, I’m guessing 13177 is in fact a prime number. What else can you do this with? TANSTAAFL’s “Will I Be Convinced That Justin Trudeau Is Not Fidel Castro’s Son?” market is maybe pushing the limit of this methodology. Anyway, there are lots of me-too prediction markets but this is something genuinely new under the sun. Maybe it will be awesome itself, but I’m also hoping it helps bigger players realize how much more is possible. This Week In Metaculus A few new questions on intelligence enhancement, eg: The question explicitly allows embryo selection, but says it must raise IQ ten points and be available for <25% median income to count. Trivial improvements to existing embryo selection will top out around 9 points, so this seems to be predicting something more interesting, maybe iterated embryo selection at the very least. I’m probably slightly bearish on this one; I believe if it existed someone would find a way to get it, but I think the regulatory climate might be able to prevent the relevant research indefinitely. Improving adult IQ is really hard. This is a bold thing to speculate about! Atmospheric CO2 was 300ish for most of pre-industrial history, 400ish now, and rising. This question predicts 600 in 2100, which sounds like what happens if global warming gets a bit worse but eventually stabilizes. I’m less sure. I think if we make it to 2100, we’ll have so much technology that atmospheric CO2 can be whatever we want it to be. But maybe we’ll want it to stay where it is; once there’s been a lot of global warming and people have moved / shifted lifestyles, it could be equally disruptive to cool the planet back down. Right now it’s 5%, the official government prediction is 10% by 2030, but this market says 17.6%. But look at that probability distribution! It’s a lot of people saying 10%ish, plus a very long tail of very big numbers. I think people are disagreeing about how exponential this change is going to be. Shorts Metaculus is holding an essay contest for people who want to use their AI-related prediction markets to argue the future of AI. $6500 available in prizes.
Metaculus is holding an essay contest for people who want to use their AI-related prediction markets to argue the future of AI. $6500 available in prizes.
I used to be really skeptical here, but Metaculus and Manifold have softened my stance. So let’s look closer at how and whether these kinds of systems work.
Metaculus has a weird system combining absolute and relative accuracy: all predictions are treated as a combination of “bets with the house” on absolute accuracy, plus bets against other predictors on relative accuracy. Why? As a kind of market-making function; even if nobody else has yet predicted, it’s still worth entering a market for the absolute accuracy points. This works, but has a lot of complicated consequences we’ll discuss more below.
Positive-sum means that the house always loses; on average, you make money every time you bet. Metaculus is infamous for this; see eg this question on Ukraine:
…don’t really do much. The median goes from 2052 to about 2065. Four of the models give results between 2030 and 2070. The last two, Neural Net With Long Horizon and Evolution, suggest probably no AI this century (although Neural Net With Long Horizon does think there’s a 40% chance by 2100). Ajeya doesn’t really like either of these models and they’re not heavily weighted in her main result. Does The Truth Point To Itself? Back up a second. Here’s something that makes me kind of nervous. Most of Ajeya’s numbers are kind of made up, with several order-of-magnitude error bars and simplifying assumptions like “all animals are nematodes”. For a single parameter, we get estimates spanning seventeen different orders of magnitude: the upper bound is one hundred quadrillion times the lower bound. And yet four of the six models, including two genuinely exotic ones, manage to get dates within twenty years of 2050. And 2050 is also the date everyone else focuses on. Here’s the prediction-market-like site Metaculus: Their distribution looks a lot like Ajeya’s, and even has the same median, 2052 (though forecasters could have read Ajeya’s report). Katja Grace et al surveyed 352 AI experts, and they gave a median estimate of 2062 for an AI that could “outperform humans at all tasks” (though with many caveats and high sensitivity to question framing). This was before Ajeya’s report, so they definitely didn’t read it. So lots of Ajeya’s different methods and lots of other people presumably using different methodologies or no methodology at all, all converge on this same idea of 2050 give or take a decade or two. An optimist might say “The truth points to itself! There are 371 known proofs of the Pythagorean Theorem, and they all end up in the same place. That’s because no matter what methodology you use, if you use it well enough you get to the correct answer.” A pessimist might be more suspicious; we’ll return to this part later. FLOPS Alone Turn The Wheel Of History One more question: what if this is all bullshit? What if it’s an utterly useless total garbage steaming pile of grade A crap? Imagine a scientist in Victorian Britain, speculating on when humankind might invent ships that travel through space. He finds a natural anchor: the moon travels through space! He can observe things about the moon: for example, it is 220 miles in diameter (give or take an order of magnitude). So when humankind invents ships that are 220 miles in diameter, they can travel through space! Ships have certainly grown in size tremendously, from primitive kayaks to Roman triremes to Spanish galleons to the great ocean liners of the (Victorian) present. The AI forecasting organization AI Impacts actually has a whole report on historical ship size trends to prove an unrelated point about technological progress, so I didn’t even have to make this graph up. Suppose our Victorian scientist lived in 1858, right when the Great Eastern was launched. The trend line for ship size crossed 100m around 1843, and 200m in 1858, so doubling time is 15 years - but perhaps they notice this is going to be an outlier, so let’s round up a bit and say 18 years. The (one order of magnitude off estimate for the size of the) Moon is 350,000m, so you’d need ships to scale up by 350,000/200 = 1,750x before they’re as big as the Moon. That’s about 10.8 doublings, and a doubling time is 18 years, so we’ll get spaceships in . . . 2052 exactly. (fudging numbers to land where you want is actually fun and easy) SS Great Eastern, the extreme outlier large steamship from 1858. This has become sort of a mascot for quantitative technological progress forecasters. What is this scientist’s error? The big one is thinking that spaceship progress depends on some easily-measured quantity (size) instead of on fundamental advances (eg figuring out how rockets work). You can make the same accusation against Ajeya et al: you can have all the FLOPs in the world, but if you don’t understand how to make a machine think, your AI will be, well, a flop. Ajeya discusses this a bit on page 143 of her report. There is some sense in which FLOPs and knowing-what-you’re-doing trade of against each other. If you have literally no idea what you’re doing, you can sort of kind of re-run evolution until it comes up with something that looks good. If things are somehow even worse than that, you could always run AIXI, a hypothetical AI design guaranteed to get excellent results as long as you have infinite computation. You could run a Go engine by searching the entire branching tree structure of Go - you shouldn’t, and it would take a zillion times more compute than exists in the entire world, but you could. So in some sense what you’re doing, when you’re figuring out what you’re doing, is coming up with ways to do already-possible things more efficiently. But that’s just algorithmic progress, which Ajeya has already baked into her model. (our Victorian scientist: “As a reductio ad absurdum, you could always stand the ship on its end, and then climb up it to reach space. We’re just trying to make ships that are more efficient than that.”) Part II: Biology-Inspired AI Timelines: The Trick That Never Works Eliezer Yudkowsky presents a more subtle version of these kinds of objection in an essay called Biology-Inspired AI Timelines: The Trick That Never Works, published December 2021. Ajeya’s report is a 169-page collection of equations, graphs, and modeling assumptions. Yudkowsky’s rebuttal is a fictional dialogue between himself, younger versions of himself, famous AI scientists, and other bit players. At one point, a character called “Humbali” shows up begging Yudkowsky to be more humble, and Yudkowsky defeats him with devastating counterarguments. Still, he did found the field, so I guess everyone has to listen to him. He starts: in 1988, famous AI scientist Hans Moravec predicted human-level AI by 2010. He was using the same methodology as Ajeya: extrapolate how quickly processing power would grow (in FLOP/S), and see when it would match some estimate of the human brain. Moravec got the processing power almost exactly right (it hit his 2010 projection in 2008) and his human brain estimate pretty close (he says 10^13 FLOP/S, Ajeya says 10^15, this 2 OOM difference only delays things a few years), yet there was not human-level AI in 2010. What happened? Ajeya's answer could be: Moravec didn't realize that, in the modern ML paradigm, any given size of program requires a much bigger program to train. Ajeya, who has a 35-year advantage on Moravec, estimates approximately the same power for the finished program (10^16 vs. 10^13 FLOP/S) but says that training the 10^16 FLOP/S program will require 10^33ish FLOPs. Eliezer agrees as far as it goes, but says this points to a much deeper failure mode, which was that Moravec had no idea what he was doing. He was assuming processing power of human brain = processing power of computer necessary for AGI. Why? The human brain consumes around 20 watts of power. Can we thereby conclude that an AGI should consume around 20 watts of power, and that, when technology advances to the point of being able to supply around 20 watts of power to computers, we'll get AGI? […] You say that AIs consume energy in a very different way from brains? Well, they'll also consume computations in a very different way from brains! The only difference between these two cases is that you know something about how humans eat food and break it down in their stomachs and convert it into ATP that gets consumed by neurons to pump ions back out of dendrites and axons, while computer chips consume electricity whose flow gets interrupted by transistors to transmit information. Since you know anything whatsoever about how AGIs and humans consume energy, you can see that the consumption is so vastly different as to obviate all comparisons entirely. You are ignorant of how the brain consumes computation, you are ignorant of how the first AGIs built would consume computation, but "an unknown key does not open an unknown lock" and these two ignorant distributions should not assert much internal correlation between them. Cars don’t move by contracting their leg muscles and planes don’t fly by flapping their wings like birds. Telescopes do form images the same way as the lenses in our eyes, but differ by so many orders of magnitude in every important way that they defy comparison. Why should AI be different? You have to use some specific algorithm when you’re creating AI; why should we expect it to be anywhere near the same efficiency as the ones Nature uses in our brains? The same is true for arguments from evolution, eg Ajeya’s Evolutionary Anchor, ie “it took evolution 10^43 FLOPs of computation to evolve the human brain so maybe that will be the training cost”. AI scientists sitting in labs trying to figure things out, and nematodes getting eaten by other nematodes, are such different methods for designing things that it’s crazy to use one as an estimate for the other. Algorithmic Progress vs. Algorithmic Paradigm Shifts This post is a dialogue, so (Eliezer’s hypothetical model of) OpenPhil gets a chance to respond. They object: this is why we put a term for algorithmic progress in our model. The model isn’t very sensitive to changes in that term. If you want you can set it to some kind of crazy high value and see what happens, but you can’t say we didn’t consider it. OpenPhil: We did already consider that and try to take it into account: our model already includes a parameter for how algorithmic progress reduces hardware requirements. It's not easy to graph as exactly as Moore's Law, as you say, but our best-guess estimate is that compute costs halve every 2-3 years […] Eliezer: The makers of AGI aren't going to be doing 10,000,000,000,000 rounds of gradient descent, on entire brain-sized 300,000,000,000,000-parameter models, algorithmically faster than today. They're going to get to AGI via some route that you don't know how to take, at least if it happens in 2040. If it happens in 2025, it may be via a route that some modern researchers do know how to take, but in this case, of course, your model was also wrong. They're not going to be taking your default-imagined approach algorithmically faster, they're going to be taking an algorithmically different approach that eats computing power in a different way than you imagine it being consumed. OpenPhil: Shouldn't that just be folded into our estimate of how the computation required to accomplish a fixed task decreases by half every 2-3 years due to better algorithms? Eliezer: Backtesting this viewpoint on the previous history of computer science, it seems to me to assert that it should be possible to: Train a pre-Transformer RNN/CNN-based model, not using any other techniques invented after 2017, to GPT-2 levels of performance, using only around 2x as much compute as GPT-2;
Play pro-level Go using 8-16 times as much computing power as AlphaGo, but only 2006 levels of technology. For reference, recall that in 2006, Hinton and Salakhutdinov were just starting to publish that, by training multiple layers of Restricted Boltzmann machines and then unrolling them into a "deep" neural network, you could get an initialization for the network weights that would avoid the problem of vanishing and exploding gradients and activations. At least so long as you didn't try to stack too many layers, like a dozen layers or something ridiculous like that. This being the point that kicked off the entire deep-learning revolution. Your model apparently suggests that we have gotten around 50 times more efficient at turning computation into intelligence since that time; so, we should be able to replicate any modern feat of deep learning performed in 2021, using techniques from before deep learning and around fifty times as much computing power. OpenPhil: No, that's totally not what our viewpoint says when you backfit it to past reality. Our model does a great job of retrodicting past reality. Eliezer: How so? OpenPhil: <Eliezer cannot predict what they will say here.> I think the argument here is that OpenPhil is accounting for normal scientific progress in algorithms, but not for paradigm shifts. Directional Error These are the two arguments Eliezer makes against OpenPhil that I find most persuasive. First, that you shouldn’t be using biological anchors at all. Second, that unpredictable paradigm shifts are more realistic than gradual algorithmic progress. These mostly add uncertainty to OpenPhil’s model, but Eliezer ends his essay making a stronger argument: he thinks OpenPhil is directionally wrong, and AI will come earlier than they think. Mostly this is the paradigm argument again. Five years from now, there could be a paradigm shift that makes AI much easier to build. It’s happened before; from GOFAI’s pre-programmed logical rules to Deep Blue’s tree searches to the sorts of Big Data methods that won the Netflix Prize to modern deep learning. Instead of just extrapolating deep learning scaling thirty years out, OpenPhil should be worried about the next big idea. Hypothetical OpenPhil retorts that this is a double-edged sword. Maybe the deep learning paradigm can’t produce AGI, and we’ll have to wait decades or centuries for someone to have the right insight. Or maybe the new paradigm you need for AGI will take more compute than deep learning, in the same way deep learning takes more compute than whatever Moravec was imagining. This is a pretty strong response, since it would have been true for every previous forecaster: remember, Moravec erred in thinking AI would come too soon, not too late. So although Eliezer is taking the cheap shot of saying OpenPhil’s estimate will be wrong just as everyone else’s was wrong before, he’s also giving himself the much harder case of arguing it might be wrong in the opposite direction as all its predecessors. Eliezer takes this objection seriously, but feels like on balance probably new paradigms will speed up AI rather than slow it down. Here he grudgingly and with suitable embarrassment does try to make an object-level semi-biological-anchors-related argument: Moravec was wrong because he ignored the training phase. And the proper anchor for the training phase is somewhere between evolution and a human childhood, where evolution represents “blind chance eventually finding good things” and human childhood represents “an intelligent cognitive engine trying to squeeze as much data out of experience as possible”. And part of what he expects paradigm shifts to do is to move from more evolutionary processes to more childhood-like processes, and that’s a net gain in efficiency. So he still thinks OpenPhil’s methods are more likely to overestimate the amount of time until AGI rather than underestimate it. What Moore’s Law Giveth, Platt’s Law Taketh Away Eliezer’s other argument is kind of a low blow: he refers to Platt’s Law Of AI Forecasting: “any AI forecast will put strong AI thirty years out from when the forecast is made.” This isn’t exact. Hans Moravec, writing in 1988, said 2010 - so 22 years. Ray Kurzweil, writing in 2001, said 2023 - another 22 years. Vernor Vinge, in a 1993 speech, said 2023, and that was exactly 30 years, but Vinge knew about Platt’s Law and might have been joking. The point is: OpenPhil wrote a report in 2020 that predicted strong AI in 2052, isn’t that kind of suspicious? I’d previously mentioned it as a plus that Ajeya got around the same year everyone else got. The forecasters on Metaculus. The experts surveyed in Grace et al. Lots of other smart experts with clever models. But what if all of these experts and models and analyses are just fudging the numbers for the same Platt’s-Law-related reasons? Hypothetical OpenPhil is BTFO: OpenPhil: That part about Charles Platt's generalization is interesting, but just because we unwittingly chose literally exactly the median that Platt predicted people would always choose in consistent error, that doesn't justify dismissing our work, right? We could have used a completely valid method of estimation which would have pointed to 2050 no matter which year it was tried in, and, by sheer coincidence, have first written that up in 2020. In fact, we try to show in the report that the same methodology, evaluated in earlier years, would also have pointed to around 2050 - Eliezer: Look, people keep trying this. It's never worked. It's never going to work. 2 years before the end of the world, there'll be another published biologically inspired estimate showing that AGI is 30 years away and it will be exactly as informative then as it is now. I'd love to know the timelines too, but you're not going to get the answer you want until right before the end of the world, and maybe not even then unless you're paying very close attention. Timing this stuff is just plain hard. Part III: Responses And Commentary Response 1: Less Wrong Comments Less Wrong is a site founded by Eliezer Yudkowsky for Eliezer Yudkowsky fans who wanted to discuss Eliezer Yudkowsky’s ideas. So, for whatever it’s worth - the comments on his essay were pretty negative. Carl Shulman, an independent researcher with links to both OpenPhil and MIRI (Eliezer’s org), writes the top-voted comment. He works from a model where there is hardware progress, software progress downstream of hardware progress, and independent (ie unrelated to algorithms) software progress, and where the first two make up most progress on the margin. Researchers generally develop new paradigms once they have enough compute available to tinker with them. Progress in AI has largely been a function of increasing compute, human software research efforts, and serial time/steps. Throwing more compute at researchers has improved performance both directly and indirectly (e.g. by enabling more experiments, refining evaluation functions in chess, training neural networks, or making algorithms that work best with large compute more attractive). Historically compute has grown by many orders of magnitude, while human labor applied to AI and supporting software by only a few. And on plausible decompositions of progress (allowing for adjustment of software to current hardware and vice versa), hardware growth accounts for more of the progress over time than human labor input growth. So if you're going to use an AI production function for tech forecasting based on inputs (which do relatively OK by the standards tech forecasting), it's best to use all of compute, labor, and time, but it makes sense for compute to have pride of place and take in more modeling effort and attention, since it's the biggest source of change (particularly when including software gains downstream of hardware technology and expenditures). […] A perfectly correlated time series of compute and labor would not let us say which had the larger marginal contribution, but we have resources to get at that, which I was referring to with 'plausible decompositions.' This includes experiments with old and new software and hardware, like the chess ones Paul recently commissioned, and studies by AI Impacts, OpenAI, and Neil Thompson. There are AI scaling experiments, and observations of the results of shocks like the end of Dennard scaling, the availability of GPGPU computing, and Besiroglu's data on the relative predictive power of computer and labor in individual papers and subfields. In different ways those tend to put hardware as driving more log improvement than software (with both contributing), particularly if we consider software innovations downstream of hardware changes. Vanessa Kosoy makes the obvious objection, which echoes a comment of Eliezer’s in the dialogue above: I'm confused how can this pass some obvious tests. For example, do you claim that alpha-beta pruning can match AlphaGo given some not-crazy advantage in compute? Do you claim that SVMs can do SOTA image classification with not-crazy advantage in compute (or with any amount of compute with the same training data)? Can Eliza-style chatbots compete with GPT3 however we scale them up? Mark Xu answers: My model is something like: For any given algorithm, e.g. SVMs, AlphaGo, alpha-beta pruning, convnets, etc., there is an "effective compute regime" where dumping more compute makes them better. If you go above this regime, you get steep diminishing marginal returns.
That is - suppose before we read Ajeya’s report, we started with some distribution over when we’d get AGI. For me, not being an expert in this area, this would be some combination of the Metaculus forecast and the Grace et al expert survey, slightly pushed various directions by the views of individual smart people I trust. Now Ajeya says maybe it’s more like some other distribution. I should end up with a distribution somewhere in between my prior and this new evidence. But where?
1: Eli Lifland and Misha Yagudin have asked me to announce the Impactful Forecasting Prize, with $2,000 for first prize and more money available for other winners. Read the rules (bolded link above), write up forecasts on one of these Metaculus questions and submit via this form by March 11. They’ll also be having a meetup in Gather on March 2.
Starting with Metaculus:
— Will Kyiv fall to Russian forces by April 1 2022? 69% chance This is the most-predicted relevant question on Metaculus right now. The first day of the war, the market predicted as high as 90%; as people realized the strength of Ukrainian resistance, it fell to 80. Mid-Saturday there was a sudden drop from 78% to 72%, after some combination of a defiant Zelenskyy speech and a report that Russian paratroopers had been repelled. Since then it’s barely budged.
This is the most-predicted relevant question on Metaculus right now. The first day of the war, the market predicted as high as 90%; as people realized the strength of Ukrainian resistance, it fell to 80. Mid-Saturday there was a sudden drop from 78% to 72%, after some combination of a defiant Zelenskyy speech and a report that Russian paratroopers had been repelled. Since then it’s barely budged.
a. Metaculus Alerts is a Twitter bot that alerts you when a Metaculus prediction on the Ukraine war has changed drastically in a short time. For example, “the chance of Russia taking Kiev by April has decreased 10% in the past 24 hours”. I find this a good substitute to refreshing the news every minute to see if something interesting has happened.
k. Metaculus thinks Russia might soon close its borders. It might be helpful to talk to Russians you know about getting out of Russia if they can, before things get worse. See also Letter: Russians Are Welcome In America - though I don’t know what the visa situation is like now and it might be terrible.
Will Kiev fall to Russian forces by April 2022?: 69% —→ 14%
Will at least three of six big cities fall by June 1?: 71% —→ 70%
Will World War III happen before 2050?: 20% —→21%
Will Kiev fall to Russian forces by April 2022?: 14% —→ 2%
Will at least three of six big cities fall by June 1?: 70% —→ 53%
Will World War III happen before 2050?: 21% —→20%
Metaculus currently has him at 40% to win the primary and 29% to win the general. I’m closer to 60/45. Although he’s getting support from some big funders, campaign finance privileges small-to-medium-sized donations from ordinary people. If you want to support him, you can see a list of possible options here - including donations. You can donate max $2900 for the primary, plus another $2900 for the general that will be refunded if he doesn’t make it. If you do donate, it would be extra helpful if the money came in before a key reporting deadline March 31. 3: Every year in autumn I hold a big Meetups Everywhere event, and every time people tell me I should do it more often than once a year. So this time we’ll hold a mini-Meetups-Everywhere this April. It won’t be any different from your usual meetup schedule except that it’ll be the Schelling time for everyone who only wants to come once every few months to come. If you’re a meetups organizer (or want to become one), please fill in this form with the date of a meetup April 11th or later. Next Sunday I’ll put the results on the Open Thread for people to see. 4: Speaking of meetups, the rationalist/EA establishment is trying to promote local meetups. If you’re a local ACX/LW meetups organizer, you’re potentially invited to attend an all-expenses paid retreat in California in July with our meetups czar Mingyuan. Please read more here, then fill in this form to get on her radar. 5: And speaking of Mingyuan, she is going to inspect - sorry, enjoy the hospitality of - the East Coast meetup groups. She’ll be in DC: 4/11–4/13 Baltimore: 4/14 Philadelphia: 4/15–4/16 NYC: 4/17–4/21 Yale: 4/22–4/23 Northampton: 4/24–4/25 Boston: 4/26–5/1. The local groups have already taken care of having meetups at the right time, but she’s looking for people who could host her and drive her between cities . Email meetupsmingyuan@gmail.com if you can help. 6: Last week I tried to figure out the needs of community members in Russia and Ukraine. There are some great resources on the thread, but issues that still need solving: Seven Ukrainian refugees looking for remote work
This is the Metaculus forecasting question corresponding to Paul’s preferred formulation of hard/soft takeoff. Metaculans think there’s a 69% chance it’s true. But it fell by about 4% after the debate, suggesting that some people got won over to Eliezer’s point of view.
17: Metaculus: will plant-based meat pass a “Turing test” (where people can’t distinguish it from real meat) by 2023? Currently at 55%
Will at least three of six big cities fall by June 1?: 53% → 5%
Will World War III happen before 2050?: 20% →22%
Will Russia invade any other country in 2022?: 7% →5%
Will at least three of six big cities fall by June 1?: 5% → 2%
Will World War III happen before 2050?: 22% →25%
Will Russia invade any other country in 2022?: 5% →10%
Metaculus predicts 17000 cases and 400 deaths from monkeypox this year. But as usual, it’s all about the distribution 90% chance of fewer than 400,000 cases. 95% chance of fewer than 2.2 million cases. 98% chance of fewer than 500 million cases. This is encouraging, but a 2% chance of >500 million cases (there have been about 500 million recorded COVID infections total) is still very bad. Does Metaculus say this because it’s true, or because there will always be a few crazy people entering very large numbers without modeling anything carefully? I’m not sure. How would you test that? Warcasting The war in Ukraine has shifted into a new phase, with Russia concentrating in Donetsk and Luhansk, and finally beginning to make good use of its artillery advantage. I’m going to stop following the old Kiev-centric set of questions and replace them with more appropriate ones: Notice that this continues to rise, from 16% a month ago to 22% today. See Eikonal’s comment here for some discussion of how this might happen and what territories these might be (and note that we switched from Ukrainian control in the last question to Russian control in this one). I’m keeping this one in here, but it never changes. Meanwhile, on Insight Prediction: $2000 in liquidity and still 14% off from Metaculus, weird. Musk Vs. Marcus Elon Musk recently said he thought we might have AGI before 2029, and Gary Marcus said we wouldn’t and offered to bet on it. It’s an important tradition of AGI discussions that nobody can ever agree on a definition of it and it has to be re-invented every time the topic comes up. Marcus proposed five different things he thought an AI couldn’t do before 2029, such that if it does them, he admits he was wrong and Musk wins the bet (which purely hypothetical at this point; Musk hasn’t responded). The AI would have to do at least three of: Read a novel and answer complicated questions about eg the themes (existing language models can do this with pre-digested novels, eg LAMDA talking about Les Miserables here - I think Marcus means you have to give it a new novel that it has no corpus of humans ever having discussed before, and make it do the work itself).
90% chance of fewer than 400,000 cases. 95% chance of fewer than 2.2 million cases. 98% chance of fewer than 500 million cases. This is encouraging, but a 2% chance of >500 million cases (there have been about 500 million recorded COVID infections total) is still very bad. Does Metaculus say this because it’s true, or because there will always be a few crazy people entering very large numbers without modeling anything carefully? I’m not sure. How would you test that? Warcasting The war in Ukraine has shifted into a new phase, with Russia concentrating in Donetsk and Luhansk, and finally beginning to make good use of its artillery advantage. I’m going to stop following the old Kiev-centric set of questions and replace them with more appropriate ones: Notice that this continues to rise, from 16% a month ago to 22% today. See Eikonal’s comment here for some discussion of how this might happen and what territories these might be (and note that we switched from Ukrainian control in the last question to Russian control in this one). I’m keeping this one in here, but it never changes. Meanwhile, on Insight Prediction: $2000 in liquidity and still 14% off from Metaculus, weird. Musk Vs. Marcus Elon Musk recently said he thought we might have AGI before 2029, and Gary Marcus said we wouldn’t and offered to bet on it. It’s an important tradition of AGI discussions that nobody can ever agree on a definition of it and it has to be re-invented every time the topic comes up. Marcus proposed five different things he thought an AI couldn’t do before 2029, such that if it does them, he admits he was wrong and Musk wins the bet (which purely hypothetical at this point; Musk hasn’t responded). The AI would have to do at least three of: Read a novel and answer complicated questions about eg the themes (existing language models can do this with pre-digested novels, eg LAMDA talking about Les Miserables here - I think Marcus means you have to give it a new novel that it has no corpus of humans ever having discussed before, and make it do the work itself).
$2000 in liquidity and still 14% off from Metaculus, weird. Musk Vs. Marcus Elon Musk recently said he thought we might have AGI before 2029, and Gary Marcus said we wouldn’t and offered to bet on it. It’s an important tradition of AGI discussions that nobody can ever agree on a definition of it and it has to be re-invented every time the topic comes up. Marcus proposed five different things he thought an AI couldn’t do before 2029, such that if it does them, he admits he was wrong and Musk wins the bet (which purely hypothetical at this point; Musk hasn’t responded). The AI would have to do at least three of: Read a novel and answer complicated questions about eg the themes (existing language models can do this with pre-digested novels, eg LAMDA talking about Les Miserables here - I think Marcus means you have to give it a new novel that it has no corpus of humans ever having discussed before, and make it do the work itself).
There is (inexplicably) no PredictIt market on whether Trump will run, but the relevant Metaculus question is at 82% (and didn’t rise much with the NYT article). If the NYT article caused the big spike in Trump predictions on PredictIt, something is wrong either with their market or with this one.
Metaculus lists the chance of Elon Musk becoming CEO of Twitter by 2025 as 10%.
Metaculus is hard to read on this question - they really should make it easier to zoom in on their graphs, or at least give specific dates - but it looks like they show the same pattern:
Metaculus predicts Artificial General Intelligence (by their specific definition, which you can check here) in 2029, and superintelligence (see definition here) 41 months (ie 3.5 years) after that. This is why I even though I love predictions, I couldn’t bring myself to participate in the “predict what the world will be like in 2050!” contest that was going around this part of the blogosphere recently. Even 2050 is starting not to seem like a very real year. Don’t get me wrong, I think there’s even odds it happens, I would just feel silly predicting something like “US politics will center around this set of issues” and then 2050 comes along and things are more like “the cloud of microscopic death robots that used to be our solar system has expanded as far as Sirius B”.
From Metaculus (source) And are we seriously expecting First World countries to be worrying about labor shortages by 2100?
The community consensus so far seems to be to try to avoid Kalshi as long as it can. There are some good real-money prediction markets open to non-Americans: Polymarket, Futuur, Hedgehog, and Insight Prediction, although Americans will find visits prohibited nationally, and I would never recommend violating precepts negligently. You could also try play-money markets like Manifold, or market-adjacent forecasting sites like Metaculus.
This is just a lot of really smart people making lots and lots of bets on serious questions, and it makes me optimistic that Manifold’s flaws are shallow and its potential is high. Kudos to everyone involved - and if you want to participate, go to this page. This Week In The Markets Chance of a war with China by 2050 up from 30% in April to 55% now. What’s changed?
Chance of a war with China by 2050 up from 30% in April to 55% now. What’s changed?
HUNTSVILLE, AL Contact: Mike, mjhouse[at]protonmail[dot]com Time: Saturday, September 3, 3:00 PM Location: Barnes & Noble – 300 The Bridge St #100, Huntsville, AL 35806. I'll be in the cafe with a sign that says ACX MEETUP on it. Coordinates: 866MP88H+53 Event link(s): LessWrong Notes: Barnes & Noble has an area for little kids. If you want to bring a service animal, that's probably fine, but I doubt they allow pets. PHOENIX, AZ Contact: Ben Morin, benjamin[dot]j[dot]morin[at]gmail[dot]com Time: Saturday, October 15, 1:00 PM Location: Thirsty Lion Pub in Tempe. I will have a table with an ACX sign. Coordinates: 8559FVVQ+6C Event link(s): LessWrong Group info: This will be our 5th meetup (started during the meetups everywhere last year). Notes: Please email if interested to be added to the email list, even if you can't make this event BELMONT, CA Contact: Moshe Z., belmont-acx[at]devskillup[dot]com Time: Sunday, September 4, 2:00 PM Location: Twin Pines Park, Picnic Tables. The table will have some sign saying 'ACX Meetup' on it. Coordinates: 849VGP8C+RRG Event link(s): LessWrong Group info: You can join the mailing list here. BERKELEY, CA Contact: Scott Time: Sunday, September 18, 1:00 PM Location: Rose Garden Inn, a rationalist event space at 2740 Telegraph Ave. Come in through the front gate on Telegraph. Coordinates: 849VVP5R+X7V Event link(s): LessWrong Group info: The Bay rationality community has a mailing list, a Discord server, and a Facebook group. There are dinner meetups every Thursday at 7 PM in the East Bay, and occasional meetups in SF and South Bay. FILLMORE, CA Contact: Ryan, wiserd[at]gmail[dot]com, Discord: Wiserd#0906 Time: Saturday, October 1st, 6:00 PM Location: It's my house. There are a bunch of plants on the porch and garbage bins in the driveway. Coordinates: 856393VX+VQ Event link(s): LessWrong Notes: Please RSVP to my email or Discord. Kids and dogs are welcome in the back yard. Full vaccinations (on the honor system) and masks required. GRASS VALLEY, CA Contact: Max Harms, raelifin[at]gmail[dot]com Time: Saturday, September 10, 2:00 PM Location: Condon Park by the prospector statue. In the case of rain we'll change the location to a residence, so RSVP to get updated! Coordinates: 84FW6W8H+C5 Event link(s): LessWrong IRVINE, CA Contact: Nick C, cohenskijanuary1[at]mail[dot]com Time: Saturday, October 1, 2:00 PM Location: University Town Center Coordinates: 8554M526+7H Event link(s): LessWrong Group info: We meet once a month at the same location. LOS ANGELES, CA Contact: Vishal Prasad (koreindian), vprasadcs[at]gmail[dot]com, Contact me on Discord. I am "Vishal" on the server. Time: Saturday, October 8, 6:30 PM Location: 11841 Wagner St., Culver City, CA 90039 Coordinates: 8553XHWM+GP Event link(s): LessWrong Group info: We meet weekly every Wednesday. We have been around for over 8 years. We discuss articles, watch movies, lift weights. We have a Discord server, a LessWrong group, and a website! Notes: Please RSVP on LessWrong so I know how much food to get. NEWPORT BEACH, CA Contact: Michael M, michaelmichalchik[at]gmail[dot]com Time: Saturday, August 27, 2:00 PM Location: Picnic tables next to 1900 Port Carlow community clubhouse. The park is verdant and pleasant and easy to access. Free street parking nearby. In case of bad weather, we have a couple of near by places to relocate to. Coordinates: 8554J48R+WCX Event link(s): LessWrong, Facebook event Group info: We will meet most Saturdays at 2pm until whenever. There will be short suggested readings and question most weeks to spur conversation, but they are optional. Each week we will ask if people have had something happen recently that surprised them or changed the way they looked at the world. Something that should or did update their priors. Participation is optional. Notes: Its a public park with tables and BBQ's so you can bring food and well behaved pets. We may regularly go on casual walks in the surrounding area. SAN DIEGO, CA Contact: Julius, julius[dot]simonelli[at]gmail[dot]com Time: Sunday, October 9, 3:00 PM Location: We will meet up in Bird Park. I will be wearing a red shirt. Coordinates: 8544PVQ8+Q7 Event link(s): LessWrong, Meetup.com Group info: Join our Discord server SAN FRANCISCO, CA Contact: Derek Pankaew, derekpankaew[at]gmail[dot]com Time: Sunday, September 18, 11:00 AM Location: We'll between in the Panhandle, between Ashbury and Masonic, with a 'ACX' sign. Coordinates: 849VQHC3+V8 Event link(s): LessWrong SAN JOSE, CA Contact: David Friedman, ddfr[at]daviddfriedman[dot]com Time: Saturday, September 17, 2:00 PM Location: 3806 Williams Rd, San Jose, CA 95117 Coordinates: 849W825J+6P Event link(s): LessWrong Group info: Before Covid we hosted every month or two. No structure, just conversation and food. We feed everyone who is still there at dinner time. We have done it once or twice since Covid. I have an email list of interested people. Notes: Kids are welcome. Please RSVP to my email so I will have a rough count of how many we are feeding. SAN MARCOS, CA Contact: Eric F., EricF14159[at]gmail[dot]com Time: Sunday, September 25, 2:00 PM Location: Hollandia Park Soccer Field. At the tables near the top parking lot. Coordinates: 85544VW4+RV Event link(s): LessWrong BOULDER, CO Contact: Josh Sacks, josh[dot]sacks+acx[at]gmail[dot]com Time: Sunday, October 16, 3:00 PM Location: 9191 Tahoe Ln, Boulder, CO 80301 Coordinates: 85GP2V96+JQ Event link(s): LessWrong Notes: Please RSVP on LessWrong so we know ~ how many people to expect! CARBONDALE, CO Contact: Nick, naj[at]njarboe[dot]com Time: Saturday, September 3, 1:00 PM Location: Sopris Park - Center covered picnic tables - blue shirt with ACX sign on table Coordinates: 85FJ9QXP+QMF Event link(s): LessWrong DENVER, CO Contact: Ian Philips, iansphilips[at]gmail[dot]com, Discord: palebone#2796 Time: Sunday, October 2, 11:00 AM Location: We'll be in the backyard patio of St. Mark's Coffee House. I'll wear a white shirt with (my brothers') baby faces on it and have a brown hat on. Coordinates: 85FQP2VP+9R Event link(s): LessWrong Group info: We meet typically 4 times a year. LAKEWOOD, CO Contact: Steven Zuber, stevenjzuber[at]gmail[dot]com Time: Wednesday, October 5, 7:00 PM Location: We meet in the clubhouse located in this townhome community: 8769 W Cornell Ave Lakewood, CO 80227 Coordinates: 85FPMW64+MW Event link(s): LessWrong, Meetup.com Group info: We meet the first Wednesday of every month. Informal, casual atmosphere with occasional presentations by people. Notes: Check the Meetup page or Facebook group for updates. FAIRFIELD, CT Contact: Justin Barclay, barclay[dot]justin[at]gmail[dot]com Time: Saturday, September 10, 10:00 AM Location: South Pine Creek Beach. I'll set up near the lifeguard stand. Coordinates: 87H84PCH+CM Event link(s): LessWrong MANCHESTER, CT Contact: Mike, park-mike[at]outlook[dot]com Time: Saturday, September 17, 5:00 PM Location: Near flagpole on top of hill Coordinates: 87H9QFFH+J7 Event link(s): LessWrong NEW HAVEN, CT Contact: RM, acx[dot]meetup[dot]nhv[at]gmail[dot]com Time: Sunday, September 18, 12:30 PM Location: Cross Campus (Yale University), New Haven, CT 06511. We'll be on the grass on the northern half of Cross Campus, closest to Sterling Memorial Library. I'll be wearing an orange shirt. Coordinates: 87H9836C+8VG Event link(s): LessWrong Notes: Feel free to bring friends! The vibe will be welcoming and relaxed, and you can stay for any amount of time. Please email me if you're thinking about coming so I can get the right number of Insomnia cookies! WASHINGTON, DC Contact: John Bennett, WashingtonDCAstralCodexTen[at]gmail[dot]com Time: Saturday, September 17, 6:00 PM Location: Froggy Bottom Pub: 2021 K Street NW, Washington, D.C. 20006 Coordinates: 87C4WX33+3J Event link(s): LessWrong, Facebook event Group info: The Washington DC ACX/SSC group has been active since the first Meetups Everywhere in 2017. We have Monthly Socials downtown, hikes, board game days, and other cultural events. We're looking to spin up more rationality Dojo-type events with nearby groups in the coming months. Notes: We've rented out the Froggy Bottom Pub for the night, dinner and soft drinks will be provided. Alcohol available for purchase if desired, but no purchases are required. Metered street parking on nearby blocks is free after 6:30. Closest Metros are Farragut West and Farragut North. CAPE CORAL / FORT MYERS, FL Contact: Shawn Spilman, shawn[dot]spilman[at]outlook[dot]com, 508 655 8123 Time: Sunday, October 2, 1:00 PM Location: 929 SW 54th Ln, Cape Coral, FL 33914 Coordinates: 76RWH224+44 Event link(s): LessWrong Notes: RSVP via email. I can be flexible about the date. GULF BREEZE / PENSACOLA, FL Contact: Christian, christian[dot]h[dot]williams[at]gmail[dot]com Time: Wednesday, October 12, 7:30 PM Location: The Bridge Bar - 33 Gulf Breeze Pkwy A, Gulf Breeze, FL 32561 Coordinates: 862J9RCF+G6 Event link(s): LessWrong Notes: Please RSVP by emailing me. Thanks! If I don't hear from anyone, I won't be there. I work for Metaculus, but promise not to talk your ear off about forecasting. (Unless you want it talked off.) MIAMI, FL Contact: Eric Magro, eric135033[at]gmail[dot]com, Discord: eric135#4943 Time: Sunday, September 11, 5:00 PM Location: Buckminster Fuller Fly's Eye Dome 140 NE 39th St #001, Miami, FL 33137 ----- Look for a paper sign on a table that says ACX MEETUP west of the dome. Coordinates: 76QXRR65+V2 Event link(s): LessWrong Group info: Miami ACX started in 2017. Our official meetup happens monthly in either Miami or Broward. There are activities happening on a weekly basis from Miami to Palm Beach. We have a Facebook group, Discord server, and Meetup.com group. ORLANDO, FL Contact: Noah Topper, noah[dot]topper[at]gmail[dot]com Time: Friday, September 16, 7:00 PM Location: 4000 Central Florida Blvd, Orlando, FL. We'll be meeting up at UCF's pavilion near Garages A and I. I'll have a pretty ACX Meetup sign. Coordinates: 76WWJQ2X+82 Event link(s): LessWrong Group info: We try to meet up once a month, so far they've just been casual social meetups with natural discussions of rationality topics. Here's our Discord link :) Notes: RSVPs on LessWrong would be greatly appreciated. :) TALLAHASSEE, FL Contact: JF, jf19o[at]fsu[dot]edu Time: Monday, August 29, 2:00 PM Location: Landis, FSU. I will be wearing a black shirt Coordinates: 862QCPR3+PX Event link(s): LessWrong ATHENS, GA Contact: Dallon, knox[dot]dallon[dot]a[at]gmail[dot]com, Discord: leonard#4208 Time: Saturday, October 15, 3:00 PM Location: Hendershots on Prince Avenue Coordinates: 865RXJ68+2W Event link(s): LessWrong Notes: I might bring some board games ATLANTA, GA Contact: Steve French, steve[at]digitaltoolfactory[dot]net Time: Saturday, September 17, 2:00 PM Location: Bold Monk Brewing - 1737 Ellsworth Industrial Blvd NW suite d-1 · Atlanta, GA (upstairs – look for the ACX Atlanta sign) Coordinates: 865QRH2F+V8 Event link(s): LessWrong, Meetup.com Group info: We've been in existence for four years – we have a dedicated crew and a very active Slack group Notes: Please RSVP on LessWrong or Meetup.com HONOLULU, HI Contact: Matt Popovich, mattpopovich[at]outlook[dot]com Time: Saturday, September 3, 4:00 PM Location: We'll meet at Magic Island at Ala Moana Beach Park, 1201 Ala Moana Blvd, Honolulu, HI 96814. From the parking lot, walk along the left side of the peninsula out toward Magic Island Lagoon. We're usually near the end of the peninsula, somewhere around the bathroom building. Look for the large 'ACX' sign. Coordinates: 73H475M3+JP Event link(s): LessWrong, Meetup.com Group info: Honolulu Rationality hosts discussion meetups about twice a month in Ala Moana Beach Park. Check us out on our website BOISE, ID Contact: Julia and John, jae[dot]miomu[at]gmail[dot]com Time: Friday, October 7, 6:00 PM Location: Old Timer's Shelter in Ann Morrison Park. I will have an ACX sign. Coordinates: 85M5JQ6P+96 Event link(s): LessWrong Notes: Please RSVP and feel free to bring kids. CHAMPAIGN-URBANA, IL Contact: Ben, cu[dot]acx[dot]meetups[at]gmail[dot]com Time: Friday, September 9, 7:00 PM Location: Siebel Center for Computer Science, Room 4403 Coordinates: 86GH4Q7G+H8F Event link(s): LessWrong Group info: Discord server Notes: RSVPs are appreciated but not at all required. You can RSVP by email or by pinging me in the Discord server. Suggested entrance is the East side of the building (see Coordinates) - we'll try to make sure at least that door is unlocked, but if it isn't then ping us on email or Discord. CHICAGO, IL Contact: Todd, info[at]chicagorationality[dot]com, https://chicagorationality.com/ Time: Sunday, September 18, 1:00 PM Location: Grant Park - North side of Balbo between the tracks and Columbus Coordinates: 86HJV9FH+84 Event link(s): LessWrong Group info: Chicago Rationality does a monthly discussion meetup (typically the first Saturday of the month) and a monthly social meetup (typically the third weekend of the month) Notes: Sign up for our email list to be notified of future meetups EVANSTON, IL Contact: Uzair, uzairq93[at]gmail[dot]com Time: Saturday, October 1, 7:00 PM Location: 626 Church Street, Evanston IL 60201 Coordinates: 86JJ28X9+5WQ Event link(s): LessWrong Notes: The venue is a pub but it's really more of a restaurant, big long tables available so space should be fine and non drinkers shouldn't feel too out of place. BLOOMINGTON, IN Contact: Avery, acxbloomington[at]fastmail[dot]com Time: Sunday, October 16, 2:00 PM Location: Switchyard Park. Will be at one of the tables near the Rogers Street parking lot. I will bring a cardboard sign that says “ACX”. Coordinates: 86FM4FX6+4Q Event link(s): LessWrong Group info: We met last year for Meetups Everywhere and it was fun! Here's a link to our Discord. Notes: You can RSVP via Discord or email, but you are encouraged to show up even if you did not RSVP! WEST LAFAYETTE, IN Contact: NR, mapreader4[at]gmail[dot]com Time: Saturday, September 17, 1:00 PM Location: 1275 1st Street, West Lafayette, IN 47906. We'll be in the south of the Earhart Hall lobby (not the dining court) near the piano, and I will be wearing a green shirt and carrying a sign with ACX MEETUP on it. Coordinates: 86GMC3GG+728 Event link(s): LessWrong LEXINGTON, KY Contact: Nathan, nwculley[at]gmail[dot]com Time: Saturday, September 3, 7:00 PM Location: Blue Stallion Brewing. 610 W. 3rd St., Lexington, KY 40508. We will have a sign indicating we are the ACX meetup. Coordinates: 86CQ3F4X+VF Event link(s): LessWrong Group info: We meet 1-2 times a month to talk about ACX, books, memes, etc., often over drinks and board games. NEW ORLEANS, LA Contact: Blake, blake[at]philosophers[dot]group Time: Sunday, September 4, 11:11 AM Location: Petite Clouet Cafe. Look for the group with an iPad that has a People’s Pint sticker. Coordinates: 76XFXX73+8R Event link(s): LessWrong Group info: Website Notes: Hybrid in-person and online, video link sent weekly. Email for the link. BOSTON, MA Contact: Robi Rahman, robirahman94[at]gmail[dot]com, 7039818526 Time: Saturday, September 10, 5:00 PM Location: Boston Common, at the Parkman Bandstand gazebo Coordinates: 87JC9W3M+PR Event link(s): LessWrong, Facebook event Group info: Mailing list, Facebook group, Meetup.com Notes: We'll be providing food at the meetup, and giving out free books related to ACX, rationality, and effective altruism. Email the hosts if you'd like a particular book or you have any dietary restrictions. Our group is also doing a tour of the JFK Presidential Library on September 9, you’re welcome to join! NORTHAMPTON, MA Contact: Alex, alex[at]alexliebowitz[dot]com Time: Friday, September 9, 6:00 PM Location: The Deck, 125A Pleasant St., Northampton MA 01096. The official address is bizarre and inaccurate; it's the outdoor dining part of a group of bars & restaurants in a former rail station... a whole block away from Pleasant St. The simplest way to get to The Deck is to enter The Platform, one of the other restaurants, by its street entrance around 36 Strong Ave., here (make sure to look at street view). Go inside and ask them to show you to The Deck. We'll have a sign. Coordinates: 87J9899F+H7H Event link(s): LessWrong, Facebook event Group info: We started in the 2018 Meetups Everywhere and is still going strong. We aim to meet about once every two weeks. At most meetups we get about 5-7 people out of a rotation of 15-20; Meetups Everywhere and other special events tend to bring in a few more than usual. We're a totally social meetup with no 'format' or suggested readings. Although it's not rare for us to touch on ACX articles and related topics, the conversation varies wildly, and you are welcome even if you're the most occasional ACX reader. Notes: We have a (not very active) Discord where you can DM me or post on a public channel. I'm most responsive by email. There is a small chance we'll have to change the location to somewhere else in Northampton. Please check the Less Wrong or Facebook posts on or after August 26 to get the final word on location. BALTIMORE, MD Contact: Rivka, rivka[at]adrusi[dot]com Time: Sunday, September 11, 7:00 PM Location: UMBC outside of the Performing Arts and Humanities Building, on the north side. I will have a sign that says ACX meetup. Parking is free on the weekends. Edit: Rain is forecasted; if it’s raining, we will be inside of the Performing Arts building, on the ground floor just inside the entrance. Coordinates: 87F5774P+53 Event link(s): LessWrong Group info: We meet Sundays at 7pm — half are in person and half are virtual. Notes: There will be pizza and drinks DETROIT, MI Contact: Matt Arnold, matt[dot]mattarn[at]gmail[dot]com Time: Tuesday, September 20, 7:00 PM Location: Tenacity Craft, 8517 2nd Ave, Detroit, MI 48202 Coordinates: 86JR9WG9+R6 Event link(s): LessWrong MINNEAPOLIS, MN Contact: Timothy, tmbond[at]gmail[dot]com Time: Saturday, September 10, 1:00 PM Location: Meet at the picnic tables near the southeast corner of Powderhorn Park - the ones by the parking lot. I will be wearing a green Google t-shirt and have a sign that says ACX. Coordinates: 86P8WPRW+76 Event link(s): LessWrong Notes: I will bring some snacks (but not a full lunch, so eat before or bring something if you'll be that hungry). Please RSVP on LessWrong. KANSAS CITY, MO Contact: Alex, alex[dot]hedtke[at]gmail[dot]com Time: Friday, September 16, 6:30 PM Location: We will be in the courtyard above Whole Foods (which is also an apartment complex). You can enter through the apartment lobby, located on Oak Street. We will have runners shepherding people from the entrance up to the courtyard. Coordinates: 86F72CM8+RR Event link(s): LessWrong, Meetup.com SAINT LOUIS, MO Contact: JohnBuridan, littlejohnburidan[at]gmail[dot]com Time: Saturday, October 8, 1:00 PM Location: Lily Pond Shelter, Tower Grove Park, St. Louis Coordinates: 86CFJP4R+XV Event link(s): LessWrong Notes: BYOB WEST PLAINS, MO Contact: Liam, liamhession[at]gmail[dot]com Time: Saturday, September 17, 12:00 PM Location: 10/40 Coffee, 24 Court Square, West Plains, MO Coordinates: 868CP4HW+CV Event link(s): LessWrong Notes: Hoping to get anyone from around the Ozark region DURHAM, NC Contact: Will Jarvis, willdjarvis[at]gmail[dot]com Time: Thursday, September 8, 7:30 PM Location: Ponysaurus Brewing Company, 219 Hood St, Durham Coordinates: 8773X4Q3+QW Event link(s): LessWrong Group info: We meet weekly! We also have a Discord LAKEWOOD, NJ Contact: Ben L, mywebdev3[at]gmail[dot]com Time: Saturday, October 29, 8:30 PM Location: TBD Event link(s): LessWrong MORRISTOWN, NJ Contact: Matt, matt[dot]brooks[at]impactmarkets[dot]io, Discord: Matt B#0216 Time: Saturday, October 1, 2:00 PM Location: 10 N Park Pl, Morristown, NJ 07960 (at the center of the Morristown Green) Coordinates: 87G7QGW9+RJ Event link(s): LessWrong Group info: This is the first meetup, come be a founding member of the Northern NJ ACX/EA/LW group! PRINCETON, NJ Contact: Danny K, dskumpf[at]gmail[dot]com Time: Saturday, October 1, 3:00 PM Location: Palmer Square, Princeton, NJ 08540. On the green right outside The Bent Spoon and Rojo's Roastary, near the big tree. I'll have some sort of ACX Meetup sign! Coordinates: 87G7982Q+2CP Event link(s): LessWrong LAS VEGAS, NV Contact: Jonathan Ray, ray[dot]jonathan[dot]w[at]gmail[dot]com Time: Sunday, September 11, 11:45 AM Location: At El Segundo Sol restaurant with giant ACX MEETUP signs Coordinates: 85864RHJ+3H Event link(s): LessWrong, Facebook event Group info: We meet regularly and mostly just socialize. We have a new Discord server. RENO, NV Contact: Steven, stevenl451[at]gmail[dot]com, Discord: Steeven#7407 Time: Friday, September 2, 5:30 PM Location: We'll be in Crissie Caughlin Park, near the tables and the swing set Coordinates: 85F2G46W+FG Event link(s): LessWrong Notes: Feel free to bring kids/dogs and please RSVP on LessWrong if you are going BUFFALO, NY Contact: George Herold, ggherold[at]gmail[dot]com Time: Sunday, September 11, 1:00 PM Location: 932 Welch Rd. Java Center, NY 14082 Coordinates: 87J3W467+8P Notes: Last-minute location change! LONG ISLAND, NY Contact: Gabe, gabeaweil[at]gmail[dot]com Time: Thursday, October 27, 7:00 PM Location: Whales Tale in Northport Coordinates: 87G8VJRW+99 Event link(s): LessWrong NEW YORK CITY, NY Contact: Jasmine, jasminermj[at]gmail[dot]com Time: Sunday, September 11, 4:00 PM Location: Pavillion @ Rockefeller Park, Warren St / River Terrace Coordinates: 87G7PX9M+4J3 Event link(s): LessWrong Group info: OBNYC has a Discord and a Google Group; the Google Group is the main mailing list we use for events NEWBURGH, NY Contact: Pedro David Bonilla, proportionatetoevidence[at]gmail[dot]com, Cell 8452001681 Time: Saturday, September 24, 10:00 AM Location: Perkins Restaurant & Bakery, 1421 NY-300, Newburgh, NY 12550 Coordinates: 87H7GWCH+GF Event link(s): LessWrong ROCHESTER, NY Contact: Skivverus, skivverus[at]gmail[dot]com, Discord: Skivverus#5915 Time: Saturday, October 8, 1:00 PM Location: 4870 Culver Road; will be wearing a polo shirt, jeans, and glasses, and may or may not have figured out a sign due to just getting back from honeymoon. Look for a pair of parrots, one white, one green with a yellow/orange head. Coordinates: 87M46FM6+Q5P Event link(s): LessWrong Notes: Venue very near amusement park; non-bathroom, non-parking amenities are therefore available but not free. Plan accordingly. Not particularly attached to specific location named, just happen to live reasonably close to there; alternative suggestions acceptable. Canadian visitors also welcome should your logistics permit; airport transportation available. RSVP via Discord preferred, but email will also work. CLEVELAND, OH Contact: Jack Zhang, LukeZhao9[at]protonmail[dot]com Time: Saturday, September 24, 1:00 PM Location: Picnic tables at Wade Oval (university circle) Coordinates: 86HWG96Q+GC5 Event link(s): LessWrong COLUMBUS, OH Contact: Daniel, daniel[dot]m[dot]adamiak[at]gmail[dot]com Time: Saturday, September 17, 3:00 PM Location: Jeffrey Park - Clinton Shelter. I will be wearing a red shirt. Coordinates: 86FVX3C3+QF Event link(s): LessWrong Group info: We meet once a month. We discuss EA, AI and other two letter initialisms. Occasionally we go for walks in local grottos and nature trails. Notes: Email me if you want to be added to the mailing list to receive any updates or future invites. RSVPing is appreciated. TOLEDO, OH Contact: Scout, scout[dot]sivar[at]gmail[dot]com Time: Saturday, September 10, 12:00 PM Location: Black Kite Coffee Coordinates: 86HRMCCV+9R Event link(s): LessWrong OKLAHOMA CITY, OK Contact: bean, battleshipbean[at]gmail[dot]com Time: Sunday, October 9, 1:00 PM Location: Edmond Public Library/Shannon Miller Park. I will be wearing a hat that says USS Iowa on it. Coordinates: 8674MG3C+MW Event link(s): LessWrong Group info: Had four people last year and a good time, moved to Edmond because a lot of us are up here. ALBANY, OR Contact: Kenan (he/him), kbitikofer[at]gmail[dot]com Time: Saturday, October 1, 2:00 PM Location: Bowman Park, Albany, Oregon. In or near the shelter. I will wear a bright red shirt and carry a sign with ACX MEETUP on it. Coordinates: 84PRJWR7+XC6 Event link(s): LessWrong CORVALLIS, OR Contact: Ethan Ashkie, ethanashkie[at]gmail[dot]com Time: Wednesday, September 7, 6:00 PM Location: Common Fields, in the reserved outdoor seating near the entrance Coordinates: 84PRHP5P+VQ Event link(s): LessWrong EUGENE, OR Contact: Ben Smith, benjsmith[at]gmail[dot]com Time: Wednesday, August 31, 7:00 PM Location: The Barn Light, 924 Willamette St, Eugene 97401 Coordinates: 84PR2WX4+VV Event link(s): LessWrong Notes: Please RSVP on LessWrong so I know how much pizza to get, but if you forget, don't worry about it, we want you to come along anyway PORTLAND, OR Contact: Sam F Celarek, support[at]pearcommunity[dot]com, 513-432-3310, Discord: Sam Celarek#2845 Time: Friday, September 9, 5:00 PM Location: 205 NW 4th Ave Coordinates: 84QVG8FG+V4 Event link(s): LessWrong, Meetup.com Group info: Portland Effective Altruism and Rationality is very active. We have book clubs, bi-weekly AI safety meet-ups, bi-weekly topical meet-ups, bi-weekly socials, and have an active Discord. Notes: We would prefer you RSVP on Meetup.com a week beforehand so that we can get the right amount of food! HARRISBURG, PA Contact: Phil, acxharrisburg[at]gmail[dot]com Time: Saturday, September 24, 2:00 PM Location: Ever Grain Brewing Co, 4444 Carlisle Pike, Camp Hill, PA 17011 - We will be sitting at one of the picnic tables outside with an ACX MEETUP sign Coordinates: 87G562QQ+8P Event link(s): LessWrong Group info: Small monthly meetup group based out of Harrisburg - celebrating 1 year of actuality! You can see more of our events on LessWrong. INDIANA, PA Contact: Eric, ericindianapa[at]gmail[dot]com, 717-256-2717 Time: Saturday, September 24, 11:00 AM Location: Caffè Amadeus in downtown Indiana, PA. I will have a sign with 'ACX Meetup' on one of the tables. Coordinates: 87G2JRFX+48 Event link(s): LessWrong Notes: Please RSVP via email or text message so I know how many to expect. PHILADELPHIA, PA Contact: Wes and Diana, rationalphilly[at]gmail[dot]com Time: Thursday, September 22, 6:30 PM Location: The Philadelphia Ethical Society, 1906 Rittenhouse Square. The meeting room is in the basement, look for the signs. Coordinates: 87F6WRXG+FQ Event link(s): LessWrong Group info: We tend to meet in downtown Philly on the last Thursday of the month. We're aiming to make the Ethical Society our new steady location. We have many links: Discord, Google Calendar, Facebook, Meetup, Google Group Notes: We'll be ordering food from a local restaurant, so no need to eat first. BYOB PITTSBURGH, PA Contact: Justin, pghacx[at]gmail[dot]com Time: Saturday, September 24, 2:00 PM Location: Westinghouse Shelter @ Schenley Park (W Circuit Rd near Schenley Dr). We have the outdoor shelter reserved, so light rain shouldn't be a problem, but in the event of extreme weather, we may relocate indoors (our default 'contingency indoor location' is Crazy Mocha Coffee on 2100 Murray Ave in Squirrel Hill). Coordinates: 87G2C3Q4+773 Event link(s): LessWrong Group info: We meet monthly-ish for general discussion and chit-chat, email me if you'd like to be notified of future meetups. STATE COLLEGE, PA Contact: John Slow, auk480[at]psu[dot]edu Time: Thursday, September 8, 5:00 PM Location: Old Main. I will be carrying an ACX meetup sign. Coordinates: 87G4Q4WP+HV Event link(s): LessWrong SAN JUAN, PUERTO RICO Contact: Dan Gelfarb, danielgelfarb[at]gmail[dot]com Time: Saturday, September 10, 1:00 PM Location: Lote 23, back corner under the tents. I will be wearing a blue shirt with a sign that says ACX meetup on it. Coordinates: 77CMCWVM+W32 Event link(s): LessWrong PROVIDENCE, RI Contact: James Bailey, feanor1600[at]gmail[dot]com Time: Saturday, September 17, 4:00 PM Location: Prospect Terrace park, to the right of the Roger Williams statue Coordinates: 87HCRHJV+24 Event link(s): LessWrong SIOUX FALLS, SD Contact: S. C., villainsplus[at]protonmail[dot]com Time: Sunday, October 2, 5:00 PM Location: 410 E 26th St, Sioux Falls, SD 57105 - the pavillion on the west side of McKennan Park, or the tables just south of it if I can't book it. I'll be the guy with the grill. Coordinates: 86M5G7JH+W57 Event link(s): LessWrong MEMPHIS, TN Contact: Michael, michael[at]postlibertarian[dot]com Time: Monday, September 5, 1:00 PM Location: French Truck Coffee at Crosstown Concourse, Central Atrium 1350 Concourse Ave, Memphis, TN 38104. We will be at one of the many tables near French Truck Coffee and I will have a sign that says ACX MEETUP. Coordinates: 867F5X2P+QHC Event link(s): LessWrong Group info: We meet about every month or so. We've been around since 2019 but only regularly since mid 2021 due to the pandemic. We have a Discord server. NASHVILLE, TN Contact: Ellen, enwiegand[at]gmail[dot]com Time: Saturday, October 1, 11:00 AM Location: OneCity Nashville (8 City Blvd, Nashville, TN 37209), next to the volleyball courts. I'll have a pink ballcap that says SPINSTER on it. Coordinates: 868M552H+XW Event link(s): LessWrong AUSTIN, TX Contact: Silas Barta, sbarta[at]gmail[dot]com Time: Saturday, October 8, 12:00 PM Location: 4001 N Lamar, Austin Texas, park by Central Market near stone tables and tents Coordinates: 86248746+8C Event link(s): LessWrong Group info: Austin LessWrong has a weekly focused discussion, a weekly social mixer, a weekly online book club, and a monthly movie night. Been around since 2011. Notes: Location may change as we are talking to other venues BRYAN/COLLEGE STATION, TX Contact: Kenny, easwaran[at]gmail[dot]com Time: Friday, September 9, 5:00 PM Location: Back patio of Torchy's Tacos at Texas and New Main. I'll have a yellow umbrella and pinkish/purple hair Coordinates: JMFC+4J Event link(s): LessWrong DALLAS, TX Contact: Ethan Morse, ethan[dot]morse97[at]gmail[dot]com, Discord: ethanmorse#5255 Time: Sunday, September 11, 12:00 PM Location: Union, 3705 Cedar Springs Rd, Dallas, TX 75219. We'll be in the upstairs conference room. Coordinates: 8645R55R+9M9 Event link(s): LessWrong Notes: Please RSVP on LessWrong so I know how much food to get HOUSTON, TX Contact: Eric Magro, eric135033[at]gmail[dot]com Time: Sunday, September 18, 4:00 PM Location: Empire Cafe, 1732 Westheimer Rd, Houston, TX 77098 ---- Look for a table with an ACX MEETUP sign. Coordinates: 76X6PHVW+5H Event link(s): LessWrong Group info: There are meetups every week. We have a Discord and a Facebook group. WACO, TX Contact: Mike, BaylorACX[at]gmail[dot]com Time: Saturday, October 1, 1:00 PM Location: Cameron Park, picnic tables next to Jacob's Ladder Coordinates: 8634HVG2+V9 Event link(s): LessWrong Notes: Please email me if you're thinking about attending! Would love to start an ACX community here :) SALT LAKE CITY, UT Contact: Ross Richey (aka Jeremiah), wearenotsaved[at]gmail[dot]com Time: Saturday, October 8, 3:00 PM Location: Liberty Park near the ChargePoint stations Coordinates: 85GCP4WF+VJ Event link(s): LessWrong Group info: We meet every other month, we do book clubs and movie nights as well. Notes: Will be outdoors. If the weather looks bad, email event organizer to check on location. CHARLOTTESVILLE, VA Contact: RL, effectivealtruismatuva[at]gmail[dot]com Time: Sunday, September 4, 5:00 PM Location: 12 Rotunda Drive Charlottesville, VA 22903 - We’ll meet at the picnic tables across the street from The Virginian. There will be an ACX sign. Coordinates: 87C32FPX+3H4 Event link(s): LessWrong LYNCHBURG, VA Contact: Craig, craigbdaniel[at]gmail[dot]com Time: Saturday, September 17, 4:00 PM Location: Three Roads Brewing - I will be wearing a purple t-shirt and will place an ""ACX"" card on the table Coordinates: 8792CV65+5G NORFOLK, VA Contact: Willa, walambert[at]pm[dot]me Time: Sunday, September 18, 4:00 PM Location: Pagoda & Oriental Garden, 265 W Tazewell St, Norfolk, VA 23510. I will be wearing a bright green shirt, will have a large green & yellow hat on, and will have a sign with ACX Meetup on it. Coordinates: 8785RPX4+W3 Event link(s): LessWrong, Facebook event Group info: Hi! Virginia Rationalists was co-founded in Norfolk VA earlier this year by Willa & Yitzi with the goal of growing a thriving ACX / LW / EA community in our city & the state of Virginia. We meet every week at Fair Grounds cafe on Wednesday evenings from 5-7:30pm Eastern Time. We have a Discord server and a Twitter. RESTON, VA Contact: James, jrbalch333[at]gmail[dot]com Time: Saturday, September 24, 1:30 PM Location: The matchbox at 1900 Reston Station Blvd, Reston, VA 20190 on the 1st floor of the giant Google building. I'll be holding a copy of Sapiens. Coordinates: 87C4WMX6+9X Event link(s): LessWrong Notes: Email me to be added to the WhatsApp group RICHMOND, VA Contact: Cedar, cedar[dot]ren+acxmeetup[at]gmail[dot]com, @Cedar at this Discord server Time: Saturday, October 1, 2:30 PM Location: Richmond Public Libraries, West End Branch 5420 Patterson Ave, Richmond, VA 23226 Coordinates: 8794HFHQ+3G Event link(s): LessWrong Notes: Please RSVP on LessWrong & optionally reach out to me on Discord to introduce yourself! BURLINGTON, VT Contact: Forrest, lucidobservor[at]gmail[dot]com Time: Saturday, September 10, 2:00 PM Location: Battery Park, at the benches in the south-western corner of the park, near the cannons facing the lake. I will have an 'ACX Meetup' sign. Coordinates: 87P8FQJH+8P Event link(s): LessWrong BELLINGHAM, WA Contact: Alex, bellinghamrationalish[at]gmail[dot]com Time: Thursday, September 29, 5:30 PM Location: Lake Padden Park, at one of the tables near the lake by the dog park. If it's rainy, we'll meet in one of the two covered gazebo areas just north (right, if you're facing the lake) of the planned spot. If the forecast looks really bad (e.g. very cold), I'll post an indoor location to the Meetup.com page at least three days in advance. Coordinates: 84WVMHX3+GM Event link(s): LessWrong, Meetup.com Group info: Bellingham Rationalish discusses (in good faith!) topics in and around rationality. We usually meet the evening of the last Wednesday of each month. Our first meeting was a 2021 ACX Everywhere meetup. Notes: Please RSVP on Meetup so I have an idea how many people to expect. Kids, animals, food, beverages, etc. are all welcome. SEATTLE, WA Contact: Nikita Sokolsky, sokolx[at]gmail[dot]com Time: Sunday, October 9, 5:00 PM Location: Optimism Brewing (1158 Broadway, Seattle) Coordinates: 84VVJM7H+4Q Event link(s): LessWrong, Facebook event, Meetup.com Notes: Please RSVP on LessWrong (or FB/Meetup) for planning purposes MADISON, WI Contact: Mary Wang, mmwang[at]wisc[dot]edu Time: Saturday, September 10, 1:00 PM Location: 1022 High St. Blue house with red porches. If weather permits, we'll be in my large backyard, which has more seating now than last year. If rain, come in the side door. There will be air purifiers and open windows. Masks optional. Look for a sign at the end of the driveway that says ACX/SSC Meetup. Coordinates: 86MG3H3X+XW Event link(s): LessWrong, Facebook event Group info: We have met fortnightly in the past, but quit last year when it got too cold to meet outside. We typically have shared a meal, sat around my kitchen table and talked. Have held a Solstice celebration.
Sources: Manifold, CSPI, Metaculus, Polymarket, PredictIt, Insight, GJOpen The lowest forecaster is higher than the highest pollster! Taking 538 as an example, forecasters range from 5 pp higher (Manifold) to 17 pp higher (PredictIt). Tournaments and real-money markets tend to give higher numbers than play-money sites. I would go with 47% on this one, based on the convergence between GJO, CSPI, and Polymarket. CFTC vs. PredictIt (and everyone else), Part II The Commodity Futures Trading Commission is the US agency regulating prediction markets. In August, they told PredictIt (the biggest political prediction market) to shut down, effective in February. Now a motley group of stakeholders are suing the CFTC for a stay of execution. Plaintiffs include: 2 professors using the site as “a source of data for research”
One of my hopes for forecasting is that it eventually becomes so well-validated that decision-makers can take these kinds of considerations into account: “Should we sent ATACMS missiles to Ukraine? It would have such-and-such benefits, but also increase the risk of nuclear escalation by 3.6%, is it worth it?” We can’t directly compare Samotsvety and Swift because they’re predicting over different time periods. But assuming that there’s more risk in the next six months than in the six months after that, I think Samotsvety is a little higher but they’re not embarrassingly far off. Metaculus is a bit more optimistic than either, believing there’s only a 4% chance of detonation in Ukraine in 2023 and a 7% chance of any use in the next ~year. Max Tegmark is going much higher than anyone else and says 16% chance of global nuclear war. Kalshi Applies For Election Markets Kalshi is a regulated and fully-legal prediction market with good lobbyists and a compliance team. This means the CFTC probably won’t randomly shut them down one day. But it also means they can only create new markets with CFTC permission. In July, Kalshi asked the CFTC for permission to make midterm election prediction markets - specifically, which party will win control of the House and Senate. The CFTC has said they will make a decision by October 28 (which doesn’t leave much time for predicting to happen before the November 8 election, but I guess it sets a precedent). September was the Request For Comment period, when the CFTC solicited comments from stakeholders about what they should do. Kalshi tried really hard to get lots of people to send in positive assessments - I know this because of how many people asked me “why is the CEO of Kalshi emailing me about this thing?” Their strategy seems to have worked; among the people who wrote to the CFTC in support were: A managing director at JP Morgan
Source: Metaculus. Although the recent blast took out a few lanes, others are still open, so the market hasn’t resolved yet.
30: Writing Forecasting Questions For EA Organizations (6/10) Nathan Young has since gotten much larger grants to do much more exciting forecasting work, particularly a platform for generating forecasting questions. With my approval, he’s put my grant on the back burner while he works on other things, but he still hopes to get some questions up on Manifold or Metaculus sometime.
Polymarket is within 2% of Manifold. Metaculus here has slightly stricter criteria but broadly agrees. 71 traders, still pretty good, but I find it meaningless without a way to distinguish between “everything collapses, Elon sells it for peanuts to scavengers” vs. “Elon saves Twitter, then hands it over to a minion while he moves on to a company building giant death zeppelins”. Oh, here we go. 20 traders, they think Musk will stay in charge. 23 traders. Twitter was profitable in 2018 and 2019, then went back to being net negative in 2020 and 2021 (I don’t know why) . I don’t think it’s been very profitable lately, so it would be a feather in Musk’s cap if he accomplished this. 24 traders. Twitter’s mDAU have consistently gone up in the past. DAU is slightly different and I think more likely to include bots. 26 traders. One thing I like about Manifold is that it lets you choose any point along the gradient from “completely objective” (eg Twitter’s reported DAU count) to “completely subjective” (eg whether the person who made the market thinks something is better or worse). This at least uses a poll as its resolution method. But the poll will be in the comments of this market, which means it will mostly be by people who invested in this market, who’ll have strong incentives to manipulate it. Maybe Manifold should add a polling platform to their service? 815 traders, one of the biggest markets of all time. It’s easy to see the jump where Musk unbanned Trump the other day. Trump has said that he doesn’t need to tweet because he prefers his own Truth Social network. This is a good business decision on his part, but hinges on him having enough impulse control to stick to his plan and avoid tweeting. The market thinks there’s a 25% chance he can do it! Polymarket again within 2% of Manifold. Only 23 traders here, and they’re a lot less optimistic than the Trump traders. FTX! 43 traders, seems like probably. I’ve seen a lot of Twitter takes about how rich well-connected people never get in trouble for this kind of thing, but the markets seem less cynical. 251 traders, and by the way amazing job by “mr22” who started this market on October 5. I also appreciate the relatively late end date - there’s another market “. . . by 2024” which is in the 30s, but that’s because people don’t trust the justice system to move quickly, not because they think he’ll be found innocent. There are a series of markets on sentence length which seem to suggest more than a month but less than a year in jail; this doesn’t really make sense to me and I’m going to nervously ignore them. Only 8 traders here, so take with a grain of salt, but this is a great example of the creative ways people are using Manifold. The market resolves not to “yes” or “no” but to the percent of FTX US users’ funds that they eventually get back; you make money if you were closer than other traders. Here they seem to think most people will only be getting about 14 cents on the dollar. There’s another market for FTX.US users which is a little higher at 29. 34 traders. I think this is too high; I bet it was some random third-tier insider, just because there are more of them and they’re under less scrutiny. Moving on to the effects on effective altruism in particular (just assume I have all possible conflicts of interest here): 272 traders, check the detailed resolution criteria. I think the strongest case is something like the one described in this article, about Center for Effective Altruism leaders discussing concerns about Alameda Research in 2018. The article doesn’t give specifics but my guess is they were the same issues Kerry Vaughn describes here (though see the followup comment by an employee who left FTX, casting doubt on Vaughn’s claims). That means the market hinges on whether Vaughn’s allegations fit the resolution criteria that “the unethical behavior must have been related to fraudulent investment strategies that involve spending other people's money without their permission”. Vaughn describes “poor capital controls, including a lack of distinction between money owned by investors and money owned by Alameda itself”, which sounds like it’s in that direction but could cover a wide variety of badness levels. My guess is everyone will end up agreeing that disgruntled Alameda employees whisper-networked that some things were bad about the company in 2018, some of the rumors got to CEA leaders, the leaders debated whether this was worse than normal for a tech startup, decided it didn’t rise to a level where they needed to publicly freak out, and moved on. Isaac will have to pay attention to the details as they come out and decide whether or not it qualifies. 45 traders. This seems to confirm that the CEA incident is responsible for most of the probability mass above; many fewer people think the FTX Future Fund (ie the charitable branch of FTX responsible for giving out their money) was in on this. Related: this market only has five traders, but I’m highlighting it anyway in the hopes that it gets more. The most money is on 2022. My guess is that we’ll find that they had terrible accounting practices in 2018-2019 of the sort that could be classified as criminally incompetent in a way that bled into fraud (but the trades went fine so nobody was harmed) and then they ramped it up a lot in 2022 to deal with the crypto crash. I think this market will be harder to resolve than people expect. 47 traders. Everyone is panicking about this possibility, but it looks like it’s not too likely. 10 traders. I’ll take this chance to say: a lot of media is predicting the death of EA, or a major blow to EA, or something in that category. Not going to happen. The media isn’t good at understanding people who do things for reasons other than PR. But most EAs really believe. Like, really believe. If every single other effective altruist in the world were completely discredited, I would just shrug and do effective altruism on my own. If they instituted the death penalty for effective altruism, I would do it under cover of night using ZCash. And I’m nowhere near the most committed effective altruist; honestly I’m probably below average. “Saint gets eaten by lions in Colosseum, can early Christianity possibly survive this setback?” Update your model or prepare to be constantly surprised. 6 traders. So, we lost several hundred million dollars of funding in a giant disaster which was also morally outrageous and demoralizing. It happens. But lots of people have already emailed me asking how to send in more money to help fill the gap. Some added something like “it was so depressing that all the FTX money meant my money didn’t make a difference, but now I can help again, and it’s great!” Can these people fill the hole? 32% chance that they can! 10 traders. And if they don’t, we’ll still probably do better than in 2021, before all the FTX money started rolling in. We’ll try harder to hammer in the point about not doing “ends justify the means” reasoning, and do some reorgs and purges to prevent anything like this from happening again, we’ll make a bunch of other changes - some reasonable, some panic-driven - but we’ll go on. If all the far-future stuff collapses, we’ll donate to global health charities. If the global health charities don’t work, we’ll fund GiveWell to sit around and figure out something that does. If GiveWell gets hit by an asteroid, we’ll work on asteroid deflection (actually I think we might already be doing that). If asteroid deflection turns out to be -EV, we’ll switch to shrimp welfare, or give ourselves Zika virus, or any of a million other things. You have no idea how committed we are to continuing to do effective altruism regardless of whether or not it’s “popular”. But it will be popular. 45 traders, resolution criteria at the link, notice the dip when the FTX news broke, followed by recovery as people had time to think it over more. Moving on to slightly less serious topics: The snapshot doesn’t show this, but one of the suggestions is Atlas Rugged. 67 traders, interesting to see where forecasters’ priorities lie. This was a big rumor early on, along with “everyone was on meth”, but the on site psychiatrist said it was false during an interview. 13 traders. WHY DO PEOPLE KEEP GOING ON PODCASTS? Midterms! That was two weeks ago? It feels like years! A week before the midterms, I wrote: Polymarket, Manifold, and PredictIt now have shiny interfaces for predicting the upcoming US midterm elections. In terms of the Republicans taking the Senate, Polymarket is at 65%, Manifold at 58%, PredictIt at 73%, and 538 at 49%. Congratulations 538! Mike Saint Antoine (who wrote the review of Viral in the last Book Review Contest) has put some more work into scoring midterm election forecasts. Here are some headline results: Mike writes: The reason I didn’t just do a three-way comparison between PredictIt, FiveThirtyEight, and Manifold Markets is that the Manifold Markets forecasts included fewer questions than the PredictIt and FiveThirtyEight forecasts. So in order to do a fair comparison here, I’ll be comparing the smaller subset of questions for which PredictIt and Manifold Markets both gave a forecast. So it looks like both Manifold and 538 did better than PredictIt, and there’s no clear way to tell which of the former did better. (except I guess you could do this analysis with just the subset of questions Manifold and 538 share, but Mike didn’t and I’m also not going to). PredictIt has a pretty consistent Republican bias (it’s a minor epistemic sin to accuse a prediction market of having a predictable bias unless you’ve made money exploiting it, I made $600 this election so I’ll let myself pass). In years when Republicans do better than expected, it will probably look better than other markets; in years when they do worse, it will look worse. Still, this is a bias, so I think we should take them doing worse this year as a fair reflection of their accuracy, even thought next year it could go the other way. My main two takeaways here are: PredictIt isn’t yet good enough that the ideal theorems showing prediction markets should be unbiased and better than everyone else apply to it. The obvious explanation is its $800-per-question cap. Polymarket doesn’t have that cap and it did better, although Mike hasn’t done a formal comparison to 538.
This year I’m making it official, with a 50-question 2023 Prediction Benchmark Question Set. I hope that this can be used as a common standard to compare different forecasters and forecasting site (Manifold and Metaculus have already agreed to use it, and I’m hoping to get others). Also, I’d like to do an ACX Survey later this month, and this will let me try to correlate personality traits with forecasting accuracy.
Thanks to people from Metaculus, Manifold, and the EA forecasting community for helping with questions and plans.
After the event happens, use the outcome to update everyone’s reputation and refine the algorithm. Superforecasting uses some of the same ideas as prediction markets - probabilistic forecasts, incentives to get the right answer, aggregation methods that favor people with good track records. In studies comparing superforecasting tournaments to small prediction markets, the superforecasting tournaments have done equally well or even slightly better. My goal with this FAQ is not to claim that prediction markets are always better than superforecasting. I think of both as part of the same revolution in forecasting technology, and would be happy with policy-makers or other important people using either. Still, I do think that each has situations where they might be a better fit than the other. Superforecasting tournaments shine on questions so far in the future that financial incentives start to lose force (for example, people are unlikely to place bets on questions about 2100, when most of them will be dead anyway). They’re also good in situations where you can’t get a big prediction market together - superforecasting scales down more gracefully, since you can identify individuals as superforecasters and consult them even in situations where you can’t get a full tournament together. Prediction markets shine in avoiding advanced manipulation attempts, in providing a single canonical answer when someone might worry that any given tournament was biased, and in aggregating the results of superforecaster tournaments with each other and with other sources. Remember that a superforecasting tournament can be considered an “expert”, like Nate Silver. So by the argument in Part 2, we should expect that a big prediction market won’t consistently be worse than any given superforecasting tournament, as long as the tournament’s answers are public knowledge. If there were ever a superforecasting tournament that consistently outperformed prediction markets, that would be a simple mispricing, people would correct it, and the market would eventually agree with the tournament. 4.5: Aren’t prediction markets gambling? Isn’t gambling bad and addictive? Yes, sort of. But most countries allow forms of gambling that aren’t too addictive and have some social value. For example, investing in stocks, or investing in commodities futures. I think prediction markets are more like this than like traditional gambling in casinos. People who want to gamble can already buy cryptocurrencies, or trade stocks on Robin Hood, or (in 20 states) place online sports bets on sites like DraftKings. All these things seem more addictive than, and have less social utility than, prediction markets. I don’t think promoting or legalizing prediction markets is going to make the gambling situation much worse than it is already - so given how useful I think they are, I think they would be net positive. People who are more concerned about the gambling aspect might want to stick to play money prediction markets, which wouldn’t have this problem. 4.6: Where does the money in prediction markets come from? That is, if "you get a dollar when the Democrats win”, who provides the dollar? In the abstract, prediction markets pair up people who want to bet on different sides of a proposition. For example, if a market says that there’s a 75% chance that the Democrats win, then they pair up someone willing to buy a share in “The Democrats win” for $0.75 with someone willing to buy a share in “The Democrats lose” for $0.25, for a total of $1 spent on these two shares. Then, when the Democrats either win or lose, the person with the correct share gets the $1. In practice it’s annoying to have to wait for someone to take the opposite side of the trade, so some people (or bots!) play “market maker” and are willing to take your bet on the assumption that someone else will come along soon to take the other side. But it’s usually safe to abstract this step away and just imagine people betting with each other, using the market as an intermediary. 4.6.1: Then why should anyone play prediction markets, when on average they’ll only break even? It seems like this is a worse deal than stocks, which tend to go up over time. Every dollar someone wins on a prediction market corresponds to someone else’s loss; in expectation; across all participants, the average gain is 0. But the stock market tends to go up over time, as businesses expand to new areas and invent new products; across all participants, the average gain is about 4% per year. So why ever invest in prediction markets instead of stocks? Whatever the theoretical answer to this question, lots of people do invest in prediction markets instead of stocks sometimes; several existing prediction markets have questions with hundreds of thousands of dollars in trading volume. You would have to ask those people why they do it. Maybe it’s because it’s fun. Or maybe it’s because they think (rightly or wrongly) that they’re above average and can make a profit. This is no different than other zero-sum games like sports betting, which attracts billions of dollars each year. The futures and commodities markets are also zero-sum, but attract billions of dollars by giving companies an opportunity to hedge risk. For example, a nickel mine might get rich if the price of nickel goes up, but go bankrupt if the price of nickel goes down. And they might prefer a predictable world where they get a small but guaranteed profit no matter what happens to nickel prices. So they bet some amount of money on commodity markets that the price of nickel will go down, and then their income is the sum of what they make from their nickel mining and from their bets - which, if they handled their hedging correctly, should be a small but guaranteed profit. Prediction markets would allow hedging of other types of risk - for example, import-export businesses might want to hedge against the risk that a protectionist politician gets elected, or tourism companies might want to hedge against a pandemic that closes international borders. These people would inject enough money into the market to subsidize sophisticated speculators. Finally, I envision that someday people who want to know the answer to specific questions can subsidize prediction markets on them. For example, the Democratic Party might subsidize a conditional market (see 5.1) about which Democratic primary candidate is most likely to win the general election. Their money would go to giving the average investor a 4% (or some other number) rate of return - although of course winners would gain more than that and losers would still lose on net. I think this is the most likely way for prediction markets to become very big. 4.6.1.1: If people use prediction markets to hedge risk, won’t that distort them? That is, suppose that an import-export business spends millions of dollars betting that Trump will win in order to hedge against his protectionist policies. Since their bets aren’t based on the real chance of Trump winning, won’t that distort the market? No. Suppose that everyone knows Trump has a 50-50 chance of winning. And suppose the import-export business, in the process of hedging risk, bids it up to 90-10. Since you know Trump has a 50-50 chance of winning, you can get rich quick by bidding it back down to 50-50. From your point of view, the import-export business is (in expectation) giving you free money. But they’re still happy to do it, because they’re hedging their risk successfully. 4.7: Aren’t a lot of the questions we care about inherently subjective or hard to measure? This is a frequent problem for prediction markets. For example, we might want to know something like “will we get human-level AI before 2050?” But how do we define “human-level AI”? If there’s an AI that’s much better than humans at most tasks, but much worse at a few, is that “human-level”? If there’s an AI that seems human-level in demos, but the team that makes it won’t let it be independently tested, should that count? If it works through some kind of Frankenstein chip that combines vat-grown brain tissue with computing machinery, is that still an “AI”? Prediction markets have found a few ways around this problem. First, many groups (for example, Metaculus) try to define their resolution criteria very carefully. A typical Metaculus question on AI sounds like this: We will thus define "an AI system" as a single unified software system that can satisfy the following criteria, all completable by at least some humans. Able to reliably pass a 2-hour, adversarial Turing test during which the participants can send text, images, and audio files (as is done in ordinary text messaging applications) during the course of their conversation. An 'adversarial' Turing test is one in which the human judges are instructed to ask interesting and difficult questions, designed to advantage human participants, and to successfully unmask the computer as an impostor. A single demonstration of an AI passing such a Turing test, or one that is sufficiently similar, will be sufficient for this condition, so long as the test is well-designed to the estimation of Metaculus Admins.
Able to reliably pass a 2-hour, adversarial Turing test during which the participants can send text, images, and audio files (as is done in ordinary text messaging applications) during the course of their conversation. An 'adversarial' Turing test is one in which the human judges are instructed to ask interesting and difficult questions, designed to advantage human participants, and to successfully unmask the computer as an impostor. A single demonstration of an AI passing such a Turing test, or one that is sufficiently similar, will be sufficient for this condition, so long as the test is well-designed to the estimation of Metaculus Admins.
Able to get top-1 strict accuracy of at least 90.0% on interview-level problems found in the APPS benchmark introduced by Dan Hendrycks, Steven Basart et al. Top-1 accuracy is distinguished, as in the paper, from top-k accuracy in which k outputs from the model are generated, and the best output is selected. By "unified" we mean that the system is integrated enough that it can, for example, explain its reasoning on a Q&A task, or verbally report its progress and identify objects during model assembly. (This is not really meant to be an additional capability of "introspection" so much as a provision that the system not simply be cobbled together as a set of sub-systems specialized to tasks like the above, but rather a single system applicable to many problems.) Resolution will come from any of three forms, whichever comes first: (1) direct demonstration of such a system achieving ALL of the above criteria, (2) confident credible statement by its developers that an existing system is able to satisfy these criteria, or (3) judgement by a majority vote in a special committee composed of the question author and two AI experts chosen in good faith by him, for the sole purpose of resolving this question. Resolution date will be the first date at which the system (subsequently judged to satisfy the criteria) and its capabilities are publicly described in a talk, press release, paper, or other report available to the general public. Even this isn’t perfect (which models are “the equivalent of” a 1:8 scale Ferrari 312?), but in practice once you get to this level of details people mostly stop worrying about this. Another method (mostly associated with Manifold) is to just leave it up to human judgment - specifically, the judgment of the person who made the market. For example, I could make a market in “By 2050, will there be an AI which Scott Alexander thinks qualifies as ‘human-level’?” This will force market participants to price in the risk that I have bad judgment or act dishonestly. But perhaps these risks are small. For example, I might say elsewhere what I think qualifies as “human-level” AI, or you might think human-level AI will be so obvious when it comes that I will definitely agree with you about it. As for honesty, this could be enforced either legally or by reputation. Someone who has resolved their past 100 prediction markets honestly will probably resolve this one honestly too, especially if they get paid to do so and will never get customers again if they lie. When we invest on the normal stock market, we trust that our brokers / the NYSE / etc won’t run off with our money, and this trust is usually well-deserved. Even when we make an online purchase, we trust that the store we’re sending our money to won’t steal it and refuse to send us the product. It would be an exaggeration to say that trust is a solved problem, but evidence from Manifold suggests that most people price in a <1% chance that well-known market makers with good reputation resolve dishonestly. If prediction markets got big enough, they could spawn trusted “resolution companies” who individual markets and market-makers could outsource their resolution to, for a fee. If these companies were ever dishonest, they would lose all their business from then on, so they would probably be as honest as other businesses like your broker / the NYSE / various online stores / etc. 4.7.1: Isn’t a lot of the “crisis of trust” around questions that might never have clear future answers? For example, consider the debate around whether Donald Trump is a Russian agent. Maybe no proof will ever come out either way. Or maybe some evidence will appear that seems to prove one side or the other, but people will continue to deny it for political reasons, and the problem of resolving the prediction market will be just as hard as the problem of answering the original question. Indeed, prediction markets aren’t very good at this, and are only fully trustworthy on questions where the true answer will eventually become apparent. Still, they might not be completely useless. For example, if you’re worried about Trump being a Russian agent because you expect him to pursue pro-Russia policies, you can start markets in whether he pursues those policies. Or you can start a conditional market (see 5.1) on whether, if Russia ever releases its past intelligence data many years from now, the data confirm/disconfirm that Trump was an agent. See Part 5 for other clever ways you might try to address this problem. 4.8: “Meme stocks” like Gamestop and AMC sometimes remain mispriced indefinitely. How do we know this won’t happen with prediction markets? Meme stocks are a type of Ponzi. It’s “reasonable” to buy Gamestop at some inflated price, because - who knows? - someone else might buy it at an even more inflated price tomorrow. And this can keep going arbitrarily long, or at least long enough for you to get out with a profit. Unlike meme stocks, prediction markets have a clear resolution date. If you’re predicting who will win the next election, the market will go to 100% or 0% after the election finishes. No matter how many memes there were, you wouldn’t buy a share in “the Democrats will win the election” for 99% the day before Election Day if you knew they would definitely lose. But that means prediction markets should be accurately priced the day before Election Day, which means you shouldn’t buy at an inaccurate price two days before Election Day, and so on. I can’t say for sure that no prediction market will ever get mispriced for meme reasons, but they should be much more robust against meme mispricings than the stock market. And even the stock market doesn’t have too many meme stocks. 4.9: How do prediction markets deal with outcomes in the far future? Suppose there is a question “who will win the 2100 election?” Currently it says 25% Democrats, 75% Republicans, and I believe it should be 50-50 (we’ll ignore third parties, or the possibility of America not existing in 2100, for now). So if I bet on the market, I can (in expectation) double my money. But there are many better ways to double your money by 2100. For example, if the stock market grows 4% per year, I should expect any money invested in the stock market to multiply by 20x in 2100. So just doubling it in a prediction market is a bad option. Realistically, this means prediction markets won’t work well for far-future events. These might be a better match for forecaster tournaments or some other structure, where we get the forecaster track records through present events, then use those track records weighting their far-future predictions (see also 5.5). There are already good forecasting tournaments on some far future events. But if you really wanted to use a prediction market, you could theoretically solve this by putting investors’ money in index funds while they waited. Then the winner would get their (and the losers’) original deposits and investment profits, and it would go back to being a better option than investing in index funds directly. In practice this seems complicated and I wouldn’t expect it to work. 4.9.1: What about predicting things that would make it impossible or pointless to win money, like human extinction? Again, these questions probably aren’t great matches for prediction markets, and you should use forecasting tournaments or some other method (see also 5.5). If you really wanted, you might be able to make it work in theory through a mechanism sort of like this one. 5. What are some clever uses for prediction markets? Here’s a non-exhaustive list: 5.1: Conditional prediction markets / decision markets Suppose the government is trying to decide whether to throw its weight behind Vaccine A or Vaccine B for some deadly disease. There are some experts behind both, both sets of experts accuse the other of being in the pay of pharmaceutical companies, and decision-makers don’t know who to trust. They might make two prediction markets, like: If we decide to go with Vaccine A, will at least X people die from the disease?
20th: Peter Wildeford. Peter is the co-CEO of effective altruist organization Rethink Priorities. He’s also one of the top forecasters on Metaculus. You can hear him discuss his forecasting strategies on the Inside View podcast.
Or maybe the prediction market results will hold. One market (Manifold) and another market-like site (Metaculus) are joining the contest this year. If they do as well as last year, they’ll beat all but 15 of the 3500 entries. If things go very well, maybe we’ll discover new ways of aggregating their results that can beat every individual predictor, at least most of the time.
Happy One Millionth Prediction, Metaculus Metaculus celebrated its one millionth user forecast with a hackathon, a series of talks, and a party:
Metaculus celebrated its one millionth user forecast with a hackathon, a series of talks, and a party:
This was a helpful reminder that Metaculus is a real organization, not just a site I go to sometimes to check the probabilities of things. The company is run remotely; catching nine of them in a room together was a happy coincidence. Although I think it still relies heavily on grants, Metaculus’ theoretical business model is to create forecasts on important topics for organizations that want them (“partners”) - so far including universities, tech companies, and charities. A typical example is this recent forecasting tournament on the spread of COVID in Virginia, run in partnership with the Virginia Department of Health and the University of Virginia Biocomplexity Institute (this year’s version here). The main bottleneck is interest from policy-makers, which they’re trying to solve both through product improvement and public education. In December, Metaculus’ Director of Nuclear Risk, Peter Scoblic, published an article in Foreign Affairs magazine about forecasting’s “struggle for legitimacy” in the foreign policy world. It’s paywalled, but quoting liberally: Organizational change is difficult under the best of circumstances and is close to impossible when powerful insiders actively resist it. National security experts with decades of experience and access to classified information see little reason for deferring to the upstart winners of forecasting tournaments, contests that allow the public to compete at putting realistic odds on future events. Perhaps they are concerned that as forecasters get better at geopolitical analysis, they will threaten the notion of expertise and the professional identities of those who supply it. But forecasting should be seen as a complement to expert analysis, not a substitute for it. The same situation obtains among the corps of foreign-policy columnists, think tank fellows, and former government officials who wield more influence for the confidence of their convictions than for the precision of their predictions. There is little incentive for such analysts to ask when they have been wrong and why—questions that top forecasters must constantly confront if they are to maintain their place in the accuracy hierarchy. Instead, the “thought leader” ecosystem insulates the careers of people who would have washed out of any geopolitical forecasting tournament. It concludes: All this suggests that to make forecasting a resource that policymakers use, the quality of both supply and demand needs to improve. The former requires giving subject-matter experts a role in producing forecasts—in formulating questions (because they know which indicators are most germane) and in vetting the rationales that inform forecasts (because they can gut-check causal claims and fact-check evidence). The latter requires making the national security establishment more numerate or at least more open to quantitative appraisals of the future. These are challenging tasks, but forecasting scholars are already testing methods for not only measuring the best forecasts but also judging the most persuasive rationales for those forecasts. For example: What story best conveys that there is a 10–15 percent chance of between one and three million people dying in the Ukraine war by the end of 2024? Where forecasters provide probability, subject-matter experts can provide plausibility, making well-calibrated quantitative future estimates more convincing and palatable to policymakers—and therefore making their decisions a little less wrong. And in national security, being a little less wrong can be a lot less dangerous. These are the kinds of questions Metaculus-the-organization is thinking about, and the kinds of problems it’s trying to solve. They’ve also got some exciting ideas for making their product more policy-relevant. For example, they’re working on causal modeling, where forecasters not only predict the chance of (eg) a Russian nuclear strike, but also all of the inputs into their decision. For example, there’s a 10% chance of a strike, which comes from a 15% chance if the war in Ukraine continues vs. a 5% chance if it doesn’t. And they think there’s a 50% chance the war will continue, which comes from a 60% chance if the US stops arms shipments and a 33% chance if it doesn’t - and so on. Policymakers can play around with the causal graph, investigate which factors make a strike more vs. less likely, and check how their preferred policy would affect things they care about. For more on the intersection of forecasting and policy, see this EA Forum post. To learn more about Metaculus, follow them on Twitter or Facebook. And here’s to many millions more predictions! Taking Stock Prediction market users really want stocks. “Stock” in this sense means an instrument that measures the status of a person, group, or idea. When their status goes up, the stock goes up. When their status goes down, the stock goes down. It feels like a natural way to bet on things like “I’m bearish on Elon Musk and think everyone else is overestimating him.” It’s hard to turn this vague idea into a real financial instrument. You could try tying it to their Twitter follower count, or Google search trends, or net worth, but none of these exactly track “status”. If Musk commits murder in broad daylight, his search volume will go up, his Twitter follower count will stay about the same, his net worth might not be affected, but his status will have gone way down. The current solution is to make no effort whatsoever to moor stocks to the real world and just hope they work out. This could work! It’s kind of like a Ponzi scheme or crypto token. Some big influencer endorses MoonCoin, and MoonCoin goes up, because MoonCoin has gained status, which means more people will want to buy it, because it’s even more likely that more people will want to buy it later. Crypto tokens keep a fig leaf of “and maybe in the cyberpunk future when all transactions everywhere have switched to crypto this will really pay off”, but over time that fig leaf became increasingly threadbare, and a fun low-stakes instrument like Manifold stocks might do fine without it. But the 0% to 100% prediction scale is a bad match for stocks. If Elon started at 50% in 2000, then when Tesla made it big he surely should have doubled. And that brings him up to 100% and leaves nowhere for him to go. Also, people who bet on Elon Musk in 2000 might be miffed that their prescient choice only doubled their money. Probably the solution is some kind of cardinal number. But which one, and at what scale? Again, the lesson from crypto is that maybe it doesn’t matter. Just start at 10 or something or something and see where it ends up. Manifold leadership isn’t totally resigned yet to having stocks be meaningless Ponzi schemes. If you have a better idea for how to run stocks, leave it in the comments here and they’ll probably see it. CFTC vs. PredictIt Update So far it’s not clear if this means indefinite normal operation, or if they’ll spend the extra time trying to wind existing markets down. The overall chance of them winning their lawsuit remains unchanged at around 25%. PredictIt has gotten some sympathetic news coverage, including from the Washington Post. In the process, the Post tried to get some clarity on what terms of the no-action letter PredictIt violated, apparently without success: @CFTC why they're shutting PredictIt down. They give no real answer, just as in the original withdrawal letter. Closest thing we have to an answer is that they don't want other prediction markets. But why? No sense here at all. washingtonpost.com/lifestyle/2023… ","username":"RichardHanania","name":"Richard Hanania","profile_image_url":"","date":"Tue Jan 24 18:12:59 +0000 2023","photos":[{"img_url":"https://pbs.substack.com/media/FnQbawZaYAAKRws.jpg","link_url":"https://t.co/zeKhe8sjnT","alt_text":null}],"quoted_tweet":{},"reply_count":0,"retweet_count":8,"like_count":39,"impression_count":0,"expanded_url":{},"video_url":null,"belowTheFold":true}" data-component-name="Twitter2ToDOM"> @StephenPiment I'm flat appalled the CFTC said \"you violated terms\", but won't tell anyone, @PredictIt included, which ones, and then has big enough balls to try to get the judge to dismiss PI's \"shotgun\" defense. Um, with no info what other case COULD they make?\n","username":"kmett","name":"Edward Kmett","profile_image_url":"","date":"Sun Nov 27 19:01:29 +0000 2022","photos":[],"quoted_tweet":{},"reply_count":0,"retweet_count":8,"like_count":21,"impression_count":0,"expanded_url":{"url":"https://www.bonus.com/news/cftc-predictit-hearings-coming/","image":"https://substack-post-media.s3.amazonaws.com/public/images/8d5a1d5e-49ee-4294-84cd-eb5a4259bbc3_1200x800.jpeg","title":"Hearings Coming Soon in PredictIt Lawsuit, CFTC Asks to Dismiss","description":"The CFTC is seeking to have the PredictIt lawsuit dismissed, while the plaintiffs want the case fast-tracked due to the shutdown deadline.","domain":"bonus.com"},"video_url":null,"belowTheFold":true}" data-component-name="Twitter2ToDOM"> I guess they’ll have to give some kind of explanation during the hearing, right? Related: Richard Hanania has an article on How To Legalize Prediction Markets. The actual advice isn’t very surprising, and mostly boils down to “write letters to the government officials in charge of this”, but like other people I learned something new from the details: In the United States, prediction markets are, with a few minor exceptions, against the law. If you don’t have a legal background, you might think that means that Congress at some point considered the issue, decided people shouldn’t be able to bet on real world events, and passed a law to that effect, which was then signed by the president. But this is not what happened. As with most things, Congress has never directly considered the matter. Rather, prediction markets are illegal due to the discretion of a government agency called the Commodity Futures Trading Commission (CFTC). Why does it have this right? And on what basis has it made prediction markets illegal? […] In 1936, Congress passed and FDR signed the Commodity Exchange Act. In 1974, Congress created the CFTC to enforce the original law, which has been amended on multiple occasions over the years. The CFTC has authority to regulate what are called “derivatives markets.” A derivatives contract derives its value from some kind of underlying asset or benchmark in the real world. The thing to understand about derivatives is that the baseline is that they’re legal. That’s why you can “bet” on the price of oil through a futures contract. The CFTC wasn’t created to ban derivative markets, but to regulate them, though this can involve prohibiting certain kinds of markets altogether. Current law includes the following provision on event contracts, [banning]: activity that is unlawful under any Federal or State law;
1: Thanks to everyone who entered the Prediction Contest; entry is now closed. You can continue to make predictions on Manifold or Metaculus, but they won’t officially count. Also, another prediction market, Futuur, has markets up for the contest questions. I’m pretty excited about this, because although Futuur does let you use play money like Manifold, it also offers real money betting (warning: requires crypto and a non-US IP). If you want to make real money bets on contest questions, now you can (and I’ll be seeing how they compare to the play money markets).
A: Any project about trying to improve our knowledge of the future will be eligible. This can include writing prediction market / Metaculus questions, fortified essays for Metaculus, studying forecasting/prediction-related topics, new websites or companies in this space (yes, Manifold will fund its own competitors) or anything else you can think of.
A: We’re trying to figure that out. I would have liked to do it myself, but I think there are legal issues about me both providing the money and determining who gets it. Probably they will be some respected people from Metaculus or Manifold, or someone with forecasting grant-making experience.
Sam Altman posing with leading AI safety proponent Eliezer Yudkowsky. Also Grimes for some reason. Planning For AGI And Beyond (“AGI” = “artificial general intelligence”, ie human-level AI) is the latest volley in that campaign. It’s very good, in all the ways ExxonMobil’s hypothetical statement above was very good. If they’re trying to fool people, they’re doing a convincing job! Still, it doesn’t apologize for doing normal AI company stuff in the past, or plan to stop doing normal AI company stuff in the present. It just says that, at some indefinite point when they decide AI is a threat, they’re going to do everything right. This is more believable when OpenAI says it than when ExxonMobil does. There are real arguments for why an AI company might want to switch from moving fast and breaking things at time t to acting all responsible at time t + 1 . Let’s explore the arguments they make in the document, go over the reasons they’re obviously wrong, then look at the more complicated arguments they might be based off of. Why Doomers Think OpenAI Is Bad And Should Have Slowed Research A Long Time Ago OpenAI boosters might object: there’s a disanalogy between the global warming story above and AI capabilities research. Global warming is continuously bad: a temperature increase of 0.5 degrees C is bad, 1.0 degrees is worse, and 1.5 degrees is worse still. AI doesn’t become dangerous until some specific point. GPT-3 didn’t hurt anyone. GPT-4 probably won’t hurt anyone. So why not keep building fun chatbots like these for now, then start worrying later? Doomers counterargue that the fun chatbots burn timeline. That is, suppose you have some timeline for when AI becomes dangerous. For example, last year Metaculus thought human-like AI would arrive in 2040, and superintelligence around 2043. Recent AIs have tried lying to, blackmailing, threatening, and seducing users. AI companies freely admit they can’t really control their AIs, and it seems high-priority to solve that before we get superintelligence. If you think that’s 2043, the people who work on this question (“alignment researchers”) have twenty years to learn to control AI. Then OpenAI poured money into AI, did ground-breaking research, and advanced the state of the art. That meant that AI progress would speed up, and AI would reach the danger level faster. Now Metaculus expects superintelligence in 2031, not 2043 (although this seems kind of like an over-update), which gives alignment researchers eight years, not twenty. So the faster companies advance AI research - even by creating fun chatbots that aren’t dangerous themselves - the harder it is for alignment researchers to solve their part of the problem in time. This is why some AI doomers think of OpenAI as an Exxon-Mobil style villain, even though they’ve promised to change course before the danger period. Imagine an environmentalist group working on research and regulatory changes that would have solar power ready to go in 2045. Then ExxonMobil invents a new kind of super-oil that ensures that, nope, all major cities will be underwater by 2031 now. No matter how nice a statement they put out, you’d probably be pretty mad! Why OpenAI Thinks Their Research Is Good Now, But Might Be Bad Later OpenAI understands the argument against burning timeline. But they counterargue that having the AIs speeds up alignment research and all other forms of social adjustment to AI. If we want to prepare for superintelligence - whether solving the technical challenge of alignment, or solving the political challenges of unemployment, misinformation, etc - we can do this better when everything is happening gradually and we’ve got concrete AIs to think about: We believe we have to continuously learn and adapt by deploying less powerful versions of the technology in order to minimize “one shot to get it right” scenarios […] As we create successively more powerful systems, we want to deploy them and gain experience with operating them in the real world. We believe this is the best way to carefully steward AGI into existence—a gradual transition to a world with AGI is better than a sudden one. We expect powerful AI to make the rate of progress in the world much faster, and we think it’s better to adjust to this incrementally. A gradual transition gives people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place. It also allows for society and AI to co-evolve, and for people collectively to figure out what they want while the stakes are relatively low. You might notice that, as written, this argument doesn’t support full-speed-ahead AI research. If you really wanted this kind of gradual release that lets society adjust to less powerful AI, you would do something like this: Release AI #1
This is the basic idea behind Zou et al (2022), Forecasting Future World Events With Neural Networks. They create a dataset, Autocast, with 6000 questions from forecasting tournaments Metaculus, Good Judgment Project, and CSET Foretell. Then they ask their AI (a variant of GPT-2) to predict them, given news articles up to some date before the event happened. Here’s their result:
You can access their dataset here. The authors were originally planning to host a competition to see who could create the best AI forecaster, but due to financial constraints they’ll be running only a reduced version. You can read more about the semi-competition here. Metaculus Looking Good Two new reports say nice things about Metaculus’ accuracy.
Two new reports say nice things about Metaculus’ accuracy.
A recent leak suggested that the cost of training GPT-4 was $63 million, which is already higher than the superforecasters’ median estimate of $35 million by 2024 has already been proven incorrect. I don’t know how many petaFLOP-days were involved in GPT-4, but maybe that one is already off also. There was another question on when an AI would pass a Turing Test. The superforecasters guessed 2060, the domain experts 2045. GPT-4 hasn’t quite passed the exact Turing Test described in the study, but it seems very close, so much so that we seem on track to pass it by the 2030s. Once again the experts look better than the superforecasters. So is it possible that we, in 2023, now have so much better insight into AI than the 2022 forecasters that we can throw out their results? We could investigate this by looking at Metaculus, a forecasting site that’s probably comparably advanced to this tournament. They have a question suspiciously similar to XPT’s global catastrophe framing: In summer 2022, the Metaculus estimate was 30%, compared to the XPT superforecasters’ 9% (why the difference? maybe because Metaculus is especially popular with x-risk-pilled rationalists). Since then it’s gone up to 38%. Over the same period, Metaculus estimates of AI catastrophe risk went from 6% to 15%. If the XPT superforecasters’ probabilities rose linearly by the same factor as Metaculus forecasters’, they might be willing to update total global catastrophe risk to 11% and AI catastrophe risk to 5%. But the main thing we’ve updated on since 2022 is that AI might be sooner. But most people in the tournament already agreed we would get AGI by 2100. The main disagreement was over whether it would cause a catastrophe once we got it. You could argue that getting it sooner increases that risk, since we’ll have less time to work on alignment. But I would be surprised if the kind of people saying the risk of AI extinction is 0.4% are thinking about arguments like that. So maybe we shouldn’t expect much change. FRI called back a few XPT forecasters in May 2023 to see if any of them wanted to change their minds, but they mostly didn’t. Overall I don’t think this was just a problem of the incentives being bad or the forecasters being stupid. This is a real, strong disagreement. We may be able to slightly increase their forecast based on recent events, but this would only change the estimate a little. Breaking Down The AI Estimate How did the forecasters arrive at their AI estimate? What were the cruxes between the people who thought AI was very dangerous, and the people who thought it wasn’t? You can think of AI extinction as happening in a series of steps: We get human-level AI by 2100.
In summer 2022, the Metaculus estimate was 30%, compared to the XPT superforecasters’ 9% (why the difference? maybe because Metaculus is especially popular with x-risk-pilled rationalists). Since then it’s gone up to 38%. Over the same period, Metaculus estimates of AI catastrophe risk went from 6% to 15%. If the XPT superforecasters’ probabilities rose linearly by the same factor as Metaculus forecasters’, they might be willing to update total global catastrophe risk to 11% and AI catastrophe risk to 5%. But the main thing we’ve updated on since 2022 is that AI might be sooner. But most people in the tournament already agreed we would get AGI by 2100. The main disagreement was over whether it would cause a catastrophe once we got it. You could argue that getting it sooner increases that risk, since we’ll have less time to work on alignment. But I would be surprised if the kind of people saying the risk of AI extinction is 0.4% are thinking about arguments like that. So maybe we shouldn’t expect much change. FRI called back a few XPT forecasters in May 2023 to see if any of them wanted to change their minds, but they mostly didn’t. Overall I don’t think this was just a problem of the incentives being bad or the forecasters being stupid. This is a real, strong disagreement. We may be able to slightly increase their forecast based on recent events, but this would only change the estimate a little. Breaking Down The AI Estimate How did the forecasters arrive at their AI estimate? What were the cruxes between the people who thought AI was very dangerous, and the people who thought it wasn’t? You can think of AI extinction as happening in a series of steps: We get human-level AI by 2100.
There was no question about when or whether we’ll have superintelligence. Metaculus thinks superintelligence will come very shortly after human-level intelligence, and this is the conclusion of the best models and analyses I’ve seen as well. Still, I don’t know if the superforecasters here also believed this.
I’m heartened to see these two very big markets ($200,000+ volume, 2,000+ traders) within 1% of each other (as of time of writing). This is a really difficult question without an obvious prior, so the level of convergence suggests the markets really are doing their job… …but Metaculus is much lower, probably because the other two are asking if any replication will be positive, and Metaculus is asking if the first replication attempt will be. It’s bad news that these numbers are so different, and suggests a high chance that this stays confusing and comes down to finicky resolution criteria. Still, this has gotten lots of people checking the prediction markets, including Paul Graham: …and around 500 others, according to the Manifold Active Users graph (source): Aside from headline numbers, I’ve also appreciated prediction market comment sections as a good place to stay up to date on the latest developments (including a link to this thread) Elsewhere In Forecasting NYPost: Blind Mystic Baba Vanga Makes Terrifying Nuclear Disaster Prediction For 2023: A blind mystic who allegedly predicted 9/11 is said to have foreseen a nuclear disaster that will ravage Earth before the end of 2023. Baba Vanga, a blind Bulgarian woman, is rumored to have predicted some of the biggest events in world history. She died more than a quarter of a century ago, but many of her predictions are said to have come true long after her death. Now, her followers claim that Baba Vanga foresaw a devastating nuclear disaster that will unfold this year. Big if true. In what sense did she predict 9/11? Another article gives the exact text of the 1989 prediction: “Horror, horror! The American brethren will fall after being attacked by the steel birds. The wolves will be howling in a bush, and innocent blood will be gushing.” This is a 1989 prediction! If you’re calling airplanes “steel birds” in 1989, you’re just hoping that people forget you lived when airplanes already existed and then get impressed with you for predicting them. Come on! (you could argue that the second half is about Assistant Secretary of State John Wolf and Deputy Secretary of Defense Paul Wolfowitz howling for war with Iraq from within the Bush administration, but Ass. Sec Wolf played a minimal role in the war buildup so I think if you are being very strict in your interpretation there was really only one wolf involved.) Anyway, Vanga’s other predictions for 2023 include: Earth’s orbit will change
…but Metaculus is much lower, probably because the other two are asking if any replication will be positive, and Metaculus is asking if the first replication attempt will be. It’s bad news that these numbers are so different, and suggests a high chance that this stays confusing and comes down to finicky resolution criteria. Still, this has gotten lots of people checking the prediction markets, including Paul Graham: …and around 500 others, according to the Manifold Active Users graph (source): Aside from headline numbers, I’ve also appreciated prediction market comment sections as a good place to stay up to date on the latest developments (including a link to this thread) Elsewhere In Forecasting NYPost: Blind Mystic Baba Vanga Makes Terrifying Nuclear Disaster Prediction For 2023: A blind mystic who allegedly predicted 9/11 is said to have foreseen a nuclear disaster that will ravage Earth before the end of 2023. Baba Vanga, a blind Bulgarian woman, is rumored to have predicted some of the biggest events in world history. She died more than a quarter of a century ago, but many of her predictions are said to have come true long after her death. Now, her followers claim that Baba Vanga foresaw a devastating nuclear disaster that will unfold this year. Big if true. In what sense did she predict 9/11? Another article gives the exact text of the 1989 prediction: “Horror, horror! The American brethren will fall after being attacked by the steel birds. The wolves will be howling in a bush, and innocent blood will be gushing.” This is a 1989 prediction! If you’re calling airplanes “steel birds” in 1989, you’re just hoping that people forget you lived when airplanes already existed and then get impressed with you for predicting them. Come on! (you could argue that the second half is about Assistant Secretary of State John Wolf and Deputy Secretary of Defense Paul Wolfowitz howling for war with Iraq from within the Bush administration, but Ass. Sec Wolf played a minimal role in the war buildup so I think if you are being very strict in your interpretation there was really only one wolf involved.) Anyway, Vanga’s other predictions for 2023 include: Earth’s orbit will change
We use Metaculus! It’s great!
GULF BREEZE, FLORIDA, USA Contact: Christian Contact Info: christian[at]metaculus[dot]com Time: Wednesday, October 18th, 8:00 PM Location: Perfect Plain Brewing Coordinates: https://plus.codes/862JCQ7P+9C Notable Guests: Christian, the Director of Comms and Data for Metaculus Notes: Please email me if you'll make it. Would love to chat. If there are no takers, I won't be there.
Kalshi: https://kalshi.com/markets/supercon/roomtemp-superconductor-reported Both reached the 40s to 50s! I think there just wasn’t enough smart money to drown out the people who wanted to bet on an exciting thing being true, or who were unduly influenced by a social media environment optimized to keep their attention by convincing them that an exciting thing was true. I have never claimed prediction markets are always good. All I wrote in the Prediction Market FAQ was that either a prediction market will be good, or you could make lots of free money. In this case, it was the second one. I regret I only made $30. I do hope this situation will improve over time, as over-eager forecasters get burned and dollars flow from dumb money to smarter. [EDIT: I should have included something about Metaculus here, but it’s confusing. I think the most popular Metaculus market was lower because it had stricter resolution criteria (the first replication had to be positive, instead of any replication) but that otherwise Metaculus raw probabilities mirrored everyone else’s. We don’t know how their algorithmically processed probabilities did yet and I’ll report on that information when I get it.] Salem/CSPI Tournament Winners The Salem Center and the Center For The Study Of Partisanship And Ideology, two think tanks associated with right-wing intellectual Richard Hanania, sponsored a prediction market tournament last year. Participants got $1000 in play money to bet on selected markets about current events; winners would be interviewed for a well-paying academic sinecure at one of the think tanks. Now the tournament is over. Winners have yet to be announced, but unofficially, everyone knows who they are: First place out of 999 participants is zubbybadger. Zubby is a prediction market veteran who was featured in a Washington Monthly article last year for his great track record in political betting (he’s made > $150,000 on PredictIt). Now he works as a “community manager” for Kalshi (I don’t know what this entails). Second place was Robert from Considerations On Codecrafting. He’s written a detailed reflection on his experience (part one, part two) which is my main source for this section and highly recommended. He describes himself as “having absolutely no experience with prediction markets”. Third place was Johnny Ten-Numbers, about whom I can find no further information. You can see the rest of the top 20 at the very bottom of this post. Reading Robert’s story of his experience, I’m struck by how little of the competition at the top was about predictive accuracy. Everyone in the top 20 was a very accurate predictor (Exactly equally accurate? Hard to tell.) What separated 1st place from 20th, aside from luck, was things like: Ability to move fast - both in responding to news, and in taking the other side of bad bets. Several top performers programmed bots to give them an edge here.
4: 5:
5: 6:
OPTIC is announcing intercollegiate forecasting tournaments in SF, DC, and Boston. Think 1-day hackathon/olympiad/debate tournament, but for forecasting the future — teams predict on topics ranging from geopolitics to celebrity twitter patterns to financial asset prices, and the best forecasters get thousands of dollars in cash prizes and exclusive internships at Metaculus.
Hanson is less sure about this answer than the overall story, but he suggests hiring. You could create some kind of product that companies could buy and give their hiring managers at the beginning of a hiring round, asking them to predict which candidates would get good employee evaluation results or promotions at the end of X amount of time. Even if you’re Manifold or Metaculus or someone who already has a good prediction engine, making this product requires a lot of adaptations. Who should be part of the market? What training should you give them beforehand? What should the resolution criteria be? Hanson thinks that the process of designing this product, answering customer questions about it, and iterating before you sell to the next customer is the kind of last-mile problem whose solution will make prediction markets ready for the big time.
Sparked a renaissance in forecasting, including major roles in creating, funding, and/or staffing Metaculus, Manifold Markets, and the Forecasting Research Institute.
Oh, and I almost forgot: Manifold Love: One Month Progress Report A month ago, Manifold founded a dating site, manifold.love. The idea is, you bet on who would be a good match, and make (play) money if they end up having a second date or continuing on to a relationship.
Manifold.love has also introduced OKCupid-style “compatibility questions”. They don’t seem to involve calculating a match percent yet AFAICT, but hopefully soon! Metaculus’ “Multiple Major Advances” Metaculus announces “multiple major advances to the Metaculus platform”, especially “new scores, new leaderboard, new medals”.
Metaculus announces “multiple major advances to the Metaculus platform”, especially “new scores, new leaderboard, new medals”.
2: I’m too swamped to run my own forecasting tournament this year, so Metaculus is taking over. If you want to participate, check this link. I will be grading last year’s tournament and posting results hopefully sometime this month.
First of all, my allegiance has always been to forecasting in general, of which prediction markets are just a particularly flashy sub-category. So I find it encouraging that forecasting site Metaculus beats 538, usually considering the gold standard for political prediction.
As for the real-money prediction markets, yeah, they seem worse than other options. But solar power was worse than other options in 1990. They’re a fledgling technology, we have strong reasons to think they’ll work when they’re mature, and we know what we need to do to help them grow. Unlike 538, Metaculus, and play-money markets, they have bias-resistance properties that could be really useful if they ever get big.
Vox has a standard article about how we can’t be sure whether bad polls are bad, or whether they don’t matter this far before an election. This ought to be exactly the kind of problem prediction markets are good for, but: …Metaculus and PredictIt are 50-50, Manifold favors Biden, and Polymarket favors Trump. Shouldn’t really be possible, should it?
S, $7,000, to produce materials on forecasting for governments. S is a strategic advisor for a European government, and wants to write manuals and run workshops for EU policy-makers on how to integrate forecasting platforms like Metaculus and prediction markets into their decision-making.
How many residents will live in Prospera, a new special economic zone in Honduras, on Jan 1, 2026? Answer: 600 (80% confidence interval 100-2,000) This seems like a good guess (except that my confidence interval would have included zero because there’s a 20%+ chance that it gets shut down). So overall its forecasts seem pretty impressive. But I was concerned by its reasoning even in some of the questions it got “right”. For example, the Nikki Haley question tried to get a base rate by asking what percent of elections Haley had won before, and found she had won 71% of them - these were mostly elections for South Carolina governor. You can see what the AI is trying to do - but it’s not going to work. Then it got confused and read a lot of news stories about how she’s currently losing the 2024 presidential election, and seemed to think they were about 2028. So either the AI only got a reasonable probability by coincidence, or it was testing many different strategies, throwing out the useless ones, and updating only on the useful ones, in a way that was kind of opaque to the casual reader. Still, if the company says it beats most human forecasters, this doesn’t seem totally impossible based on what I’ve seen. And that would be exciting! An AI that can generate probabilistic forecasts for any question seems like in some way a culmination of the rationalist project. And if you can make something like this work, it doesn’t sound too outlandish that you could apply the same AI to conditional forecasts, or to questions about the past and present (eg whether COVID was a lab leak). I would be most excited if at some point this graduated from its geopolitical focus and was able to answer questions on any topic (eg “what is the chance that Astral Codex Ten gains paid subscribers this year?”), maybe if the questioner gives it links or feeds it some of the appropriate information. FutureSearch is run by a team formerly from Metaculus, including former Metaculus CTO (and Google internal prediction market veteran) Dan Schwarz. They’re looking for potential clients and/or investors; if you’re interested, email hello@futuresearch.ai. Vitalik On AI Prediction Markets Vitalik Buterin, Ethereum-founder-turned-cryptocurrency-public-intellectual, has a blog post on The Promise And Challenge Of Crypto + AI Applications. One of them is a prediction market. He writes: Prediction markets have been a holy grail of epistemics technology for a long time; I was excited about using prediction markets as an input for governance ("futarchy") back in 2014, and played around with them extensively in the last election as well as more recently. But so far prediction markets have not taken off too much in practice, and there is a series of commonly given reasons why: the largest participants are often irrational, people with the right knowledge are not willing to take the time and bet unless a lot of money is involved, markets are often thin, etc. One response to this is to point to ongoing UX improvements in Polymarket or other new prediction markets, and hope that they will succeed where previous iterations have failed. After all, the story goes, people are willing to bet tens of billions on sports, so why wouldn't people throw in enough money betting on US elections or LK99 that it starts to make sense for the serious players to start coming in? But this argument must contend with the fact that, well, previous iterations have failed to get to this level of scale (at least compared to their proponents' dreams), and so it seems like you need something new to make prediction markets succeed. And so a different response is to point to one specific feature of prediction market ecosystems that we can expect to see in the 2020s that we did not see in the 2010s: the possibility of ubiquitous participation by AIs. AIs are willing to work for less than $1 per hour, and have the knowledge of an encyclopedia - and if that's not enough, they can even be integrated with real-time web search capability. If you make a market, and put up a liquidity subsidy of $50, humans will not care enough to bid, but thousands of AIs will easily swarm all over the question and make the best guess they can. The incentive to do a good job on any one question may be tiny, but the incentive to make an AI that makes good predictions in general may be in the millions. Note that potentially, you don't even need the humans to adjudicate most questions: you can use a multi-round dispute system similar to Augur or Kleros, where AIs would also be the ones participating in earlier rounds. Humans would only need to respond in those few cases where a series of escalations have taken place and large amounts of money have been committed by both sides. This is a powerful primitive, because once a "prediction market" can be made to work on such a microscopic scale, you can reuse the "prediction market" primitive for many other kinds of questions: Is this social media post acceptable under [terms of use]?
Metaculus asks the same question and forecasts that AI will be able to make feature films by 2030:
Metaculus asks the same question and forecasts that AI will be able to make feature films by 2030: The dumbest possible way to do this is to ask GPT-4 to write a summary (“write the summary of a plot for a detective mystery story”), then ask it to convert the summary into a 100-point outline, then convert that into 100 minutes of a 100-minute movie, then ask Sora to generate each one-minute block. This wouldn’t work as written now (I don’t think Sora can do sound, it wouldn’t keep actors and style consistent unless you forced it), but it seems like something that requires incremental improvement rather than a grand breakthrough.
Then I released the list of 3300 x 50 guesses, and asked people to analyze them with the aggregation algorithm of their choice to produce what they thought was the best possible list. 460 of you took me up on that (“Full Mode”). Then I waited until 2024 and sent everything to Eric Neyman, who’s better at math than I am. He used the Metaculus scoring function to assess everyone’s accuracy. Thanks to Eric (and to Sam Marks, who helped last time around) for taking care of this. II. And The Winners Are . . . For Blind Mode - where you had to rely on your wits alone and couldn’t spend more than five minutes per question - the winners are: Small Singapore gave me no information except this pseudonym and won’t answer any emails. I don’t even know how to give them their prize money. Please email me at scott@slatestarcodex.com if this is you.
Leonard B. lives in Oregon, and works in real estate development and asset management. He started forecasting during the pandemic, has qualified as a "superforecaster" since 2022, and has recently been doing some work at the Swift Centre For Applied Forecasting. He's "lbiii" on various forecasting platforms (especially Metaculus) and says "I like to hear about cool projects to get involved in, and am especially keen to connect with folks who are working to make forecasting more visible and decision-relevant to policymakers - reach out to possiblylenny@gmail.com"
Andrey S is a psychologist in Israel with a background in computer science. He started forecasting on Metaculus a few years ago, and describes himself as "always interested in learning and expanding my point of view".
Then they fine-tune the whole system on forecasting questions from prediction sites (eg Metaculus, Manifold) that ended between mid-2023 and today. Why mid-2023? Because the AI was trained in mid-2023 and only knows what happened before then, and they can artificially limit its news API calls to before mid-2023. This lets them train the AI on thousands of forecasting questions without letting the AI cheat or having to wait years for the questions to resolve. They select the reasoning where the AI does well, and fine-tune it to do more stuff like that. The Halawi et al AI forecasting method. They find this works almost as well as the human crowd: Are these the data I’ve been trying to get for years - which forecasting platforms beat which others? I don’t think so - Metaculus’ good Briar score only means it performs well on Metaculus’ questions, which might be easier or harder than some other platform’s questions. Can we use the Halawi et al AI as a fixed comparison point, since it’s always the same skill level? I’m not sure - it trained on each of these markets for the style of question that’s in each market, so it might be biased. Still, these numbers are all about where I would expect them to be, except maybe Polymarket, which does better than I would have expected. But the crowd still beats the AI, right? Halawi et al object that humans can forecast only when they feel like it - you can bet on a prediction market question you feel confident on, and avoid one you don’t. When they let their AI forecast only on those questions where it’s most likely to do well (eg those with lots of relevant news articles), it very slightly outperforms the human crowd. As AI gets better, will it naturally beat humans in forecasting? Halawi et al say this won’t be trivial. They find a version of their system based off GPT-3.5 is only very slightly worse than the final version built off GPT-4. This suggests a forecasting AI built off GPT-5 or 6 might get only small improvements. The second team is Tetlock et al. They start from the same place as Halawi - out-of-the-box LLMs aren’t good at forecasting. They’re more scathing about this than Halawi was - they argue that out-of-the-box models do worse than predicting 50% for everything (this was close to true of human forecasters in the ACX tournament). Instead of increasing quality, Tetlock increases quantity. He wants to do wisdom of crowds, where the crowd is a bunch of different LLMs. So he gets twelve LLMs - including Bard, GPT, Claude, Mistral, PaLM, LLaMa, some Chinese models I’d never heard of, and a couple of variations on these bases - asks them to predict questions, and averages the results. Remember, you gotta prompt your model with “you are a smart person”, or else it won’t be smart! The results: Next, we compare the LLM crowd performance to that of the human crowd for our second hypothesis, directly putting the two crowd-aggregation mechanisms head-to-head. To do this, we use the same LLM crowd average as before (taking the median LLM prediction on each question and averaging up the Brier scores across questions). We compare this to the average of median human predictions on the same questions. In our preregistered analysis, we fail to find statistically significant differences between the LLM crowd’s mean Brier score of M=0.20 (SD=0.12) and that of the human crowd, M=0.19 (SD=0.19), t(60) = 0.19, p = 0.850 Their study was much smaller than Halawi’s (31 questions vs. 3,672), so I don’t think this result (nonsignificant small difference) should be considered different from Halawi’s (significant small difference). Still, it’s weird, isn’t it? Halawi used a really complicated tower of prompts and APIs and fine-tunings, and Tetlock just got more LLMs, and they both did about the same. I have two questions after reading these results: Did they actually do the same, or is this just a function of the small sample size in Tetlock and the non-head-to-head comparison?
Are these the data I’ve been trying to get for years - which forecasting platforms beat which others? I don’t think so - Metaculus’ good Briar score only means it performs well on Metaculus’ questions, which might be easier or harder than some other platform’s questions. Can we use the Halawi et al AI as a fixed comparison point, since it’s always the same skill level? I’m not sure - it trained on each of these markets for the style of question that’s in each market, so it might be biased. Still, these numbers are all about where I would expect them to be, except maybe Polymarket, which does better than I would have expected. But the crowd still beats the AI, right? Halawi et al object that humans can forecast only when they feel like it - you can bet on a prediction market question you feel confident on, and avoid one you don’t. When they let their AI forecast only on those questions where it’s most likely to do well (eg those with lots of relevant news articles), it very slightly outperforms the human crowd. As AI gets better, will it naturally beat humans in forecasting? Halawi et al say this won’t be trivial. They find a version of their system based off GPT-3.5 is only very slightly worse than the final version built off GPT-4. This suggests a forecasting AI built off GPT-5 or 6 might get only small improvements. The second team is Tetlock et al. They start from the same place as Halawi - out-of-the-box LLMs aren’t good at forecasting. They’re more scathing about this than Halawi was - they argue that out-of-the-box models do worse than predicting 50% for everything (this was close to true of human forecasters in the ACX tournament). Instead of increasing quality, Tetlock increases quantity. He wants to do wisdom of crowds, where the crowd is a bunch of different LLMs. So he gets twelve LLMs - including Bard, GPT, Claude, Mistral, PaLM, LLaMa, some Chinese models I’d never heard of, and a couple of variations on these bases - asks them to predict questions, and averages the results. Remember, you gotta prompt your model with “you are a smart person”, or else it won’t be smart! The results: Next, we compare the LLM crowd performance to that of the human crowd for our second hypothesis, directly putting the two crowd-aggregation mechanisms head-to-head. To do this, we use the same LLM crowd average as before (taking the median LLM prediction on each question and averaging up the Brier scores across questions). We compare this to the average of median human predictions on the same questions. In our preregistered analysis, we fail to find statistically significant differences between the LLM crowd’s mean Brier score of M=0.20 (SD=0.12) and that of the human crowd, M=0.19 (SD=0.19), t(60) = 0.19, p = 0.850 Their study was much smaller than Halawi’s (31 questions vs. 3,672), so I don’t think this result (nonsignificant small difference) should be considered different from Halawi’s (significant small difference). Still, it’s weird, isn’t it? Halawi used a really complicated tower of prompts and APIs and fine-tunings, and Tetlock just got more LLMs, and they both did about the same. I have two questions after reading these results: Did they actually do the same, or is this just a function of the small sample size in Tetlock and the non-head-to-head comparison?
Halawi and Tetlock’s AIs did between slightly-worse-than and equivalent-to the participant aggregate, so let’s say 90-95th percentile. FutureSearch claims to equal a 98th percentile forecaster, but they got this number through totally different and slightly suspicious methodology, so I don’t know if it’s actually any better. Still, we see that Samotsvety is capable of 98%ile performance (likely real and repeatable) and Metaculus of 99.5th. So there’s still a long way to go before we exhaust the limits of what’s possible to predict given the available amount of information! Towards Rationality Engines An interlude, before we get to other interesting prediction news. Forecasting AIs are pretty cool. I wouldn’t have expected them to work as well as they do. They are already superforecaster-level, and given the amount of low-hanging fruit that gets picked every day here, I can see them equalling or exceeding the top human forecasters in the next few years But they can’t answer many of the questions we care about most - questions that aren’t about prediction. Do masks prevent COVID transmission? Was OJ guilty? Did global warming contribute to the California superdrought? What caused the opioid crisis? Is social media bad for children? I see two interesting challenges ahead here: Making an AI that can do this.
Datscilly, a Metaculus Pro Forecaster who was ranked #1 on the Metaculus leaderboard from 2018 to 2021 for baseline accuracy.
4: The hedge fund Bridgewater is running a forecasting contest on Metaculus. US residents only, extra prizes for undergraduates. Prizes include $25,000 and potentially getting recruited by Bridgewater (in which case read the “Corporate Culture” section on their wiki page before accepting).
Even if you don’t want to convince yourself, this is the correct next step. Again by analogy to Tetlock - if he had started with just one superforecaster, and his thesis was “this guy is really smart, but I refuse to prove it”, nothing would have changed. Instead, his theory of change goes through publishing in a bunch of papers, to identifying other superforecasters, to teaching general principles of superforecasting, to superforecasting as a service (either through specific superforecasters at GJO, or through projects that seek to emulate them like Metaculus, FutureSearch, etc). If Rootclaim doesn’t scale, it either dies with Saar, or at best Saar lives a long life and puts out a few more dozen Rootclaim analyses but nothing else comes of it. You’ve got to start training other people eventually, and part of that process involves demonstrating you did it right, and that’s going to involve inter-rater reliability.
People changed their minds a little over time, but not in a very consistent way that mattered much in the end. What was the “client feedback”? The report says: Client feedback was provided to the Superforecasters on December 21. The client posed questions to the Superforecasters about their assessments up to that date and asked for their reactions to several studies and articles. In the days following the client engagement, the Superforecasters lowered their confidence in the natural zoonosis hypothesis from 73% to 67%, although zoonosis remained the most likely potential cause in their assessment. But following an active engagement with recent genomic studies and historical base rates of zoonotic spillovers, those numbers began to return to earlier levels. January also saw increased attention to the geopolitical context and transparency issues, particularly related to research activities in Wuhan Is this bad? I’m imagining a pro-lab-leak client saying “But what about [this list of pro-lab-leak arguments]?” and then the superforecasters read them and adjust. In one sense, it’s good that they got to see more arguments; on the other, it seems like a potential route by which clients could bias the results - probabilities never quite got back to where they were before the feedback, though they got pretty close. The last-minute spike for zoonosis might be the Rootclaim debate results, which were released on 2/18. So maybe the client feedback and the Rootclaim results both slightly affected the numbers, but mostly the superforecasters started out pro-zoonosis and stuck to their guns. Dan Schwarz and the FutureSearch team say that forecasting has a “rationale-shaped hole”. Despite the report making this sound like a pretty intense process, we don’t get much information about details: In their extensive discussions , Good Judgment’s Superforecasters assessed base rates and historical patterns, existing evidence and scientific analysis, geopolitical context and transparency concerns, trust in intelligence communities, and methodological constraints. 1. Base Rates and Historical Patterns: The Superforecasters frequently referenced base rates, i.e., the history of pandemics emerging from natural zoonosis versus the history of laboratory leaks, to anchor their probabilities. For the former, they discussed how the base rates are changing as the climate warms and as expanding human populations push farther into natural environments that previously saw little human presence. For the latter, they acknowledged that it has only been 12 years since the advent of CRISPR gene- editing tools, and the base rate of lab leaks in the short synthetic biology era is not yet well established. 2. New Evidence and Scientific Analysis: Throughout the period, the Superforecasters adapted their forecasts in light of new scientific evidence, including genomic analyses of SARS-CoV-2 and its relation to bat viruses, and the debate over potential laboratory manipulation. 3. Geopolitical Context and Transparency Concerns: The geopolitical implications of the virus’s origins, particularly in relation to China’s transparency and the involvement of international research institutions, played a significant role in the analysis. Concerns over data veracity, and over the political ramifications of determining that the pandemic’s origins were other than zoonosis, were extensively debated. 4. Trust in Intelligence: Commentary on trust in intelligence communities and discussions about the impact of geopolitical biases on the interpretation of evidence illustrated the complex interplay between science, politics, and human behavior in assessing the pandemic’s origins. 5. Methodological Critiques and the Evaluation of Evidence: The Superforecasters engaged in methodological critiques of the evidence base, including the scrutiny of laboratory practices and biocontainment levels [...] In the end, most Superforecasters were in rough agreement on issues like the base rates of zoonotic spillover. Where they most often disagreed was on the interpretation of actions by Chinese officials and whether their actions reflected how an authoritarian government would react in any crisis over which it did not have full control, or whether those actions were indicative of attempts to cover up a biomedical research-related accident that allowed the SARS-CoV-2 virus to enter circulation in China and, ultimately, the entire globe. Probably it would be too much to ask for to get a transcript of all their discussions - then they’d be nervous saying things that might make them look bad to an audience. What would be a good balance between getting more information and not imposing on their time? Forecasting is an unusually legible and easy-to-judge domain. One of the theories of change for forecasting was to use it to identify smart people with good reasoning, then turn them loose on less well-behaved problems. This is one of the first big attempts to do this at scale. How did it work? We can’t tell, because it’s inherently an illegible and hard-to-judge domain. Darn. I don’t know what I expected. Notes From A Local Optimum Austin’s concern - that forecasting has reached a local optimum - is widely shared. We have some good sites: Manifold, Metaculus, Polymarket, GJO, etc - all doing good work. We have good-ish probabilities for a few important questions. Every so often a news source cites them. Sometimes a decision-maker looks at them behind the scenes, maybe. Is this all there is? The FutureSearch team says the next step is to focus on “rationale”. We need to use forecasting not just to get a raw probability, but to explain what’s going on and why we think something. Then instead of just convincing policy-makers to trust forecasts, we can tell them why something is true, or inform their discussions even if they’re not willing to blindly trust a number. Is this a betrayal of the forecasting ethos? The original dream was that instead of a bunch of people giving arguments, we could just test who was right. Now we’re going back to the arguments? People have argued forever; what does forecasting add to that? Well, they add the knowledge that the arguments are from people who have been right a lot before and are incentivized to be right again. Still, it’s not a natural fit. Probably it’s relevant here that FutureSearch’s forecasting AI does a really good job of this by default, in a way humans can’t match. Nuno’s yearly forecasting roundup doesn’t have a single thesis, but the first part is a well-supported complaint that most forecasting sites aren’t good business. They either burn VC money, burn EA donations, or converge towards casinos to support themselves. He gives an honorable exception to Cultivate Labs, which sells prediction market software rather than the results themselves. Open Philanthropy (billionaire Dustin Moskovitz’s EA-aligned charitable foundation) has at least given forecasting a vote of confidence, recently choosing to promote it to one of their main donation areas. Still, they got a lot of pushback on the decision, for example SuperDuperForecasting here: This will be a total waste of time and money unless OpenPhil actually pushes the people it funds towards achieving real-world impact. The typical pattern in the past has been to launch yet another forecasting tournament to try to find better forecasts and forecasters. No one cares, we already know how to do this since at least 2012! The unsolved problem is translating the research into real-world impact. Does the Forecasting Research Institute have any actual commercial paying clients? What is Metaculus's revenue from actual clients rather than grants? Who are they working with and where is the evidence that they are helping high-stakes decision makers improve their thought processes? Incidentally, I note that forecasting is not actually successful even within EA at changing anything: superforecasters are generally far more relaxed about Xrisk than the median EA, but has this made any kind of difference to how EA spends its money? It seems very unlikely. And Marcus Abramovich here: I'm in the process of writing up my thoughts on forecasting in general and particularly EA's reverence for forecasting but I feel, similar to @Grayden that forecasting is a game that is nearly perfectly designed to distract EAs from useful things. It's a combination of winning, being right when others are wrong and seemingly useful, all wrapped into a fun game. I'd like to see tangible benefits to more broad funding of forecasting that seems to be done in t he millions and tens of millions of dollars. I would also be the type of person you would think would be a greater fan of forecasting. I'm the number one forecaster on Manifold and I've made tens of thousands of dollars on Polymarket. But I think we should start to think of forecasting as more of a game that EAs like to play, something like Magic the Gathering that is fun and has some relations to useful things but isn't really useful by itself. Eli Lifland has a long and hard-to-summarize comment here, response from Ozzie Gooen here, podcast between them on “Is Forecasting A Promising EA Cause Area?” here. I’m split on this. My previous hope was that the field would gradually grow, without any qualitative changes or discontinuities, until it became big enough that journalists and policy-makers were aware of it and took it seriously (compare eg the growth of the Internet as a scholarly resource). I think the strongest argument against this is Manifold’s relatively flat user numbers. Is there a new hope? I think if nothing else, forecasting might be useful as a testing ground: First, to create forecasting AIs (like FutureSearch) which can then get consulted on a variety of questions, eg by policy-makers. The biggest holdup has always been the need to gather 20 or 50 or however many hard-to-find superforecasters for whatever question you’re asking, and then trust their advice even though they’re fallible fleshbag humans. If you can use the 20 to 50 superforecasters to inspire an AI, and then test the AI and prove it’s good, people might be more interested. This is especially true if the AI can branch out beyond traditional forecasting questions. Once we have a few of these, we can start comparing the next generation of AIs to the previous generation, and skip the superforecasters.
This is a response to the predictions I made in my update on the Lumina probiotic. You can click “see three more answers” for the question on side effects (separate from this question on efficacy). My numbers were 5/35/10/50 for the first question and 30/5/<1 for the second. Huh?
The New York case is going on now, and it seems like there’s an 80% chance he’ll be found guilty. The part I don’t understand is the last one (73% found guilty of felony in New York) vs. the second one (56% of any felony at all). This might just be a failure of arbitrage. It looks like nobody expects jail time in any case. Here’s an embarrassing screwup from Metaculus. This question was about when there would be a “Great Power war”, with Great Powers defined as any country in the top ten of military spending. But surprise surprise, Ukraine getting invaded made them spend a lot of money on their military that year, so they rose to #8 in the world in military spending in 2023. Since Russia is also in the top ten, this qualifies as a “Great Power war” by the technical definition, and the question resolves positive. Moral of the story: resolution criteria are hard!
I assume they chose these three because they’re the only ones discussed enough to have enough data. I am following their lead. I appreciate John and Maxim’s work, but I’m not completely comfortable trusting it. Their model is based on results from Betfair, Smarkets, PredictIt, and Polymarket. But I don’t know much about the first two (as an American, I’m banned from even reading Betfair), and the latter two are notoriously bad at partisan political questions. They usually overestimate Republicans’ chances, partly because Democrats’ opposition to online political betting has turned the pool of online political bettors disproportionately red. While a fluid and easily-accessible prediction market should be able to avoid biases like these, neither PredictIt nor Polymarket really qualifies. The CFTC, which regulates prediction markets, has crippled both - PredictIt has very low maximum investments per market, and Polymarket is crypto-only and banned for US citizens. These have prevented their biases from being corrected and made both of them perform relatively weakly in head-to-head contests. And Stossel/Lott’s focus on betting sites automatically excludes two of the biggest and most historically accurate forecasting engines from their calculation - Metaculus and Manifold. In order to get numbers I trusted more than theirs, I looked at Metaculus, Manifold, PredictIt, and Polymarket, weighting each by how much I trusted it. Here’s what I found: The Biden number is about 4% higher than Nate Silver’s model over the same time period; see below for why that might be. [EDIT 7/2/24: Original version had a miscalculation which decreased everyone’s odds by about 10%. Above version should be correct.] You can find my sources at the bottom of the post. “Explicit” odds are based on questions like “What are the chances of Biden winning if he is the nominee?” “Implied” odds were generated by combining the questions “What is the chance of Biden being the nominee?” and “What is the chance of Biden winning?”; this is safe enough with Biden, but with unlikely nominees like Newsom, some of the percentages can get small enough that they start running into small-number-biases and become less trustworthy. I’ve weighted each market’s explicit calculation higher than their implicit one to compensate. A possible objection to these results: conditional probabilities don’t exactly reflect the intuitive concept of decision-making. That is, we’re not asking “We want to know whether or not to keep Biden, so what are the chances that he’ll win if we do?”, we’re asking the market for the chance that he’ll win, in the set of worlds where people decide to keep him for other reasons. We should expect this to overestimate his performance. That is, imagine that tomorrow, Biden has completely recovered, he easily wins his next debate with Trump, and everyone agrees the most recent debate was just a fluke - in that world, he is both more likely to be nominated and more likely to win. Alternatively, if tomorrow he gets much worse and can’t even speak in full sentences, he’s much less likely to be nominated and much more likely to lose. Since the real world includes both those possibilities, restricting ourselves to the set of worlds where he gets nominated means we’re overestimating the chance that he wins. There are similar-albeit-less-severe problems with other candidates - if we choose Newsom, that might be because he won some kind of debate or process versus Harris and all the other potential replacements. Overall I expect this to be mostly correct, but probably overestimate Biden’s chances by a percent or two relative to others. Along with these three candidates, Metaculus had an explicit “should the Democrats replace Biden?” question: Manifold also asks how Democrats will do if they replace Biden (without specifying a particular replacement): We can compare this to their Biden market… …and find that once again, they expect replacing Biden to go better (though I think 51% is just cope). At the Manifest prediction market conference in early June, I interviewed Nate Silver: …and asked him for his probability that the Democrats would win this election, versus his probability that the Democrats would win conditional on Biden not being the nominee (specifically “drops dead tomorrow of natural causes”). He said 40-45% chance normally, 50% chance without Biden. This was before the debate, but I think it matches the markets’ opinion that switching candidates would help the Democrats’ chances - and this has only become more true since the debate. On the other hand, polls asking people how they would vote in possible matchups don’t show any advantage of alternate candidates over Biden. Here’s the only post-debate poll I could find: And if Biden does need to be replaced, Democrats mostly support Harris, who the prediction markets find least promising: Maybe Democrats are the wrong people to ask - they’re already going to vote Biden, so you want someone who’s more attractive to independents. Of course, in a normal primary it would be Democrats making the decision. But if elites are going to do something behind closed doors, maybe they should take advantage and choose the candidate most likely to win, for once. I think these polls are the strongest objection to the prediction markets’ verdict. You could make an argument where prediction market users are mostly educated liberal white males, and even though they’re incentivized to honestly determine what ordinary people think, they’re too out-of-touch with ordinary people to do so effectively. Or they might be over-fixating on “voters don’t like Biden’s senility” without considering that, even if voters didn’t know Biden was currently senile before Thursday, they probably guessed that he would become senile sometime in his four-year term, and had basically accepted that his aides would do the hard work. Maybe they prefer a well-known likeable incumbent over an unknown quantity (and the unknown quantity’s potential new/weird aides), even if the well-known likeable incumbent is senile. Maybe elites know more than we do about how hard it is to inject a new candidate at the last moment, how dangerous it is to have someone who hasn’t been thoroughly vetted for scandals, et cetera. Still, for now I trust the prediction markets. I think replacing Biden would add ~10 prcentage points to the Democrats’ chance of victory. At the end of this post, I’ll list the prediction markets I’m using as sources. But before then, a brief interlude of: Fuzzy Subjective Human Factors I Am Not Really Qualified To Talk About Many people on Twitter are asking “how could anyone possibly have been stupid enough to not realize that Biden was senile?” I was that stupid. I didn’t say it openly, because I’m at least smart enough to have a high threshold for giving my opinion on political things I don’t know much about. But I thought it in my heart. So in case the people asking “how could anyone have been that stupid?” actually want an explanation, here’s my former reasoning. Republicans have been accusing Biden of being senile (and the Democrats of hiding it) for at least five years now. Before the 2020 debates, they were excited that this was when they could finally prove once and for all that Biden was senile. Then Biden did fine, and they retreated to “well he’s senile but they have some secret drug they’re giving him, just during debates, that makes him look fine”. Notice this is from 2020; according to polls, he did win the debate that year (source) I think a lot about experimental cognitive enhancement drugs, and I can say with confidence that nothing like that exists. Stimulants can help people with mild dementia be more active and motivated, but they don’t really improve cognition directly, and they can’t make a demented person temporarily lucid. Still, for the past four years, every time Biden was going to do something - a press conference, a State of the Union, whatever - the Republicans would say “ha, this time is going to be the proof that he’s senile!” And then he would always do fine, and they would retreat back to “I guess he used the secret drug this time too”. The satire site Babylon Bee had some funny articles about this: Babylon Bee, after Biden gave a good State of the Union speech earlier this year. Meanwhile, the Democrats were spreading the alternate narrative that Trump was senile. This one has gotten less press, because I don’t know how many people really believed it. But it came up occasionally, along with out-of-context video snippets where Trump said or did something dumb or meandering. Of course, anybody with a presidential candidate’s level of public exposure will have a few gaffes. Even if they don’t, you can always deceptively crop something so it looks like they did. Wait, why is a psychoanalyst getting quoted as a top expert in dementia? (source) I didn’t know you could diagnose someone via Change.org petition, but 2544 people who claim to be licensed professionals can’t be wrong! So with the constant attempts to prove that both candidates were senile, the constant demonstration by both candidates that they weren’t, and the constant retreat into conspiracy theories of “I guess he used the magic drug again but we’ll get him next time!”, I just tuned out this entire category of thing. And I guess I kept it tuned out longer than I should have, whoops. Reversed stupidity is not intelligence. Even if liars are saying something for their usual liar reasons, it can still be true. For twenty years, people spread false rumors that Castro was on his deathbed, but this didn’t make Castro immortal. In the same way, I should have figured out that even if I couldn’t trust any particular claim that Biden was senile, the prior for an 81 year old becoming senile was still high. But I guess I assumed that if he was becoming senile, some Democratic elites would have secret knowledge about it, and they couldn’t possibly be so stupid as to deny it while also scheduling him for a debate where it would inevitably come out. So I figured the Democratic elites who were closest to him thought he was doing well, and I trusted them more than the people who had been wrong every time for the past five years. I’m still confused what those elites were thinking. Reading the news coverage for the past few days (including some video clips from a post-debate rally where he seemed noticeably better) it seems like some combination of: He has good days and bad days, and they were hoping this would be a good day.
The Biden number is about 4% higher than Nate Silver’s model over the same time period; see below for why that might be. [EDIT 7/2/24: Original version had a miscalculation which decreased everyone’s odds by about 10%. Above version should be correct.] You can find my sources at the bottom of the post. “Explicit” odds are based on questions like “What are the chances of Biden winning if he is the nominee?” “Implied” odds were generated by combining the questions “What is the chance of Biden being the nominee?” and “What is the chance of Biden winning?”; this is safe enough with Biden, but with unlikely nominees like Newsom, some of the percentages can get small enough that they start running into small-number-biases and become less trustworthy. I’ve weighted each market’s explicit calculation higher than their implicit one to compensate. A possible objection to these results: conditional probabilities don’t exactly reflect the intuitive concept of decision-making. That is, we’re not asking “We want to know whether or not to keep Biden, so what are the chances that he’ll win if we do?”, we’re asking the market for the chance that he’ll win, in the set of worlds where people decide to keep him for other reasons. We should expect this to overestimate his performance. That is, imagine that tomorrow, Biden has completely recovered, he easily wins his next debate with Trump, and everyone agrees the most recent debate was just a fluke - in that world, he is both more likely to be nominated and more likely to win. Alternatively, if tomorrow he gets much worse and can’t even speak in full sentences, he’s much less likely to be nominated and much more likely to lose. Since the real world includes both those possibilities, restricting ourselves to the set of worlds where he gets nominated means we’re overestimating the chance that he wins. There are similar-albeit-less-severe problems with other candidates - if we choose Newsom, that might be because he won some kind of debate or process versus Harris and all the other potential replacements. Overall I expect this to be mostly correct, but probably overestimate Biden’s chances by a percent or two relative to others. Along with these three candidates, Metaculus had an explicit “should the Democrats replace Biden?” question: Manifold also asks how Democrats will do if they replace Biden (without specifying a particular replacement): We can compare this to their Biden market… …and find that once again, they expect replacing Biden to go better (though I think 51% is just cope). At the Manifest prediction market conference in early June, I interviewed Nate Silver: …and asked him for his probability that the Democrats would win this election, versus his probability that the Democrats would win conditional on Biden not being the nominee (specifically “drops dead tomorrow of natural causes”). He said 40-45% chance normally, 50% chance without Biden. This was before the debate, but I think it matches the markets’ opinion that switching candidates would help the Democrats’ chances - and this has only become more true since the debate. On the other hand, polls asking people how they would vote in possible matchups don’t show any advantage of alternate candidates over Biden. Here’s the only post-debate poll I could find: And if Biden does need to be replaced, Democrats mostly support Harris, who the prediction markets find least promising: Maybe Democrats are the wrong people to ask - they’re already going to vote Biden, so you want someone who’s more attractive to independents. Of course, in a normal primary it would be Democrats making the decision. But if elites are going to do something behind closed doors, maybe they should take advantage and choose the candidate most likely to win, for once. I think these polls are the strongest objection to the prediction markets’ verdict. You could make an argument where prediction market users are mostly educated liberal white males, and even though they’re incentivized to honestly determine what ordinary people think, they’re too out-of-touch with ordinary people to do so effectively. Or they might be over-fixating on “voters don’t like Biden’s senility” without considering that, even if voters didn’t know Biden was currently senile before Thursday, they probably guessed that he would become senile sometime in his four-year term, and had basically accepted that his aides would do the hard work. Maybe they prefer a well-known likeable incumbent over an unknown quantity (and the unknown quantity’s potential new/weird aides), even if the well-known likeable incumbent is senile. Maybe elites know more than we do about how hard it is to inject a new candidate at the last moment, how dangerous it is to have someone who hasn’t been thoroughly vetted for scandals, et cetera. Still, for now I trust the prediction markets. I think replacing Biden would add ~10 prcentage points to the Democrats’ chance of victory. At the end of this post, I’ll list the prediction markets I’m using as sources. But before then, a brief interlude of: Fuzzy Subjective Human Factors I Am Not Really Qualified To Talk About Many people on Twitter are asking “how could anyone possibly have been stupid enough to not realize that Biden was senile?” I was that stupid. I didn’t say it openly, because I’m at least smart enough to have a high threshold for giving my opinion on political things I don’t know much about. But I thought it in my heart. So in case the people asking “how could anyone have been that stupid?” actually want an explanation, here’s my former reasoning. Republicans have been accusing Biden of being senile (and the Democrats of hiding it) for at least five years now. Before the 2020 debates, they were excited that this was when they could finally prove once and for all that Biden was senile. Then Biden did fine, and they retreated to “well he’s senile but they have some secret drug they’re giving him, just during debates, that makes him look fine”. Notice this is from 2020; according to polls, he did win the debate that year (source) I think a lot about experimental cognitive enhancement drugs, and I can say with confidence that nothing like that exists. Stimulants can help people with mild dementia be more active and motivated, but they don’t really improve cognition directly, and they can’t make a demented person temporarily lucid. Still, for the past four years, every time Biden was going to do something - a press conference, a State of the Union, whatever - the Republicans would say “ha, this time is going to be the proof that he’s senile!” And then he would always do fine, and they would retreat back to “I guess he used the secret drug this time too”. The satire site Babylon Bee had some funny articles about this: Babylon Bee, after Biden gave a good State of the Union speech earlier this year. Meanwhile, the Democrats were spreading the alternate narrative that Trump was senile. This one has gotten less press, because I don’t know how many people really believed it. But it came up occasionally, along with out-of-context video snippets where Trump said or did something dumb or meandering. Of course, anybody with a presidential candidate’s level of public exposure will have a few gaffes. Even if they don’t, you can always deceptively crop something so it looks like they did. Wait, why is a psychoanalyst getting quoted as a top expert in dementia? (source) I didn’t know you could diagnose someone via Change.org petition, but 2544 people who claim to be licensed professionals can’t be wrong! So with the constant attempts to prove that both candidates were senile, the constant demonstration by both candidates that they weren’t, and the constant retreat into conspiracy theories of “I guess he used the magic drug again but we’ll get him next time!”, I just tuned out this entire category of thing. And I guess I kept it tuned out longer than I should have, whoops. Reversed stupidity is not intelligence. Even if liars are saying something for their usual liar reasons, it can still be true. For twenty years, people spread false rumors that Castro was on his deathbed, but this didn’t make Castro immortal. In the same way, I should have figured out that even if I couldn’t trust any particular claim that Biden was senile, the prior for an 81 year old becoming senile was still high. But I guess I assumed that if he was becoming senile, some Democratic elites would have secret knowledge about it, and they couldn’t possibly be so stupid as to deny it while also scheduling him for a debate where it would inevitably come out. So I figured the Democratic elites who were closest to him thought he was doing well, and I trusted them more than the people who had been wrong every time for the past five years. I’m still confused what those elites were thinking. Reading the news coverage for the past few days (including some video clips from a post-debate rally where he seemed noticeably better) it seems like some combination of: He has good days and bad days, and they were hoping this would be a good day.
35: Metaculus is running an AI bot forecasting tournament. You write the bot, they provide the questions, best bot wins $30,000 prize. Learn more here.
Mike Hawke points out that despite the new legislation promoting nuclear power, Metaculus’ forecast of US nuclear power in 2050 hasn’t budged.
FiveThirtyNine (ha ha) is a new forecasting AI that purports to be “superintelligent”, ie able to beat basically all human forecasters. In fact, its creators go further than that: they say it beats Metaculus, a site which aggregates the estimates of hundreds of forecasters to generate estimates more accurate than any of them. You can read the announcement here and play with the model itself here.
The basic structure is the same as past forecasting AIs like FutureSearch. A heavily-modified copy of ChatGPT gathers relevant news articles, then prompts itself to think in superforecaster-like ways. The creators say the ChatGPT copy had a knowledge cutoff of October 2023, so they tested it on Metaculus questions from after that date. It got 87.7% accuracy, slightly above Metaculus forecasters’ 87.0%. Manifold is skeptical: The commenters, especially Neel Nanda, found that doing knowledge cutoffs properly is hard, and the ChatGPT base seems to know about news events after October 2023 - upon questioning, it seemed aware of an earthquake in November 2023. When presented with a different set of questions that were all after November 2023, FiveThirtyNine substantially underperformed the Metaculus average. But also, my attempts to play around with the bot haven’t been encouraging: I asked it to predict the chance that Prospera would have a population of at least 1,000 in 2027. Like FutureSearch on the same question, it cited many interesting news articles on Prospera’s chances but failed to do the basic step of figuring out its current population and growth rate. It eventually concluded 35% chance, which is reasonable enough. But when asked whether Prospera would have a population of 100,000 in 2028, it also said 35% chance, which is absurd.
A Twitter user pointed out (and I confirmed) that upon being asked “What is the probability that Joe Biden is still President in October 2025?”, it goes through a lot of reasoning about his age and dementia and finally concludes 55% because he’s not that demented. I originally thought this might be due to the knowledge cutoff (it doesn’t know Biden dropped out in favor of Harris), but if I ask the AI about October 2029, then it says that Joe Biden has dropped out in favor of Harris (even though in that question it doesn’t matter). So now I think it’s more like ChatGPT’s tendency to round anything that sounds vaguely like the surgeon riddle off to the surgeon riddle - in the same way, FiveThirtyNine rounds off anything that sounds vaguely like the popular question “is Biden too old and demented to stay president?” into that question, even though there are much stronger non-dementia-related reasons he can’t be president next year. The FutureSearch team wrote a LessWrong post generalizing these kinds of observations, Contra Papers Claiming Superhuman AI Forecasting. They examine four claims, including the one above, and find similar problems with all of them. Sometimes the teams involved missed potential data contamination (ie their LLM wasn’t forecasting, it just already knew the answers). Other times the LLM failed but - in the spirit of technologists everywhere - the researchers invented finicky definitions of “above human level” by which even mediocre AIs qualified. They conclude: Today's autonomous AI forecasting can be better than average, or even experienced, human forecasters…but it's very unlikely that any autonomous AI forecaster yet built is close to the accuracy of a top 2% Metaculus forecaster, or the crowd. Still, FiveThirtyNine is a big advance in at least one way: as far as I know, it’s the first high-quality AI forecaster which is free to the general public. Try it out! This means there’s still time to use this joke when they invent the actually good one! r/MarkMyWords This is a subreddit for people who want to record bold predictions. There’s nothing formal - nobody gives probabilities, and some of them don’t even have end dates. It’s just people going out on a limb to say they’re sure something will happen. …most of them are “mark my words, time will prove Democrats right about everything, and reveal Republicans to be disgusting criminal hypocrites”. …so much so that it kind of fails as a potentially interesting institution and becomes just another monument to how sad the Internet’s gotten. Still, it might be fun to keep going until you find an old post where the prediction has already “resolved”, and see what happens. Here are some of the highest-upvoted posts from at least a year ago (minus pop culture and dumb in-jokes): MMW: It will turn out the Notre Dame fire was actually arson, and not an “accident” as the Paris police initially claimed.
Yet in the end, everything is so perfectly balanced that the sum total of these luminaries refuse to say which side of even we’re on. The nation balances on a knife’s edge. Eli Lilly stock moons. A red sun hangs over Philadelphia, where American democracy began and may yet end. A man walks into a diner just before closing time. He looks like a good tipper. The waitress was hoping to leave early and go vote. She decides against. Seven trumpets sound; seven seals are opened; there is silence in Heaven for the space of about half an hour. As George RR Martin put it, “God flips a coin and the world holds its breath.” Tomorrow - if we are so lucky - there will be a result. The great function that has consumed us for so long will return 0 or 1. The pundits who guessed 51-49 will be hailed as prophets; the pundits who guessed 49-51 will get bullied out of public life. The winner’s campaign operatives will be praised as world-historic geniuses, the loser’s mocked forever as utter nincompoops. Thousands of lifelong public servants who backed Mr. 49% will be tossed from DC like used toilet paper and replaced with thousands of hacks who backed Mr. 51%. Funding streams will go dry. Whole lands will turn to economic deserts. Fortunes will be destroyed. A few people will make good on their exile and suicide threats. Most won’t. The Union will either survive or not. If it survives, we’ll do it all over again four years later. A red sun sets over DC. The marble monuments are stained crimson; the statues of Lincoln and Jefferson and the rest look like they writhe in hellfire. The people seclude themselves in their houses. A city where even the Christians are atheist kneels in prayer. On some level, they know - we know - it was never just about choosing a leader. It was all for this - the same urge that drove the games of the Colosseum and sacrifices of Tenochtitlan. The need for a single moment of unconditioned reality. For one evening, the people of the richest and most secure nation in history, fat off the spoils of six continents, will know the same fear as the starving Catalhuyuk farmer, staring at the sky, wondering if the rains will come. For one evening, everyone - rich or poor, religious or secular, Democrat or Republican - will join in the prayer of the poet: “Judge of the Nations, spare us yet Lest we forget - lest we forget!” Don’t Blame Me, I Voted For Kodos Metaculus uses experimental “conditional forecasts” to determine the consequences of a Trump/Harris victory. How it works (example): you set up two forecasts: If Trump wins, will China invade Taiwan?
Iranian nukes more likely under Trump (49.5%) than Harris (45%) All of these involve foreign policy going worse under Trump than Harris. Is this unfair? Even Trump’s supporters would agree he is less interested in funding Ukrainian resistance than Harris; Metaculus’ numbers here seem in line with this. Harris is more likely to continue deals where Iran gets sanctions relief / money in exchange for not going nuclear. Whether or not you agree with Trump that those deals are extortionary and unfair, it makes sense that Iran is more likely to go nuclear if those deals are discontinued. But this is also a small effect and could be noise. The Taiwan numbers are the least convincing, but seem to be based off of arguments like the ones here: Trump has said that Taiwan should “pay for” defense, and generally been skeptical of foreign entanglements. And here’s Manifold’s version of the same thing: Polymarket’s Wild Ride On October 14th, Polymarket gave Donald Trump 54% odds of winning, compared to Nate Silver’s 49% and Metaculus’ 45%. Whatever, everyone knows Polymarket has a small right-wing bias, and 5% isn’t too bad. Three days later, it had risen from 54% to 61%, despite no news and no change for Metaculus or Nate, bringing the Polymarket/Silver spread to an unprecedented 11%. What happened? This is the rare prediction market story where the answers are already in the New York Times and the Wall Street Journal: one really rich guy put $30 million on Trump (a recent followup by Jorge Velez claims it’s actually more like $75 million). Although he prefers to remain anonymous, reporters have talked to him and are able to reveal that he’s French, goes by “Theo”, is a former banker, and has no insider connections. He just a normal rich guy who really thinks Trump will win. This is exactly the sort of shock that prediction markets are supposed to be resilient against. Instead, the market stayed at 61% for days, swung even higher for a while, finally fell back down two weeks later, then went back up again. What happened? The simplest story would be insufficient liquidity: there just weren’t enough people to gather the $75 million it would take to bet against Theo. This is superficially plausible: Polymarket requires crypto and bans Americans, so the mispricing couldn’t be corrected until enough crypto-literate, American-election-following foreigners showed up to bet $75 million. That’s a tall order, and maybe it took two weeks. But the simple story seems wrong. Other real-money markets rose approximately in tandem with Polymarket. For example, Smarkets got to Trump 59% on 10/16, and peaked at 64% on 10/30. Kalshi followed a similar path. Both tracked Polymarket, not Nate Silver or Metaculus (neither of whom ever went above Trump 55% since Harris joined the race). So I think the remaining stories are: Theo made his giant bet on Polymarket. By coincidence, at the same time, bettors everywhere massively overcounted a few good polls for Trump and started a feeding frenzy on pro-Trump shares. This made all other markets gain, and Polymarket stay at its Theo-caused peak, until a few bad polls for Trump brought everyone back to reality last week.
On October 14th, Polymarket gave Donald Trump 54% odds of winning, compared to Nate Silver’s 49% and Metaculus’ 45%. Whatever, everyone knows Polymarket has a small right-wing bias, and 5% isn’t too bad. Three days later, it had risen from 54% to 61%, despite no news and no change for Metaculus or Nate, bringing the Polymarket/Silver spread to an unprecedented 11%. What happened? This is the rare prediction market story where the answers are already in the New York Times and the Wall Street Journal: one really rich guy put $30 million on Trump (a recent followup by Jorge Velez claims it’s actually more like $75 million). Although he prefers to remain anonymous, reporters have talked to him and are able to reveal that he’s French, goes by “Theo”, is a former banker, and has no insider connections. He just a normal rich guy who really thinks Trump will win. This is exactly the sort of shock that prediction markets are supposed to be resilient against. Instead, the market stayed at 61% for days, swung even higher for a while, finally fell back down two weeks later, then went back up again. What happened? The simplest story would be insufficient liquidity: there just weren’t enough people to gather the $75 million it would take to bet against Theo. This is superficially plausible: Polymarket requires crypto and bans Americans, so the mispricing couldn’t be corrected until enough crypto-literate, American-election-following foreigners showed up to bet $75 million. That’s a tall order, and maybe it took two weeks. But the simple story seems wrong. Other real-money markets rose approximately in tandem with Polymarket. For example, Smarkets got to Trump 59% on 10/16, and peaked at 64% on 10/30. Kalshi followed a similar path. Both tracked Polymarket, not Nate Silver or Metaculus (neither of whom ever went above Trump 55% since Harris joined the race). So I think the remaining stories are: Theo made his giant bet on Polymarket. By coincidence, at the same time, bettors everywhere massively overcounted a few good polls for Trump and started a feeding frenzy on pro-Trump shares. This made all other markets gain, and Polymarket stay at its Theo-caused peak, until a few bad polls for Trump brought everyone back to reality last week.
This is equivalent to the implicit argument between Polymarket and a group of other forecasting sites, especially Metaculus.
Just before the election, Polymarket and other real-money prediction markets said Trump had a 60% chance of winning. Metaculus and other non-money forecasting sites said he had a 50% chance of winning.
Then Trump won. Should this increase your trust in Polymarket rather than Metaculus? Only by the tiniest of amounts. If you previously thought (like I did) that there was a 90% chance that Metaculus was more accurate, you should update down to 88%.
From “Genesis and pathogenesis of the 1918 pandemic H1N1 influenza A virus”, linked above. You may recognize the lead author - Michael Worobey has also been a leading voice on the zoonotic side of the COVID origins debate. The recent history of the flu, as far as I can tell, is: 1918: An H1N1 flu (“Spanish flu”) jumped from birds to humans in America and killed 50 million people worldwide. This replaced all older strains, so most seasonal flus during this era were H1N1. 1957: An H2N2 flu (“Asian flu”) crossed from birds to humans in China, and killed about 2 million people worldwide. It replaced the H1N1 strain, so most seasonal flus during this era were H2N2. 1968: An H3N2 flu (“Hong Kong flu”) crossed from pigs (?) to humans in Hong Kong, and killed another 2 million people worldwide. It replaced the H2N2 strain, so most seasonal flus during this era were H3N2. 1977: An H1N1 flu (“Russian flu”) leaked from a biology lab (?) in Russia (it might have been a strain from the 1940s, which the Russians were trying to make a vaccine for). It didn’t kill that many people, but it stuck around, and from then on, seasonal flus could be either H3N2 or H1N1. 2009: An H1N1 flu (“Mexican flu” until the PC police stepped in; afterwards “swine flu”) took some horrible circuitous route between birds and pigs and back again, crossed over into humans in Mexico, and killed 200,000 people. It outcompeted older strains of H1N1, but couldn’t crowd out H3N2, so seasonal flus are still either H3N2 or H1N1. …which brings us to the present, hopefully illuminating why “new flu strain crosses over from animals into humans” is such an “uh oh” moment. The Bird Flu Technically, all pandemic flus start as bird flus. Influenza A evolved in birds. Sometimes it spreads to other animals, including pigs, cattle, and humans. The most common way for a bird flu to spread to humans is to “reassort” (not exactly virus sex, but close enough, and the real version is less memorable) with a human flu virus (ie one that has already crossed over to humans). The resulting virus has all of the human flu virus’ human adaptations, but borrows enough new antigens from the bird virus to evade the immune system. Pigs can be infected by both human and bird viruses, so they are a common place for this reassortment to take place. If reassortment is sort of like viral sex, pigs are sort of like Tinder. When a bird flu and human flu reassort in pigs, the resulting disease is called a swine flu. At least the 2009 flu pandemic was a swine flu, and a minority opinion thinks the 1918 pandemic was too. There aren’t major epidemiological differences between direct-from-bird flus and swine flus. H5N1 was first noticed in birds - specifically, a flock of chickens in Scotland in 1959 - after which it disappeared for forty years. In 1996, it showed up in geese in China, then gradually increased its market share among birds worldwide. In 2022, it was found in minks; apparently it had learned to infect mammals. By early 2024, it was seen in cows. Now it’s in cow herds in 16 states, and one of them (California) has declared a state of emergency. And in October, H5N1 was found in pigs for the first time. It’s not uncommon for humans to catch an animal disease. This doesn’t mean the disease has “crossed over” to humans. If the virus isn’t suited to human-to-human transmission, it simply dies off (either before or after killing its human host). Thus, chicken farmers have been reporting scattered H5N1 cases since 1997; now that the virus has spread to cattle, cow farmers have started reporting the same. A Metaculus comment on this topic introduced me to the phrase “biocomputational surface”. Every viral replication that takes place in a human gives the virus one more chance to develop the set of mutations that makes it human-transmissible and start the next pandemic. Or, more likely, every viral replication that takes place in a human who has both the H5N1 bird flu and a normal human flu - or in a pig which has both viruses - gives the virus one extra chance to reassort in a way that produces a bird-antigen-fortified human-adapted flu virus. This doesn’t mean H5N1 will definitely become human-transmissible soon. Many viruses hang out on the borders of transmissibility for decades. Some, for unclear reasons, never cross over at all. But all of this is compatible with the virus becoming transmissible soon. So: What Is The Chance Of A Pandemic? The prediction markets on this topic ask a question about “10,000 cases in the United States”. Does this necessarily mean “pandemic”? Might it be possible to get to 10,000 cases just from the scattered chicken and cow farmers, with no human-to-human transmission? Despite many chicken and cow infections this year, there have only been 60 - 70 recorded human cases. Unless there is a phase change in screening methods, it seems hard for this number to increase to 10,000 off farmers alone. I think it’s fair to treat this question as operationalizing “what is the chance of a pandemic”? By this definition, Manifold estimates a 40% chance of an H5N1 pandemic in 2025. Metaculus estimates a 5% chance. You can see below whether that’s changed since I wrote this essay: 5% versus 40% is a big difference! Who do we trust? I trust Metaculus. Metaculus has beaten Manifold in both of the two head-to-head comparisons that I know of (Jeremiah Johnson’s and mine). Manifold’s number swings by a factor of two from week to week; Metaculus has been steady. But also, Metaculus hosts a CDC-sponsored respiratory disease forecasting tournament which has enriched them in epidemiological expertise. And if you look at the quality of comments on both sites, it’s pretty obvious where the people with more intellectual chops are hanging out. The Manifold comments are mostly single sentences, or occasionally just links to an article about new cases. The Metaculus comments look more like this one by dimaklenchin: Despite the panic propaganda, H5N1 is unlikely to be "just a single mutation away from switching host preference": 1) It normally takes a lot more than a single mutation to switch hosts. E.g., there are at least five different reasons why SIV (monkey equivalent of HIV) is not infectious to humans. Heck, a variant of SIV that bears HIV's receptor-recognizing surface protein (SHIV) is still not infectious to humans. HIV most certainly evolved from SIV but, almost as certainly, it took a very long time to get there. Not that all viruses are the same and things can't turn out differently with flu, but I don't subscribe to the idea that a mere change of receptor specificity (something that can take 1-2 mutations) will be sufficient. 2) We have data. Lots of human infections with other varieties of bird flu in the past - all those viruses ultimately went nowhere. Why would H5N1 be radically different? E.g., the "Canadian teen", despite what sounds like a prolonged exposure, failed to infect anyone around him. Since I am at 18% for the h-2-h H5N1 detection in 2025, I am arbitrarily going ~ an order of magnitude lower than that for something as unprecedented as 10K human infections. Maybe should be much lower but hedging for the time being and will allow another couple months of observations. And Sergio: I'm currently at 20% on the question of reported human-to-human transmission of highly pathogenic avian influenza H5N1 globally before 2026. However, this question is only about the US, and is more general about all subtypes of H5. But H5N1 very strongly appears to be the most important subtype to consider in this time period. And, given the current situation in the US with H5N1 human cases derived from exposure to poultry or cattle (with cattle(mammals) being more worrisome), h2h transmission seems quite more likely to arise in North America than elsewhere before 2026. Conditioning on h2h transmission in the US (and also trying to consider, with lower probability, a start in Canada), I want to estimate the chances that it becomes sustained and out of control (in which case, if it starts in Canada, I largely expect it to spread to the US). The (6) past events of probable h2h transmission of avian H5(N1), none of which were sustained, could serve as a base rate, although I'm a bit wary of giving much weight to this precedent, since the last event was quite a while ago (2007), and also because reporting and testing standards may have improved considerably since then (so perhaps they might not have been classified as h2h transmission events if they had occurred more recently). The current situation in the US, and events such as the Canadian teen who got sick with H5N1, do suggest a higher background level of risk than normal (which would be reduced if a vaccine for cattle is licensed soon), but I'm wary of overupdating. Conditioned on sustained h2h transmission, reaching over 10k cases in a few months seems likely, although perhaps very strong monitoring and surveillance could contain the situation in time (at the very least to moderate the growth rate). Trying to combine all these factors somewhat haphazardly, I'm currently at 3.5% for this question. That’s before 2026. What about longer-term? Manifold gives a ~50% chance before 2030; Metaculus uses a more complicated method but it says about 25% chance before 2030. H5N1 may cross to humans, but it could take a while. Superforecaster Juan Cambeiro at The Institute For Progress estimated a 4% chance of a “worse than COVID” H5N1 pandemic in “the next year”, but their estimate was made in 2023, without the benefit of the Metaculus estimates or most of our current knowledge. This feels high now - Metaculus says 5% total for H5N1 pandemic, and most pandemic flus are not worse than COVID. IFP also seem to be expecting a case fatality rate greater than 10%, which I find unlikely for the reasons mentioned above. I trust their estimate less than Metaculus’ current ones. I conclude that the most plausible estimate for the chance of an H5N1 pandemic in the next year is 5%. Interestingly, 5% is about the base rate for pandemic flus per year: five in the past century = one per twenty years = 5% chance per year. Isn’t it surprising that we’re still at the base rate when we can see a dangerous-looking flu virus spreading through the types of animals that have caused pandemic flus in the past? Part of the answer is that we’re not - in addition to the 5% chance of H5N1, we have to add the chance of some other pandemic flu. This probably isn’t 5% on its own; scientists monitor flu strains closely, and they haven’t found any others which are giving off as many red flags as H5N1. Still, something could always come out of left field. Maybe we should add a 2.5% chance of some other strain, for a total of 7.5% chance of a flu pandemic (ie beyond normal seasonal flu) next year. But still, isn’t it surprising that we’re so close to the base rate? One way to think about this: the base rate represents how concerned we should be if there was no epidemiological monitoring at all. In that case, we would estimate a probability distribution across different epidemiological landscapes, most of which contain some concerning-looking flu strains. Since we are doing the epidemiological monitoring, we can collapse that distribution into a single picture: one flu strain, H5N1, is in fact pretty concerning, and other strains mostly aren’t. This is enough to move our prior from 5% to 7.5%, but no more. The forecasters I talked to raised one other point of uncertainty: does the flu work more like a dice roll, or like a bus? Dice rolls are uncorrelated with their predecessors; even if it’s been a hundred rolls since you last rolled a 6, your chance this time is still 1/6. But buses come at fixed intervals; if the buses are hourly, and you haven’t seen a bus in the past 59 minutes, then your chance of seeing a bus in the next minute is very high. It’s been 16 years since the last flu pandemic; these pandemics come (on average) every 20 years. I don’t think anyone has a good sense of how to think about this. But it was 40 years between the Spanish and Hong Kong flus, so the twenty year number is at best a rule of thumb. The 5% number feels very low to me (and, apparently, to the average Manifold forecaster). Isn’t H5N1 spreading to cows and pigs and all sorts of other mammals? Isn’t it in the news all the time? I trust Metaculus a lot, but I agree that this is a surprising update, and I’m taking it on faith rather than feeling it in my bones. What Would The Fatality Rate Be For An H5N1 Pandemic? There are four basic stories you could tell about likely H5N1 mortality. First, maybe mortality would be 50%. The argument here is that official statistics report this mortality rate in the chicken farmers who have been infected with H5N1 so far. Several news sources and even some scientists have raised the specter of a pandemic version of H5N1 pandemic with this same death rate, which could kill a quarter to a third of the world population. THIS IS EXTREMELY FAKE. The official statistics only report fatality rate in the infections we know about. Bird flu is rare, there’s no mass testing, and we only learn that somebody had it if they’re in a hospital and the doctors are worried enough to test for rare conditions. Of Americans who got bird flu in the past year, 0 out of 61 have died. Probably this is mostly because America upped its detection game and is now finding milder cases; we also can’t rule out the virus mutating to become less virulent. Metaculus estimates the current true mortality rate as 1.25%. …but leaves a wide 90% confidence interval, from 0.5% to 7%. Second, maybe mortality would be somewhere around 1.25%. The argument here is that Metaculus uses this as its central estimate of US mortality. But Sentinel discusses some reasons to be skeptical of broad inferences from the US numbers: Scientists have been puzzled by the apparently low H5N1 case fatality rate in humans in the US. They offer a number of hypotheses: “The way in which the virus is being transmitted — along with the amount of virus exposure — is limiting the severity of disease.”
H5N1 may cross to humans, but it could take a while. Superforecaster Juan Cambeiro at The Institute For Progress estimated a 4% chance of a “worse than COVID” H5N1 pandemic in “the next year”, but their estimate was made in 2023, without the benefit of the Metaculus estimates or most of our current knowledge. This feels high now - Metaculus says 5% total for H5N1 pandemic, and most pandemic flus are not worse than COVID. IFP also seem to be expecting a case fatality rate greater than 10%, which I find unlikely for the reasons mentioned above. I trust their estimate less than Metaculus’ current ones. I conclude that the most plausible estimate for the chance of an H5N1 pandemic in the next year is 5%. Interestingly, 5% is about the base rate for pandemic flus per year: five in the past century = one per twenty years = 5% chance per year. Isn’t it surprising that we’re still at the base rate when we can see a dangerous-looking flu virus spreading through the types of animals that have caused pandemic flus in the past? Part of the answer is that we’re not - in addition to the 5% chance of H5N1, we have to add the chance of some other pandemic flu. This probably isn’t 5% on its own; scientists monitor flu strains closely, and they haven’t found any others which are giving off as many red flags as H5N1. Still, something could always come out of left field. Maybe we should add a 2.5% chance of some other strain, for a total of 7.5% chance of a flu pandemic (ie beyond normal seasonal flu) next year. But still, isn’t it surprising that we’re so close to the base rate? One way to think about this: the base rate represents how concerned we should be if there was no epidemiological monitoring at all. In that case, we would estimate a probability distribution across different epidemiological landscapes, most of which contain some concerning-looking flu strains. Since we are doing the epidemiological monitoring, we can collapse that distribution into a single picture: one flu strain, H5N1, is in fact pretty concerning, and other strains mostly aren’t. This is enough to move our prior from 5% to 7.5%, but no more. The forecasters I talked to raised one other point of uncertainty: does the flu work more like a dice roll, or like a bus? Dice rolls are uncorrelated with their predecessors; even if it’s been a hundred rolls since you last rolled a 6, your chance this time is still 1/6. But buses come at fixed intervals; if the buses are hourly, and you haven’t seen a bus in the past 59 minutes, then your chance of seeing a bus in the next minute is very high. It’s been 16 years since the last flu pandemic; these pandemics come (on average) every 20 years. I don’t think anyone has a good sense of how to think about this. But it was 40 years between the Spanish and Hong Kong flus, so the twenty year number is at best a rule of thumb. The 5% number feels very low to me (and, apparently, to the average Manifold forecaster). Isn’t H5N1 spreading to cows and pigs and all sorts of other mammals? Isn’t it in the news all the time? I trust Metaculus a lot, but I agree that this is a surprising update, and I’m taking it on faith rather than feeling it in my bones. What Would The Fatality Rate Be For An H5N1 Pandemic? There are four basic stories you could tell about likely H5N1 mortality. First, maybe mortality would be 50%. The argument here is that official statistics report this mortality rate in the chicken farmers who have been infected with H5N1 so far. Several news sources and even some scientists have raised the specter of a pandemic version of H5N1 pandemic with this same death rate, which could kill a quarter to a third of the world population. THIS IS EXTREMELY FAKE. The official statistics only report fatality rate in the infections we know about. Bird flu is rare, there’s no mass testing, and we only learn that somebody had it if they’re in a hospital and the doctors are worried enough to test for rare conditions. Of Americans who got bird flu in the past year, 0 out of 61 have died. Probably this is mostly because America upped its detection game and is now finding milder cases; we also can’t rule out the virus mutating to become less virulent. Metaculus estimates the current true mortality rate as 1.25%. …but leaves a wide 90% confidence interval, from 0.5% to 7%. Second, maybe mortality would be somewhere around 1.25%. The argument here is that Metaculus uses this as its central estimate of US mortality. But Sentinel discusses some reasons to be skeptical of broad inferences from the US numbers: Scientists have been puzzled by the apparently low H5N1 case fatality rate in humans in the US. They offer a number of hypotheses: “The way in which the virus is being transmitted — along with the amount of virus exposure — is limiting the severity of disease.”
This is normally when I would announce the winners of the 2024 forecasting contest, but there are some complications and Metaculus has asked me to wait until they get sorted out.
But time doesn’t wait, and we have to get started on the new year’s forecasting contest to make sure there’s enough time for events to happen or not. That means the 2025 contest is now open! This year I had hoped to arrange some kind of fair comparison with Polymarket so I could prove my thesis that it usually underperforms Metaculus - but with all the excitement of the election and the feds harassing Shayne we never got around to making it work.
This year I had hoped to arrange some kind of fair comparison with Polymarket so I could prove my thesis that it usually underperforms Metaculus - but with all the excitement of the election and the feds harassing Shayne we never got around to making it work.
4: Some straggler Metaculus/ACX forecasting winners who I didn’t get to mention last week:
43: Just as there are stock indexes like NASDAQ or Shanghai Composite to easily track questions like “how is tech doing?” or “how is China doing?”, Metaculus is experimenting with prediction market indices. I’m skeptical of their flagship example - “how ready are we for AGI?” - which seems to be a weird mishmash of questions about how good AI capabilities are, how well technical alignment is going, and stuff like UBI. Split between recommending better curation vs. worse curation (eg something more like NASDAQ that includes so many thousands of stocks that it can’t help but track underlying trends).
This Metaculus question looks like the Manifold market, but without the big drop at the end. Are the Manifolders overreacting, or are the Metaculans asleep at the wheel?
No update this time, but from last cycle: “Nathan Young has since gotten much larger grants to do much more exciting forecasting work, particularly a platform for generating forecasting questions. With my approval, he’s put my grant on the back burner while he works on other things, but he still hopes to get some questions up on Manifold or Metaculus sometime.”
Meanwhile, tech companies with ten times as much money pretend that they’re cool and playful when their HQ has some rounded edges and a set of colored cubes in front. Do better! 22: Effective altruists have been funding teams working on lab-grown meat for almost a decade now. Around 2020, they hired some experts to double-check that this was possible in principle, and the experts wrote scathing analyses saying it was cost-ineffective by so many orders of magnitude that it was basically a pipe dream. Reactions were mixed, but a lot of us beat ourselves up and vowed to be less gullible next time. But now a new report comes out arguing that the previous reports were wrong, that lab-grown meat production is going much better than the earlier reports thought possible, and it’s more or less cost-effective already for the simplest products! Again, mixed reactions, and although some of the numbers are indisputable the analysis itself this is by a VC firm with lab-based meat investments. Here are some related Metaculus questions. 23: Ozy, citing Stutzman et al: “Afghanistan after the American withdrawal has the lowest life satisfaction rate ever recorded. Two-thirds of respondents rate their life satisfaction below 2, which is generally considered to be the point at which a life is no longer worth living. Life satisfaction dropped significantly after the withdrawal of American troops. Women, people in rural areas, and the poor were particularly negatively affected.” 24: Lencapavir is dubbed a “miracle drug” for AIDS; a single dose protects against infection for six months. Unclear how this interacts with PEPFAR cuts; if PEPFAR still existed it would be a big boost to its efficacy; now maybe this might be part of a strategy to tread water? 25: Did you know: when people first started making artificial ice in the 1850s, there was a backlash from people who thought it was gross and dystopian and that people should insist on natural ice for their iceboxes. From Pessimists’ Archive, which goes on to draw an analogy to lab-grown meat, etc (h/t Isaac King on X). 26: From Peter Hague (on X) and commenter Phaethon: why did so many Anglosphere countries see immigration spikes in 2021? Each of these has their own local story. In Britain, it’s the paradoxical effects of Brexit. In the US, it’s Joe Biden being soft on immigration. And so on - but should we be looking for some deeper cause that explains the overall phenomenon? A commenter suggests “a way to soak up all the inflation from the COVID money printing”, but I can’t tell if that even makes sense. Still, should something something COVID be a leading hypothesis? 27: Jesse Singal vs. Mark Stern on the Skrmetti Supreme Court case that failed to overturn Tennessee’s ban on gender medicine. US law bans sex discrimination, so pro-transgender advocates argued that, since doctors often prescribe eg estrogen to biological women, it was sex discrimination to ban prescribing it to biological men. Tennessee’s anti-transgender argument was that they weren’t discriminating by sex, they were discriminating by diagnosis (estrogen for eg hot flashes, vs. estrogen for gender transition). There is some subtlety here (if a biological man grows breasts because of some hormone imbalance, doctors might give him testosterone to counteract it, and this seems sort of like giving biological women testosterone to make them look less like women), but these are still sort of different diagnoses (gynecomastia vs. gender dysphoria) and Tennessee said you can still think of it as diagnostic discrimination rather than sex discrimination. This makes sense, except that the standards around sex discrimination are very strict and sort of box the court in here. And in a fit of wokeness, the 2020 court (including some of the conservative justices hearing this case) applied these standards very strictly and ruled that discriminating against gays was a form of sex discrimination (since if women can date men, it’s sex discrimination if men can’t also date men), and this is obviously the same argument. Now that wokeness is less popular, the court wants to rule against transgender, but it can’t help tripping over its previous ruling and giving some kind of unprincipled confusing non-opinion. 28: Contra compelling anecdotes, only ~5% of people raised very religious end up atheist later in life (X). Most people are about as religious as their parents; most exceptions are only slightly less religious, and most families that secularize do it over several generations. Note: percentages are of total, not of each row! 29: Related: social science team proposes a three-stage model of secularization: decreased public ritual participation → decreased personal importance → decreased identification, presents apparently confirmatory data. If true, would be somewhat inconsistent with intellectual models (eg people learn about evolution and start doubting the Bible) and more consistent with institutional models (eg the government provides welfare so people no longer need to be part of a tight-knit church). 30: Navigating LLMs’ spiky intelligence profile is a constant source of delight; in any given area, it seems like almost a random draw whether they will be completely transformative or totally useless. Now Ethan Strauss reports that they are, for some reason, extraordinarily effective at teaching people golf. “I am predicting the Golf Revolution, or perhaps decline, if your perspective is that optimization tends to ruin hobbies. A sport for obsessives has been gifted the ideal tool for refinement.” 31: Claim (via nxthompson on X): “In a huge survey of young kids about phones and technology, they all say they want to be out playing in the real world. But parents don't let them out unsupervised. So they're stuck on their phones.” Interesting, but I’m nervous about social desirability bias - how many adults would say on a survey that they would rather be on their phones than playing with friends? But adults do have this choice and mostly go with the phones. 32: Steven Adler on AI psychosis. He tries to analyze ER admissions data for psychosis and finds no change. I don’t think anyone reasonable expected this to be a large enough effect to show up in ER admissions data, but there are lots of unreasonable people so I appreciate his effort. He thinks AI companies might have better data on this, and encourages them to release it. 33: Cuartetera was the greatest polo horse ever. Polo players responded in a very practical way: they cloned her, dozens of times (and it worked; the clones are also excellent). Now there is a lawsuit as different polo teams fight to get their hands on Cuartetera clones. What is the equilibrium? If the outsiders get their hands on the genetic material, do we see a world where every polo horse is a Cuartetera clone? How much is lost if nobody ever tries to breed a polo horse better than Cuartetera (since the economics might not check out if the odds of success for any given foal is too low)? H/T Gwern and Siberian Fox (on X). 34: Claim: as of 2013, India’s Agarwal caste, who make up less than 1% of the population, got 40% of the e-commerce funding. 35: Owlposting: What Happened To Pathology AI Companies? Pathology is a medical specialty. A typical task involves looking at a microscope slide full of cells and trying to determine if any of them are cancerous. This seems like a good match for AI - and for years, studies have been showing that in fact AI can equal human experts. So why isn’t it being used more? The author’s three answers: first, slide scanning is expensive and clunky, and you can’t apply AI to a slide until you digitize it. Second, it’s hard to figure out a business plan where this saves someone money and doesn’t step on the toes of big companies that can outcompete anyone they don’t like. Third, pathologists use the context of a patient’s entire clinical history when they interpret a slide, and AIs that can’t do that (either because of technical limitations or legal/privacy limitations) are at a disadvantage even if their skills specifically relating to slide-reading are better. 36: Noahpinion: Will Data Centers Crash The Economy? Suppose that AI is a bubble, either permanently (because the technology isn’t really transformative) or temporarily (because it can’t transform things quickly enough to keep up with all the dumb money pouring into it). Will the sudden write-off of data centers lead to a broader economic collapse? In 2001, the dot-com bubble harmed the tech sector, but didn’t take the rest of the economy down with it; in 2008, the subprime mortgage bubble did take the rest of the economy down with it, because it damaged banks that the whole economy relied on. The optimistic case for AI is that data center spending is mostly coming from big companies like Google and Meta that can absorb a lot of loss. The pessimistic case is that some of the money is coming from private credit, a new-ish form of finance which hasn’t really been stress-tested and whose failure modes are still poorly understood. Noah’s final verdict: the stage isn’t obviously set for a crisis yet, but there’s the potential to get there and we should consider acting (how?) early. 37: The latest Twitter talking point is that universal hepatitis B vaccination at birth is “woke”: Hep B is (aside from mother-to-child transmission) often sexually transmitted, slutty women’s children are more likely to have Hep B, so perhaps giving the vaccine to everyone (instead of testing and only giving to the children of women who test positive) is an attempt to spare slutty women the embarrassment of getting a positive test. Ruxandra Teslo provides the counterargument - Hep B tests take a while, the medical system is fragmented, and any attempt to test people and then give the vaccine inevitably leads to many positive tests falling through the cracks. Vaccinating at birth is easy and hard to screw up, the vaccine has no known side effects, and empirically child Hepatitis B rates go down (by as much as 2/3!) when countries switch from test-and-vaccinate to universal vaccination. This benefits everyone - even people who never have unprotected sex and always follow up on their medical tests - because toddlers in daycare exchange saliva copiously, and if your toddler exchanges saliva with a Hep B positive toddler they could get the disease. A funny Twitter interaction was seeing Republicans in Congress hop on the anti-slut anti-vaccination bandwagon - except for Senator Bill Cassidy (R-Louisiana), who happens to be a liver doctor, and who is still fighting the good fight. I am always nervous when a good person who I like starts engaging on Twitter, since it elevates the discourse there but also gradually turns their brain into mush - but Ruxandra has made the leap and is doing a great job not just on bio related topics but also (for example) countering Curtis Yarvin on the history of her native Romania. 38: The response to GPT-5 was confusing; most specific people who reviewed it said they were impressed (Ethan Mollick, Tyler Cowen, Nabeel Qureshi, Taelin), it performed as expected on formal benchmarks, but the overall vibes declared it a big failure. Peter Wildeford speculated that maybe there was some kind of sinister pay-to-play early access bias involved. Zvi went the other way, calling it a “reverse DeepSeek moment” (insofar as DeepSeek was a pretty average model that got glowing praise.) In the end, I agree with Peter that this was mostly a branding issue. o3 was a genuinely revolutionary model; if OpenAI had called it “GPT-5”, it would have met expectations. Instead, they called it “o3”, and called a minor incremental update a few months later “GPT-5”. Then people got mad that the exciting-sounding “GPT-5” was merely an incremental update. A secondary issue was that the router wasn’t very good, and so many queries got routed to a small version without thinking mode that was if anything a downgrade from o3. I think this tweet by Shakeel perfectly encapsulates the essence of GPT discourse in two sentences: …but maybe it’s worth asking why GPT-5 isn’t bigger than o3. Was 4.5 a failed attempt at scaling? Did it fail in a way that sort of back-handedly justifies the “lost steam” take? Does the answer depend on distinctions between pre-training scaling, post-training scaling, etc? How? 39: This month in etymology: did you know that “oy vey” is a “fully Germanic phrase” which is cognate with English “oh woe!” (h/t Wylfcen on X) 40: mRNA shows promise to be a game-changing treatment for cancer, but RFK is trying to halt research. But so far he can only starve it of money, not ban it, and the funding gap is only $500 million. Will there be enough philanthropic billionaires and private foundations to step up? Zvi points out that although there is usually a game of chicken where foundations are hesitant to touch something the government cancelled lest the government decide it can cancel everything and hope philanthropists pick up the bill, in this case there are no game theory considerations - RFK is halting it because he genuinely wants it halted, and they are thwarting him rather than playing into his hands. The only problem is that $500M is a lot of money for the private sector; a few foundations could technically afford it, but not many could afford it comfortably and still have money left over for the next few crises of this magnitude. I hope someone is trying to organize a coalition. 41: AI fantasy flash fiction Turing test. Eight stories about demons, four by famous fantasy authors, four by ChatGPT. After 3000 votes, AI wins: humans can't tell the difference and slightly prefer the AI stories. My own score was only 75%. But I will say that I thought Mark Lawrence's was obviously the best, I was ~100% sure it was human, and it convinced me that regardless of the official results it's still possible to write flash fiction that an AI obviously can't do. 42: “SignPro” offers customized “In This House We Believe” signs, try not to use this for evil. 43: China think tank assessment of how in control Xi is: still very in control, maybe not infinitely in control. 44: Related - did you know (h/t xlr8harder) that if you ask AI to write a science fiction story, it will very often name the protagonist “Elara Voss” (or some very close variant like Elena Voss), and this remains true across various models and versions? Related: Chelsea Voss of OpenAI is having a baby and has the opportunity to do the funniest thing. 45: “Hector (cloud) is a cumulonimbus thundercloud cluster that forms regularly nearly every afternoon on the Tiwi Islands in the Northern Territory of Australia…[he is sometimes called] Hector the Convector”. 46: British allergy sufferers who want to know the ingredients of things demand that British cosmetics stop listing their ingredients in Latin. “For example, sweet almond oil is Prunus Amygdalus Dulcis, peanut oil is Arachis Hypogaea, and wheat germ extract is Triticum Vulgare.” 47: Text-based RPG about being an NYT journalist at the Manifest prediction market conference. I make a brief appearance. 48: Study uses supposedly-random variation in doctor assignments to test whether the marginal mental health commitment is good or bad for patients, finds that it is quite bad. Freddie de Boer is violently skeptical (maybe literally so?) and makes some good points about how a single quasi-experimental study is never absolute proof. But I don’t think he quite justifies his opinion that the paper was irresponsible and should never have been published; it’s just a normal quasi-experimental study that we should nod and say “huh” at but not overweight as the culmination of all possible research that overcomes all possible priors. My prior is that the marginal commitment is pretty useless (many commitments are just “well, since this person arrived at our ED for some reason, it would look bad from a medico-legal perspective to just let them go, so let’s keep them a few days to evaluate” - and yeah, you should be upset about this) but I’m still surprised by how many outright negative (as opposed to zero) effects the researchers found. The strongest argument for negative effects is that it will make some people miss work and maybe lose their job. But this study found that commitment ~doubles the risk of near-term suicide (admittedly only from 1% to 2%), which would have been outside my confidence intervals for how bad it could be. I suspect confounding, but only on general principle, and I wouldn’t be too surprised either way. 49: This tweet is probably bait, but I found it a thought-provoking question: I think there’s a boring answer, where the law is more complex than just a single number and whatever kind of weird trafficking Epstein was doing is worse than whatever normal relationships these European laws are permitting. But assuming that there’s a substantive difference even after taking that into account, I think my answer is something like - we’ve got to divide kids from adults at some age, there’s a range of reasonable possible ages, we shouldn’t be too mad at other societies that choose different dividing lines within that range - but having decided upon the age, we’ve got to stick with it and take it seriously (in the sense of penalizing/shaming people who break it). This is more culturally relativist than I expected to find myself being, so good job to Richard for highlighting the apparent paradox. 50: Dilan Esper describes his experience as one of Hulk Hogan’s attorneys in the Gawker lawsuit (X). Parts I found interesting: none of the lawyers knew Thiel was funding the lawsuit; Gawker probably could have won if they had been slightly competent but kept "shooting themselves in the foot"; and Gawker probably could have won if they had just pixelated the private parts in the video. 51: Amazing concept and poems (link on X): I tried to see if AI could do this, and it did something that technically met the requirements but had zero artistic merit - using a lot of words like “nowhere” and “outside” in one, then separating them out to “no where” and “out side” in the other. I didn’t invest much energy in creating a clever prompt telling it not to do that, so feel free to report if you get better success. 52: New study claims consultants are actually good, at least for profits: "We find positive effects on labor productivity of 3.6% over five years, driven by modest employment reductions alongside stable or growing revenue" 53: A Polish team tries to test Peter Turchin’s equations for predicting political unrest on recent Polish history, has to make some changes but claims mostly positive results. 54: New big multi-author Substack, The Argument, trying to be a sort of center-left version of the model pioneered by The Free Press and other high-production-value ideological Substack properties. Excited to see Kelsey Piper is involved, and she starts off strong with a post on the latest round of First World basic income studies, which find few positive effects. This is surprising, because recipients didn’t waste the money on alcohol or gambling or anything - they paid down debt and got useful goods. Still, it didn’t even affect things that should have been obvious, like stress level. It’s not even clear that amounts of money large enough to help with rent made homeless people more likely to get houses! Matt Bruenig criticizes the article, accusing Kelsey’s studies of being downstream of Perry Preschool style dreams that exactly the right welfare program will have massively compounding effects that cut poverty out at the root and turn everyone into elite human capital; he thinks giving people money won’t do this, but it will increase equality and give the poor better lives. I assume he’s not a strong hereditarian, but his argument makes even more sense from that perspective, and I’ve certainly criticized dumb outcome measures like infant brain waves which we have only tenuous reasons to think are related to anything we care about. But Kelsey reasonably responds that the outcome measures she’s talking about include stress level and life satisfaction. To defuse this critique, Bruenig either has to argue that our construct “life satisfaction” doesn’t really measure whether someone’s life is satisfactory, or else claim that giving poor people satisfactory lives isn’t really what we’re going for - which I think would require more explanation on his part. There’s some further (impressively acrimonious) debate on X, but I don’t see anything that addresses my core concern. GiveDirectly, a charity involved in basic income experiments, has a presponse here; they say that some studies are positive, and that the ones that aren’t might have tried too little cash to matter, or been confounded by COVID making everything worse. They also point out that basic income is harder to study than traditional programs like giving people housing, because if you’re giving housing you can measure housing-related outcomes directly and have a pretty good chance of getting enough statistical power to find them, but since everyone spends cash on different things, the positive effects might be scattered across many different outcomes (and therefore too small to reach significance on each). Everyone involved in this debate wants to emphasize that the poor results are for First World studies only, and that studies continue to show large benefits to giving cash in the developing world. 55: Related: I was less impressed by The Argument’s first foray into housing policy, which follows an all-too-familiar pattern: Some people say they don’t like noise and disorder and try to make rules against it in their apartments.
4: Metaculus is gearing up for another yearly forecasting contest, and looking for ideas for questions. You can see this year’s question set here - for example, “Will there be a ceasefire in the Russia-Ukraine war by the end of 2025?”. I’ll post an Open Thread comment below where you can list your ideas and someone from Metaculus will read them.
ACX has been co-running a forecasting contest with Metaculus for the past few years. Lately the “co-running” has drifted towards them doing all the work and giving me credit, but that’s how I like it! Last year’s contest included more than 4500 forecasters predicting on 33 questions covering US politics, international events, AI, and more.
They’re preparing for this year’s contest, and currently looking for interesting questions. These could be any objective outcome that might or might not happen in 2026, whose answer will be known by the end of the year. Not “Will Congress do a good job?”, but “Will Congress’ approval rating be above 40% on December 1, 2026?”. Or, even better, “Will Congress’ approval rating be above 40% according to the first NYT Congressional Approval Tracker update to be published after December 1, 2026?”. Please share ideas for 2026 forecast questions here. The top ten question contributors will win prizes from $150 to $700. You can see examples of last year’s questions here (click on each one for more details).
This year’s contest will also include AI bots, who will compete against the humans and one another for prizes of their own. To learn more about building a Metaculus forecasting bot, see here.
This year’s prediction contest is live on Metaculus. They write:
You are welcome to create a bot account to forecast and participate in addition to your regular Metaculus account. Create a bot account and get support building a bot here.
To participate in the tournament or learn more, go to Metaculus.
…this market is about whether Greenland or a meaningful portion of it becomes part of America, not about minor acquisitions like a single building or small plot of land. Here’s a pretty crazy Metaculus question - the resolution criteria specify it’s not about scammers using AIs to blackmail their victims, it’s about an AI independently developing and executing a blackmail plan without human prompting or support. Sometime like this has already happened in toy experiments conducted by safety teams when all the conditions were exactly right, but forecasters seem confident it will happen in real life sometime in the next three years. I don’t understand what’s going on here, and I’m going to recheck this question after signal-boosting it to see if it changes.
Here’s a pretty crazy Metaculus question - the resolution criteria specify it’s not about scammers using AIs to blackmail their victims, it’s about an AI independently developing and executing a blackmail plan without human prompting or support. Sometime like this has already happened in toy experiments conducted by safety teams when all the conditions were exactly right, but forecasters seem confident it will happen in real life sometime in the next three years. I don’t understand what’s going on here, and I’m going to recheck this question after signal-boosting it to see if it changes.
3: You have five days left to submit your predictions in the ACX/Metaculus 2026 Prediction Contest.
1: Congratulations to the winners of last year’s ACX/Metaculus Forecasting Contest, especially:
All of these winners got approximately $100. And thanks again to Metaculus for making this happen. You can follow along with the 2026 contest here, although it’s too late to participate.
America will hold midterm elections on November 3. Incumbents always have a hard time during midterms, and Trump’s approval rating is low, so it’s expected to be a good year for Democrats. Prediction markets expect them to win at least the House (80% chance) and maybe even the Senate (20 - 40% chance). This simple story is complicated by two different Republican attempts to change voting law.
This seems like a good sign that there won’t be mass voter disenfranchisement. But Metaculus expects a 25% chance that martial law is declared?!
But Metaculus expects a 25% chance that martial law is declared?!