Philip Tetlock
Article
Philip Tetlock is a recurring person in the Astral Codex Ten archive, appearing 13 times across 13 issues between March 15, 2021 and October 13, 2025. The archive places it in contexts such as “Lots of people worked on this (especially Philip Tetlock)”; “Philip Tetlock wasn’t writing all those books and tweets to self-aggrandize”; “Philip Tetlock (one of the signatories on the pro-prediction market letter)“. It most often appears alongside Metaculus, Manifold, Good Judgment Project.
Metadata
- Category: People
- Mention count: 13
- Issue count: 13
- First seen: March 15, 2021
- Last seen: October 13, 2025
Appears In
- Mantic Monday: Mantic Matt Y
- 15
- The Passage Of Polymarket
- Ukraine Warcasting
- Open Thread 222
- From The Mailbag
- Open Thread 255
- Prediction Market FAQ
- Who Predicted 2022?
- Berkeley Meetup On Sunday, Special Guest Philip Tetlock
- The Extinction Tournament
- Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
- ACX Grants Results 2025
Related Pages
-
- Metaculus (7 shared issues)
-
- Manifold (6 shared issues)
-
- Good Judgment Project (5 shared issues)
-
- Polymarket (5 shared issues)
-
- Substack (5 shared issues)
-
- China (4 shared issues)
-
- Forecasting Research Institute (4 shared issues)
-
- Less Wrong (4 shared issues)
-
- America (3 shared issues)
-
- Biden (3 shared issues)
-
- COVID (3 shared issues)
-
- Discord (3 shared issues)
External Links
Source Context
Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.
You don't want a rule that if a pundit ever gets anything wrong, we stop trusting them forever. Warren Buffett gets some things wrong, Zeynep Tufecki gets some things wrong, even Nostradamus would have gotten some things wrong if he'd said anything clearly enough to pin down what he meant. The best we can hope for is people with a good win-loss record. But how do you measure win-loss record? Lots of people worked on this (especially Philip Tetlock) and we ended up with the kind of probabilistic predictions a lot of people use now.
They admit that you’ve got to be really careful with this. If there are a lot of low-quality forecasters in the tournament, then since high-quality forecasters will accurately predict that low-quality forecasters will give a low-quality answer, everyone will converge on the low-quality answer. This paper is by Good Judgment Project who have just spent years identifying a population of superforecasters, so their plan is to use these people, who are all great, who all know they’re all great, who all know they all know they’re all great, etc. Philip Tetlock wasn’t writing all those books and tweets to self-aggrandize, he was writing them to create common knowledge!
Now there’s a paper, by Karger, Monrad, Mellers, and Tetlock - Reciprocal Scoring: A Method For Forecasting Unanswerable Questions.
The paper continues to an empirical study. The authors ran a forecasting tournament on various easily-checkable things like COVID vaccinations, commodity prices, and the weather. Forecasters were separated into three conditions: reciprocal scoring, traditional scoring (ie Brier score + incentives), and no scoring. The no scoring team did worse than the normal scoring team, which is the basic insight Tetlock et al have found again and again: scored and incentivized forecasts are better than random people pontificating on things. But more relevantly for this paper, the reciprocal scoring and traditional scoring did basically the same!
Throughout these bad decisions, intelligence analysts and national security advisors were begging the government to come up with some kind of good forecasting infrastructure. By the early 2000s, many of them had settled on prediction markets as the most promising opportunity. In 2008, twenty-two prominent economists including five Nobel Prize winners wrote an editorial begging the CFTC to legalize prediction markets; the CFTC refused. In 2010, Philip Tetlock (one of the signatories on the pro-prediction market letter) did some pretty basic forecasting work, not even prediction market level, and proved that he could significantly outperform top analysts at the CIA with access to classified information. The government refused to hire him or use any of his methods, and continued shutting down new prediction markets as they arose.
Part of the point of turning forecasting into a formal science is Philip Tetlock’s observation that pundits do such a bad job. They don’t seem to be right more often than chance, and even when they’re confidently wrong everyone keeps listening to them.
As I’ve said before, “trust the experts” and “don’t test the experts” are both bad heuristics (see the Substack article and its NYT version). Talking to anti-interventionists about the potential for a Russian invasion throughout January and February, I was struck by how often they would ignore my arguments and instead say things like “this guy has always been right, so I’m going to trust him.” That might be an adequate strategy when you have nothing else to go on, but here we had a lot of evidence relevant to what was likely to happen, including satellite imagery of military movements and reports on the state of diplomatic negotiations. I’ve found one of the most insightful analysts throughout the crisis to be Dmitri Alperovitch. But when I shared one of his tweets with a very intelligent friend of mine, his response was basically “his profile says that he is affiliated with Crowdstrike, which was involved in the Russiagate hoax. How can we believe anything he says?” I can understand the reaction, but it seems like this kind of thinking led many intelligent observers astray.
1: Philip Tetlock (of Superforecasting fame) and his team are running a new tournament that combines forecasting and persuasion. They want people who are familiar with x-risk and willing to spend ~3 hours a week for a few months thinking/talking about it. People selected to participate will get $2,000 to $10,000 (and some people can win $2,000 as a prize just for applying). See here for more information and to apply.
Inline links: See here for more information and to apply
I rarely have specific things to talk about. When I do, there are better people to talk about them. If you want to hear about AI risk, interview Eliezer Yudkowsky; if you want to hear about forecasting, interview Philip Tetlock; if you want to hear about psychopharmacology, interview Robin Carhart-Harris. All of these people have spent their lives thinking about their respective issues and will have much better things to say than I will. Every so often, I do learn something new and interesting on some topic, and then I will write a blog post about it. If I haven’t written a blog post about a topic, I probably don’t know new and interesting things about it. If you ask me about some political event or medication or philosopher or whatever I haven’t written a blog post on, my most likely answer will be “Sorry, I haven’t learned anything that makes me deviate from the consensus opinion on this yet”. If you ask me about one that I have written a blog post on, I’ll just repeat what I said in the blog post.
1: Philip Tetlock and others have founded the Forecasting Research Institute to study prediction, including prediction of existential risk. They are looking to hire “research and data analysts, content editors, and RAs”, all roles remote, see here for details.
Inline links: Forecasting Research Institute, see here for details
If we try this plan, then looking back on it ten years from now, will we agree it was a mistake? Prediction markets give us a way to get accurate and canonical answers to questions like these, and to short circuit the usual discussions about how biased different information sources are. See below for some clever, more exotic ways we can use prediction markets. 4. What are the most common objections to prediction markets? These are various objections, some wrongheaded, some true but nonfatal. There are many of them, making this section very long - you might want to skip over any objections you’re not worried about. 4.1: Would prediction markets be ruined by insider trading? That is, suppose there is a market on whether President Biden will resign before the end of his term. President Biden has special knowledge of this, so he could bet on the true outcome and make a lot of money unfairly. He could even change his behavior (eg resign at an unexpected time) just to make more money. Isn’t this unfair? One answer is that normal markets (eg the stock market) face these same problems, but manage them by making insider trading illegal. These laws don’t always work perfectly, but they work well enough that most people are happy to buy stocks. Another answer is that, while this is bad for other investors, it’s not bad for the accuracy of prediction markets, or their use in creating unbiased social consensuses. In fact, knowing that President Biden is insider-trading on a “Will President Biden resign?” prediction market should only increase your confidence in it getting the right answer! This is slightly too rosy, because if insider trading is bad enough for other investors, they might just not trade. This would be a partial effect: investors would be willing to overcome their fear for a big enough payday, meaning that concerns about insider trading probably would increase the likelihood of persistent small mispricings while still not allowing bigger ones (with the exact size depending on how frequent the insider trading was). It’s unclear whether this negative effect would be bigger or smaller than the positive effect from insiders having more information, so in different situations the market might end up either more or less accurate. Overall, economists are split on whether insider trading makes markets more or less accurate. Commodities markets don’t really have insider trading laws right now, and seem to be about as accurate as anything else. I hope prediction markets will experiment with different insider trading rules, and the ones that best satisfy all participants and create the most accurate results will win out. If for some reason this doesn’t work, I don’t expect it to make too much difference either way. 4.2: Would prediction markets encourage harmful or illegal activities? What about the risk of insider trading by committing harmful / illegal acts? That is, could President Biden’s doctor decide to poison him, then make money when he has to resign due to ill health? I think the strongest evidence against is that this basically never happens in stock markets. Tesla stock would plummet if Elon Musk died or resigned, but nobody realistically worries that Musk’s doctor will short Tesla and poison him. Lots of corporations’ stocks would sink to zero if you burned down their offices and factories, but nobody shorts them and then commits arson. Probably this is because there are laws against doing harmful and illegal things, and people have decided that stock market gains aren’t worth breaking the law and getting punished. Since prediction markets have only a tiny fraction of the amount of money that stock markets do, probably people won’t consider it worthwhile to commit harmful actions to manipulate them either. If you were going to murder someone to profit off a market, who would you rather kill: a US politician (the PredictIt market on the presidential election has a volume of about $600,000)? Or a Fortune 500 CEO (whose companies might have market caps in the hundreds of billions)? 4.2.1: What about prediction markets in very specific harmful or illegal activities? I guess if you created a market in “Will someone burn down the 7-11 on Main Street tomorrow at 3:32 AM?”, then bet a lot of money, then did it, that would be bad. I think realistically nobody would bet against you on that. But probably prediction markets should avoid hosting markets on these very specific bad things, just to make sure. 4.3: Would prediction markets give rich people more power? That is, suppose we used prediction markets to assess socially important questions like “will the climate change by such-and-such a number of degrees by 2030?” It would be bad if rich people could manipulate our social consensus on this. But you move prediction markets by buying shares, and rich people can afford more shares than poor people. So doesn’t this mean that rich people can manipulate how concerned we are by global warming? No. See 3.2 for the general reasons why it’s very difficult or impossible to successfully manipulate a prediction market. These reasons apply to rich people too. Suppose a rich person spent $100 million to buy NO shares in “will the climate be warmer in 2030 than today?”, pushing the market’s implicit chance of global warming down to 1%. That means if there is global warming, you could multiply your money by 100x by buying YES. I would immediately invest $10,000 in this market, so that I could get $1 million back in 2030 and retire rich. My $10,000 isn’t going to be enough to fully move this market all the way back - we already said the rich person spent $100 million manipulating it. But “you can get a free $1 million quickly with no downside at an evil rich person’s expense by correcting an obvious misconception about global warming” sounds like the sort of thing that could make it to the front page of Reddit (to put it lightly). I think more than enough people would learn about this to fully correct the mispricing. Is there any amount of money that could successfully manipulate a market? I think the answer is that you need to have more money than the sum total owned by everybody else in the world who wants to make $1 million quick. And at the limit, there’s always Goldman Sachs - who watch financial markets very closely, definitely want to make $1 million quick, and have a lot of money. So I think the most honest answer to this objection is: if you are an evil rich person reading this FAQ, then it will definitely work for you. Please sink $100 million into reducing a prediction market’s chance of global warming to 1%. And make sure you tell me first, so that I can fully marvel at your evil genius. This will work great for you and nothing will possibly go wrong. 4.3.1: But wouldn’t the subtle biases of rich people (which they might genuinely believe) still affect the market more, since they have more money? No. See 3.3 for the general reasons why we should expect prediction markets to be free from subtle biases which people genuinely believe. These reasons apply to rich people too. Suppose rich people have subtle biases which make them wrong more often than poor people. And suppose rich people (wrongly) believe global warming is 75% likely, but poor people (correctly) believe it’s 99% likely. This just reduces to the Nate Silver situation earlier, with poor people playing Nate Silver. The aggregated opinion of poor people is “an expert” which is right more often than the markets. It’s easy for someone to notice this and get rich quick (in expectation) by betting on what poor people think. Since lots of people can easily notice this and want to get rich quick, eventually they will correct the mispricing. Even if rich people have so much more money than poor people that no group of poor people, however large, can ever correct a rich person mispricing, eventually some smart rich person will hit upon this strategy themselves. If no individual rich person does it, Goldman Sachs will definitely do it. 4.3.1.1: What if both rich people and poor people have biases, and neither one is consistently more right than the other? Won’t the market still reflect rich people’s biases rather than poor people’s? Not if it’s possible for anybody to notice these biases and correct for them. Treating the aggregate opinion of poor people as an expert was just one example. If the winning strategy is something like “trust rich people on financial questions, poor people on environmental questions, and the point exactly halfway between them on social questions”, then whoever discovers that strategy can get rich quick. The more often people use prediction markets, the easier it should be to detect strategies like these. 4.4: Aren’t prediction markets worse than superforecasting? “Superforecasting” refers to a variety of forecasting methods similar to those pioneered by Philip Tetlock and the Good Judgment Project. Typically, they would do something like: Ask many smart people to give probabilistic answers to a very well-specified question
Inline links: economists are split
Note truncated vertical axis As mentioned above: guessing 50% corresponds to a score of 40.2. This would have put you in the eleventh percentile (yes, 11% of participants did worse than chance). Philip Tetlock and his team have identified “superforecasters” - people who seem to do surprisingly well at prediction tasks, again and again. Some of Tetlock’s picks kindly agreed to participate in this contest and let me test them. The median superforecaster outscored 84% of other participants. The “wisdom of crowds” hypothesis says that averaging many ordinary people’s predictions produces a “smoothed-out” prediction at least as good as experts. That proved true here. An aggregate created by averaging all 508 participants’ guesses scored at the 84th percentile, equaling superforecaster performance. There are fancy ways to adjust people’s predictions before aggregating them that outperformed simple averaging in the previous experiments. Eric tried one of these methods, and it scored at the 85th percentile, barely better than the simple average. Crowds can beat smart people, but crowds of smart people do best of all. The aggregate of the 12 participating superforecasters scored at the 97th percentile. Prediction markets did extraordinarily well during this competition, scoring at the 99.5th percentile - ie they beat 506 of the 508 participants, plus all other forms of aggregation. But this is an unfair comparison: our participants were only allowed to spend five minutes max researching each question, but we couldn’t control prediction market participants; they spent however long they wanted. That means prediction markets’ victory doesn’t necessarily mean they’re better than other aggregation methods - it might just mean that people who can do lots of research beat people who do less research.2 Next year's contest will have some participants who do more research, and hopefully provide a fairer test. The single best forecaster of our 508 participants got a score of 25.68. That doesn’t necessarily mean he’s smarter than aggregates and prediction markets. There were 508 entries, ie 508 lottery tickets to outperform the markets by coincidence. Most likely he won by a combination of skill and luck. Still, this is an outstanding performance, and must have taken extraordinary skill, regardless of how much luck was involved. And The Winners Are . . . I planned to recognize the top five of these 508 entries. After I sent out prize announcement emails, a participant pointed out flaws in our resolution criteria3. We decided to give prizes to people who won under either resolution method. 1st and 2nd place didn't change, but 3rd, 4th, and 5th did - so seven people placed in the top five spots. They are: 1st: Ryan Kupyn. Ryan is a forecasting researcher at Amazon. His main hobby outside of work is designing walking tours for different Los Angeles neighborhoods. He asks me to include his “meet-me email address” coffee@ryankupyn.com, saying “I love to meet new people and talk about careers, ML, their best breakfast recipes and anything else."4
7th: Ezra Karger. Ezra is research director at the Forecasting Research Institute. This contest is an amateurish retreading of work that FRI’s Philip Tetlock already did much more formally years ago, and we feel honored that he entered at all. I’m not sure why it should be the case that forecasting researchers are also excellent forecasters, but Ezra has adequately demonstrated this at least in his own case.
Inline links: Forecasting Research Institute
Why: Philip Tetlock, co-author of Superforecasting and co-founder of the Good Judgment Project and the Forecasting Research Institute, is in town and has kindly agreed to come to an ACX meetup.
[XPT co-author Philip Tetlock will be at the ACX meetup this Sunday. If you have any questions, maybe he can answer them for you!]
Inline links: the ACX meetup this Sunday
This was a decisive victory. There were two judges, who each gave separate verdicts (or were allowed to declare a draw). Both judges decided in favor of Peter. You can see the judges’ own summary of their reasoning here (Will, Eric) Manifold agreed with the judges. There was a prediction market on who would win. It started out 70-30 in favor of lab leak. As the videos came out, zoonosis started doing better and better. I don’t want to take the exact final numbers too seriously, since I think some of the later price increases involved hints from the participants’ behavior. But it’s clear which way viewers thought the wind was blowing4. Around the same time, the Good Judgment Project - Philip Tetlock’s group studying superforecasters - put out a report on the lab leak hypothesis. After studying it in depth, his forecasters ended up 75-25 in favor of zoonosis. The Rootclaim debate was one of ten sources they said they found especially interesting. And also around the same time, and unrelated to any of this, the Global Catastrophic Risks Institute surveyed experts (“168 virologists, infectious disease epidemiologists, and other scientists from 47 countries”) and found the same thing (though see here for some potential problems with the survey): For what it’s worth, I was close to 50-50 before the debate, and now I’m 90-10 in favor of zoonosis. III. The Math And The Aftermath The third debate session was about “inference”, how to put evidence together. I put this part off until after disclosing the winner, because I wanted to talk about some of these issues at more length. The Math: Judges Both judges included a probabilistic analysis in their written decision. Here’s the same table as above, expanded to add the judges: I shoehorned the judges’ factors into the categories I already had; some of them were actually subtly different from Peter’s, Saar’s, and each other’s. The “priors” category is especially a mess here. We’ll go over these later, but I get the impression that they both thought of probabilistic analyses as an afterthought. For example, Judge Eric wrote 30,000 words about which considerations moved him, and only then includes the analysis, saying: I am not convinced that this Bayesian calculation is even an appropriate way to estimate the relative posterior probability of Z and LL; it just seemed fair that after criticizing Rootclaim’s calculations at length I should make an attempt at it myself. Judge Will’s decision ran to 10,000 words. He said he independently tried both reasoning it out intuitively, and running the Bayesian analysis, and was relieved when these two methods returned the same result. He said: I am skeptical that the Bayesian decision making/evaluation methods are any more "objective" than [intuitive reasoning]. I think they maximize legibility, not objectivity, and tend to hide the intuitive/heuristic portion in the data inclusion step and values, where it’s harder to see . . . I am not skilled in the Bayesian method, and I am sure I made significant mistakes. More time and practice would improve and refine my estimates. At the fundamental rules of the universe level, Bayesian analysis must be the best way to evaluate evidence. However, I am unsure that it’s a good strategy for a human given our cognitive limitations, and doubly unsure it’s truly being used (in the dispassionate sense) where the outcome is social desirability/fame/Twitter likes. I’m focusing on this because Saar’s opinion is that the debate went wrong (for his side) because he didn’t realize the judges were going to use Bayesian math, they did the math wrong (because Saar hadn’t done enough work explaining how to do it right), and so they got the wrong answer. I want to discuss the math errors he thinks the judges made, but this discussion would be incomplete without mentioning that the judges themselves say the numbers were only a supplement for their intuitive reasoning. That having been said, let’s look deeper into some of Saar’s concerns. The Math: Extreme Odds Saar complained that Peter’s odds were too extreme. For example, Peter said there was only a 1/10,000 chance that a lab leak pandemic would first show up at a wet market. Peter’s argument went something like: obviously a zoonotic pandemic would start at a site selling weird animals. But a lab leak pandemic - if it didn’t start at the lab - could show up anywhere. 1/10,000 Wuhan citizens work at the wet market. So if a lab leak was going to show up somewhere random, the wet market was a 1/10,000 chance. Saar had specific arguments against this, but he also had a more general argument: you should rarely see odds like 1/10,000 outside of well-understood domains. In his blog post, he gave this example: A prosecutor shows the court a statistical analysis of which DNA markers matched the defendant and their prevalence, arriving at a 1E-9 probability they would all match a random person, implying a Bayes factor near 1E9 for guilty. But if we try to estimate p(DNA|~guilty) by truly assuming innocence, it is immediately evident how ridiculous it is to claim only 1 out of a billion innocent suspects will have a DNA match to the crime scene. There are obviously far better explanations like a lab mistake, framing, an object of the suspect being brought by someone to the scene, etc. So the real p(wet market|lab leak) isn’t the 1/10,000 chance a pandemic arising in a random place hits the wet market, but the (higher?) probability that there’s something wrong with Peter’s argument. Then Saar tried to show specific things that might be wrong with Peter’s argument. I didn’t find his specific examples convincing. But maybe the question shouldn’t be whether I agreed with him. It should be whether I’m so confident he’s wrong that I would give it 10,000-to-1 odds. This makes total sense, it’s absolutely true, and I want to be really, really careful with it. If you take this kind of reasoning too far, you can convince yourself that the sun won’t rise tomorrow morning. All you have to do is propose 100 different reasons the sunrise might not happen. For example: The sun might go nova.
Inline links: Will, Eric, agreed, 4, put out a report on the lab leak hypothesis, https://substackcdn.com/image/fetch/$s_!g7k2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37f1b493-b556-41ec-925e-03f9d8bc26cb_1456x849.webp, surveyed experts, see here, https://substackcdn.com/image/fetch/$s_!Zejl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9c88e87-b6ca-4c6d-840e-24da726f50b7_975x365.png, https://substackcdn.com/image/fetch/$s_!T5rV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4983e2cd-4151-42de-9685-08037ef7a8e8_635x788.png
David Rozado, $50K, to study truth-seeking and bias in LLMs. Suppose you ask a chatbot about minimum wages, and it summarizes economic research on the topic. Or suppose it’s 2030, GPT-7 has outpaced human economists, and you want it to do original analysis. How can you be sure that it’s not falling victim to the same political biases that might plague the rest of us? Professor Rozado studies this question in depth, working on tools that measure bias (for example, whether the AI will evaluate study methodologies consistently when the results favor different political views) and trying to determine what interventions (prompts, fine-tuning, etc) best ensure AI neutrality. Philip Tetlock, of superforecasting fame, will assist with this research.
Backlinks
- ACX Grants Results 2025
- Berkeley Meetup On Sunday, Special Guest Philip Tetlock
- Forecasting Research Institute
- From The Mailbag
- Good Judgment Project
- 15
- Mantic Monday: Mantic Matt Y
- Open Thread 222
- Open Thread 255
- Organizations: F
- Organizations: G
- People: P
- People: W
- Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
- Prediction Market FAQ
- prediction markets
- President Biden
- The Extinction Tournament
- The Passage Of Polymarket
- Ukraine Warcasting
- Warren Buffett
- Who Predicted 2022?
- Zelenskyy