Article
Google is a recurring organization in the Astral Codex Ten archive, appearing 82 times across 82 issues between February 05, 2021 and March 06, 2026. The archive places it in contexts such as “Google apparently has hard-coded into their search algo”; “Google apparently has hard-coded into their search algorithm”; “If a random shmuck who doesn’t know anything about anything Googles ‘who should I trust about COVID?’, Google will return Dr. Fauci’s name”. It most often appears alongside US, Twitter, OpenAI.
Metadata
- Category: Organizations
- Mention count: 82
- Issue count: 82
- First seen: February 05, 2021
- Last seen: March 06, 2026
Appears In
- WebMD, And The Tragedy Of Legible Expertise
- Ontology Of Psychiatric Conditions: Tradeoffs And Failures
- Mantic Monday: Predictions For 2021
- The Rise And Fall Of Online Culture Wars
- Peer Review Request: Depression
- Carbon Costs Quantified
- Open Thread 187
- The Unbearable Semiheaviness Of Being
- Whither Tartaria?
- Highlights From The Comments On Modern Architecture
- Highlights From The Comments On Orban
- Open Thread 198
- Highlights From The Comments On Great Families
- Mantic Monday: Let Me Google That For You
- Mantic Monday: Dogs In Wizard Hats
- Grading My 2021 Predictions
- Predictions For 2022
- ACX Grants ++: The First Half
- The Passage Of Polymarket
- ACX Grants ++: The Second Half
- Links For February
- Biological Anchors: A Trick That Might Or Might Not Work
- Highlights From The Comments On Justice Creep
- Yudkowsky Contra Christiano On AI Takeoff Speeds
- Dictator Book Club: Xi Jinping
- Every Bay Area House Party
- Open Thread 230
- Your Book Review: The Internationalists
- Why Not Slow AI Progress?
- A Cyclic Theory Of Subcultures
- Billionaires, Surplus, And Replaceability
- Links For September 2022
- I Won My Three Year AI Progress Bet In Three Months
- Open Thread 242
- Why Is The Central Valley So Bad?
- Another Bay Area House Party
- Why I’m Less Than Infinitely Hostile To Cryptocurrency
- Prediction Market FAQ
- Even More Bay Area House Party
- Mostly Skeptical Thoughts On The Chatbot Propaganda Apocalypse
- Grading My 2018 Predictions For 2023
- OpenAI’s “Planning For AGI And Beyond”
- Half An Hour Before Dawn In San Francisco
- Highlights From The Comments On IRBs
- Highlights From The Comments On Housing Density And Prices
- The Question Of Separatism
- 23
- Hypergamy: Much More Than You Wanted To Know
- Every Flashing Element On Your Site Alienates And Enrages Users
- Tales Of Takeover In CCF-World
- Links For August 2023
- Your Book Review: The Weirdest People in the World
- Bride Of Bay Area House Party
- My Presidential Platform
- Links For September 2023
- Highlights From The Comments On Kidney Donation
- In The Long Run, We’re All Dad
- The Psychopolitics Of Trauma
- Sam Altman Wants $7 Trillion
- 24
- Highlights From The Comments On “The Origin Of Woke”
- Zvi on California’s AI Bill
- Links for May 2024
- Some Practical Considerations Before Descending Into An Orgy Of Vengeance
- Links for July 2024
- SB 1047: Our Side Of The Story
- OpenAI Nonprofit Buyout: Much More Than You Wanted To Know
- Meetups Everywhere Spring 2025: Times & Places
- Bayes For Everyone
- ACX Grants 1-3 Year Updates
- Your Review: Joan of Arc
- In Defense Of The Amyloid Hypothesis
- Links For September 2025
- Why AI Safety Won’t Make America Lose The Race With China
- Vibecession: Much More Than You Wanted To Know
- Open Thread 415
- Mantic Monday: The Monkey’s Paw Curls
- Highlights From The Comments On Scott Adams
- AMA (Ask Machines Anything)
- The Pentagon Threatens Anthropic
- Mantic Monday: Groundhog Day
- SEIU Delenda Est
Related Pages
-
- US (29 shared issues)
-
- Twitter (26 shared issues)
-
- OpenAI (23 shared issues)
-
- China (22 shared issues)
-
- United States (20 shared issues)
-
- facebook (19 shared issues)
-
- Scott (19 shared issues)
-
- Trump (18 shared issues)
-
- California (17 shared issues)
-
- Anthropic (16 shared issues)
-
- Elon Musk (15 shared issues)
-
- Metaculus (14 shared issues)
External Links
Source Context
Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.
The essence of Moloch is that if you want to win intense competitions, you have to optimize for winning intense competitions - not for some unrelated thing like giving good medical advice. Google apparently has hard-coded into their search algorithm that WebMD should be on the front page for any medical-related search; I would say they have handily won the intense competition that they're in. They must have placated a wide variety of stakeholders and fought off a wide variety of attackers; each of those victories took a minor change to their medical information or their procedures for producing medical information. Repeat a thousand times, and they're on top of the world, and also every diagnosis is "cancer" and every drug's side effects are "everything".
Inline links: Moloch
Dr. Fauci (and WebMD) are legibly good (or at least legibly okay). They sit on a giant golden throne, with a giant neon arrow pointing to them saying "TRUST THIS GUY". If a random shmuck who doesn't know anything about anything Googles "who should I trust about COVID?", Google will return Dr. Fauci's name. This is a position of great power; Dr. Fauci is able to make decisions that will affect billions of dollars in wealth, Senate seats, Twitter likes, and other extremely valuable resources. Thousands of people who would prefer that they get the dollars and seats and likes will be gunning for him. In order to stay on that throne, Dr. Fauci will need to get and keep lots of powerful allies (plus be the sort of person who thinks in terms of how to get allies rather than being minimaxed for COVID-prediction). This interferes with his COVID predicting ability, but in the current system there’s no alternative. You can't trivially put Zvi on that throne, any more than you could trivially make Zvi benevolent dictator of the world (another job I think he would be good at). One of the big differences between good and bad systems of government is how much they rely on corruption vs. meritocracy in putting people on those thrones, and our system of government is only mediocre. As the saying goes, "there are no First World countries".
My guess is that somebody who's chosen the far end of this tradeoff naturally ends up as the stereotypical "aspie engineer", who's very smart, a bit off, but not so far gone he can't hold down his job at Google.
ECON/TECH 14. Gamestop stock price still above $100: 50% 15. Bitcoin above 100K: 40% 16. Ethereum above 5K: 50% 17. Ethereum above 0.05 BTC: 70% 18. Dow above 35K: 90% 19. ...above 37.5K: 70% 20. Unemployment above 5%: 40% 21. Google widely allows remote work, no questions asked: 20% 22. Starship reaches orbit: 60%
This corresponds to the middle and end of the Internet atheist movement, and some of the same dynamics I discussed in my article there apply here as well, especially the slow shift from 2000s-era "argument culture" to 2010s-era "echo culture". The very early Internet had pro-argument norms; it was your god-given right to march into any blog or forum you wanted and tell the people there why they were wrong. Partly this was the inevitable effect of everyone on the early Internet being the sort of programming nerds willing to try this weird new invention. And partly it came from a utopian philosophy where the Internet was going to be a new medium that united humanity regardless of nation or creed in a great Republic Of The Intellect, or whatever. Maybe it was even partly due to naivete - a lot of people hadn't really met anyone who thought differently from them before, and assumed that changing people’s minds would be really easy. For whatever reason, the early Internet was a place for polite but insistent debate, and early websites centered around the needs of a debating community. The most obvious example was TalkOrigins' massive alphabetized database of arguments against creationist claims, with the explicit goal of helping people win debates with creationists.
But what does Google Trends have to say?
I'm not saying there's literally only one thing the Internet gets in fights about at any given time. The Internet fights about lots of things. But intuitively it feels like there's kind of a power law distribution where one topic clearly outstrips the others - maybe not winner-take-all, but at least winner-take-most. I think you could describe the last twenty years of Internet history as going through three phases - one dominated by religion, one dominated by gender, and now one dominated by race. The race phase seems to have peaked in 2018 and started declining, before being given new life by George Floyd and BLM. The Google Trends results raise the tantalizing possibility that racial issues can’t keep increasing forever. They could eventually crash the same way religious and gender issues did (probably to be replaced by something else even more divisive and awful).
See also the (free) Mediterranean Diet cookbook and the (non-free) book by Professor Felice Jacka on using modified Mediterranean diets in depression This is a slightly altered version of the Mediterranean diet, which is also recommended by cardiologists, endocrinologists, etc – see here for more about the general medical context.
Inline links: (free) Mediterranean Diet cookbook, (non-free) book by Professor Felice Jacka on using modified Mediterranean diets in depression, see here
Intellicare is a series of CBT apps; you can download it for free as “Intellicare Hub” here or on the Google or Apple stores. I have never tried it, but the Carlat Report says nice things about it, and it has several successful studies under its belt.
Inline links: here, several successful studies
28. https://services.google.com/fh/files/misc/google_2019-environmental-report.pdf
2: Since the last post, meetups have been added in Bangalore, Tokyo, and Ife (Nigeria), and a lot of existing meetups have changed times/dates/places, so please check the spreadsheet to make sure the last thing you read about meetups is still up to date.
Inline links: check the spreadsheet
I hear that Google tests prospective employees with weird vaguely-science-related riddles. If I were in charge of this, here's what I would ask:
The headquarters of Google, one of the richest corporations in the world. A third-rate 1500s merchant would be ashamed to live anywhere as bare. So (continues the conspiracy) probably we suffered some kind of apocalypse a hundred-ish years ago. Our elites are keeping it quiet, and have altered the records, but they haven’t been able to destroy all the buildings of the lost world. Their cover story is that technology and wealth level haven’t regressed or anything, those kinds of buildings have just “gone out of style”.
Some people were very insistent about this, saying that I was stupid for not having read the primary sources where post-WWII builders explain that this is exactly what they are doing. I admit that my claim in the original post that I hadn’t heard people say this before had more to do with my own ignorance than with what other people had said, and I will try to read those sources, but I think these people’s strident tone is a bit misplaced. Even granting that many people said that, there still seems to be a mystery here. Did Google think about the horrors of WWII before deciding to build a kind of ugly headquarters? Did some random 1990s suburb think about the horrors of WWII before deciding to build a kind of ugly City Hall?
1) Unlawful entry only accounts for a small portion of illegal immigration in the US, with most immigrants entering legally and overstaying their visas. [This article](https://www.theatlantic.com/international/archive/2019/04/real-immigration-crisis-people-overstaying-their-visas/587485/) was the first Trump-era hit on the subject I found on Google. As far as I can tell, the current numbers are similar, with there being about double the number of visa overstays annually as unlawful entries. Obviously a wall would only affect unlawful entry.
The questions I most often had after reading people’s applications were “why would this be good?”, “why isn’t this a for-profit startup?”, “but what actual, concrete things are you going to do?”, and “if you care so much about this and you’re a software engineer at Google and it only costs $1000 why haven’t you just funded it yourself?” If your applications answer those questions, you’ll have a better chance of getting accepted, or at least of saving yourself an email conversation with me about them.
The Wojcickis had the unfair advantage that Google was founded in their garage, which gave them some pretty great networking opportunities. For the record, the sisters’ father is a Stanford physicist, and their mother is an educator who has leveraged her childrens’ fame into a book How To Raise Successful People. Not gonna lie, I’m pretty tempted to read this.
I agree it’s awkward that we can only do these calculations well with Nobels (and maybe Olympic medalists?). A really rigorous attempt at this would try to find some way of quantifying extreme but not Nobel-level talent. Maybe Google Trends volume or number of hits on their Wikipedia page? With some kind of scaling factor based on recency or being in fields that tend to get lots of searches and Wikipedia hits?
New from Google this month: Creating A Prediction Market On Google Cloud. Google announces that they’ve been running an internal prediction market for the past year, with “over 175,000 predictions from over 10,000 Google employees”.
Inline links: Creating A Prediction Market On Google Cloud
Most of it’s classified because they’re predicting stuff about Google’s corporate secrets, but some friendly Googlers were at least willing to walk me through the article and clarify pieces I didn’t understand.
The market, called Gleangen, is actually the second prediction market Google’s tried. The first, in 2007, was called Prophit - the team included occasional ACX commenter Patri Friedman, who’s since moved into the charter city space.
The outcome is measured in some kind of Google mobility data, but that’s irrelevant. The question is how long it will take to go back to normal after the coronavirus.
3: Congratulations to Google’s new prediction market team for making the front page of Hacker News twice last week! A good demonstration that there’s a lot of interest in this field.
Inline links: the front page of Hacker News, twice
ECON/TECH 14. Gamestop stock price still above $100: 50% 15. Bitcoin above 100K: 40% 16. Ethereum above 5K: 50% 17. Ethereum above 0.05 BTC: 70% 18. Dow above 35K: 90% 19. ...above 37.5K: 70% 20. Unemployment above 5%: 40% 21. Google widely allows remote work, no questions asked: 20% 22. Starship reaches orbit: 60%
ECON/TECH 11. Gamestop stock price still above $100: 30% 12. Bitcoin above 100K: 20% 13. Ethereum above 5K: 20% 14. Ethereum above 0.05 BTC: 90% 15. Bored Ape floor price here below current price of $203K: 40% 16. Dow above 35K: 90% 17. ...above 37.5K: 40% 18. Inflation for the year below five percent: 90% 19. Unemployment below five percent: 50% 20. Google widely allows remote work, no questions asked: 50% 21. Starship reaches orbit: 90%
Inline links: here
If you want, you can go to their form and predict the same set of questions I did (minus the personal and redacted ones). Use the same rules I did: no peeking at the prediction markets, and no more than five minutes of research per question. If you don’t know anything about a question, you can leave it blank and it will get filled with my prediction by default.
Inline links: their form
- Read the contest description/rules here - Give feedback on the contest here - And once again, the form where you take the contest is here
#2: Understand The Texture Of Pain The project consists of writing software to edit textures in real-time in a browser using texture synthesis techniques, exploring ways of narrowing the state-space, and funding user testing to determine the usefulness of the method. We would like to demonstrate as a proof of concept how different medical conditions which have similar symptomatologies at the surface-level (e.g. “stabbing pain in shoulder”) show up as recognizably different textures when visualized with this technique. We think this will significantly contribute to foundational research on pain with applications for medical diagnosis, as well as pain management and treatment. We also think that these visualizations will advance our understanding of the true meaning of pain scales: we will be collecting self-reported pain levels in reference to clinically-used scales and correlating them to the properties of the visualized pains. For example, we may find that a certain pain described as “2/10” could match a visualization of 10 pin-pricks per second, while a pain of the same type described as “3/10” could match 50 pin-pricks/s, and a “4/10” pain could match 250 pin-pricks/s, and so on. I.e. these visualizations might provide a very grounded + transparent way to show that pain scales are non-linear, and possibly logarithmic (Gómez-Emilsson, 2019, see: https://tinyurl.com/ha834tpm) in nature. See full proposal here: https://docs.google.com/document/d/1zLWyxhOMNqHp8tGqOAK2aABHOXpMNj_tmQbIy8Fdlbk/edit?usp=sharing
#48: Research Transparency AUDITS Of Published PAPERS Today’s scientists are rewarded for QUANTITY at the expense of QUALITY, causing serious quality control problems in science. In a fresh attempt to solve this problem, we are boldly conducting the world’s first researcher transparency audits, in combination with using unique rewards to NUDGE authors to increase their transparency. This uniquely addresses the needs of the established professor market while also catering to the needs of junior scientists in the emerging open science market. We are seeking a new round of funding so that we can (1) scale up and improve our apps and (2) operate a small auditing team to conduct ongoing transparency audits at a global scale. We’re excited to move forward on our MISSION to scale up our disruptive transparency author apps, so we can achieve our VISION of a transformed research world brimming with high-quality scientific evidence (for more details, see our 4-page funding proposal https://docs.google.com/document/d/1fiv6t0izX7z4F5kuPiLpzeyBtV4GwLRjaMODUj5EpSg/edit?usp=sharing ). We're looking for seed funding in the $50K to $150K range. If you can provide funding or advice, please email contact@curatescience.org
#60: Empower People To Understand And Reform Public Policy PolicyEngine is a tech nonprofit that empowers people to understand and reform public policy. Last year, we launched our open source UK web app (https://policyengine.org), which lets anyone see their benefit eligibility and tax liability, and then calculate the personalized and society-wide impacts of changing tax and benefit rules. Policymakers from multiple parties use PolicyEngine to improve their institutional decision-making, and individuals are using it to explore policy reforms and hold leaders accountable. Our founders are Max Ghenis, a US-based former Google data scientist and MIT-trained economist who previously founded the UBI Center basic income research organization, and Nikhil Woodruff, a former data scientist on leave from a MSc in Computer Science at Durham University in the UK. Our board of advisors includes economists with experience in academia, think tanks, and government, as well as tech leaders. Now we're seeking $100,000 to build PolicyEngine US over six months. We're fiscally sponsored by the PSL Foundation (https://psl-foundation.org), a 501(c)3. We've provided more information at https://proposal.policyengine.org and you can reach us at max@policyengine.org.
That vision was . . . maybe 25% achieved? It’s pretty great that I can write a blog like this instead of begging for my supper at a major media organization. But after a brief period of discombobulation, dictatorships found it easy to create their own walled-garden Internets through light-touch censorship; although there are ways around most of their tricks, ordinary people don’t bother with them (very poor news indeed!) And in practice most people ended up basing their Internet explorations at a few big businesses like Google, Facebook, and Twitter, which became easy prey for censors and in some cases rush to self-censor even more zealously than governments demand. It’s not that the Internet can’t create a magical censorship-resistant infrastructure, it’s that it’s 5% easier to sell your soul to FAANG, and so many people take that option that the few people who don’t aren’t really a critical mass for escaping governments or building new communities.
Inline links: don’t bother with them
#83: Detect And Fight Healthcare Fraud Our company is using data to detect fraud against the government. Access to quality healthcare is dwindling in the United States. There is an estimated hundred billion dollars in fraud every year leading to lower standards of care and making healthcare unaffordable. We’re seeking a hundred thousand dollars to buy data from the Centers for Medicare and Medicaid services. This will allow us to find fraud and file lawsuits on behalf of the government. The Department of Justice signaled a new level of support for independent companies using data methods to identify fraud in June of last year when they picked up a case brought by Integra Med Analytics. For the past twelve months we’ve been working with attorneys specializing in this area (qui tam). We’ve been consolidating data returned from broad FOIA requests and begun assisting law firms with data science. Our team combines broad technical expertise (Google, NASA, LANL, NIST, UC Berkeley) with business acumen and investigative experience. The three of us have been working together on projects with positive externalities for five years. Previous successful projects include providing flexible housing, and a micro-targeting methods for political action. [Contact erbahr@gmail.com if you can help]
#113: Increase Own Intelligence, Then Write About How My name is David Gretzschel and I want money to increase my own intelligence full-time for about a year. Once I have succeeded (more than I already have), I will teach others how to do this. The benefits of this are obvious. And I already know how to do that for the most part. I have a concrete foundation in the form of a synesthetic encoding scheme, that I can build on. I merely need the time to do an intense amount of training without being distracted by either having a job or not having one and starving. And practice how to use them on various mathematical and computational problems. And a bunch of other things. Details are in the long pitch (see below). So I need 20.000 dollars to not worry about rent and food for that time. Please send them in Bitcoin here: 3Qcm3UJRuFca1fTkf2iPPEkU3PevpzPuwP I certainly would have use for more money, too. (though it'd not be necessary, I don't want to dissuade you from it, if that's an option) So do feel free to shower me with the stuff, if you have it and believe in my cause. (or you only believe in it 10%, but know that the expected value calculation still ends up with a happy face /pascal-mugging) With 10.000 dollars I'd still commit to a year, though that would be a bit tighter than I’d like. The longer pitch is here: https://docs.google.com/document/d/170WETB6enUOzQEzwbwmOCVHz9VkBe4R86rCh_ewvOcg/edit?usp=sharing . If you have further questions/conditions/need more persuasion, send an email to: davidgretzschel@gmail.com
8: Economist: Why Brahmins Lead Western Firms But Rarely Indian Ones. Brahmins are the highest Indian caste; in India they tend to be academics/lawyers/etc, but in the US they are disproportionately likely to become CEOs (including the current leaders of Google and Microsoft). Article theorizes that this is a combination of more business-related Indian castes having better networking within India (so motivated Brahmins tend to go abroad), Brahmins being good at the traditional academic pathway that lends itself well to immigration, plus maybe affirmative action against them in India. Here’s a rebuttal I link to out of duty, but I’m not sure it’s worth wading through the woke outrage to get to the two or three mildly interesting facts (Brahmins started immigrating before India’s affirmative action really ramped up, and they might have a first-mover advantage from building immigrant communities earlier).
Inline links: Why Brahmins Lead Western Firms But Rarely Indian Ones, Here’s
9: Most previous studies of preschool found zero to negative effects on academic achievement, but potentially positive effects on nonacademic outcomes like discipline and grit. A big new study of lower-income children (h/t Samuel Hammond) confirms negative effects on academic achievement but also finds negative effects on non-academic outcomes. I have yet to look at it closely enough to have a good theory of what’s going on here, or whether parents should be trying to keep their kids out of preschool.
Inline links: big new study
11: [edited to add: also an AI Governance curriculum here]
Inline links: here
The Open Philanthropy Project ("Open Phil") is a big effective altruist foundation interested in funding AI safety. It's got $20 billion, probably the majority of money in the field, so its decisions matter a lot and it’s very invested in getting things right. In 2020, it asked senior researcher Ajeya Cotra to produce a report on when human-level AI would arrive. It says the resulting document is "informal" - but it’s 169 pages long and likely to affect millions of dollars in funding, which some might describe as making it kind of formal. The report finds a 10% chance of “transformative AI” by 2031, a 50% chance by 2052, and an almost 80% chance by 2100.
Inline links: Open Philanthropy Project, https://drive.google.com/drive/u/1/folders/15ArhEPZSTYU8f012bs6ehPS6-xmhtBPP
Source: This document by Paul Christiano. Ajeya combines this with another metric where they see how existing AI compares to animals with apparently similar computational capacity; for example, she says that DeepMind’s Starcraft engine has about as much inferential compute as a honeybee and seems about equally subjectively impressive. I have no idea what this means. Impressive at what? Winning multiplayer online games? Stinging people? In any case, they decide to penalize AI by one order of magnitude compared to Nature, so a human-level AI would need to do 10^16 floating point operations per second. How Much Compute Would It Take To Train A Model That Does 10^16 Floating Point Operations Per Second? So an AI could potentially equal the human brain with 10^16 FLOP/S. Good news! There’s a supercomputer in Japan that can do 10^17 FLOP/S! It looks like this (source) So why don’t we have AI yet? Why don’t we have ten AIs? In the modern paradigm of machine learning, it takes very big computers to train relatively small end-product AIs. If you tried to train GPT-3 on the same kind of medium-sized computers you run it on, it would take between tens and hundreds of years. Instead, you train GPT-3 on giant supercomputers like the ones above, get results in a few months, then run it on medium-sized computers, maybe ~10x better than the average desktop. But our hypothetical future human-level AI is 10^16 FLOP/S in inference mode. It needs to run on a giant supercomputer like the one in the picture. Nothing we have now could even begin to train it. There’s no direct and obvious way to convert inference requirements to training requirements. Ajeya tries assuming that each parameter will contribute about 10 FLOPs, which would mean the model would have about 10^15 parameters (GPT-3 has about 10^11 parameters). Finally, she uses some empirical scaling laws derived from looking at past machine learning projects to estimate that training 10^15 parameters would require H*10^30 FLOPs, where H represents the model’s “horizon”. If I understand this correctly, “horizon” is a reinforcement learning concept: how long does it take to learn how much reward you got for something? If you’re playing a slot machine, the answer is one second. If you’re starting a company, the answer might be ten years. So what horizon do you need for human level AI? Who knows? It probably depends on what human-level task you want the AI to do, plus how well an AI can learn to do that task from things less complex than the entire task. If writing a good book is mostly about learning to write good sentence and then stringing them together, a book-writing AI can get away with a short horizon. If nothing short of writing an entire book and then evaluating it to see whether it is good or bad can possibly teach you book-writing, the AI will need a long time horizon. Ajeya doesn’t claim to have a great answer for this, and considers three models: horizons of a few minutes, a few hours, and a few years. Each step up adds another three orders of magnitude, so she ends up with three estimates of 10^30, 10^33, and 10^36 FLOPs. (for reference, the lowest training estimate - 10^30 - would take the supercomputer pictured above 300,000 years to complete; the highest, 300 billion.) Or What If We Ignore All Of That And Do Something Else? This is piling a lot of assumptions atop each other, so Ajeya tries three other methods of figuring out how hard this training task is. Humans seem to be human-level AIs. How much training do we need? You can analogize our childhood to an AI’s training period. We receive a stream of sense-data. We start out flailing kind of randomly. Some of what we do gets rewarded. Some of what we do gets punished. Eventually our behavior becomes more sophisticated. We subject our new behavior to reward or punishment, fine-tune it further. Rent asks us: how do you measure the life of a woman or man? It answers: “in daylights, in sunsets, in midnights, in cups of coffee; in inches, in miles, in laughter, in strife.” But you can also measure in floating point operations, in which case the answer is about 10^24. This is actually trivial: multiply the 10^15 FLOP/S of the human brain by the ~10^9 seconds of childhood and adolescence. This new estimate of 10^24 is much lower than our neural net estimate of 10^30 - 10^36 above. In fact, it’s only a hair above the amount it took to train GPT-3! If human-level AI was this easy, we should have hit it by accident sometime in the process of making a GPT-4 prototype. Since OpenAI hasn’t mentioned this, probably it’s harder than this and we’re missing something. Probably we’re missing that humans aren’t blank slates. We don’t start at zero and then only use our childhood to train us further. The very structure of our brain encodes certain assumptions about what kinds of data we should be looking out for and how we should use it. Our training data isn’t just what we observed during childhood, it’s everything that any of our ancestors observed during evolution. How many floating-point operations is the evolutionary process? Ajeya estimates 10^41. I can’t believe I’m writing this. I can’t believe someone actually estimated the number of floating point operations involved in jellyfish rising out of the primordial ooze and eventually becoming fish and lizards and mammals and so on all the way to the Ascent of Man. Still, the idea is simple. You estimate how long animals with neurons have been around for (10^16 seconds), total number of animals at any given second (10^20) times average number of FLOPS per animal (10^5) and you can read more here but it comes out to 10^41 FLOs. I would not call this an exact estimate - for one thing, it assumes that all animals are nematodes, on the grounds that non-nematode animals are basically a rounding error in the grand scheme of things. But it does justify this bizarre assumption, and I don’t feel inclined to split hairs here - surely the total amount of computation performed by evolution is irrelevant except as an extreme upper bound? Surely the part where Australia got all those weird marsupials wasn’t strictly necessary for the human brain to have human-level intelligence? One more weird human training data estimate attempt: what about the genome? If in some sense a bit of information in the genome is a “parameter”, how many parameters does that suggest humans have, and how does it affect training time? Ajeya calculates that the genome has about 7.5x10^8 parameters (compared to 10^15 parameters in our neural net calculation, and 10^11 for GPT-3). So we can… Okay, I’ve got to admit, this doesn’t have quite the same “huh?!” factor as trying to calculate the number of FLOs in evolution, but it is in a lot of ways even crazier. The Japanese canopy plant has a genome fifty times larger than ours, which suggests that genome size doesn’t correspond very well to organism awesomeness. Also, most of the genome is coding for weird proteins that stabilize the shape of your kidney tubule or something, why should this matter for intelligence? The Japanese canopy plant. I think it is very pretty, but probably low prettiness per megabyte of DNA. I think Ajeya would answer that she’s debating orders of magnitude here, and each of these weird things costs only a few OOMs and probably they all even out. That still leaves the question of why she thinks this approach is interesting at all, to which she answers that: The motivating intuition is that evolution performed a search over a space of small, compact genomes which coded for large brains rather than directly searching over the much larger space of all possible large brains, and human researchers may be able to compete with evolution on this axis. So maybe instead of having to figure out how to generate a brain per se, you figure out how to generate some short(er) program that can output a brain? But this would be very different from how ML works now. Also, you need to give each short program the chance to unfold into a brain before you can evaluate it, which evolution has time for but we probably don’t. Ajeya sort of mentions these problems and counters with an argument that maybe you could think of the genome as a reinforcement learner with a long horizon. I don’t quite follow this but it sounds like the sort of thing that almost might make sense. Anyway, when you apply the scaling laws to a 7.5*10^8 parameter genome and penalize it for a long horizon, you get about 10^33 FLOPs, which is weirdly similar to some of the other estimates. So now we have six different training cost estimates. First, neural nets with short, medium, and long horizons, which are 10^30, 10^33, and 10^36 FLOPs, respectively. Next, the amount of training data in a human lifetime - 10^24 FLOs - and in all of evolutionary history - 10^41 FLOPs. And finally, this weird genome thing, which is 10^33 FLOPs. An optimist might say “Well, our lowest estimate is 10^24 FLOPs, our highest is 10^41 FLOPs, those sound like kind of similar numbers, at least there’s no “5 FLOPs” or “10^9999 FLOPs” in there. A pessimist might say “The difference between 10^24 and 10^41 is seventeen orders of magnitude, ie a factor of 100,000,000,000,000,000 times. This barely constrains our expectations at all!” Before we decide who to trust, let’s remember that we’re still only at Step 2 of our eight step Methodology, and continue. How Do We Adjust For Algorithmic Progress? So today, in 2022 (or in 2020 when this was written, or whenever), assume it would take about 10^33 FLOs to train a human-level AI. But technology constantly advances. Maybe we’ll discover ways to train AIs faster, or run AIs more efficiently, or something like that. How does that factor into our estimate? Ajeya draws on Hernandez & Brown’s Measuring The Algorithmic Efficiency Of Neural Networks. They look at how many FLOPs it took to train various image recognition AIs to an equivalent level of performance between 2012 and 2019, and find that over those seven years it decreased by a factor of 44x, ie training efficiency doubles every sixteen months! Ajeya assumes a doubling time slightly longer than that, because it’s easier to make progress in simple well-understood fields like image recognition than in the novel task of human-level AI. She chooses a doubling time of “merely” 2 - 3 years. If training efficiency doubles every 2-3 years, it would dectuple in about 10 years. So although it might take 10^33 FLOPs to train a human level AI today, in ten years or so it may take only 10^32, in twenty years 10^31, and so on. When Will Anyone Have Enough Computational Resources To Train A Human-Level AI? In 2020, AI researchers could buy computational resources at about $1 for 10^17 FLOPs. That means the 10^33 FLOPs you’d need to train a human-level AI would cost $10^16, ie ten quadrillion dollars. This is about twenty times more money than exists in the entire world. But compute costs fall quickly. Some formulations of Moore’s Law suggest it halves every eighteen months. These no longer seem to hold exactly, but it does seem to be halving maybe once every 2.5 years. The exact number is kind of controversial: Ajeya admits it’s been more like once every 3-4 years lately, but she heard good things about some upcoming chips and predicted it might revert back to the longer-term faster trend (it’s been two years now, some new chips have come out, and this prediction is looking pretty good). So as time goes on, algorithmic progress will cut the cost of training (in FLOPs), and hardware progress will also cut the cost of FLOPs (in dollars). So training will become gradually more affordable as time goes on. Once it reaches a cost somebody is willing to pay, they’ll buy human-level AI, and then that will be the year human-level AI happens. What is the cost that somebody (company? government? billionaire?) is willing to pay for human-level AI? The most expensive AI training in history was AlphaStar, a DeepMind project that spent over $1 million to train an AI to play StarCraft (in their defense, it won). But people have been pouring more and more money into AI lately: Source here. This is about compute rather than cost, but most of the increase seen here has been companies willing to pay for more compute over time, rather than algorithmic or hardware progress. The StarCraft AI was kind of a vanity project, or science for science’s sake, or whatever you want to call it. But AI is starting to become profitable, and human-level AI would be very profitable. Who knows how much companies will be willing to pay in the future? Ajeya extrapolates the line on the graph forward to 2025 and gets $1 billion. This is starting to sound kind of absurd - the entire company OpenAI was founded with $1 billion in venture capital, it seems like a lot to expect them to spend more than $1 billion on a single training run. So Ajeya backs off from this after 2025 and predicts a “two year doubling time”. This is not much of a concession. It still means that in 2040 someone might be spending $100 billion to train one AI. Is this at all plausible? At the height of the Manhattan Project, the US was investing about 0.5% of its GDP into the effort; a similar investment today would be worth $100 billion. And we’re about twice as rich as 2000, so 2040 might be twice as rich as we are. At that point, $100 billion for training an AI is within reach of Google and maybe a few individual billionaires (though it would still require most or all of their fortune). Ajeya creates a complicated function to assess how much money people will be willing to pay on giant AI projects per year. This looks like an upward-sloping curve. The line representing the likely cost of training a human-level AI looks like a downward sloping curve. At some point, those two curves meet, representing when human-level AI will first be trained. So When Will We Get Human-Level AI? The report gives a long distribution of dates based on weights assigned to the six different models, each of which has really wide confidence intervals and options for adjusting the mean and variance based on your assumptions. But the median of all of that is 10% chance by 2031, 50% chance by 2052, and almost 80% chance by 2100. Ajeya takes her six models and decides to weigh them like so, based on how plausible she thinks each one is: 20% neural net, short horizon 30% neural net, medium horizon 15% neural net, long horizon 5% human lifetime as training data 10% evolutionary history as training data 10% genome as parameter number She ends up with this: How Sensitive Is This To Changes In Assumptions? She very helpfully gives us a Colab notebook and Google spreadsheet to play around with. The notebook lets you change some of the more detailed parameters of the individual models, and the spreadsheet lets you change the big picture. I leave the notebook to people more dedicated to forecasting than I am, and will talk about the spreadsheet here. If you’re following along at home, the default spreadsheet won’t reflect Ajeya’s findings until you fill in the table in the bottom left like so: Great. Now that we’ve got that, let’s try changing some stuff. I like the human childhood training data argument (Lifetime Anchor) more than Ajeya does, and I like the size-of-the-genome argument less. I’m going to change the weights to 20-20-0-20-20-20. Also, Ajeya thinks that someone might be willing to spend 1% of national GDP on training AIs, but that sounds really high to me, so I’m going to down to 0.1%. Also, Ajeya’s estimate of 3% GDP growth sounds high for the sort of industrialized nations who might do AI research, I’m going to lower it to 2%. Since I’m feeling mistrustful today, let’s use the Hernandez&Brown estimate for compute halving (1.5 years) in place of Ajeya’s ad hoc adjustments. And let’s use the current compute halving time (3.5 years) instead of Ajeya’s overly rosy version (2.5 years). All these changes… …don’t really do much. The median goes from 2052 to about 2065. Four of the models give results between 2030 and 2070. The last two, Neural Net With Long Horizon and Evolution, suggest probably no AI this century (although Neural Net With Long Horizon does think there’s a 40% chance by 2100). Ajeya doesn’t really like either of these models and they’re not heavily weighted in her main result. Does The Truth Point To Itself? Back up a second. Here’s something that makes me kind of nervous. Most of Ajeya’s numbers are kind of made up, with several order-of-magnitude error bars and simplifying assumptions like “all animals are nematodes”. For a single parameter, we get estimates spanning seventeen different orders of magnitude: the upper bound is one hundred quadrillion times the lower bound. And yet four of the six models, including two genuinely exotic ones, manage to get dates within twenty years of 2050. And 2050 is also the date everyone else focuses on. Here’s the prediction-market-like site Metaculus: Their distribution looks a lot like Ajeya’s, and even has the same median, 2052 (though forecasters could have read Ajeya’s report). Katja Grace et al surveyed 352 AI experts, and they gave a median estimate of 2062 for an AI that could “outperform humans at all tasks” (though with many caveats and high sensitivity to question framing). This was before Ajeya’s report, so they definitely didn’t read it. So lots of Ajeya’s different methods and lots of other people presumably using different methodologies or no methodology at all, all converge on this same idea of 2050 give or take a decade or two. An optimist might say “The truth points to itself! There are 371 known proofs of the Pythagorean Theorem, and they all end up in the same place. That’s because no matter what methodology you use, if you use it well enough you get to the correct answer.” A pessimist might be more suspicious; we’ll return to this part later. FLOPS Alone Turn The Wheel Of History One more question: what if this is all bullshit? What if it’s an utterly useless total garbage steaming pile of grade A crap? Imagine a scientist in Victorian Britain, speculating on when humankind might invent ships that travel through space. He finds a natural anchor: the moon travels through space! He can observe things about the moon: for example, it is 220 miles in diameter (give or take an order of magnitude). So when humankind invents ships that are 220 miles in diameter, they can travel through space! Ships have certainly grown in size tremendously, from primitive kayaks to Roman triremes to Spanish galleons to the great ocean liners of the (Victorian) present. The AI forecasting organization AI Impacts actually has a whole report on historical ship size trends to prove an unrelated point about technological progress, so I didn’t even have to make this graph up. Suppose our Victorian scientist lived in 1858, right when the Great Eastern was launched. The trend line for ship size crossed 100m around 1843, and 200m in 1858, so doubling time is 15 years - but perhaps they notice this is going to be an outlier, so let’s round up a bit and say 18 years. The (one order of magnitude off estimate for the size of the) Moon is 350,000m, so you’d need ships to scale up by 350,000/200 = 1,750x before they’re as big as the Moon. That’s about 10.8 doublings, and a doubling time is 18 years, so we’ll get spaceships in . . . 2052 exactly. (fudging numbers to land where you want is actually fun and easy) SS Great Eastern, the extreme outlier large steamship from 1858. This has become sort of a mascot for quantitative technological progress forecasters. What is this scientist’s error? The big one is thinking that spaceship progress depends on some easily-measured quantity (size) instead of on fundamental advances (eg figuring out how rockets work). You can make the same accusation against Ajeya et al: you can have all the FLOPs in the world, but if you don’t understand how to make a machine think, your AI will be, well, a flop. Ajeya discusses this a bit on page 143 of her report. There is some sense in which FLOPs and knowing-what-you’re-doing trade of against each other. If you have literally no idea what you’re doing, you can sort of kind of re-run evolution until it comes up with something that looks good. If things are somehow even worse than that, you could always run AIXI, a hypothetical AI design guaranteed to get excellent results as long as you have infinite computation. You could run a Go engine by searching the entire branching tree structure of Go - you shouldn’t, and it would take a zillion times more compute than exists in the entire world, but you could. So in some sense what you’re doing, when you’re figuring out what you’re doing, is coming up with ways to do already-possible things more efficiently. But that’s just algorithmic progress, which Ajeya has already baked into her model. (our Victorian scientist: “As a reductio ad absurdum, you could always stand the ship on its end, and then climb up it to reach space. We’re just trying to make ships that are more efficient than that.”) Part II: Biology-Inspired AI Timelines: The Trick That Never Works Eliezer Yudkowsky presents a more subtle version of these kinds of objection in an essay called Biology-Inspired AI Timelines: The Trick That Never Works, published December 2021. Ajeya’s report is a 169-page collection of equations, graphs, and modeling assumptions. Yudkowsky’s rebuttal is a fictional dialogue between himself, younger versions of himself, famous AI scientists, and other bit players. At one point, a character called “Humbali” shows up begging Yudkowsky to be more humble, and Yudkowsky defeats him with devastating counterarguments. Still, he did found the field, so I guess everyone has to listen to him. He starts: in 1988, famous AI scientist Hans Moravec predicted human-level AI by 2010. He was using the same methodology as Ajeya: extrapolate how quickly processing power would grow (in FLOP/S), and see when it would match some estimate of the human brain. Moravec got the processing power almost exactly right (it hit his 2010 projection in 2008) and his human brain estimate pretty close (he says 10^13 FLOP/S, Ajeya says 10^15, this 2 OOM difference only delays things a few years), yet there was not human-level AI in 2010. What happened? Ajeya's answer could be: Moravec didn't realize that, in the modern ML paradigm, any given size of program requires a much bigger program to train. Ajeya, who has a 35-year advantage on Moravec, estimates approximately the same power for the finished program (10^16 vs. 10^13 FLOP/S) but says that training the 10^16 FLOP/S program will require 10^33ish FLOPs. Eliezer agrees as far as it goes, but says this points to a much deeper failure mode, which was that Moravec had no idea what he was doing. He was assuming processing power of human brain = processing power of computer necessary for AGI. Why? The human brain consumes around 20 watts of power. Can we thereby conclude that an AGI should consume around 20 watts of power, and that, when technology advances to the point of being able to supply around 20 watts of power to computers, we'll get AGI? […] You say that AIs consume energy in a very different way from brains? Well, they'll also consume computations in a very different way from brains! The only difference between these two cases is that you know something about how humans eat food and break it down in their stomachs and convert it into ATP that gets consumed by neurons to pump ions back out of dendrites and axons, while computer chips consume electricity whose flow gets interrupted by transistors to transmit information. Since you know anything whatsoever about how AGIs and humans consume energy, you can see that the consumption is so vastly different as to obviate all comparisons entirely. You are ignorant of how the brain consumes computation, you are ignorant of how the first AGIs built would consume computation, but "an unknown key does not open an unknown lock" and these two ignorant distributions should not assert much internal correlation between them. Cars don’t move by contracting their leg muscles and planes don’t fly by flapping their wings like birds. Telescopes do form images the same way as the lenses in our eyes, but differ by so many orders of magnitude in every important way that they defy comparison. Why should AI be different? You have to use some specific algorithm when you’re creating AI; why should we expect it to be anywhere near the same efficiency as the ones Nature uses in our brains? The same is true for arguments from evolution, eg Ajeya’s Evolutionary Anchor, ie “it took evolution 10^43 FLOPs of computation to evolve the human brain so maybe that will be the training cost”. AI scientists sitting in labs trying to figure things out, and nematodes getting eaten by other nematodes, are such different methods for designing things that it’s crazy to use one as an estimate for the other. Algorithmic Progress vs. Algorithmic Paradigm Shifts This post is a dialogue, so (Eliezer’s hypothetical model of) OpenPhil gets a chance to respond. They object: this is why we put a term for algorithmic progress in our model. The model isn’t very sensitive to changes in that term. If you want you can set it to some kind of crazy high value and see what happens, but you can’t say we didn’t consider it. OpenPhil: We did already consider that and try to take it into account: our model already includes a parameter for how algorithmic progress reduces hardware requirements. It's not easy to graph as exactly as Moore's Law, as you say, but our best-guess estimate is that compute costs halve every 2-3 years […] Eliezer: The makers of AGI aren't going to be doing 10,000,000,000,000 rounds of gradient descent, on entire brain-sized 300,000,000,000,000-parameter models, algorithmically faster than today. They're going to get to AGI via some route that you don't know how to take, at least if it happens in 2040. If it happens in 2025, it may be via a route that some modern researchers do know how to take, but in this case, of course, your model was also wrong. They're not going to be taking your default-imagined approach algorithmically faster, they're going to be taking an algorithmically different approach that eats computing power in a different way than you imagine it being consumed. OpenPhil: Shouldn't that just be folded into our estimate of how the computation required to accomplish a fixed task decreases by half every 2-3 years due to better algorithms? Eliezer: Backtesting this viewpoint on the previous history of computer science, it seems to me to assert that it should be possible to: Train a pre-Transformer RNN/CNN-based model, not using any other techniques invented after 2017, to GPT-2 levels of performance, using only around 2x as much compute as GPT-2;
Inline links: This document, a supercomputer in Japan, https://substackcdn.com/image/fetch/$s_!svqA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5924b1f-a563-4332-b137-ff9dda5580d0_1240x516.jpeg, source, here, Japanese canopy plant, https://substackcdn.com/image/fetch/$s_!gj-T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F333dcbf2-1f63-42a1-821f-94f39818e62d_1280x897.jpeg, Measuring The Algorithmic Efficiency Of Neural Networks, https://substackcdn.com/image/fetch/$s_!dX1J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9496f1f-ec6c-41a2-8c2e-27f09da22097_1280x759.png, here, https://substackcdn.com/image/fetch/$s_!LnC0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F62d647ff-58ed-4e9a-9f1a-7febf5859249_1152x842.png, Colab notebook, Google spreadsheet, https://substackcdn.com/image/fetch/$s_!BND-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F622bac28-eaa6-40b5-b93b-695952966ef7_744x324.png, https://substackcdn.com/image/fetch/$s_!lbos!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7d5c2306-a123-4903-adb9-d961d56ebfb5_1152x842.png, Metaculus, https://substackcdn.com/image/fetch/$s_!SMnF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F807f66de-8c5c-4423-b293-ca92b5b64053_763x360.png, surveyed 352 AI experts, https://substackcdn.com/image/fetch/$s_!JxQ5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fceba6aa0-dbde-41ca-805e-01af4fac9324_769x336.png, a whole report on historical ship size trends, https://substackcdn.com/image/fetch/$s_!PRDj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fde3d97f4-afca-45c4-9ed2-521cd25041df_460x262.jpeg, AIXI, Biology-Inspired AI Timelines: The Trick That Never Works
It looks like this (source) So why don’t we have AI yet? Why don’t we have ten AIs? In the modern paradigm of machine learning, it takes very big computers to train relatively small end-product AIs. If you tried to train GPT-3 on the same kind of medium-sized computers you run it on, it would take between tens and hundreds of years. Instead, you train GPT-3 on giant supercomputers like the ones above, get results in a few months, then run it on medium-sized computers, maybe ~10x better than the average desktop. But our hypothetical future human-level AI is 10^16 FLOP/S in inference mode. It needs to run on a giant supercomputer like the one in the picture. Nothing we have now could even begin to train it. There’s no direct and obvious way to convert inference requirements to training requirements. Ajeya tries assuming that each parameter will contribute about 10 FLOPs, which would mean the model would have about 10^15 parameters (GPT-3 has about 10^11 parameters). Finally, she uses some empirical scaling laws derived from looking at past machine learning projects to estimate that training 10^15 parameters would require H*10^30 FLOPs, where H represents the model’s “horizon”. If I understand this correctly, “horizon” is a reinforcement learning concept: how long does it take to learn how much reward you got for something? If you’re playing a slot machine, the answer is one second. If you’re starting a company, the answer might be ten years. So what horizon do you need for human level AI? Who knows? It probably depends on what human-level task you want the AI to do, plus how well an AI can learn to do that task from things less complex than the entire task. If writing a good book is mostly about learning to write good sentence and then stringing them together, a book-writing AI can get away with a short horizon. If nothing short of writing an entire book and then evaluating it to see whether it is good or bad can possibly teach you book-writing, the AI will need a long time horizon. Ajeya doesn’t claim to have a great answer for this, and considers three models: horizons of a few minutes, a few hours, and a few years. Each step up adds another three orders of magnitude, so she ends up with three estimates of 10^30, 10^33, and 10^36 FLOPs. (for reference, the lowest training estimate - 10^30 - would take the supercomputer pictured above 300,000 years to complete; the highest, 300 billion.) Or What If We Ignore All Of That And Do Something Else? This is piling a lot of assumptions atop each other, so Ajeya tries three other methods of figuring out how hard this training task is. Humans seem to be human-level AIs. How much training do we need? You can analogize our childhood to an AI’s training period. We receive a stream of sense-data. We start out flailing kind of randomly. Some of what we do gets rewarded. Some of what we do gets punished. Eventually our behavior becomes more sophisticated. We subject our new behavior to reward or punishment, fine-tune it further. Rent asks us: how do you measure the life of a woman or man? It answers: “in daylights, in sunsets, in midnights, in cups of coffee; in inches, in miles, in laughter, in strife.” But you can also measure in floating point operations, in which case the answer is about 10^24. This is actually trivial: multiply the 10^15 FLOP/S of the human brain by the ~10^9 seconds of childhood and adolescence. This new estimate of 10^24 is much lower than our neural net estimate of 10^30 - 10^36 above. In fact, it’s only a hair above the amount it took to train GPT-3! If human-level AI was this easy, we should have hit it by accident sometime in the process of making a GPT-4 prototype. Since OpenAI hasn’t mentioned this, probably it’s harder than this and we’re missing something. Probably we’re missing that humans aren’t blank slates. We don’t start at zero and then only use our childhood to train us further. The very structure of our brain encodes certain assumptions about what kinds of data we should be looking out for and how we should use it. Our training data isn’t just what we observed during childhood, it’s everything that any of our ancestors observed during evolution. How many floating-point operations is the evolutionary process? Ajeya estimates 10^41. I can’t believe I’m writing this. I can’t believe someone actually estimated the number of floating point operations involved in jellyfish rising out of the primordial ooze and eventually becoming fish and lizards and mammals and so on all the way to the Ascent of Man. Still, the idea is simple. You estimate how long animals with neurons have been around for (10^16 seconds), total number of animals at any given second (10^20) times average number of FLOPS per animal (10^5) and you can read more here but it comes out to 10^41 FLOs. I would not call this an exact estimate - for one thing, it assumes that all animals are nematodes, on the grounds that non-nematode animals are basically a rounding error in the grand scheme of things. But it does justify this bizarre assumption, and I don’t feel inclined to split hairs here - surely the total amount of computation performed by evolution is irrelevant except as an extreme upper bound? Surely the part where Australia got all those weird marsupials wasn’t strictly necessary for the human brain to have human-level intelligence? One more weird human training data estimate attempt: what about the genome? If in some sense a bit of information in the genome is a “parameter”, how many parameters does that suggest humans have, and how does it affect training time? Ajeya calculates that the genome has about 7.5x10^8 parameters (compared to 10^15 parameters in our neural net calculation, and 10^11 for GPT-3). So we can… Okay, I’ve got to admit, this doesn’t have quite the same “huh?!” factor as trying to calculate the number of FLOs in evolution, but it is in a lot of ways even crazier. The Japanese canopy plant has a genome fifty times larger than ours, which suggests that genome size doesn’t correspond very well to organism awesomeness. Also, most of the genome is coding for weird proteins that stabilize the shape of your kidney tubule or something, why should this matter for intelligence? The Japanese canopy plant. I think it is very pretty, but probably low prettiness per megabyte of DNA. I think Ajeya would answer that she’s debating orders of magnitude here, and each of these weird things costs only a few OOMs and probably they all even out. That still leaves the question of why she thinks this approach is interesting at all, to which she answers that: The motivating intuition is that evolution performed a search over a space of small, compact genomes which coded for large brains rather than directly searching over the much larger space of all possible large brains, and human researchers may be able to compete with evolution on this axis. So maybe instead of having to figure out how to generate a brain per se, you figure out how to generate some short(er) program that can output a brain? But this would be very different from how ML works now. Also, you need to give each short program the chance to unfold into a brain before you can evaluate it, which evolution has time for but we probably don’t. Ajeya sort of mentions these problems and counters with an argument that maybe you could think of the genome as a reinforcement learner with a long horizon. I don’t quite follow this but it sounds like the sort of thing that almost might make sense. Anyway, when you apply the scaling laws to a 7.5*10^8 parameter genome and penalize it for a long horizon, you get about 10^33 FLOPs, which is weirdly similar to some of the other estimates. So now we have six different training cost estimates. First, neural nets with short, medium, and long horizons, which are 10^30, 10^33, and 10^36 FLOPs, respectively. Next, the amount of training data in a human lifetime - 10^24 FLOs - and in all of evolutionary history - 10^41 FLOPs. And finally, this weird genome thing, which is 10^33 FLOPs. An optimist might say “Well, our lowest estimate is 10^24 FLOPs, our highest is 10^41 FLOPs, those sound like kind of similar numbers, at least there’s no “5 FLOPs” or “10^9999 FLOPs” in there. A pessimist might say “The difference between 10^24 and 10^41 is seventeen orders of magnitude, ie a factor of 100,000,000,000,000,000 times. This barely constrains our expectations at all!” Before we decide who to trust, let’s remember that we’re still only at Step 2 of our eight step Methodology, and continue. How Do We Adjust For Algorithmic Progress? So today, in 2022 (or in 2020 when this was written, or whenever), assume it would take about 10^33 FLOs to train a human-level AI. But technology constantly advances. Maybe we’ll discover ways to train AIs faster, or run AIs more efficiently, or something like that. How does that factor into our estimate? Ajeya draws on Hernandez & Brown’s Measuring The Algorithmic Efficiency Of Neural Networks. They look at how many FLOPs it took to train various image recognition AIs to an equivalent level of performance between 2012 and 2019, and find that over those seven years it decreased by a factor of 44x, ie training efficiency doubles every sixteen months! Ajeya assumes a doubling time slightly longer than that, because it’s easier to make progress in simple well-understood fields like image recognition than in the novel task of human-level AI. She chooses a doubling time of “merely” 2 - 3 years. If training efficiency doubles every 2-3 years, it would dectuple in about 10 years. So although it might take 10^33 FLOPs to train a human level AI today, in ten years or so it may take only 10^32, in twenty years 10^31, and so on. When Will Anyone Have Enough Computational Resources To Train A Human-Level AI? In 2020, AI researchers could buy computational resources at about $1 for 10^17 FLOPs. That means the 10^33 FLOPs you’d need to train a human-level AI would cost $10^16, ie ten quadrillion dollars. This is about twenty times more money than exists in the entire world. But compute costs fall quickly. Some formulations of Moore’s Law suggest it halves every eighteen months. These no longer seem to hold exactly, but it does seem to be halving maybe once every 2.5 years. The exact number is kind of controversial: Ajeya admits it’s been more like once every 3-4 years lately, but she heard good things about some upcoming chips and predicted it might revert back to the longer-term faster trend (it’s been two years now, some new chips have come out, and this prediction is looking pretty good). So as time goes on, algorithmic progress will cut the cost of training (in FLOPs), and hardware progress will also cut the cost of FLOPs (in dollars). So training will become gradually more affordable as time goes on. Once it reaches a cost somebody is willing to pay, they’ll buy human-level AI, and then that will be the year human-level AI happens. What is the cost that somebody (company? government? billionaire?) is willing to pay for human-level AI? The most expensive AI training in history was AlphaStar, a DeepMind project that spent over $1 million to train an AI to play StarCraft (in their defense, it won). But people have been pouring more and more money into AI lately: Source here. This is about compute rather than cost, but most of the increase seen here has been companies willing to pay for more compute over time, rather than algorithmic or hardware progress. The StarCraft AI was kind of a vanity project, or science for science’s sake, or whatever you want to call it. But AI is starting to become profitable, and human-level AI would be very profitable. Who knows how much companies will be willing to pay in the future? Ajeya extrapolates the line on the graph forward to 2025 and gets $1 billion. This is starting to sound kind of absurd - the entire company OpenAI was founded with $1 billion in venture capital, it seems like a lot to expect them to spend more than $1 billion on a single training run. So Ajeya backs off from this after 2025 and predicts a “two year doubling time”. This is not much of a concession. It still means that in 2040 someone might be spending $100 billion to train one AI. Is this at all plausible? At the height of the Manhattan Project, the US was investing about 0.5% of its GDP into the effort; a similar investment today would be worth $100 billion. And we’re about twice as rich as 2000, so 2040 might be twice as rich as we are. At that point, $100 billion for training an AI is within reach of Google and maybe a few individual billionaires (though it would still require most or all of their fortune). Ajeya creates a complicated function to assess how much money people will be willing to pay on giant AI projects per year. This looks like an upward-sloping curve. The line representing the likely cost of training a human-level AI looks like a downward sloping curve. At some point, those two curves meet, representing when human-level AI will first be trained. So When Will We Get Human-Level AI? The report gives a long distribution of dates based on weights assigned to the six different models, each of which has really wide confidence intervals and options for adjusting the mean and variance based on your assumptions. But the median of all of that is 10% chance by 2031, 50% chance by 2052, and almost 80% chance by 2100. Ajeya takes her six models and decides to weigh them like so, based on how plausible she thinks each one is: 20% neural net, short horizon 30% neural net, medium horizon 15% neural net, long horizon 5% human lifetime as training data 10% evolutionary history as training data 10% genome as parameter number She ends up with this: How Sensitive Is This To Changes In Assumptions? She very helpfully gives us a Colab notebook and Google spreadsheet to play around with. The notebook lets you change some of the more detailed parameters of the individual models, and the spreadsheet lets you change the big picture. I leave the notebook to people more dedicated to forecasting than I am, and will talk about the spreadsheet here. If you’re following along at home, the default spreadsheet won’t reflect Ajeya’s findings until you fill in the table in the bottom left like so: Great. Now that we’ve got that, let’s try changing some stuff. I like the human childhood training data argument (Lifetime Anchor) more than Ajeya does, and I like the size-of-the-genome argument less. I’m going to change the weights to 20-20-0-20-20-20. Also, Ajeya thinks that someone might be willing to spend 1% of national GDP on training AIs, but that sounds really high to me, so I’m going to down to 0.1%. Also, Ajeya’s estimate of 3% GDP growth sounds high for the sort of industrialized nations who might do AI research, I’m going to lower it to 2%. Since I’m feeling mistrustful today, let’s use the Hernandez&Brown estimate for compute halving (1.5 years) in place of Ajeya’s ad hoc adjustments. And let’s use the current compute halving time (3.5 years) instead of Ajeya’s overly rosy version (2.5 years). All these changes… …don’t really do much. The median goes from 2052 to about 2065. Four of the models give results between 2030 and 2070. The last two, Neural Net With Long Horizon and Evolution, suggest probably no AI this century (although Neural Net With Long Horizon does think there’s a 40% chance by 2100). Ajeya doesn’t really like either of these models and they’re not heavily weighted in her main result. Does The Truth Point To Itself? Back up a second. Here’s something that makes me kind of nervous. Most of Ajeya’s numbers are kind of made up, with several order-of-magnitude error bars and simplifying assumptions like “all animals are nematodes”. For a single parameter, we get estimates spanning seventeen different orders of magnitude: the upper bound is one hundred quadrillion times the lower bound. And yet four of the six models, including two genuinely exotic ones, manage to get dates within twenty years of 2050. And 2050 is also the date everyone else focuses on. Here’s the prediction-market-like site Metaculus: Their distribution looks a lot like Ajeya’s, and even has the same median, 2052 (though forecasters could have read Ajeya’s report). Katja Grace et al surveyed 352 AI experts, and they gave a median estimate of 2062 for an AI that could “outperform humans at all tasks” (though with many caveats and high sensitivity to question framing). This was before Ajeya’s report, so they definitely didn’t read it. So lots of Ajeya’s different methods and lots of other people presumably using different methodologies or no methodology at all, all converge on this same idea of 2050 give or take a decade or two. An optimist might say “The truth points to itself! There are 371 known proofs of the Pythagorean Theorem, and they all end up in the same place. That’s because no matter what methodology you use, if you use it well enough you get to the correct answer.” A pessimist might be more suspicious; we’ll return to this part later. FLOPS Alone Turn The Wheel Of History One more question: what if this is all bullshit? What if it’s an utterly useless total garbage steaming pile of grade A crap? Imagine a scientist in Victorian Britain, speculating on when humankind might invent ships that travel through space. He finds a natural anchor: the moon travels through space! He can observe things about the moon: for example, it is 220 miles in diameter (give or take an order of magnitude). So when humankind invents ships that are 220 miles in diameter, they can travel through space! Ships have certainly grown in size tremendously, from primitive kayaks to Roman triremes to Spanish galleons to the great ocean liners of the (Victorian) present. The AI forecasting organization AI Impacts actually has a whole report on historical ship size trends to prove an unrelated point about technological progress, so I didn’t even have to make this graph up. Suppose our Victorian scientist lived in 1858, right when the Great Eastern was launched. The trend line for ship size crossed 100m around 1843, and 200m in 1858, so doubling time is 15 years - but perhaps they notice this is going to be an outlier, so let’s round up a bit and say 18 years. The (one order of magnitude off estimate for the size of the) Moon is 350,000m, so you’d need ships to scale up by 350,000/200 = 1,750x before they’re as big as the Moon. That’s about 10.8 doublings, and a doubling time is 18 years, so we’ll get spaceships in . . . 2052 exactly. (fudging numbers to land where you want is actually fun and easy) SS Great Eastern, the extreme outlier large steamship from 1858. This has become sort of a mascot for quantitative technological progress forecasters. What is this scientist’s error? The big one is thinking that spaceship progress depends on some easily-measured quantity (size) instead of on fundamental advances (eg figuring out how rockets work). You can make the same accusation against Ajeya et al: you can have all the FLOPs in the world, but if you don’t understand how to make a machine think, your AI will be, well, a flop. Ajeya discusses this a bit on page 143 of her report. There is some sense in which FLOPs and knowing-what-you’re-doing trade of against each other. If you have literally no idea what you’re doing, you can sort of kind of re-run evolution until it comes up with something that looks good. If things are somehow even worse than that, you could always run AIXI, a hypothetical AI design guaranteed to get excellent results as long as you have infinite computation. You could run a Go engine by searching the entire branching tree structure of Go - you shouldn’t, and it would take a zillion times more compute than exists in the entire world, but you could. So in some sense what you’re doing, when you’re figuring out what you’re doing, is coming up with ways to do already-possible things more efficiently. But that’s just algorithmic progress, which Ajeya has already baked into her model. (our Victorian scientist: “As a reductio ad absurdum, you could always stand the ship on its end, and then climb up it to reach space. We’re just trying to make ships that are more efficient than that.”) Part II: Biology-Inspired AI Timelines: The Trick That Never Works Eliezer Yudkowsky presents a more subtle version of these kinds of objection in an essay called Biology-Inspired AI Timelines: The Trick That Never Works, published December 2021. Ajeya’s report is a 169-page collection of equations, graphs, and modeling assumptions. Yudkowsky’s rebuttal is a fictional dialogue between himself, younger versions of himself, famous AI scientists, and other bit players. At one point, a character called “Humbali” shows up begging Yudkowsky to be more humble, and Yudkowsky defeats him with devastating counterarguments. Still, he did found the field, so I guess everyone has to listen to him. He starts: in 1988, famous AI scientist Hans Moravec predicted human-level AI by 2010. He was using the same methodology as Ajeya: extrapolate how quickly processing power would grow (in FLOP/S), and see when it would match some estimate of the human brain. Moravec got the processing power almost exactly right (it hit his 2010 projection in 2008) and his human brain estimate pretty close (he says 10^13 FLOP/S, Ajeya says 10^15, this 2 OOM difference only delays things a few years), yet there was not human-level AI in 2010. What happened? Ajeya's answer could be: Moravec didn't realize that, in the modern ML paradigm, any given size of program requires a much bigger program to train. Ajeya, who has a 35-year advantage on Moravec, estimates approximately the same power for the finished program (10^16 vs. 10^13 FLOP/S) but says that training the 10^16 FLOP/S program will require 10^33ish FLOPs. Eliezer agrees as far as it goes, but says this points to a much deeper failure mode, which was that Moravec had no idea what he was doing. He was assuming processing power of human brain = processing power of computer necessary for AGI. Why? The human brain consumes around 20 watts of power. Can we thereby conclude that an AGI should consume around 20 watts of power, and that, when technology advances to the point of being able to supply around 20 watts of power to computers, we'll get AGI? […] You say that AIs consume energy in a very different way from brains? Well, they'll also consume computations in a very different way from brains! The only difference between these two cases is that you know something about how humans eat food and break it down in their stomachs and convert it into ATP that gets consumed by neurons to pump ions back out of dendrites and axons, while computer chips consume electricity whose flow gets interrupted by transistors to transmit information. Since you know anything whatsoever about how AGIs and humans consume energy, you can see that the consumption is so vastly different as to obviate all comparisons entirely. You are ignorant of how the brain consumes computation, you are ignorant of how the first AGIs built would consume computation, but "an unknown key does not open an unknown lock" and these two ignorant distributions should not assert much internal correlation between them. Cars don’t move by contracting their leg muscles and planes don’t fly by flapping their wings like birds. Telescopes do form images the same way as the lenses in our eyes, but differ by so many orders of magnitude in every important way that they defy comparison. Why should AI be different? You have to use some specific algorithm when you’re creating AI; why should we expect it to be anywhere near the same efficiency as the ones Nature uses in our brains? The same is true for arguments from evolution, eg Ajeya’s Evolutionary Anchor, ie “it took evolution 10^43 FLOPs of computation to evolve the human brain so maybe that will be the training cost”. AI scientists sitting in labs trying to figure things out, and nematodes getting eaten by other nematodes, are such different methods for designing things that it’s crazy to use one as an estimate for the other. Algorithmic Progress vs. Algorithmic Paradigm Shifts This post is a dialogue, so (Eliezer’s hypothetical model of) OpenPhil gets a chance to respond. They object: this is why we put a term for algorithmic progress in our model. The model isn’t very sensitive to changes in that term. If you want you can set it to some kind of crazy high value and see what happens, but you can’t say we didn’t consider it. OpenPhil: We did already consider that and try to take it into account: our model already includes a parameter for how algorithmic progress reduces hardware requirements. It's not easy to graph as exactly as Moore's Law, as you say, but our best-guess estimate is that compute costs halve every 2-3 years […] Eliezer: The makers of AGI aren't going to be doing 10,000,000,000,000 rounds of gradient descent, on entire brain-sized 300,000,000,000,000-parameter models, algorithmically faster than today. They're going to get to AGI via some route that you don't know how to take, at least if it happens in 2040. If it happens in 2025, it may be via a route that some modern researchers do know how to take, but in this case, of course, your model was also wrong. They're not going to be taking your default-imagined approach algorithmically faster, they're going to be taking an algorithmically different approach that eats computing power in a different way than you imagine it being consumed. OpenPhil: Shouldn't that just be folded into our estimate of how the computation required to accomplish a fixed task decreases by half every 2-3 years due to better algorithms? Eliezer: Backtesting this viewpoint on the previous history of computer science, it seems to me to assert that it should be possible to: Train a pre-Transformer RNN/CNN-based model, not using any other techniques invented after 2017, to GPT-2 levels of performance, using only around 2x as much compute as GPT-2;
Inline links: source, here, Japanese canopy plant, https://substackcdn.com/image/fetch/$s_!gj-T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F333dcbf2-1f63-42a1-821f-94f39818e62d_1280x897.jpeg, Measuring The Algorithmic Efficiency Of Neural Networks, https://substackcdn.com/image/fetch/$s_!dX1J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9496f1f-ec6c-41a2-8c2e-27f09da22097_1280x759.png, here, https://substackcdn.com/image/fetch/$s_!LnC0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F62d647ff-58ed-4e9a-9f1a-7febf5859249_1152x842.png, Colab notebook, Google spreadsheet, https://substackcdn.com/image/fetch/$s_!BND-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F622bac28-eaa6-40b5-b93b-695952966ef7_744x324.png, https://substackcdn.com/image/fetch/$s_!lbos!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7d5c2306-a123-4903-adb9-d961d56ebfb5_1152x842.png, Metaculus, https://substackcdn.com/image/fetch/$s_!SMnF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F807f66de-8c5c-4423-b293-ca92b5b64053_763x360.png, surveyed 352 AI experts, https://substackcdn.com/image/fetch/$s_!JxQ5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fceba6aa0-dbde-41ca-805e-01af4fac9324_769x336.png, a whole report on historical ship size trends, https://substackcdn.com/image/fetch/$s_!PRDj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fde3d97f4-afca-45c4-9ed2-521cd25041df_460x262.jpeg, AIXI, Biology-Inspired AI Timelines: The Trick That Never Works
I don't mean that it's not what Google says on page one of the search results. That part is true. But if you click through to page 15 of the results for this search, you find that the estimate reduces from 311,000 to 149 results. Google has decided that want to always provide an estimate of the total number of results for every search, but they have neither precomputed accurate estimates for all possible searches, nor do they wish to spend the compute to calculate good estimates on the fly for every search, when most people never go past page one. Their estimates can be ok for searches on common words (where they most likely do have cached in a database somewhere the current number of web pages associated with that term), but for compound phrases, they take each of the component words, and do some kind of math to estimate the value. So here, they would look at both "climate" hits (4,470,000,000 results), and "villains" hits (2,190,000,000 results), and maybe a few other parameters, and make a guess as to how often these appear together. Unfortunately, these guesses have almost no relationship to reality.
I often see these number cited as evidence for how prevalent something is. Given Google's reputation and prevalence, I find it pretty irresponsible that they still list these estimates despite knowing how wrong they are. But presumably some product manager likes showing users a lot of zeros to give an inflated impression of how comprehensive Google's web crawling is.
https://karl-voit.at/2017/01/15/google-search-estimates/
Inline links: https://karl-voit.at/2017/01/15/google-search-estimates/
Chess AI performance over time. Why does this matter? If there’s a slow takeoff (ie gradual exponential curve), it will become obvious that some kind of terrifying transformative AI revolution is happening, before the situation gets apocalyptic. There will be time to prepare, to test slightly-below-human AIs and see how they respond, to get governments and other stakeholders on board. We don’t have to get every single thing right ahead of time. On the other hand, because this is proceeding along the usual channels, it will be the usual variety of muddled and hard-to-control. With the exception of a few big actors like the US and Chinese government, and maybe the biggest corporations like Google, the outcome will be determined less by any one agent, and more by the usual multi-agent dynamics of political and economic competition. There will be lots of opportunities to affect things, but no real locus of control to do the affecting. If there’s a fast takeoff (ie sudden FOOM), there won’t be much warning. Conventional wisdom will still say that transformative AI is thirty years away. All the necessary pieces (ie AI alignment theory) will have to be ready ahead of time, prepared blindly without any experimental trial-and-error, to load into the AI as soon as it exists. On the plus side, a single actor (whoever has this first AI) will have complete control over the process. If this actor is smart (and presumably they’re a little smart, or they wouldn’t be the first team to invent transformative AI), they can do everything right without going through the usual government-lobbying channels. So the slower a takeoff you expect, the less you should be focusing on getting every technical detail right ahead of time, and the more you should be working on building the capacity to steer government and corporate policy to direct an incoming slew of new technologies. Yudkowsky Contra Christiano Eliezer counters that although progress may retroactively look gradual and continuous when you know what metric to graph it on, it doesn’t necessarily look that way in real life by the measures that real people care about. (one way to think of this: imagine that an AI’s effective IQ starts at 0.1 points, and triples every year, but that we can only measure this vaguely and indirectly. The year it goes from 5 to 15, you get a paper in a third-tier journal reporting that it seems to be improving on some benchmark. The year it goes from 66 to 200, you get a total transformation of everything in society. But later, once we identify the right metric, it was just the same rate of gradual progress the whole time. ) So Eliezer is much less impressed by the history of previous technologies than Paul is. He’s also skeptical of the “GDP will double in 4 years before it doubles in 1” claim, because of two contingent disagreements and two fundamental disagreements. The first contingent disagreement: government regulations make it hard to deploy imperfect things, and non-trivial to deploy things even after they’re perfect. Eliezer has non-jokingly said he thinks AI might destroy the world before the average person can buy a self-driving car. Why? Because the government has to approve self-driving cars (and can drag its feet on that), but the apocalypse can happen even without government approval. In Paul’s model, sometime long before superintelligence we should have AIs that can drive cars, and that increases GDP and contributes to a general sense that exciting things are going on. Eliezer says: fine, what if that’s true? Who cares if self-driving cars will be practical a few years before the world is destroyed? It’ll take longer than that to lobby the government to allow them on the road. The second contingent disagreement: superintelligent AIs can lie to us. Suppose you have an AI which wants to destroy humanity, whose IQ is doubling every six months. Right now it’s at IQ 200, and it suspects that it would take IQ 800 to build a human-destroying superweapon. Its best strategy is to lie low for a year. If it expects humans would turn it off if they knew how close it was to superweapons, it can pretend to be less intelligent than it really is. The period when AIs are holding back so we don’t discover their true power level looks like a period of lower-than-expected GDP growth - followed by a sudden FOOM once the AI gets its superweapon and doesn’t need to hold back. So even if Paul is conceptually right and fundamental progress proceeds along a nice smooth curve, it might not look to us like a nice smooth curve, because regulations and deceptive AIs could prevent mildly-transformative AI progress from showing up on graphs, but wouldn’t prevent the extreme kind of AI progress that leads to apocalypse. To an outside observer, it would just look like nothing much changed, nothing much changed, nothing much changed, and then suddenly, FOOM. But even aside from this, Eliezer doesn’t think Paul is conceptually right! He thinks that even on the fundamental level, AI progress is going to be discontinuous. It’s like a nuclear bomb. Either you don’t have a nuclear bomb yet, or you do have one and the world is forever transformed. There is a specific moment at which you go from “no nuke” to “nuke” without any kind of “slightly worse nuke” acting as a harbinger. He uses the example of chimps → humans. Evolution has spent hundreds of millions of years evolving brainier and brainier animals (not teleologically, of course, but in practice). For most of those hundreds of millions of years, that meant the animal could have slightly more instincts, or a better memory, or some other change that still stayed within the basic animal paradigm. At the chimp → human transition, we suddenly got tool use, language use, abstract thought, mathematics, swords, guns, nuclear bombs, spaceships, and a bunch of other stuff. The rhesus monkey → chimp transition and the chimp → human transition both involved the same ~quadrupling of neuron number, but the former was pretty boring and the latter unlocked enough new capabilities to easily conquer the world. The GPT-2 → GPT-3 transition involved centupling parameter count. Maybe we will keep centupling parameter count every few years, and most times it will be incremental improvement, and one time it will conquer the world. But even talking about centupling parameter points is giving Paul too much credit. Lots of past inventions didn’t come by quadrupling or centupling something, they came by discovering “the secret sauce”. The Wright brothers (he argues) didn’t make a plane with 4x the wingspan of the last plane that didn’t work, they invented the first plane that could fly at all. The Hiroshima bomb wasn’t some previous bomb but bigger, it was what happened after a lot of scientists spent a long time thinking about a fundamentally different paradigm of bomb-making and brought it to a point where it could work at all. The first transformative AI isn’t going to be GPT-3 with more parameters, it will be what happens after someone discovers how to make machines truly intelligent. (this is the same debate Eliezer had with Ajeya over the Biological Anchors post; have I mentioned that Ajeya and Paul are married?) Fine, Let’s Nitpick The Hell Out Of The Chimps Vs. Humans Example This is where the two of them end up, so let’s follow. Between chimps and humans, there were about seven million years of intermediate steps. These had some human capabilities, but not others. IE homo erectus probably had language, but not mathematics, and in terms of taking over the world it did make it to most of the Old World but was less dominant than moderns. But if we say evolutionary history started 500 million years ago (the Cambrian), and AI history started with the Dartmouth Conference in 1955, then the equivalent of 7 million years of evolutionary history is 1 year of AI history. In the very very unlikely and forced comparison where evolutionary history and AI history go at the same speed, there will be only about a year between chimp-level and human-level AIs. A chimp-level AI probably can’t double GDP, so this would count as a fast takeoff by Paul’s criterion. But even more than that, chimp → human feels like a discontinuity. It’s not just “animals kept getting smarter for hundreds of millions of years, and then ended up very smart indeed”. That happened for a while, and then all of sudden there was a near-instant phase transition into a totally different way of using intelligence with completely new abilities. If AI worked like this, we would have useful toys and interesting specialists for a few decades, until suddenly someone “got it right”, completed the package that was necessary for “true intelligence”, and then we would have a completely new category of thing. Paul admits this analogy is awkward for his position. He answers: Chimp evolution is not primarily selecting for making and using technology, for doing science, or for facilitating cultural accumulation. The task faced by a chimp is largely independent of the abilities that give humans such a huge fitness advantage. It’s not completely independent—the overlap is the only reason that evolution eventually produces humans—but it’s different enough that we should not be surprised if there are simple changes to chimps that would make them much better at designing technology or doing science or accumulating culture […] So I don’t think the example of evolution tells us much about whether the continuous change story applies to intelligence. This case is potentially missing the key element that drives the continuous change story—optimization for performance. Evolution changes continuously on the narrow metric it is optimizing, but can change extremely rapidly on other metrics. For human technology, features of the technology that aren’t being optimized change rapidly all the time. When humans build AI, they will be optimizing for usefulness, and so progress in usefulness is much more likely to be linear. That is, evolution wasn’t optimizing for tool use/language/intelligence, so we got an “overhang” where chimps could potentially have been very good at these, but evolution never bothered “closing the circuit” and turning those capabilities “on”. After a long time, evolution finally blundered into an area where marginal improvements in these capacities improved fitness, so evolution started improving them and it was easy. Imagine a company which, through some oversight, didn’t have a Sales department. They just sat around designing and manufacturing increasingly brilliant products, but not putting any effort into selling them. Then the CEO remembers they need a Sales department, starts one up, and the company goes from moving near zero units to moving millions of units overnight. It would look like the company had “suddenly” developed a “vast increase in capabilities”. But this is only possible when a CEO who is weirdly unconcerned about profit forgets to do obvious profit-increasing things for many years. This is Paul’s counterargument to the chimp analogy. Evolution isn’t directly concerned about various intellectual skills; it only wants them in the unusual cases where they’ll contribute to fitness on the margin. AI companies will be very concerned about various intellectual skills. If there’s a trivial change that can make their product 10x better, they’ll make it. So AI capabilities will grow in a “well-rounded” way, there won’t be any “overhangs”, and there won’t be any opportunities for a sudden overhang-solving phase transition with associated new-capability development like with chimps → humans. Eliezer answers: Chimps are nearly useless because they're not general, and doing anything on the scale of building a nuclear plant requires mastering so many different nonancestral domains that it's no wonder natural selection didn't happen to separately train any single creature across enough different domains that it had evolved to solve every kind of domain-specific problem involved in solving nuclear physics and chemistry and metallurgy and thermics in order to build the first nuclear plant in advance of any old nuclear plants existing. Humans are general enough that the same braintech selected just for chipping flint handaxes and making water-pouches and outwitting other humans, happened to be general enough that it could scale up to solving all the problems of building a nuclear plant - albeit with some added cognitive tech that didn't require new brainware, and so could happen incredibly fast relative to the generation times for evolutionarily optimized brainware. Now, since neither humans nor chimps were optimized to be "useful" (general), and humans just wandered into a sufficiently general part of the space that it cascaded up to wider generality, we should legit expect the curve of generality to look at least somewhat different if we're optimizing for that. Eg, right now people are trying to optimize for generality with AIs like Mu Zero and GPT-3. In both cases we have a weirdly shallow kind of generality. Neither is as smart or as deeply general as a chimp, but they are respectively better than chimps at a wide variety of Atari games, or a wide variety of problems that can be superposed onto generating typical human text. They are, in a sense, more general than a biological organism at a similar stage of cognitive evolution, with much less complex and architected brains, in virtue of having been trained, not just on wider datasets, but on bigger datasets using gradient-descent memorization of shallower patterns, so they can cover those wide domains while being stupider and lacking some deep aspects of architecture. It is not clear to me that we can go from observations like this, to conclude that there is a dominant mainline probability for how the future clearly ought to go and that this dominant mainline is, "Well, before you get human-level depth and generalization of general intelligence, you get something with 95% depth that covers 80% of the domains for 10% of the pragmatic impact". ...or whatever the concept is here, because this whole conversation is, on my own worldview, being conducted in a shallow way relative to the kind of analysis I did in Intelligence Explosion Microeconomics, where I was like, "here is the historical observation, here is what I think it tells us that puts a lower bound on this input-output curve". Here Eliezer sort of kind of grants Paul’s point that AIs will be optimized for generality in a way chimps aren’t, but points to his previous “Intelligence Explosion Microeconomics” essay to argue that we should expect a fast takeoff anyway. IEM has a lot of stuff in it, but one key point is that instead of using analogies to predict the course of future AI, we should open that black box and try to actually reason about how it will work, in which case we realize that recursive self-improvement common-sensically has to cause an intelligence explosion. I am sort of okay with this, but I feel like a commitment to avoiding analogies should involve not bringing up the chimp-human analogy further, which Eliezer continues to do, quite a lot. I do feel like Paul succeeded in convincing me that we shouldn’t place too much evidential weight on it. The Wimbledon Of Reference Class Tennis “Reference class tennis” is an old rationalist idiom for people throwing analogies back and forth. “AI will be slow, because it’s an economic transition like the Agricultural or Industrial Revolution, and those were slow!” “No, AI will be fast, because it’s an evolutionary step like chimps → humans, and that was fast!” “No, AI will be slow, because it’s an invention, like the computer, and computers were invented piecemeal and required decades of innovation to be useful.” “No, AI will be fast, because it’s an invention, like the nuclear bomb, and nuclear bombs went from impossible to city-killing in a single day.” “No, AI will be slow, because it will be surrounded by a shell-like metallic computer case, which makes it like a turtle, and turtles are slow.” “No, AI will be fast, because it’s dangerous and powerful, like a tiger, and tigers are fast!” And so on. Comparing things to other things is a time-tested way of speculating about them. But there are so many other things to compare to that you can get whatever result you want. This is the failure mode that the term “reference class tennis” was supposed to point to. Both participants in this debate are very smart and trying their hardest to avoid reference-class tennis, but neither entirely succeeds. Eliezer’s preferred classes are Bitcoin (“there wasn't a cryptocurrency developed a year before Bitcoin using 95% of the ideas which did 10% of the transaction volume”), nukes, humans/chimps, the Wright Brothers, AlphaGo (which really was a discontinuous improvement on previous Go engines), and AlphaFold (ditto for proteins). Paul’s preferred classes are the Agricultural and Industrial Revolutions, chess engines (which have gotten better along a gradual, well-behaved curve), all sorts of inventions like computers and ships (likewise), and world GDP. Eliezer already listed most of these in his Intelligence Explosion Microeconomics paper in 2013, and concluded that the space of possible analogies was contradictory enough that we needed to operate at a higher level. Maybe so, but when someone lobs a reference class tennis ball at you, it’s hard to resist the urge to hit it back. Recursive Self-Improvement This is where I think Eliezer most wants to take the discussion. The idea is: once AI is smarter than humans, it can do a superhuman job of developing new AI. In his Microeconomics paper, he writes about an argument he (semi-hypothetically) had with Ray Kurzweil about Moore’s Law. Kurzweil expected Moore’s Law to continue forever, even after the development of superintelligence. Eliezer objects: Suppose we were dealing with minds running a million times as fast as a human, at which rate they could do a year of internal thinking in thirty-one seconds, such that the total subjective time from the birth of Socrates to the death of Turing would pass in 20.9 hours. Do you still think the best estimate for how long it would take them to produce their next generation of computing hardware would be 1.5 orbits of the Earth around the Sun? That is: the fact that it took 1.5 years for transistor density to double isn’t a natural law. It’s pointing to a law that the amount of resources (most notably intelligence) that civilization focused on the transistor-densifying problem equalled the amount it takes to double it every 1.5 years. If some shock drastically changed available resources (by eg speeding up human minds a million times), this would change the resources involved, and the same laws would predict transistor speed doubling in some shorter amount of time (naively 0.000015 years, although realistically at that scale other inputs would dominate). So when Paul derives clean laws of economics showing that things move along slow growth curves, Eliezer asks: why do you think they would keep doing this when one of the discoveries they make along that curve might be “speeding up intelligence a million times”? (Eliezer actually thinks improvements in the quality of intelligence will dominate improvements in speed - AIs will mostly be smarter, not just faster - but speed is a useful example here and we’ll stick with it) Paul answers: Summary of my response: Before there is AI that is great at self-improvement there will be AI that is mediocre at self-improvement. Powerful AI can be used to develop better AI (amongst other things). This will lead to runaway growth. This on its own is not an argument for discontinuity: before we have AI that radically accelerates AI development, the slow takeoff argument suggests we will have AI that significantly accelerates AI development (and before that, slightly accelerates development). That is, an AI is just another, faster step in the hyperbolic growth we are currently experiencing, which corresponds to a further increase in rate but not a discontinuity (or even a discontinuity in rate). The most common argument for recursive self-improvement introducing a new discontinuity seems be: some systems “fizzle out” when they try to design a better AI, generating a few improvements before running out of steam, while others are able to autonomously generate more and more improvements. This is basically the same as the universality argument in a previous section. Eliezer: Oh, come on. That is straight-up not how simple continuous toy models of RSI work. Between a neutron multiplication factor of 0.999 and 1.001 there is a very huge gap in output behavior. Outside of toy models: Over the last 10,000 years we had humans going from mediocre at improving their mental systems to being (barely) able to throw together AI systems, but 10,000 years is the equivalent of an eyeblink in evolutionary time - outside the metaphor, this says, "A month before there is AI that is great at self-improvement, there will be AI that is mediocre at self-improvement." (Or possibly an hour before, if reality is again more extreme along the Eliezer-Hanson axis than Eliezer. But it makes little difference whether it's an hour or a month, given anything like current setups.) This is just pumping hard again on the intuition that says incremental design changes yield smooth output changes, which (the meta-level of the essay informs us wordlessly) is such a strong default that we are entitled to believe it if we can do a good job of weakening the evidence and arguments against it. And the argument is: Before there are systems great at self-improvement, there will be systems mediocre at self-improvement; implicitly: "before" implies "5 years before" not "5 days before"; implicitly: this will correspond to smooth changes in output between the two regimes even though that is not how continuous feedback loops work. I got a bit confused trying to understand the criticality metaphor here. There’s no equivalent of neutron decay, so any AI that can consistently improve its intelligence is “critical” in some sense. Imagine Elon Musk replaces his brain with a Neuralink computer which - aside from having read-write access - exactly matches his current brain in capabilities. Also he becomes immortal. He secludes himself from the world, studying AI and tinkering with his brain’s algorithms. Does he become a superintelligence? I think under the assumptions Paul and Eliezer are using, eventually maybe. After some amount of time he’ll come across a breakthrough he can use to increase his intelligence. Then, armed with that extra intelligence, he’ll be able to pursue more such breakthroughs. However intelligent the AI you’re scared of is, Musk will get there eventually. How long will it take? A good guess might be “years” - Musk starts out as an ordinary human, and ordinary humans are known to take years to make breakthroughs. Suppose it takes Musk one year to come up with a first breakthrough that raises his IQ 1 point. How long will his second breakthrough take? It might take longer, because he has picked the lowest-hanging fruit, and all the other possible breakthroughs are much harder. Or it might take shorter, because he’s slightly smarter than he was before, and maybe some extra intelligence goes a really long way in AI research. The concept of an intelligence explosion seems to assume the second effect dominates the first. This would match the observation that human researchers, who aren’t getting any smarter over time, continue making new discoveries. That suggests the range of possible discoveries at a given intelligence level is pretty vast. Some research finds that the usual pattern in science is constant rate of discovery from exponentially increasing number of researchers, suggesting strong low-hanging fruit effects, but these seem to be overwhelmed by other considerations in AI right now. I think Eliezer’s position on this subject is shaped by assumptions like: If you have an AI as intelligent as Elon Musk today, then tomorrow you can run it on more hardware with a bit of normal human algorithmic progress, and get one twice as intelligent. So even if it would take Elon years to make a breakthrough, long before those years are up you’ll have an AI that can make breakthroughs much faster.
But Xi’s main target has been the Internet. Facebook, Google, YouTube, and Twitter were already blocked when he took power, but he added more search engines (including Bing and DuckDuckGo), more social media (Instagram, Reddit), foreign news (eg BBC, NYT, WaPo, the Economist), and even Wikipedia. This has been bad for business (China’s Internet “ranks ninety-first in the world” and is getting worse, and foreign businesses list difficulty using the Internet as one of their top reasons for not expanding into China more), but Xi thinks it’s a worthwhile tradeoff.
“Man, it’s been a crazy few months. You hear I quit my job at Google and founded a fintech startup?”
“Mmmmm, kind of? I was really into Zen in college. I would sit zazen for two, three hours every day. A few years after I graduated, I took the plunge and quit my job at Google to study a Zen monastery near Kanazawa. The first day I was there, the master said ‘This very world is the Pure Land, and each one of you is already enlightened.’ I was really relieved, because I’d thought I would have to stay at the monastery like ten, maybe twenty years to get enlightened. So I thanked him and went off to pack my stuff. He ran after me, asked ‘Where are you going?’ I said that honestly I wasn’t that into the Zen aesthetic and I was just there to get enlightened - but if I was already enlightened, then mission accomplished and I might as well go back to Google. I spent a couple days seeing Kanazawa, then flew home.”
“I quit my job at Google a few months ago to work on effective altruism. I’m studying sn-risks.”
3: Related: does anyone reading this have access to the new Parti-20B image model from Google? I would like to check whether I have won my bet with Vitor about image model progress. If yes, I will write a post about it and give you good publicity. Please contact me at scott[at]slatestarcodex[dot]com
Inline links: Parti-20B image model, my bet with Vitor
The US keeps starting or engaging in wars, like in Libya, Afghanistan, and Iraq. I will briefly summarize the 3 major sections of the book and how they tackle the first five claims. Section 1: The Old World Order This section refutes the claim that outlawry of war wasn't actually a significant change for anyone at the time. To do so, it covers the history of the international laws of war as described by Hugo Grotius in a set of books titled The Law of War and Peace, including how he came to write it, what the laws were, and how they were used and understood. In this section, H&S work to fully immerse us in the laws of war before the Peace Pact, and the ways that people understood war as a result. I’ve already included a number of things about this up above, so I’ll just put in a few interesting notes here, and if you want more persuasion that people viewed war differently, I’d suggest you pick up the book. There is lots of historical evidence that attitudes toward war before the Peace Pact were not like attitudes toward war today, that people - lawyers, diplomats, sovereigns, and citizens - believed it to be normal and legal, and frequently justified. Conquest in response to debts or offenses was one of the primary motivators of war in the period ruled by the Old World Order (generally, from some time before 1625 when Grotius wrote the rules down to 1928, when the Peace Pact was signed), though H&S also document some of the weirder ones, like a King who declared that they had the right to wage war against another because the other King stole his wife. But because Grotius had declared that no one outside the belligerents could determine whose side was just without violating neutrality, the reasons for war were largely whatever Monarchs could get away, which ran the gamut. Perhaps because it was fashionable, perhaps to convince their citizenry of their rightness, Monarchs paid handsomely for famous thinkers to write manifestos explaining why they were going to war, and other Monarchs and the citizenry generally accepted these reasons. It would be like if Putin had called up Google co-founder Sergey Brin and asked him to write out why Russia had the right to conquer Ukraine, and then everyone else shrugged and decided, sure, that sounds reasonable. Heads of state enlisted esteemed writers and scholars as well as experienced lawyers to draft [war manifestos]. The English military and political leader Oliver Cromwell commissioned John Milton, the great epic poet, to write A Manifesto of the Lord Protector of the Commonwealth in 1655 when he ordered the invasion of the Spanish possessions in the Caribbean. In 1703, the Holy Roman Emperor Leopold I employed Gottfried Leibniz, the rationalist philosopher, co-inventor of calculus, and a trained lawyer, to compose the Manifesto for the Defense of the Rights of Charles III, which defended the empire’s involvement in the War of the Spanish Succession. Commodore Perry arrived in Japan in 1853 and returned for real the next year. Because they were so confused about how the laws of war were supposed to work, Japan proceeded to send Nishi Amane to the Netherlands to study the Law of War and Peace, and twenty years later, in 1875, Japan conquered Korea. Their logic for doing so was that they were afraid Europe or China would get there first. The world recognized their conquest at the time, though after WWII they were made to give it up. Korea was alluring prey for aggressive Western nations. As Nishi Amane [the scholar who brought the Grotian rules to Japan] would later explain, defending one’s borders “is like riding in a third-class train; at first there is adequate space but as more passengers enter there is no place for them to sit. The logic of necessity requires the people to plant both feet firmly and expand their elbows into any opening that may occur for, unless this is done, others will close the opening. (Chapter 6) Section 2: The Transformation Period Recall our list of counterclaims, #s 2 and 3. 2. Outlawry wasn't taken seriously at the time by the signatories - that it was just feel-good propaganda. 3. World War II proves that it failed, so it wasn't important. This section tells the story of how the Peace Pact came into existence, including how influential it was on the thinkers of the time. Throughout the 1930s and 40s, thinkers and diplomats attempted to turn the Peace Pact into practice, and then, when World War II demonstrated that they needed significantly more teeth to make the Peace Pact real, created the United Nations and other international institutions dedicated to supporting the Pact’s goals. At the time, they viewed World War II as a sign that they hadn’t gotten the right combination of institutions to make the Peace Pact succeed, not that it wasn’t important. This was a classic situation of needing More Dakka and they did, indeed, keep adding more until it worked. In an account composed more than a decade later, Jackson recounted that this view of the Pact was shared by the president and his inner circle. The Peace Pact, he reported, “left no vestige of legal right for [a state] to resort to a war of aggression. From the beginning, Roosevelt, Hull, Welles, Stimson and I had been in agreement that Hitler’s war . . . was an illegal one, and that other powers were under no obligation to remain indifferent. (Chapter 11) There is some counter-evidence in support of #2, from the side of the Japanese at least. Japan, for example, did not think that it had renounced the rules of the Old World Order on August 27, 1928. Its signing of the “No-War Pact,” as the Paris Peace Pact was known in Japan, was regarded as a diplomatic gesture, a noble proclamation affirming the aspiration of all civilized nations to seek peace. Indeed, Japanese officials considered it a sign of how far their nation had come that it was included among the fifteen countries at the grand ceremony in Paris. (Chapter 7) But at least on the Allies side, they had intended it seriously, and as World War II went on, that intention redoubled. Sumner Welles, Undersecretary of State during World War II, was assigned by Roosevelt to create a plan for peace after the war. What he and James Shotwell authored was effectively an outline of the United Nations, and they put the Peace Pact at the very center of it. Shotwell was far from subtle about his effort to treat the Pact as a starting point. He placed the Pact at the start of his preliminary draft. Article 1 repeated the Pact verbatim. Article 2 provided that “[t]he United Nations, in order to strengthen and safeguard the peace of nations as set forth in the General Pact for the Renunciation of war, agree to cooperate in the establishment of the necessary instrumentalities for its effective maintenance.” What followed was an outline of nearly every essential institutional component of the modern-day United Nations. Ten days later he circulated a more detailed draft, now entitled “Provisional Outline of International Organization.” (Chapter 8) It wasn't just the United Nations. NATO was built off of the Atlantic Charter, and it was also designed to reinforce the Peace Pact. This is why it's reasonably accurate to describe it as a defensive alliance. The [first draft of the Atlantic Charter] was a remarkable document. It began by restating the principles of the Stimson Doctrine—there would be no conquest; the two countries would “seek no aggrandizement, territorial or other.” Moreover, there would be “no territorial changes that do not accord with the freely expressed wishes of the peoples concerned.” The Charter looked ahead to a time “after the final destruction of the Nazi tyranny”—a remarkable statement for a neutral in the war—and declared the two states’ “hope to see established a peace which will afford to all nations the means of dwelling in safety within their own boundaries. (Chapter 8) This section brings to bear quotes from leaders at the time showing how important they considered the outlawry of war, how they viewed it as changing the world, but also how unprepared they were for how to react to countries choosing to ignore the Pact. Most importantly, they show how the Allies were strongly motivated to fight World War II specifically to preserve and expand the Pact, to make the world safe for peace. Unfortunately, then, as now, Russia/the Soviet Union did not quite live up to the ideals that the Allies generally advocated for. The Soviet Union took territory after World War II, the only one of the Allies to do so. The only ally to gain any significant territory after the war was the Soviet Union. More than twenty million of the nation’s citizens had died in the course of the war, and Stalin insisted on several territorial gains as the price of peace—many, but not all, of them in areas previously contested. … These concessions to Stalin were seen by the other Allied powers as regrettable deviations from accepted law, not precedents to be followed in the future. (Chapter 13) To be fair, we are talking about Josef Stalin, here. Who’s surprised? Section 3: The New World Order Recall our list of counterclaims, #s 4 and 5. 4. The world isn't more peaceful post outlawry. 5. Any increase in peace since World War II is due to democracies, nuclear weapons, or other reasons, and not the Peace Pact. H&S walk through the best academic evidence we have of whether the world is more peaceful today than it was in the period from 1816 (when our data collection starts being decent) to the Peace Pact. They then spend some time discussing why the evidence better supports the Peace Pact than other causes. In particular, H&S highlight that only since the Peace Pact have countries been denied territorial gains from their conquests. There's a lot of detail in there. Here's just a taste of it. A loose team of political scientists has assembled comprehensive data to help them study war. The resulting project, with the intentionally clinical name “Correlates of War,” hosts datasets on everything from “militarized interstate disputes” to “world religion data” to “bilateral trade.” Most relevant here, it includes extensive data on “territorial change”—a record of every single territorial exchange between states from 1816 to 2014, totaling over eight hundred entries. What do our 254 cases of territorial change tell us? They tell us something that is at once striking and surprising: Conquest, once common, has nearly disappeared. Even more unexpected, the switch point is that now familiar year when the world came together to outlaw war, 1928. From the time the data start in 1816 until the Peace Pact opened for signature in 1928, there was, on average, approximately one conquest every ten months (1.21 conquests per year). Put another way, the average state during this period had a 1.33 percent chance of being the victim of conquest in any given year. Those may seem like pretty good odds. They are not: A state with a 1.33 percent annual chance of conquest can expect to lose territory in a conquest once in an ordinary human lifetime. After 1948, the chance an average state would suffer a conquest fell from once in a lifetime to once or twice a millennium. (Chapter 13) The US wars in Afghanistan, Iraq, and Libya One disappointment I have is that H&S do not spend much time discussing the US wars of the last two decades. The book was published in 2017, so there’s really no excuse for this. Even counting them, their claim that wars since the Peace Pact have been fewer and less world-changing than before the Peace Pact still holds up, but since they don’t directly discuss the most notable wars of the last two decades, they leave a significant hole in their argument. I can imagine defenses that they would make, but they should have made them. They mostly refer to these conflicts either as not a conquest (since the US isn’t officially running those places now) or as a side effect of the Peace Pact in allowing failed states (See Addendum 1 for more on that) More recently, the United States invaded Iraq in 2003, toppled Sadaam Hussein, and installed the Coalition Provisional Authority to govern the country. But what’s most notable about these “nonconquests” is how ineffective and unstable they usually are. Exerting influence indirectly is inefficient and expensive. (Chapter 13) And in 2015 alone, high-fatality civil wars continued in Nigeria, South Sudan, Yemen, Syria, Iraq, Afghanistan, Pakistan, Somalia, and Ukraine. Why, if war has been outlawed, is there still so much conflict? The answer is that these conflicts are not prohibited by the Pact. Indeed, they are the predictable consequences of it … the prohibition on the use of force by one state against the territory of another has allowed two sources of conflict to simmer… within [states]. (Chapter 15) The broader intellectual history of war Reading The Internationalists led me to want to read a broader intellectual history of war. H&S include some comments that hint at it, for example describing the Principle of Distinction and other agreements made about how to behave during war. Fortunately for the civilians of Europe, the biblical model of war was finally repudiated. By the middle of the eighteenth century, European armies had come to recognize a “Principle of Distinction,” the doctrine central to modern humanitarian law, which distinguishes between soldiers and civilians and protects the latter from the former. The Principle of Distinction was the first curtailment of Grotius’s blanket immunity for those waging war. In the next century, it was followed by a flood of new legal regulations placing stricter controls on a soldier’s license to kill. International treaties protected the wounded and medical personnel (First Geneva Convention, 1864) prohibited the use of fragmenting, explosive, and incendiary small arms ammunition (St. Petersburg Declaration, 1874) banned explosives from balloons, asphyxiating gas, and dum-dum bullets (First Hague Convention, 1899) and proscribed pillage, the execution of surrendering soldiers and prisoners of war, and forcing civilians to swear an allegiance to a foreign power (Second Hague Convention, 1907). (Chapter 3) But the history of this and other pre-Peace Pact intellectual history of war is thin within the text, as the point H&S are chasing is specific to the Peace Pact's relevance in history, not the broader history of war. Some of my favorite books are books that tie together aspects of history across wide gulfs, which The Internationalists succeeds at. It’s rare and delightful to see how a piratical ship capture by the Dutch in the 16th century ties together with the opening of Japan, the US battles with Mexico, and finally, the creation of the United Nations. H&S’s perspective is that the Peace Pact marks a turning point, and one that should not be forgotten. It’s also clear that it marks a capstone on a long history of small changes that are also, themselves, interesting battles in the long-running war to make the world less intolerable. In the end, they identify four key changes in the intellectual landscape, with Lauterpacht’s fingers in nearly all of them. Neutrality no longer requires impartiality. States can help those they view as victims.
Inline links: More Dakka
Speculatively, DeepMind hoped to get all the AI talent in one place, led by safety-conscious people, so that they could double-check things at their leisure instead of everyone racing against each other to be first. I don’t know if these high ideals still hold any power; corporate parent Google has been busy stripping them of autonomy.
Inline links: stripping them of autonomy
Google’s first employee became their Director of Technology and made $900 million. Jesus’s first follower became the Bishop of Rome; one in every thousand people alive is named after him. The first few people to make websites in 1995, blogs in 2005, or YouTube channels in 2015 got outsized followings that they were able to leverage into higher status later on. The first few people to get on board the New Atheist, woke, alt-right, dirtbag left, and intellectual dark web movements all had easy opportunities to become famous; the next few thousand at least had the chance to be well-connected veterans.
(as an intuition pump, if Google and Bob’s Tools produce the same amount of value per employee in 2000, and the janitors at both get paid the same, and then in 2020 Google produces 1,000x more value per employee, should a janitor at Google get paid 1,000x more than a janitor at Bob’s?)
Content is already effectively free…the ability to browse many lifetimes’ worth of art and writing using Google search – and all that for $0 – has not made the creation of new art feel spurious […]
“I’ve written a book,” an acquaintance tells me. “I don’t care,” I reply with brusque honesty. “I have all the books I want already. I just find ‘em on Google and Amazon and Goodreads.” Except of course I don’t say that, because no one ever says that, and not just out of politeness. “I’ve written a book,” an acquaintance tells me. “I don’t care,” I reply. “I have all the books I want already. The AI writes them for me.” Except of course I don’t say that. Why would I?
Google Imagen announced May 2022.
Inline links: announced
Google PARTI announced June 2022.
Inline links: announced
Stability.ai StableDiffusion announced August 2022. Thanks to some help from researchers, employees, and beta testers, I was able to run my prompts through some newer models (thanks especially to Google for eventually giving permission to do this despite their usually high security around these things). The results were: DALLE-2: 0/5
Inline links: announced
6: Gary Marcus has a response to my recent AI bet. I want to make it clear that whatever the merits of my bet or his arguments, Google did not “snooker” me. They had no part in this: I went around begging for someone to run my prompts through PARTI and Imagen, one of their employees asked their bosses’ permission and then agreed to do so, and ran them exactly as I asked. Any fault is entirely mine. I’m insisting on this pretty hard because I’m grateful that Google will sometimes respond to random requests by amateurs, and accusing them of deliberate deception in response burns their willingness to do that. As for everything else: I wrote “without wanting to claim that Imagen has fully mastered compositionality, I think it represents a significant enough improvement to win the bet, and to provide some evidence that simple scaling and normal progress are enough for compositionality gains”, I stick to the “some evidence” claim, I feel like I was pretty open about exactly how much/little evidence it was (Google sent me ten examples per prompt, I showed you four representative ones, but the extra six don’t change much). I agree Marcus makes some useful common sense claims on how sure to be after five examples.
Inline links: has a response to my recent AI bet, begging
I looked for photos of the Central Valley to illustrate this article, but none of them were quite as I remember it. This one from Sacramento Bee is the closest I could find. But imagine it through a layer of haze, and also you can’t see well because you are in the process of dying from heatstroke. Of large Central Valley cities, Sacramento has a median income of $33,565 (but it’s the state capital, which inflates it with politicians and lobbyists), Fresno of $25,738, and Bakersfield of $30,144. Compare to Mississippi, where the state capital of Jackson has $23,714, and numbers 2 and 3 cities Gulfport and Southhaven have $25,074 and $34,237. Overall Missisippi comes out worse here, and none of these seem horrible compared to eg Phoenix with $31,821. Given these numbers (from Google), urban salaries in the Central Valley don’t seem so bad. But when instead I look directly at this list of 280 US metropolitan areas by per capita income, numbers are much lower. Bakersfield at $15,760 is 260th/280, Fresno is 267th, and only Sacramento does okay at 22nd. Mississippi cities come in at 146, 202, and 251. Maybe the difference is because Google’s data is city proper and the list is metro area? Still, it seems fair to say that the Central Valley is at least somewhat in the same league as Mississippi, even though exactly who outscores whom is inconsistent. III. What do the people who live in the Valley think went wrong? What The Hell Is Wrong With California’s Central Valley?, starting around 9:30, interviews a local conservative realtor (most people in the Valley are conservative; I haven’t found a liberal equivalent). He says that the farms in the Central Valley used to be manned by migrant workers, who would come from Mexico, work for a season, then go back to Mexico and live off their earnings for the rest of the year. Later, policies shifted to welcoming them and granting them citizenship, so many of them came over and brought their families. But around the same time there was a drought, the farm industry crashed, the remaining farms mechanized, all the immigrants were left without work, they got on welfare, and they weren’t able to get off of it. He doesn’t say exactly when this happened, but he says times were good when he was a child, and he looks like he’s in his 30s or 40s. So if he’s 35 and things started going bad when he was 10, that would mean he thinks things started going bad around 1995 to 2000. Here’s a story in the LA Times from 1999, which talks about how things are starting to get bad. It admits that Californians like to poke fun at the Central Valley, but it seems to be just that - poking fun - and not freaking out about poverty and dysfunction the way articles about the Valley do now. But it ends by saying that things are getting worse: To be honest, living in the Central Valley takes some getting used to, especially if you’re from the coast. It’s an acquired taste. Oppressive heat in summer. Depressing tule fog in winter. Sure, fall and spring are OK. But where aren’t they? First-rate culture is scarce. The state capital doesn’t even have a symphony. One of the attractions--it’s almost a local joke--is the ability to get away, particularly from Sacramento. It’s 90 minutes to San Francisco in one direction, or skiing in another; two hours-plus to the ocean or Tahoe […] Still, earthquakes aren’t a menace to most people. And it doesn’t take long before you begin to appreciate certain benefits--indeed, to understand that some Central Valley burgs, especially the capital, are among California’s best kept secrets. Or, at least, they have been. Continuing: When I moved here nearly 40 years ago--the first of three times--summer skies were blue and the stars bright. Fishing was easy in the rivers and pheasant hunting was 10 minutes from town--in fact, where I now live. All this good life, however, has been changing. Sacramento is now the sixth smoggiest area in the country. A gloomy, beige pall greets motorists as they descend from the Sierra. Even worse is the San Joaquin Valley, from Stockton to Bakersfield. It’s rated the nation’s fourth smoggiest region […] And this brings us to the root problem: a population explosion, fed notably by commuters spilling over the Grapevine from L.A. into Bakersfield, and from the Bay Area into the northern San Joaquin Valley, turning farms into houses and freeways into parking lots. In Sacramento, high-tech industry is generating jobs and sprawl. Up and down the valley, people without job skills are having babies and going on welfare. Many are immigrants from Mexico and Southeast Asia. “The population is growing at a faster pace than the economy,” notes Dan Whitehurst, a former Fresno mayor who is running again. “Livability is becoming more of an issue. But the biggest issue still is jobs.” That’s because, aside from Sacramento, the Central Valley has not cashed in on California’s economic boom. Unemployment in the San Joaquin Valley is roughly double the state average. It’s smoggy. Traffic’s getting worse. Farms are disappearing. There aren’t enough jobs. And, says pollster Mark Baldassare, people are “myopic” about their plight. It finishes: “We have a huge problem. ‘No way L.A.’ has been our slogan. But if we build nonstop houses, we’ll be worse than L.A. because we’ll have destroyed our [farm] economic base. . . . There’s no regional leadership. More state officials need to decide this area matters and poke their heads up out of the fog.” The fog and the smog. If not, one day there’ll be no getting used to the place. This is a weird article. It seems to confirm that things used to be better - nobody would call the Central Valley “the good life” now. But its concerns are smog, sprawl, and decreasing share of agriculture. These seem like the problems of somewhere that’s growing - local NIMBYs complaining that too many people want to move in. Today the problem is more that everyone in the Central Valley wants to leave. The piece sort of touches on poverty - “people without job skills are having babies and going on welfare” and “the population is growing at a faster pace than the economy” - but it’s still a weird emphasis, and one that makes me think of this as supporting the “problems were starting in the 90s” view. But by 2012, things were clearly very bad - here’s an article about how Census Shows Central Valley Areas Among Poorest In Nation. It says: Experts say the poverty problem in the nation’s agricultural powerhouse is deeply ingrained. The most important barrier is the valley’s lack of economic diversity. There are simply too few good nonagricultural jobs around and jobs in agriculture tend to be low-wage ones — except for those who run agribusinesses. “It’s a pretty ag-heavy region, so the inequality of wages and the opportunity to earn better wages is really skewed,” said Caroline Farrell, executive director of the Delano-based Center on Race, Poverty & the Environment. “If you own a farm, you’re apt to earn more wealth, while if you’re a farmworker, don’t earn very much.” The valley has not been able to bring or retain many new companies partly because it lacks a qualified workforce, said Atonio Avalos, associate professor of economics at Fresno State University. “We have an issue of skills mismatch,” Avalos said. “Companies may be offering jobs, but the skills of people in the valley are not ones they are looking for.” Students who want to get a college degree face many barriers, he said, and public funding for education is being slashed. Those who do graduate leave to find jobs elsewhere. The valley also doesn’t offer attractive amenities and has serious problems such as air pollution that have gone unaddressed. “If you’re a doctor or engineer, there are other places where you can make good money and live in better conditions,” Avalos said. “Many people don’t come here or leave because of the high incidence of asthma and other respiratory problems.” This sounds like things were already pretty bad in 2012, maybe bad enough that they must have been getting worse for longer than 10 or 15 years, I don’t know. IV. What do the data say? Here are some economic time series. I couldn’t find any good long-term ones; the least bad one comes from this unsourced report: Here it looks like things got worse from 1975 - 1985, and then depending on county there was a slower-to-imperceptible decline thereafter. FRED only has data since 1989, but agrees that things haven’t gotten worse since then. Here’s unemployment: Is this just because people got discouraged (or on welfare) and stopped seeking employment, and so stopped showing up in the statistics? Here’s a graph of Total Employed Persons: In 1990, 303,000 people were employed out of a population of 354,000. In 2022, 430,000 people were employed out of a population of 542,000. So labor participation rate went from 86% to 79%. But national labor force participation decreased by about the same amount during that time, so I don’t think we should overemphasize that. And here are some other graphs I found useful: Fresno housing prices: Racial demographics: Source: Wikipedia. Central Valley cities like Fresno and Bakersfield aren’t really more Hispanic than other parts of California or Arizona, so if immigration or racial issues played a part it must have been more complicated than just numbers. Number of immigrants in California over time: Factors of productivity in agriculture: V. So why is the Central Valley so bad? It’s an agricultural region, but lots of places are agricultural. It got lots of immigrants, but no more than many other places. It’s polluted - but so was LA, and LA rebounded. This is just a weak guess, but I think it starts with their crops. The Midwest grows mostly corn and wheat. The Central Valley is more fruits, vegetables, and nuts. Corn and wheat are easier to harvest, so middle-class farmers can own the farm and buy a mechanical harvester or something. Fruits, vegetables, and nuts benefit from intensive manual picking, so farm owners hire outside labor. According to Carolina Demography: There are about 3 million farmworkers in the United States: about two million are family farmworkers and another one million are hired farmworkers…nationally, about three-fourths of hired farmworkers are foreign-born; most (69%) were born in Mexico; 6% were born in Central America; and 1% were born in another country. Given that these are mostly Mexican immigrants, we’re probably not talking about people who are hired to grow corn in Kansas. I think plausibly the majority of US hired farmworkers live in California’s Central Valley. This makes it a sort of plantation agriculture system, which naturally tends towards landowners taking all the gains and workers ending up as an underclass. In the mid-20th century, the local plantation underclass was made of Okies (cf. The Grapes of Wrath). In the later 20th century, many immigrants moved in, lowering wages. Although immigrants don’t usually lower wages, this is because there are usually lots of industries for people to branch out into, but the Central Valley only has agriculture. Also, agribusinesses were becoming better at mechanizing their operations. Although technology doesn’t usually lower wages, again, this requires lots of diverse industries, and the Central Valley only had agriculture. All of this corresponds to the 1975-1985 period on the graphs where wages were going down. But it sounded from some of the testimonials above like the Central Valley didn’t become truly miserable until the late 90s. I’m not sure why this is. It could be the immigrants switching from being migrant laborers to raising families, and those families were impacted by poverty and inequality in a way the original migrants weren’t. It could be worsening drug problems as new drugs get invented and go down in price. (I’m not sure if NIMBYism and rising house prices also played a part. House prices do seem to have risen, a lot, but I was under the impression that building things in the Central Valley was easy and most of a house’s price there is construction rather than land. I’m not sure why house prices would have gone up so much since 1990 if this were true, though.) Other things that the articles I read emphasized: There’s a severe drought in the Central Valley right now. This is probably partly climate change, partly bad luck, and partly California diverting water to hydrate growing coastal cities. This has made everything worse (but then why isn’t that reflected in worsening economic statistics?)
Inline links: Sacramento Bee, from Google, list of 280 US metropolitan areas by per capita income, What The Hell Is Wrong With California’s Central Valley, Here’s a story, Census Shows Central Valley Areas Among Poorest In Nation, this unsourced report, https://substackcdn.com/image/fetch/$s_!viUA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F3cb60f00-5048-44dc-8ad3-49d9036437e8_632x382.png, https://substackcdn.com/image/fetch/$s_!jJHH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Faf442355-432b-4ac1-b10d-2ebf17011084_1151x345.png, https://substackcdn.com/image/fetch/$s_!oETP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1fe3e7f8-9e13-4295-b608-d071425d6adc_1116x300.png, decreased by about the same amount, https://substackcdn.com/image/fetch/$s_!is_P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F09af8185-c8cf-4879-999c-4643ec7d7079_989x590.png, https://substackcdn.com/image/fetch/$s_!K8wW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6ce47fbd-0a84-4461-a96c-07a2d8130d4e_412x104.png, https://substackcdn.com/image/fetch/$s_!Vvxn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95d035a-8dea-410c-8b8f-19acb145859e_676x356.png, https://substackcdn.com/image/fetch/$s_!xlsr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F36965205-4cb9-4f93-b855-91cc6b9047b5_450x397.png, Carolina Demography, Okies
“I quit my job at Google to work on promoting altruistic kidney donation,” says the woman you have almost bumped into. She wears a white dress, and statistically her name is most likely Elizabeth or Anna. “I’m the liaison between hospitals and religious groups.”
Do Vietnamese people love trading monkey gifs? Are Ukrainians especially susceptible to Ponzi schemes? Is Venezuela laden with techbros? Vietnam uses crypto because it’s terrible at banks. 69% of Vietnamese have no bank access, the second highest in the world. I’m not sure why; articles play up rural poverty, but many nations have more rural poor than Vietnam. There’s a history of the government forcing banks to make terrible loans, and then those banks collapsing; maybe this destroyed public trust? In any case, between banklessness and remittances (eg from Vietnamese-Americans), Vietnam leads the world in crypto use. Ukraine has always been among the top crypto countries: in 2021, NYT called it “the crypto capital of the world”. Again, this owes a lot to its terrible banking system. NYT describes its banks as “so sclerotic that sending or receiving even small amounts of money from another country requires an exasperating obstacle course of paperwork”, and this guy says that if you deposit more than $100,000 in a Ukrainian bank, “the chance that you get it back is very slim”. When Russia invaded, the Ukrainian government doubled down on crypto as a way for friendly Westerners to donate to the war effort - $70 million as of March. It proved so helpful that during the first month of the war, in between dodging Russian artillery shells President Zelenskyy found time to pass a law legalizing crypto and strengthening its regulatory framework. Venezuela’s economy has been in slow motion collapse for the past decade. Inflation is currently in the triple digits (remember, people worried the Democrats would lose the midterms because of a US inflation rate of 8%). If your country has a triple-digit inflation rate, you might prefer to use an alternative currency, which Venezuela’s authoritarian government tries to prevent people from doing. Cryptocurrency provides a hard-to-ban alternative which has caught on among Venezuelan hustlers and small businessmen. I personally contributed in a small way to Russia’s cryptocurrency use. I’ve been trying to help Russian ACX readers escape to other countries to avoid conscription or arrest. Of my two successes so far, both involved sending cryptocurrency to help them afford a ticket out and living expenses while they searched for a job in their new country. I’m pretty proud of this and I don’t think it would have been possible without crypto. I think a lot of Westerners want to think of developing-world uses as a boring sideshow, and highlight Westerners trading monkey gifs as the only part of crypto worth talking about. But about 66% of crypto users live in the developing world. More people own cryptocurrency in Africa than in North America. Of course a technology centered around avoiding governance and banking failures will be centered in the countries with the most governance and banking failures! Big Crypto Projects Are Very Rarely Scams I realize this is a bold sentence to use as a section header in 2022. But I recently tried to figure out the exact scam rate, and it seemed low. I searched for articles called things like The Top Crypto Projects Of 20XX, and then I checked how many of those projects, years later, had turned out to be scams.I tried my best not to cherry-pick, and to focus on the first article that Google fed me for each of various relevant search terms. I ended up using four articles for this experiment: Most Promising Crypto Projects Of 2015
when you’re not sure which of many competing experts to trust, you should trust a prediction market instead of any of them Going through these claims one by one: 3.1: Why expect all prediction markets to agree with each other? Either all prediction markets agree with each other, or you can get rich quick: Suppose prediction markets disagreed. For example, suppose the RNC ran an Official Republican Prediction Market that said there was only a 10% chance Democrats would win the next election, and a 90% chance Republicans would. And suppose the DNC ran an Official Democrat Prediction Market that made the opposite prediction: 90% chance Democrats, 10% chance Republicans. Then you could buy a share of “Democrats will win” from the Republican market for 10 cents, plus a share of “Republicans will win” from the Democrat market for 10 cents, and be guaranteed to make $1 when one party or the other wins. You have turned 20 cents into a guaranteed $1. Repeat until you are rich or the mispricing has been corrected. This is just what financial experts call “arbitrage”. You may notice that in finance, people always give specific prices for things like shares of stock, barrels of oil, or Bitcoins. People say things like “Google stock is up to $300”, but never “Google stock is up to $300 on the NYSE, but down to $200 on NASDAQ”. If that was true, people would buy it on NASDAQ, sell it on NYSE, make $100 in free money, and get rich quick. In ideal situations, arbitrage forces everybody everywhere to agree on the same price for a financial instrument. Prediction markets turn claims about truth into financial instruments in a way which forces everybody everywhere to agree on how likely the claim is to be true. 3.2: Why expect prediction markets to be hard for special interests to manipulate? Either a prediction market is not currently mispriced because of a manipulation attempt, or you can get rich quick. Argument: Suppose a prediction market was currently mispriced because of a manipulation attempt. For example, suppose there is a prediction market for whether the sun will rise tomorrow. The true probability is obviously 100%, corresponding to a cost of $1.00. But suppose some special interest who wanted to trick people into believing the sun would not rise successfully spent money to bid the market down to only 10%. This means that you can buy, for $0.10, a share which pays $1 if the sun rises tomorrow. In other words, you can dectuple your money for free. Repeat until you are rich or the mispricing has been corrected. This may sound complicated in theory, but it plays out straightforwardly in real life. As a test, I tried to manipulate the market on whether Austin Chen, founder of Manifold Markets, would be charged with a felony. There’s no reason to think he should be, so the price started at 5%. I spent $200 in Manifold’s play money bidding it up to 95%. Within an hour, other investors noticed the mispricing and corrected it back down to 5% again. 3.3: Why expect prediction markets to be free from bias? Either a prediction market is not currently mispriced because of bias, or you can get rich quick. The argument: Suppose all smart people, including you, know that there is an 80% chance that the Democrats’ economic plan will create new jobs. But suppose that Republicans, because of their partisan biases, refuse to believe it, and say there is only a 40% chance. And suppose the Republicans set up their own prediction market where they bid the price of a share down to $0.40. You can, of course, go on this prediction market, buy shares for $0.40, and double your money in expectation. Repeat until you are rich or the mispricing has been corrected. I already described how something like this happens on PredictIt (a non-ideal prediction market that you can only make a few hundred dollars in expectation by correcting), and that I do in fact make a few hundred dollars every election season. 3.4: Why should I believe a prediction market’s consensus over my own opinion? This is the same argument as “the prediction market will always be at least as accurate as the top expert” only with you in the place of the top expert. Either prediction markets are at least as smart as you are, or you can get rich quick. The argument here is the same as “at least as smart as the smartest expert” argument in 2, except replacing “the smartest expert” with “you”. But just to lay it out explicitly: Suppose you were smarter than some prediction market. Then if you disagreed with the market, usually you would be right and it would be wrong. So look for cases where you disagree with the market, buy those shares, and you will make money in expectation. Repeat until you are rich or the mispricing has been corrected. I like this because it’s a good empirical test, and one that many people have tried. If you think you’re smarter than the prediction markets, bet on them and see what happens! I think most people will find that (over the long run) they lose money, and eventually this will cure them of their delusion that they can beat the markets. A few people might find that (over the long run) they do win money, just as a few people (eg Warren Buffett) can consistently win money on the stock market. Hopefully those people will quit their day jobs and become full-time prediction market traders. They’ll become multimillionaires, and their hard work will ensure that prediction markets stay more accurate than the rest of us. 3.5: Why should I believe that a prediction market makes good decisions about which of many competing experts to trust? Suppose you accept that a prediction market will always be at least as accurate as some well-known expert (eg Nate Silver). But what if you’re not sure who the real experts are? Or what if there are many experts, all saying different things, and nobody knows who to trust? In this case, a prediction market will always be at least as good as any other source (including you) at telling good experts from bad, or at figuring out which of many good experts is the best. By this point you should be able to predict the argument, but for completeness’ sake: Suppose you were better than the prediction market at determining which of many competing experts to trust, or how to aggregate the pronouncements of many experts into a single authoritative opinion. Then if you disagreed with the market, usually you would be right and it would be wrong. So look for cases where you disagree with the market, buy those shares, and you will make money in expectation. Repeat until you are rich or the mispricing has been corrected. To ground this in a real example, suppose there is some new virus which might or might not spread to the United States. A Harvard professor of epidemiology says there’s a 70% chance it will spread, a Yale professor of epidemiology says there’s an 90% chance it will spread, and a guy in a tinfoil hat on Infowars says there’s a 0% chance it will spread because it’s all a fake government plot. If I knew nothing else about this situation, I would probably think there’s about an 80% chance the virus will spread. I trust the Harvard and Yale professors equally much, and the tinfoil hat guy not at all. Suppose I saw a prediction market that was only at 10%, because most people trusted the tinfoil hat guy. I would want to buy YES shares until the price got up to 80%, because in expectation I would octuple my money. Suppose I saw a prediction market that was only at 70%. Now I wouldn’t be sure whether the prediction market was dumber than me (believed tinfoil hat guy) or smarter than me (they know a lot about epidemiology - or about the credibility of specific experts - and have decided to trust the Harvard professor over the Yale professor). Maybe I could improve on this. If I knew things about epidemiology, I could read over both professors’ arguments and try to figure out if one was better than the other. If I knew things about academia, I could pick over both professors’ resumes and see whether the Harvard professor seemed more distinguished or had more respect in her own field than the Yale professor. In the end, I might decide the prediction market was right to price it at 70% (in which case I wouldn’t do anything), or that actually both experts seemed equally expert (in which case I might bid it up to 80%), or that actually the Yale epidemiologist was better (in which case I might bid it up to 90%). 3.5.1: Isn’t it weird to give non-experts (like prediction market investors) the final judgment in which of two experts is right? Yes, but I don’t think this is avoidable. If there were no such thing as prediction markets, and the Harvard epidemiologist said 70%, and the Yale epidemiologist said 90%, and the tinfoil hat guy said 0%, and for some reason it mattered a lot to you which of these was true - then you would still have to make that decision. If there’s some extremely authoritative source who can make the decision for you - let’s say the World Health Organization says “after reviewing all experts’ arguments, we believe that the final probability is 75%” - then great! Either: The WHO is clearly the most trustworthy source - in which case we go back to the Nate Silver situation where the prediction market should be just as accurate as it is.
Some companies have their own internal prediction markets, most famously Google. Last I checked, Google was offering their services to other companies that wanted to try the same thing, so you might want to try getting in touch with them.
Inline links: most famously Google
People talk about “fuck-you money”, the amount you’d have to make to never work again. You dream of fuck-you social success, where you find a partner and a few close friends, declare your interpersonal life solved, and never leave the house from then on. Still, in the real world you clock into your job at Google every day, and in the real world you attend Bay Area house parties. You just hope this one won’t focus on the same few topics as all the others . . .
Sites could ask for proof of humanity. I don’t know how this will work in the future: drivers licenses can be faked, videos can be spoofed. Worst case scenario, I think megacorporations like Google and Facebook could offer this as a service - so-and-so has a GMail account or Facebook page and has gotten lots of normal-looking messages over many years.
The leading big tech company (eg Google/Apple/Meta) is (clearly ahead of/approximately caught up to/clearly still behind) the leading AI-only company (DeepMind/OpenAI/Anthropic) in the quality of their AI products: (25%/50%/25%)
DeepMind thought they were establishing a lead in 2008, but OpenAI has caught up to them. OpenAI thought they were establishing a lead the past two years, but a few months after they came out with GPT, at least Google, Facebook, and Anthropic had comparable large language models; a few months after they came out with DALL-E, random nobody startups came out with StableDiffusion and MidJourney. None of this research has established a commanding lead, it’s just moved everyone forward together and burned timelines for no reason.
Somewhere to the south, Ray Kurzweil walks into his office at Google. Twenty years ago, he conjectured that all human history - no, all evolutionary and geologic history - was a series of accelerating movements, which would crescendo at the end of time in approximately 2029. Six years to go. San Francisco doesn’t feel like the sort of place willing to wait another six years. The doomed summoning-city at the end of time seethes with palpable impatience. Too much Ethiopian methylxanthine, that’s my diagnosis. It feels eerie and unreal in the darkness, like everything is underwater, and I remember Poe:
This reminds me a lot of a concept in software engineering I read in the google Site Reliability Engineering book, the concept of error budgets as a way to resolve the conflict of interest between progress and safety.
The "solution" that google uses is to first define (by business commitee) a non-zero number of "how much should this crash per unit time". This is common, for contracts, but what is less common is that the people responsible for defending this number are expected to defend it from both sides, not just preventing crashing too often but also preventing crashing not often enough. If there are too few crashes, then that means there is too much safety and effort should be put on faster change/releases, and that way the incentives are better.
The picture on the left is Manhattan Island, NY. The picture on the right is Conanicut Island, RI. Both islands are about the same size, the same climate, the same distance from the mainland. Both are near good natural harbors. In 1600, some early European explorer would have considered them basically interchangeable. Still, the cost of housing in Manhattan is about $2000/sqft, and the cost of housing in Conanicut is about $500/sqft. Why? God didn’t create these two islands with different land value; something must have happened to make one 4x as expensive as the other. The obvious answer is “the Dutch chose to build their colonial capital on Manhattan, more and more people moved in, it became ever denser and more urban in a virtuous cycle, now it is very dense and urban, and, in the current regulatory regime, dense urban areas have higher housing prices than empty rural ones.” If back in 1624 the Dutch had decided to build their capital on Conanicut, maybe today it would be a city of 10 million people, and Manhattan would be an empty rural area. In that case, I would expect Conanicut to have 4x the house price of Manhattan. If I were a Native American living on Manhattan, and I was committed to keeping housing prices there low, I would ask the Dutch to build their capital on Conanicut instead. In fact, whenever a European came to my island seeking to build houses, I would try to fight them off. If I somehow succeeded at this for four hundred years, and Manhattan remained an empty rural area, then I would expect Manhattan prices to be much lower than they are now. So in response to all of your comments that I don’t understand basic causal inference, I answer that history provides quasi-experiments, and no, I’m pretty sure that Manhattan has high prices because lots of people moved there, rather than because of some other factor. Or, rather, both density and desirability feed into the other, but the density step is a crucial input. 2. Comments About Jobs And Amenities (And Not Density Per Se) Producing Desirability But Martin Blank writes: NYC/SF are expensive because there are MANY good jobs there and people WANT to live there. Not because of the density of housing. You could build 500,000 homes in the middle of your empty field in North Dakota, and it wouldn't do much for the demand there. You aren't going to create Manhattan by magicking 3.5 million housing units of similar quality into the Red Lake Indian reservation in Northern Minnesota. I originally found the various comments saying this annoying. Yes, there are many good jobs in NYC. You can be a barista at Starbucks, you can be an actor on Broadway, you can be a train conductor for the MTA. But why is it easier to be a barista in NYC than in North Dakota? Surely because there are millions of people in New York, those people drink a lot of coffee, and so they need a lot of baristas. Likewise, they watch a lot of plays, and ride a lot of trains, so they need actors and train conductors. If all the residents moved to North Dakota, there would be lots of demand for baristas, actors, and train conductors in North Dakota, and none in NYC. But some people gave versions of this argument that I found harder to dismiss. JSwiffer writes: The key fact your missing is if you wave a magic wand and 10x San Francisco you wouldn't 10x all jobs. You would 10x the # of waiters, and garbage men but you wouldn't 10x the # of 500k/yr Google site reliability engineers. And it's the latter not the former that are driving up prices. Other commenters analogized this to factory or coal mining towns. Here’s how I ended up thinking about this: suppose someone strikes oil in an uninhabited part of North Dakota, enough to produce 1,000 good oilman jobs. 1,000 oilmen move to the area and start a town. Because there are no NIMBYs, they build 1,000 houses. Each oilman creates demand for a certain amount of waiters (to serve them food), doctors (to treat their illnesses), teachers (to teach their children), etc. How many waiters, doctors, teachers, etc move to the town? Assume for the sake of argument that all jobs earn the same salary, $50,000. In that case, it has to be fewer than 1,000. Each oilman earns $50,000, and some of that gets spent on taxes and out-of-town goods. So he has less than $50,000 to spent on in-town goods and services, so (in this hypothetical) creates less than one other job. Each waiter needs doctors to treat their illnesses and teachers to teach their children, so each service employee creates some number of additional service employee jobs. Makeshift housing in a North Dakota oil boom town (source) If each person creates half a job, the original 1,000 oilmen attract 500 service workers, those 500 attract another 250, and so on until population stabilizes at 2,000 people. In this model, if there are fewer than 2,000 houses in the town, demand exceeds supply (no matter what is going on in the rest of the country), but if there are more than 2,000, supply exceeds demand. So if we imagine Google’s presence as an oil-like resource, the extra demand for housing in the Bay should gradually decline: at some point, you will have finished housing the Google workers and the service workers who support them. But this isn’t right either, because Google isn’t a natural resource - it’s a company founded by Bay Area residents. If you got more Bay Area residents, you would (with some delay) get more Googles. Or: Austin gets lots of jobs from Tesla. Tesla wasn’t founded by Austinites. But it moved to Austin when it became a known “tech hub”, ie a place with lots of tech companies and tech employees. It wouldn’t have moved to Austin if Austin was still an uninhabited plain or a one-horse town. So as Austin got bigger, it attracted more tech companies. So in both the Bay Area case and the Austin case, having more people attracted more tech companies, either because the residents themselves found the company or because the company gets attracted to this newly bustling city. Potential counterargument: Each new Bay Area resident gives the Bay another lottery ticket to found the next Google. If having the first Google gets it an extra 1 million people, but there are 300 million people in the US, then those extra 1 million only give it a 1/300 chance of winning the next lottery. So even though the Bay Area won the lottery once, and this made it have high demand, this doesn’t mean the high demand will cause it to win more lotteries. If you win the lottery once, spend all your winnings on more lottery tickets, and keep doing this forever, you haven’t invented an infinite money printing machine, eventually you’ll just lose. Potential counter-counter-argument: the Bay got Google, and Facebook, and Apple, and . . . so these can’t all be separate lotteries. I think you should probably model it as a high-level lottery to become the next hub of a tech-sized industry, plus many low-level lotteries where once you’re the tech hub, you’re attracting lots of techies, and each techie gives you a ticket in a lottery where the denominator is the number of techies to found the next big tech company. And the Bay might have half the US’s techie population. So maybe here there is a self-sustaining lottery-winning cycle, at least until tech plays itself out and nobody wants any more tech companies. And that might take a long time. Tom (author of Tom Thought) writes: The primary drivers of demand for living in NYC are the specific opportunities available in NYC. It is true that on long time horizons, one of the reasons these opportunities have tended to collect in NYC is that it is a dense place. But those aren't the only reasons - NYC is much more important than other, bigger cities in other parts of the world for complex historical reasons. Even if a catastrophe were to wipe out half the city, there would still be a great deal of demand to live near important institutions like Broadway, Wall Street, Port of NY & NJ, Columbia, etc (assuming those institutions survived the catastrophe). Increasing the number of housing units has a very mechanical impact on how many people can live in the place. But it has only a second-order impact on the types of institutions that drive demand to live in the city. People don't just generically crave to live near other people for the most part (a handful of urbanist freaks like myself excepted). The Bay Area is a great example of this. It is much less populated than other much cheaper cities. Density isn't why people want to live there - it's access to a specific culture and specific institutions. Demand for that is not simply a function of density - some people want to be part of Bay Area culture and others don't. Adding more units will induce some demand as a second-order effect, but will bring prices down as a first-order effect. To relate this to your model: we might be able to say that the country has a certain number of abstract "culture points" that have been allocated to different cities by various historical forces. Each culture point a city has increases demand to live in that city by a certain amount. Adding more people to the city may allow it to generate additional culture points over time, or acquire culture points from other cities, but this doesn't happen right away, and is determined by a host of factors other than just density. Under this model, we expect a place like NYC to always cost much more than North Dakota (since NYC possesses a large number of culture points), but we would also expect that adding additional housing units to NYC would bring costs down (since there are now additional housing units per culture point). Perhaps this process will over time allow NYC to steal away some culture points from Chicago, Boston, or other cities, but this is a secondary effect. This just seems to be passing the buck. Yes, people move to New York because it has Broadway, Columbia University, and Wall Street. Why does it have those things? Because one in every X New York citizens founds a good artistic/educations/financial institution, and New York has a large population of employees to work at those institutions and customers to patronize those institutions. If Conanicut Island had a population of 10 million people instead of Manhattan, there would be lots of great institutions on Conanicut and it would have more culture points. I don’t think it’s a culture-point game and population/density just sort of occasionally redistributes culture points, I think to a first approximation culture points just track population/density. Maybe they track the population/density of upper class people better than the total population/density, but I don’t think this is a big enough distinction to sink the argument. 3. Comments About Chinese Ghost Cities Some people brought these up as a good natural experiment: the Chinese really did try building millions of houses on their equivalent of a North Dakota plain. What happened? Jeremiah Johnson (author of Infinite Scroll) writes: You currently seem like you're at the stage of understanding the thought experiments pretty well, but not understanding them on a DEEP level. For example with your hypothetical, this has actually happened before! Kind of. China built a bunch of 'ghost cities' basically out of nothing, and while there was an initial craze of speculation and tons of investment and building... nobody went to live in those cities most of the time. And now they're deeply distressed assets worth basically nothing. When nobody actually lives in the ghost city, it doesn't matter that they have super dense housing. There's no demand. (the only reason they might be worth something is that the CCP very, very much does not want to pop their huge housing bubble and is likely to bail out some of the parties involved) Parmenides (author of Last House On The Left) writes: I think your mixing up the agglomeration effects of density, which is what induces the demand, and the housing supply. You can't just build a city and expect people to move in, China has tried that. But if you have the agglomeration effects of density and shortage of housing due to artificial constraints, which we have all across the US, then you get dense areas with high housing costs. sdwr writes: Think of China's ghost cities / apartment blocks. Prices surely can't be that high there. Maybe the answer is that developers are good at their job, and build supply where theres demand for it? But several other people object that although the Western press made a big deal about Chinese ghost cities a few years ago, it mostly just took a couple of years for people to move in, and now at least some of them seem to be thriving. For example, Michael quotes the Wikipedia article, Under-occupied Developments In China: Reporting in 2018, Shepard noted that "Today, China’s so-called ghost cities that were so prevalently showcased in 2013 and 2014 are no longer global intrigues. They have filled up to the point of being functioning, normal cities". Ash Lael writes: I'm sceptical of the Chinese "ghost city" phenomenon. I haven't explored the issue rigorously but my impression is that in areas that were previously dismissed as "ghost cities" like Ordos Kangbashi, the population is now large and growing. I think we in the west are so used to infrastructure bottlenecks and short sightedness and anti-construction policies that the idea of it being possible to build the housing and infrastructure to accomodate expected demand ten years in the future is completely foreign to us. Perhaps building brand new cities before they are even needed is what the YIMBY utopia looks like. See also Bloomberg: China’s Ghost Cities Are Finally Stirring To Life After Years Of Empty Streets. This wasn’t trivial. It looks like the Chinese government had to put in some work to make people move in, including opening good schools and universities there. Probably if they had just built apartments in the middle of the desert and nothing else, they would have stayed empty. But that’s even more of a reductio ad absurdum than the original ghost city plan. Kangbashi, China’s most famous ghost city. What are housing prices like in the ghost city? Again from Bloomberg: Sitting on the southern outskirts of Inner Mongolia’s Ordos City (population 2.2 million), Kangbashi was the archetypal ghost city 10 years ago, with barren boulevards and empty buildings standing forlornly in the desert. Local officials are adamant that things have changed. They say 91% of homes in the district are occupied. In fact, after a yearslong construction freeze, the government approved six housing projects in 2020 and expects 3,000 homes to be built by the end of this year. Apartments in a new development are selling for 9,500 yuan per square meter, and downtown they go for 15,000 to 16,000 yuan, according to Liu Yueyue, 28, a salesman at a new residential development in the district’s northeast. “Would houses in a ghost town sell at such high prices?” asks Liu. Half of his customers come from outside Kangbashi, and most are parents who want to send their children to the well-regarded local schools, he says. Looking at this list of real estate prices across Chinese cities, Kangbashi seems squarely in the middle - for example, Wuhan and Xian are also in the 15,000 - 16,000 range. I claim this supports my argument: surely twenty years ago, houses in this particular deserted corner of Inner Mongolia would have been dirt cheap (if any even existed). But if you build a city there, it becomes just as expensive as any other city! Here it’s very obvious that the density caused the high prices instead of the other way around. Still, the Chinese housing market is weird, with significant vacancies even in expensive, well-developed cities. Paul Botts: No official vacancy rates are published in China and no specific definition of it exists there. Various think tanks and researchers both within that country and elsewhere have published estimates ranging from as low as 11 percent to as high as 24 percent. Those estimates have been for varying samples of Chinese cities, have used various definitions of housing vacancy rate, etc. The best (as in most systematic) estimate yet produced has come from researchers at a university in Liaoning. They used night-time urban lightsheds captured by a new (2018 launch) Chinese satellite having a new level of light sensing technology which allows separating out light from parks and plazas. They covered a large sample (49 cities), and made their sample representative of city type, city size, regions within China, etc. They also crossed-referenced with local housing data to ensure accurate balancing of their sample and to confirm that the satellite was successfully identifying light coming from housing blocks. They found vacancy rates of just under 20 percent in China's Tier 1 cities, and found rates above 20 percent in 40 of the 49 cities. They found the highest vacancy rates in western and northeastern cities, which are also the newest ones; that finding is consistent with the hypothesis of significant numbers of recently-built ghost cities. https://www.researchgate.net/publication/345092218_Housing_Vacancy_Rate_in_Major_Cities_in_China_Perspectives_from_Nighttime_Light_Data And Phil H (author of the blog Tang Poetry) writes: The price of housing in China has skyrocketed over the past few decades, as all those extra apartments have been built. I live in a pleasant but unremarkable southern city, and I paid London prices (about 4.5m yuan/$650k for a 1,300 sq ft flat). That seems to match Scott's hypothesis that high density leads to high prices. House prices here have risen much faster than incomes. They've risen in rural areas, too, but the increases in price in cities have been stratospheric. 4. Comments Accusing Me Of Not Considering Tokyo, Even Though I Included A Section In The Post On Why I Didn’t Think Tokyo Was Relevant I won’t name and shame people, but for example: You excluded Tokyo from your dataset. Tokyo has much higher density than SF and much lower price per sqft. Tokyo just kills this. Tokyo is bigger than New York and has significantly lower rent because they build more housing! This is in a wealthy country with even lower interest rates than the US. I don't think you have justified excluding non-US metros, like Tokyo, or Auckland. Doesn't this lead to the natural conclusion that there is a sufficient level of housing to build, and that the problem is that the USA's many metros are structured to prevent housing? It seems like you're just arguing that US metros are bad at building housing, which is also what Matt Yglesias is arguing. "Change my mind about housing, but don't mention Tokyo" is like saying "Change my mind about gun possession, but don't mention Switzerland." You can't test the effect of allowing new housing unless you're willing to look at cities that do, in fact, allow it. Tokyo and NYC both attract tons of new residents But Tokyo's housing rents have been stable, while NYC rents keep rising. Why? Tokyo has permissive housing construction laws. NYC makes building new housing almost illegal. Yes, dense cities are attractive, and that makes them get more dense over time. But it only makes them more expensive if you forbid new housing to keep up with the new residents. Tokyo! But I’m like the 10th person to bring it up… As I wrote on the original post (not even edited in! it’s been there the whole time!): I worry someone will bring up Tokyo as a counterexample. But I think Tokyo managed to build its way to low housing prices in the context of the rest of Japan also having good housing policy. Even if that isn’t true, Tokyo on its own is a quarter of the Japanese market, so it might be able to exhaust the entire pool of Japanese house-seekers by itself! That is, yes, you’re all correct that cities are only expensive in the context of more demand for city housing than the (NIMBY-constrained) city housing market can currently supply. You are all correct that if this problem were solved at the national level, then city housing would be cheap, and every additional city house would make it cheaper. My claim is that marginal changes - like Oakland building an extra 10,000 units, but everyone else staying the same - will most likely increase Oakland prices. Yes, if Oakland unilaterally built 50 million units, that would soak up the entire excess demand and probably lower prices everywhere (including Oakland). Yes, if the entire US switched to good housing policy at the same time, that would probably lower prices everywhere (including Oakland). But if we don’t do any of that stuff, and just build another 10,000 houses in Oakland, I think it would probably increase prices in Oakland. Some other people brought up that Japan has a declining population, and it’s much easier to have low house prices when your population is declining (compared to some previous time when number of houses presumably matched number of people), but ddd pointed out that people continue to migrate from the Japanese countryside to Tokyo, so its population continues to increase. Also, Mike (I’m stitching together two comments here): In a country with a declining population, you would expect that fewer homes are being built per capita because there's little to no competition for existing homes. But it's exactly the opposite! Japan builds far more homes per capita than the US does, despite their declining population […] As a result, the average Japanese home is very new and the average house is torn down and replaced after a relatively short 30 years. They're living in nice new homes for cheaper. 5. Comments Accusing Me Of Not Understanding Economics Maximum Limelihood Estimator writes: I think you're making a very common mistake here of confusing supply/demand with *quantity* supplied or quantity demanded. (This is very common! we teach students about this in micro 101 because it's so easy to make!) What you're seeing is that the quantity supplied is correlated with housing prices (true!). But this is very different from establishing that the supply curve--i.e. the amount of housing that would be produced at any given price, and what moves up/down when we regulate/deregulate supply--is positively correlated with price. Figuring out what supply curves look like is a lot less intuitive and requires some high-grade econometrics, which is why economists had to set up a whole commission just to study this particular problem (the Cowles Commission). In terms of resources for understanding how these concepts are different, a micro 101 textbook will cover this distinction. For the econometrics side of this, I've heard good things about Scott Cunningham's *Causal Inference Mixtape*, although I haven't personally used it. My claim is that increasing density within a city shifts the demand curve for housing within that city, because of increasing desirability. MLE later gets more on point: The effect you're discussing here is kind of real in a sense. When the marginal utility of housing increases for *other* people, density arguably becomes more desirable for me, which is kind of like the demand curve shifting up. These are called bandwagon goods and discussed here: http://econfac.bsu.edu/research/workingpapers/bsuecwp200804gisser.pdf In theory, the bandwagon effect could be so strong that parts of the demand curve are upward-sloping. Solutions like this are not, technically, prohibited by the laws of mathematics, just the laws of economics. (And arguably of physics--see paper for conditions where these kinds of bandwagon effects imply the amount of housing in the city would have to be negative). In practice, this effect exists but just can't overcome the normal, non-weird economics that says "making more of a good makes the prices fall." Again, I claim the existence of Manhattan vs. Conanicut shows that sometimes it does. I cannot find the words “housing”, “real estate”, or “land value” anywhere in that paper. Alex Poterack writes: There's two things going on here: confusing shifts in demand with movement along the demand curve, and getting causation backwards. You're assuming density causes prosperity, rather than prosperity causing density. There are ways the former can happen, but the bigger thing is that, for a wide range of historical reasons, you can make a lot of money in NYC and SF, so lots of people want to live there, so they get very dense. This is the prosperity shifting demand right, so at any given price, more people want to live there; this drives prices up, and they go higher the more fixed supply is. If you built a bunch of housing in Oakland, lots of people would move there because it's cheaper, which is movement along the demand curve; it's still the same number of people who want to live there at any price. Now, it's possible that the increased number of people living there makes the city more prosperous (this is the phenomenon of induced demand), which would shift demand right, but there are way more differences between NYC/SF and Oakland than just the density, so I don't think it would shift demand enough to offset this. In particular, if it's just a small increase in small, it's also a small increase in density, so there's almost no shift in demand (but there is movement along the curve). I still think this is missing my point, but I present it here in case anyone else is enlightened by it and wants to try further to convince me I’m making this mistake. 6. Comments By Famous People Who Potentially Have Good Opinions Scott Sumner is an economist and blogger; he writes: It is certainly the case that building more housing can make a city more desirable, and that this effect could be so strong that it overwhelms the price depressing impact of a greater quantity supplied. But studies suggest that this is not generally the case. Texas provides a nice case study. Among Texas’s big metro areas, Austin has the tightest restrictions on building and Houston is the most willing to allow dense infill development. Even though Houston is the larger city, house prices are far higher in Austin: Houston pretty much describes the “Oakland with more housing” outcome that Alexander views as somewhat far-fetched. Only in this case, it’s Austin with more housing. Alexander seems too quick to accept the, “If you build it they will come” idea—that you can build more housing and thereby boost demand so much that prices actually rise. I started the post with a graph of about 50 cities, showing a positive correlation between density and price. I’m having trouble seeing how Sumner’s point isn’t just “if you remove 48 of those cities and cherry-pick two, the relationship is negative”. My attempt to place Austin and Houston on the original graph, using Sumner’s data plus a few other things available online. Why weren’t they on there already? Maybe because the graph is metro areas and Sumner was talking about Austin and Houston as cities, but I’m not sure and agree this is confusing. Everyone knows Austin is more expensive than Houston because Austin is a trendy tech and culture hub and Houston isn’t (and relatedly, because Austin’s median family income is 50% higher than Houston’s). Unless someone wants to claim that its failure to build housing helped turn it into a trendy tech and culture hub, I don’t think there’s much point to this comparison. It’s true that Houston’s bigger size didn’t let it leapfrog over Austin to become a trendy tech and culture hub, which goes against some of what I claimed in the first part of this post. But I never claimed there would be a perfect 1-1 correlation between city size and trendiness, or that you could never find a pair of cities where one was bigger but the other was more trendy. Just that there would be a correlation. Moving on: Here’s the problem with this argument. It mixes up population change due to economic effects such as the benefits of agglomeration, with population changes due to regulatory changes such as less strict zoning. If you look at things this way, then the stylized facts work against Alexander’s argument. Over the past 50 years, increasingly strict zoning has reduced housing construction on big cities like New York and San Francisco. As a result, their populations have increased by less than in cities with less strict zoning, such as Houston. If Alexander were correct, then the price gap between the tightly controlled cities on the coast and the more laissez-faire cities of Middle America should have shrunk over time. Instead, the price gap has widened. New York and San Francisco were always more expensive than other cites, but with tighter zoning and less new construction the gap has become far wider. During the last fifty years, there was also deindustrialization and demographic sorting. This is just the Austin vs. Houston story all over again. Alexander is implicitly viewing this outcome as a “problem” for the city that builds more housing. They must sacrifice so that the rest of the country can gain. But in his scenario, Oakland is better off. Indeed if it were not better off, then why would more people choose to live in Oakland? In order for it to be true that building more housing boosts housing prices, it must also be true that the quality of existing houses (including neighborhood effects) rises by more than enough to offset the increase in supply. That means the new housing construction must make Oakland such a desirable place to live that the amenity effect overwhelms the quantity effect [...] Of course, economic change always has winners and losers. Here’s how I would describe the impact of allowing more housing construction in Oakland, in the unlikely event that this did raise housing prices: 1. America would benefit. 2. Oakland would benefit. 3. Poor people in America would benefit, in aggregate. 4. Affluent people in America would benefit, in aggregate. 5. Homeowners in Oakland would benefit. 6. Some renters in Oakland would benefit (from a more economically dynamic city.) 7. Some renters in Oakland would suffer from higher rents. In the much more likely case where new housing construction would lower prices, the impact described in #5 and #7 might reverse. Either way, there is no defensible argument for not building more housing in Oakland, regardless of the impact on price. If building more housing reduces its price, then there is a strong argument for allowing more housing construction. If building more housing raises its price, then the argument for more construction is even stronger. I agree with all this. Jeremiah Johnson is a co-founder of the Center for New Liberalism, host of the Neoliberal Podcast, and a YIMBY activist (not to be confused with Jeremiah “Liver-Eating” Johnson, who killed 300 Native Americans and ate their livers). He writes: Here's why you're wrong in a single sentence: Demand causes high prices, not new units. Prices are high in SF and NYC because those are desirable places to live for a huge number of people. People all over the country and the world would live there if they could, and prices reflect that. The fact that the densest cities are the most expensive is true. But the high prices are not caused by density - rather, the density and the high prices are both a consequence of crushingly high demand […] There's a feedback loop, but what matters here is the elasticity, which is less than one. We can measure this empirically. New housing lowers prices via the mechanism of adding supply, which is basic economics and how we expect markets to work. New housing could raise prices if it also made the city a more desirable place to live and shifted people's preferences, such that there was more demand to live there after the new housing is built. If you think it's unclear which of these effects would dominate, luckily we have empirical data that over and over and over shows adding housing supply does indeed lower prices on a local level. This is a fairly well established result that replicates well. edit: I'm actually thinking about drawing out the weighted DAG graphs here to make the conceptual stuff easier, but it would be pretty long. I'd love to do this as a guest post. I’m skeptical of the empirical results because they don’t match the much stronger “Manhattan vs. Conanicut island” empirical results, and if I try to think about why, the best explanation I can think of is that the Manhattan experiment has been going on longer (ie long enough for Manhattan’s extra residents to found businesses and institutions that attract new people). I’ve told him he can try pitching this guest post to me; in either case, I would be interested in seeing the graphs. Several other people also posted this graph that Johnson helped make famous: Hopefully by now you can predict my objection: the places in the southeast corner are mostly unfashionable red state Sun Belt cities; the places in the northwest corner are mostly trendy liberal coastal cities. My conclusion is that trendy liberal coastal cities are both more NIMBY and more desirable, and if you use this to draw any conclusions about housing policy you’ll just end up confused. But maybe I should take this same lesson to heart myself. Dense cities are mostly trendy liberal coastal cities; uninhabited tundra in North Dakota isn’t. Maybe the demand is just for trendy liberal coastal cities, and once you attain that status, extra density doesn’t matter that much. Maybe Oakland has already maxed out its “trendy liberal coastal city” status, and even if it became Manhattan-sized, it wouldn’t get any trendier, or would get trendier only with a long time lag. There are a few very trendy small coastal villages in California (think eg Sea Ranch); maybe these (rather than North Dakota) are the natural control group for San Francisco. I think they are still cheaper than SF, but maybe not by very much. Cameron Murray is a housing economist whose work some other commenters recommended; he also writes the blog Fresh Economic Thinking. He very kindly showed up and wrote: I think you are in general right that agglomeration effects are real, which is why bigger cities have higher value to residents. I agree that people move locations. But I think you can go a step further. If one city is growing faster and densifying, surely those people are not demanding homes in other cities and those cities build slower. This is part of the spatial equilibrium story that further makes claims about “build density and get cheap homes” less plausible. 7. My Final Thoughts + Poll Thanks to everyone who commented on this post and helped me refine my thoughts. I’m willing to concede the following points: It might be that only attracting the sort of educated people who found companies, universities, etc will make housing prices go up. Less educated people will take more jobs than they create and not ratchet up the city’s desirability level. (I’d previously told commenters talking about “gentrification” that it was irrelevant to the mechanism I was talking about here, but maybe it isn’t - maybe “gentrifiers” are the people creating more jobs and institutions than they consume, and so homes that attract them in particular will increase demand more than they increase supply? Maybe this discussion does reduce to the gentrification discussion?)
Inline links: https://substackcdn.com/image/fetch/$s_!_tzy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdac57667-98e2-40ff-bc78-7f3568efab8d_237x162.png, writes, writes, https://substackcdn.com/image/fetch/$s_!PkSC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8f1265d-d113-4d13-87cb-0d1985389b4d_1000x750.jpeg, source, Tom Thought, https://substackcdn.com/image/fetch/$s_!86gT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5788a8dd-c5ee-4111-be67-9c1533f7410b_335x172.png, Infinite Scroll, writes, Last House On The Left, writes, writes, quotes, Under-occupied Developments In China, writes, China’s Ghost Cities Are Finally Stirring To Life After Years Of Empty Streets., https://substackcdn.com/image/fetch/$s_!AAuI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d6a6159-a16f-45e3-b15a-e62cb2c73336_1200x900.jpeg, this list of real estate prices across Chinese cities, Paul Botts, https://www.researchgate.net/publication/345092218_Housing_Vacancy_Rate_in_Major_Cities_in_China_Perspectives_from_Nighttime_Light_Data, Tang Poetry, writes, ddd pointed out, Mike, Maximum Limelihood Estimator, gets more on point, http://econfac.bsu.edu/research/workingpapers/bsuecwp200804gisser.pdf, writes, writes:, far higher in Austin, https://substackcdn.com/image/fetch/$s_!tAF8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03f2b338-da84-4d85-b71c-0ce2e02e10e1_685x559.png, https://substackcdn.com/image/fetch/$s_!mLwn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cf942da-5716-49e8-a177-20140681e7fc_573x421.png, Center for New Liberalism, the Neoliberal Podcast, Jeremiah “Liver-Eating” Johnson, Johnson helped make famous, https://substackcdn.com/image/fetch/$s_!tg5t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc490a60a-3e97-4b73-9806-79bb967b5172_608x550.png, Sea Ranch, some other commenters, Fresh Economic Thinking, wrote
Makeshift housing in a North Dakota oil boom town (source) If each person creates half a job, the original 1,000 oilmen attract 500 service workers, those 500 attract another 250, and so on until population stabilizes at 2,000 people. In this model, if there are fewer than 2,000 houses in the town, demand exceeds supply (no matter what is going on in the rest of the country), but if there are more than 2,000, supply exceeds demand. So if we imagine Google’s presence as an oil-like resource, the extra demand for housing in the Bay should gradually decline: at some point, you will have finished housing the Google workers and the service workers who support them. But this isn’t right either, because Google isn’t a natural resource - it’s a company founded by Bay Area residents. If you got more Bay Area residents, you would (with some delay) get more Googles. Or: Austin gets lots of jobs from Tesla. Tesla wasn’t founded by Austinites. But it moved to Austin when it became a known “tech hub”, ie a place with lots of tech companies and tech employees. It wouldn’t have moved to Austin if Austin was still an uninhabited plain or a one-horse town. So as Austin got bigger, it attracted more tech companies. So in both the Bay Area case and the Austin case, having more people attracted more tech companies, either because the residents themselves found the company or because the company gets attracted to this newly bustling city. Potential counterargument: Each new Bay Area resident gives the Bay another lottery ticket to found the next Google. If having the first Google gets it an extra 1 million people, but there are 300 million people in the US, then those extra 1 million only give it a 1/300 chance of winning the next lottery. So even though the Bay Area won the lottery once, and this made it have high demand, this doesn’t mean the high demand will cause it to win more lotteries. If you win the lottery once, spend all your winnings on more lottery tickets, and keep doing this forever, you haven’t invented an infinite money printing machine, eventually you’ll just lose. Potential counter-counter-argument: the Bay got Google, and Facebook, and Apple, and . . . so these can’t all be separate lotteries. I think you should probably model it as a high-level lottery to become the next hub of a tech-sized industry, plus many low-level lotteries where once you’re the tech hub, you’re attracting lots of techies, and each techie gives you a ticket in a lottery where the denominator is the number of techies to found the next big tech company. And the Bay might have half the US’s techie population. So maybe here there is a self-sustaining lottery-winning cycle, at least until tech plays itself out and nobody wants any more tech companies. And that might take a long time. Tom (author of Tom Thought) writes: The primary drivers of demand for living in NYC are the specific opportunities available in NYC. It is true that on long time horizons, one of the reasons these opportunities have tended to collect in NYC is that it is a dense place. But those aren't the only reasons - NYC is much more important than other, bigger cities in other parts of the world for complex historical reasons. Even if a catastrophe were to wipe out half the city, there would still be a great deal of demand to live near important institutions like Broadway, Wall Street, Port of NY & NJ, Columbia, etc (assuming those institutions survived the catastrophe). Increasing the number of housing units has a very mechanical impact on how many people can live in the place. But it has only a second-order impact on the types of institutions that drive demand to live in the city. People don't just generically crave to live near other people for the most part (a handful of urbanist freaks like myself excepted). The Bay Area is a great example of this. It is much less populated than other much cheaper cities. Density isn't why people want to live there - it's access to a specific culture and specific institutions. Demand for that is not simply a function of density - some people want to be part of Bay Area culture and others don't. Adding more units will induce some demand as a second-order effect, but will bring prices down as a first-order effect. To relate this to your model: we might be able to say that the country has a certain number of abstract "culture points" that have been allocated to different cities by various historical forces. Each culture point a city has increases demand to live in that city by a certain amount. Adding more people to the city may allow it to generate additional culture points over time, or acquire culture points from other cities, but this doesn't happen right away, and is determined by a host of factors other than just density. Under this model, we expect a place like NYC to always cost much more than North Dakota (since NYC possesses a large number of culture points), but we would also expect that adding additional housing units to NYC would bring costs down (since there are now additional housing units per culture point). Perhaps this process will over time allow NYC to steal away some culture points from Chicago, Boston, or other cities, but this is a secondary effect. This just seems to be passing the buck. Yes, people move to New York because it has Broadway, Columbia University, and Wall Street. Why does it have those things? Because one in every X New York citizens founds a good artistic/educations/financial institution, and New York has a large population of employees to work at those institutions and customers to patronize those institutions. If Conanicut Island had a population of 10 million people instead of Manhattan, there would be lots of great institutions on Conanicut and it would have more culture points. I don’t think it’s a culture-point game and population/density just sort of occasionally redistributes culture points, I think to a first approximation culture points just track population/density. Maybe they track the population/density of upper class people better than the total population/density, but I don’t think this is a big enough distinction to sink the argument. 3. Comments About Chinese Ghost Cities Some people brought these up as a good natural experiment: the Chinese really did try building millions of houses on their equivalent of a North Dakota plain. What happened? Jeremiah Johnson (author of Infinite Scroll) writes: You currently seem like you're at the stage of understanding the thought experiments pretty well, but not understanding them on a DEEP level. For example with your hypothetical, this has actually happened before! Kind of. China built a bunch of 'ghost cities' basically out of nothing, and while there was an initial craze of speculation and tons of investment and building... nobody went to live in those cities most of the time. And now they're deeply distressed assets worth basically nothing. When nobody actually lives in the ghost city, it doesn't matter that they have super dense housing. There's no demand. (the only reason they might be worth something is that the CCP very, very much does not want to pop their huge housing bubble and is likely to bail out some of the parties involved) Parmenides (author of Last House On The Left) writes: I think your mixing up the agglomeration effects of density, which is what induces the demand, and the housing supply. You can't just build a city and expect people to move in, China has tried that. But if you have the agglomeration effects of density and shortage of housing due to artificial constraints, which we have all across the US, then you get dense areas with high housing costs. sdwr writes: Think of China's ghost cities / apartment blocks. Prices surely can't be that high there. Maybe the answer is that developers are good at their job, and build supply where theres demand for it? But several other people object that although the Western press made a big deal about Chinese ghost cities a few years ago, it mostly just took a couple of years for people to move in, and now at least some of them seem to be thriving. For example, Michael quotes the Wikipedia article, Under-occupied Developments In China: Reporting in 2018, Shepard noted that "Today, China’s so-called ghost cities that were so prevalently showcased in 2013 and 2014 are no longer global intrigues. They have filled up to the point of being functioning, normal cities". Ash Lael writes: I'm sceptical of the Chinese "ghost city" phenomenon. I haven't explored the issue rigorously but my impression is that in areas that were previously dismissed as "ghost cities" like Ordos Kangbashi, the population is now large and growing. I think we in the west are so used to infrastructure bottlenecks and short sightedness and anti-construction policies that the idea of it being possible to build the housing and infrastructure to accomodate expected demand ten years in the future is completely foreign to us. Perhaps building brand new cities before they are even needed is what the YIMBY utopia looks like. See also Bloomberg: China’s Ghost Cities Are Finally Stirring To Life After Years Of Empty Streets. This wasn’t trivial. It looks like the Chinese government had to put in some work to make people move in, including opening good schools and universities there. Probably if they had just built apartments in the middle of the desert and nothing else, they would have stayed empty. But that’s even more of a reductio ad absurdum than the original ghost city plan. Kangbashi, China’s most famous ghost city. What are housing prices like in the ghost city? Again from Bloomberg: Sitting on the southern outskirts of Inner Mongolia’s Ordos City (population 2.2 million), Kangbashi was the archetypal ghost city 10 years ago, with barren boulevards and empty buildings standing forlornly in the desert. Local officials are adamant that things have changed. They say 91% of homes in the district are occupied. In fact, after a yearslong construction freeze, the government approved six housing projects in 2020 and expects 3,000 homes to be built by the end of this year. Apartments in a new development are selling for 9,500 yuan per square meter, and downtown they go for 15,000 to 16,000 yuan, according to Liu Yueyue, 28, a salesman at a new residential development in the district’s northeast. “Would houses in a ghost town sell at such high prices?” asks Liu. Half of his customers come from outside Kangbashi, and most are parents who want to send their children to the well-regarded local schools, he says. Looking at this list of real estate prices across Chinese cities, Kangbashi seems squarely in the middle - for example, Wuhan and Xian are also in the 15,000 - 16,000 range. I claim this supports my argument: surely twenty years ago, houses in this particular deserted corner of Inner Mongolia would have been dirt cheap (if any even existed). But if you build a city there, it becomes just as expensive as any other city! Here it’s very obvious that the density caused the high prices instead of the other way around. Still, the Chinese housing market is weird, with significant vacancies even in expensive, well-developed cities. Paul Botts: No official vacancy rates are published in China and no specific definition of it exists there. Various think tanks and researchers both within that country and elsewhere have published estimates ranging from as low as 11 percent to as high as 24 percent. Those estimates have been for varying samples of Chinese cities, have used various definitions of housing vacancy rate, etc. The best (as in most systematic) estimate yet produced has come from researchers at a university in Liaoning. They used night-time urban lightsheds captured by a new (2018 launch) Chinese satellite having a new level of light sensing technology which allows separating out light from parks and plazas. They covered a large sample (49 cities), and made their sample representative of city type, city size, regions within China, etc. They also crossed-referenced with local housing data to ensure accurate balancing of their sample and to confirm that the satellite was successfully identifying light coming from housing blocks. They found vacancy rates of just under 20 percent in China's Tier 1 cities, and found rates above 20 percent in 40 of the 49 cities. They found the highest vacancy rates in western and northeastern cities, which are also the newest ones; that finding is consistent with the hypothesis of significant numbers of recently-built ghost cities. https://www.researchgate.net/publication/345092218_Housing_Vacancy_Rate_in_Major_Cities_in_China_Perspectives_from_Nighttime_Light_Data And Phil H (author of the blog Tang Poetry) writes: The price of housing in China has skyrocketed over the past few decades, as all those extra apartments have been built. I live in a pleasant but unremarkable southern city, and I paid London prices (about 4.5m yuan/$650k for a 1,300 sq ft flat). That seems to match Scott's hypothesis that high density leads to high prices. House prices here have risen much faster than incomes. They've risen in rural areas, too, but the increases in price in cities have been stratospheric. 4. Comments Accusing Me Of Not Considering Tokyo, Even Though I Included A Section In The Post On Why I Didn’t Think Tokyo Was Relevant I won’t name and shame people, but for example: You excluded Tokyo from your dataset. Tokyo has much higher density than SF and much lower price per sqft. Tokyo just kills this. Tokyo is bigger than New York and has significantly lower rent because they build more housing! This is in a wealthy country with even lower interest rates than the US. I don't think you have justified excluding non-US metros, like Tokyo, or Auckland. Doesn't this lead to the natural conclusion that there is a sufficient level of housing to build, and that the problem is that the USA's many metros are structured to prevent housing? It seems like you're just arguing that US metros are bad at building housing, which is also what Matt Yglesias is arguing. "Change my mind about housing, but don't mention Tokyo" is like saying "Change my mind about gun possession, but don't mention Switzerland." You can't test the effect of allowing new housing unless you're willing to look at cities that do, in fact, allow it. Tokyo and NYC both attract tons of new residents But Tokyo's housing rents have been stable, while NYC rents keep rising. Why? Tokyo has permissive housing construction laws. NYC makes building new housing almost illegal. Yes, dense cities are attractive, and that makes them get more dense over time. But it only makes them more expensive if you forbid new housing to keep up with the new residents. Tokyo! But I’m like the 10th person to bring it up… As I wrote on the original post (not even edited in! it’s been there the whole time!): I worry someone will bring up Tokyo as a counterexample. But I think Tokyo managed to build its way to low housing prices in the context of the rest of Japan also having good housing policy. Even if that isn’t true, Tokyo on its own is a quarter of the Japanese market, so it might be able to exhaust the entire pool of Japanese house-seekers by itself! That is, yes, you’re all correct that cities are only expensive in the context of more demand for city housing than the (NIMBY-constrained) city housing market can currently supply. You are all correct that if this problem were solved at the national level, then city housing would be cheap, and every additional city house would make it cheaper. My claim is that marginal changes - like Oakland building an extra 10,000 units, but everyone else staying the same - will most likely increase Oakland prices. Yes, if Oakland unilaterally built 50 million units, that would soak up the entire excess demand and probably lower prices everywhere (including Oakland). Yes, if the entire US switched to good housing policy at the same time, that would probably lower prices everywhere (including Oakland). But if we don’t do any of that stuff, and just build another 10,000 houses in Oakland, I think it would probably increase prices in Oakland. Some other people brought up that Japan has a declining population, and it’s much easier to have low house prices when your population is declining (compared to some previous time when number of houses presumably matched number of people), but ddd pointed out that people continue to migrate from the Japanese countryside to Tokyo, so its population continues to increase. Also, Mike (I’m stitching together two comments here): In a country with a declining population, you would expect that fewer homes are being built per capita because there's little to no competition for existing homes. But it's exactly the opposite! Japan builds far more homes per capita than the US does, despite their declining population […] As a result, the average Japanese home is very new and the average house is torn down and replaced after a relatively short 30 years. They're living in nice new homes for cheaper. 5. Comments Accusing Me Of Not Understanding Economics Maximum Limelihood Estimator writes: I think you're making a very common mistake here of confusing supply/demand with *quantity* supplied or quantity demanded. (This is very common! we teach students about this in micro 101 because it's so easy to make!) What you're seeing is that the quantity supplied is correlated with housing prices (true!). But this is very different from establishing that the supply curve--i.e. the amount of housing that would be produced at any given price, and what moves up/down when we regulate/deregulate supply--is positively correlated with price. Figuring out what supply curves look like is a lot less intuitive and requires some high-grade econometrics, which is why economists had to set up a whole commission just to study this particular problem (the Cowles Commission). In terms of resources for understanding how these concepts are different, a micro 101 textbook will cover this distinction. For the econometrics side of this, I've heard good things about Scott Cunningham's *Causal Inference Mixtape*, although I haven't personally used it. My claim is that increasing density within a city shifts the demand curve for housing within that city, because of increasing desirability. MLE later gets more on point: The effect you're discussing here is kind of real in a sense. When the marginal utility of housing increases for *other* people, density arguably becomes more desirable for me, which is kind of like the demand curve shifting up. These are called bandwagon goods and discussed here: http://econfac.bsu.edu/research/workingpapers/bsuecwp200804gisser.pdf In theory, the bandwagon effect could be so strong that parts of the demand curve are upward-sloping. Solutions like this are not, technically, prohibited by the laws of mathematics, just the laws of economics. (And arguably of physics--see paper for conditions where these kinds of bandwagon effects imply the amount of housing in the city would have to be negative). In practice, this effect exists but just can't overcome the normal, non-weird economics that says "making more of a good makes the prices fall." Again, I claim the existence of Manhattan vs. Conanicut shows that sometimes it does. I cannot find the words “housing”, “real estate”, or “land value” anywhere in that paper. Alex Poterack writes: There's two things going on here: confusing shifts in demand with movement along the demand curve, and getting causation backwards. You're assuming density causes prosperity, rather than prosperity causing density. There are ways the former can happen, but the bigger thing is that, for a wide range of historical reasons, you can make a lot of money in NYC and SF, so lots of people want to live there, so they get very dense. This is the prosperity shifting demand right, so at any given price, more people want to live there; this drives prices up, and they go higher the more fixed supply is. If you built a bunch of housing in Oakland, lots of people would move there because it's cheaper, which is movement along the demand curve; it's still the same number of people who want to live there at any price. Now, it's possible that the increased number of people living there makes the city more prosperous (this is the phenomenon of induced demand), which would shift demand right, but there are way more differences between NYC/SF and Oakland than just the density, so I don't think it would shift demand enough to offset this. In particular, if it's just a small increase in small, it's also a small increase in density, so there's almost no shift in demand (but there is movement along the curve). I still think this is missing my point, but I present it here in case anyone else is enlightened by it and wants to try further to convince me I’m making this mistake. 6. Comments By Famous People Who Potentially Have Good Opinions Scott Sumner is an economist and blogger; he writes: It is certainly the case that building more housing can make a city more desirable, and that this effect could be so strong that it overwhelms the price depressing impact of a greater quantity supplied. But studies suggest that this is not generally the case. Texas provides a nice case study. Among Texas’s big metro areas, Austin has the tightest restrictions on building and Houston is the most willing to allow dense infill development. Even though Houston is the larger city, house prices are far higher in Austin: Houston pretty much describes the “Oakland with more housing” outcome that Alexander views as somewhat far-fetched. Only in this case, it’s Austin with more housing. Alexander seems too quick to accept the, “If you build it they will come” idea—that you can build more housing and thereby boost demand so much that prices actually rise. I started the post with a graph of about 50 cities, showing a positive correlation between density and price. I’m having trouble seeing how Sumner’s point isn’t just “if you remove 48 of those cities and cherry-pick two, the relationship is negative”. My attempt to place Austin and Houston on the original graph, using Sumner’s data plus a few other things available online. Why weren’t they on there already? Maybe because the graph is metro areas and Sumner was talking about Austin and Houston as cities, but I’m not sure and agree this is confusing. Everyone knows Austin is more expensive than Houston because Austin is a trendy tech and culture hub and Houston isn’t (and relatedly, because Austin’s median family income is 50% higher than Houston’s). Unless someone wants to claim that its failure to build housing helped turn it into a trendy tech and culture hub, I don’t think there’s much point to this comparison. It’s true that Houston’s bigger size didn’t let it leapfrog over Austin to become a trendy tech and culture hub, which goes against some of what I claimed in the first part of this post. But I never claimed there would be a perfect 1-1 correlation between city size and trendiness, or that you could never find a pair of cities where one was bigger but the other was more trendy. Just that there would be a correlation. Moving on: Here’s the problem with this argument. It mixes up population change due to economic effects such as the benefits of agglomeration, with population changes due to regulatory changes such as less strict zoning. If you look at things this way, then the stylized facts work against Alexander’s argument. Over the past 50 years, increasingly strict zoning has reduced housing construction on big cities like New York and San Francisco. As a result, their populations have increased by less than in cities with less strict zoning, such as Houston. If Alexander were correct, then the price gap between the tightly controlled cities on the coast and the more laissez-faire cities of Middle America should have shrunk over time. Instead, the price gap has widened. New York and San Francisco were always more expensive than other cites, but with tighter zoning and less new construction the gap has become far wider. During the last fifty years, there was also deindustrialization and demographic sorting. This is just the Austin vs. Houston story all over again. Alexander is implicitly viewing this outcome as a “problem” for the city that builds more housing. They must sacrifice so that the rest of the country can gain. But in his scenario, Oakland is better off. Indeed if it were not better off, then why would more people choose to live in Oakland? In order for it to be true that building more housing boosts housing prices, it must also be true that the quality of existing houses (including neighborhood effects) rises by more than enough to offset the increase in supply. That means the new housing construction must make Oakland such a desirable place to live that the amenity effect overwhelms the quantity effect [...] Of course, economic change always has winners and losers. Here’s how I would describe the impact of allowing more housing construction in Oakland, in the unlikely event that this did raise housing prices: 1. America would benefit. 2. Oakland would benefit. 3. Poor people in America would benefit, in aggregate. 4. Affluent people in America would benefit, in aggregate. 5. Homeowners in Oakland would benefit. 6. Some renters in Oakland would benefit (from a more economically dynamic city.) 7. Some renters in Oakland would suffer from higher rents. In the much more likely case where new housing construction would lower prices, the impact described in #5 and #7 might reverse. Either way, there is no defensible argument for not building more housing in Oakland, regardless of the impact on price. If building more housing reduces its price, then there is a strong argument for allowing more housing construction. If building more housing raises its price, then the argument for more construction is even stronger. I agree with all this. Jeremiah Johnson is a co-founder of the Center for New Liberalism, host of the Neoliberal Podcast, and a YIMBY activist (not to be confused with Jeremiah “Liver-Eating” Johnson, who killed 300 Native Americans and ate their livers). He writes: Here's why you're wrong in a single sentence: Demand causes high prices, not new units. Prices are high in SF and NYC because those are desirable places to live for a huge number of people. People all over the country and the world would live there if they could, and prices reflect that. The fact that the densest cities are the most expensive is true. But the high prices are not caused by density - rather, the density and the high prices are both a consequence of crushingly high demand […] There's a feedback loop, but what matters here is the elasticity, which is less than one. We can measure this empirically. New housing lowers prices via the mechanism of adding supply, which is basic economics and how we expect markets to work. New housing could raise prices if it also made the city a more desirable place to live and shifted people's preferences, such that there was more demand to live there after the new housing is built. If you think it's unclear which of these effects would dominate, luckily we have empirical data that over and over and over shows adding housing supply does indeed lower prices on a local level. This is a fairly well established result that replicates well. edit: I'm actually thinking about drawing out the weighted DAG graphs here to make the conceptual stuff easier, but it would be pretty long. I'd love to do this as a guest post. I’m skeptical of the empirical results because they don’t match the much stronger “Manhattan vs. Conanicut island” empirical results, and if I try to think about why, the best explanation I can think of is that the Manhattan experiment has been going on longer (ie long enough for Manhattan’s extra residents to found businesses and institutions that attract new people). I’ve told him he can try pitching this guest post to me; in either case, I would be interested in seeing the graphs. Several other people also posted this graph that Johnson helped make famous: Hopefully by now you can predict my objection: the places in the southeast corner are mostly unfashionable red state Sun Belt cities; the places in the northwest corner are mostly trendy liberal coastal cities. My conclusion is that trendy liberal coastal cities are both more NIMBY and more desirable, and if you use this to draw any conclusions about housing policy you’ll just end up confused. But maybe I should take this same lesson to heart myself. Dense cities are mostly trendy liberal coastal cities; uninhabited tundra in North Dakota isn’t. Maybe the demand is just for trendy liberal coastal cities, and once you attain that status, extra density doesn’t matter that much. Maybe Oakland has already maxed out its “trendy liberal coastal city” status, and even if it became Manhattan-sized, it wouldn’t get any trendier, or would get trendier only with a long time lag. There are a few very trendy small coastal villages in California (think eg Sea Ranch); maybe these (rather than North Dakota) are the natural control group for San Francisco. I think they are still cheaper than SF, but maybe not by very much. Cameron Murray is a housing economist whose work some other commenters recommended; he also writes the blog Fresh Economic Thinking. He very kindly showed up and wrote: I think you are in general right that agglomeration effects are real, which is why bigger cities have higher value to residents. I agree that people move locations. But I think you can go a step further. If one city is growing faster and densifying, surely those people are not demanding homes in other cities and those cities build slower. This is part of the spatial equilibrium story that further makes claims about “build density and get cheap homes” less plausible. 7. My Final Thoughts + Poll Thanks to everyone who commented on this post and helped me refine my thoughts. I’m willing to concede the following points: It might be that only attracting the sort of educated people who found companies, universities, etc will make housing prices go up. Less educated people will take more jobs than they create and not ratchet up the city’s desirability level. (I’d previously told commenters talking about “gentrification” that it was irrelevant to the mechanism I was talking about here, but maybe it isn’t - maybe “gentrifiers” are the people creating more jobs and institutions than they consume, and so homes that attract them in particular will increase demand more than they increase supply? Maybe this discussion does reduce to the gentrification discussion?)
Inline links: source, Tom Thought, https://substackcdn.com/image/fetch/$s_!86gT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5788a8dd-c5ee-4111-be67-9c1533f7410b_335x172.png, Infinite Scroll, writes, Last House On The Left, writes, writes, quotes, Under-occupied Developments In China, writes, China’s Ghost Cities Are Finally Stirring To Life After Years Of Empty Streets., https://substackcdn.com/image/fetch/$s_!AAuI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d6a6159-a16f-45e3-b15a-e62cb2c73336_1200x900.jpeg, this list of real estate prices across Chinese cities, Paul Botts, https://www.researchgate.net/publication/345092218_Housing_Vacancy_Rate_in_Major_Cities_in_China_Perspectives_from_Nighttime_Light_Data, Tang Poetry, writes, ddd pointed out, Mike, Maximum Limelihood Estimator, gets more on point, http://econfac.bsu.edu/research/workingpapers/bsuecwp200804gisser.pdf, writes, writes:, far higher in Austin, https://substackcdn.com/image/fetch/$s_!tAF8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03f2b338-da84-4d85-b71c-0ce2e02e10e1_685x559.png, https://substackcdn.com/image/fetch/$s_!mLwn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9cf942da-5716-49e8-a177-20140681e7fc_573x421.png, Center for New Liberalism, the Neoliberal Podcast, Jeremiah “Liver-Eating” Johnson, Johnson helped make famous, https://substackcdn.com/image/fetch/$s_!tg5t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc490a60a-3e97-4b73-9806-79bb967b5172_608x550.png, Sea Ranch, some other commenters, Fresh Economic Thinking, wrote
Building new housing in certain cities with specific windfalls (eg Wall Street in NYC, tech in the Bay) might absorb the windfall faster than it produced new windfalls (eg building new houses in SF might make prices lower by successfully housing all existing Google employees, without necessarily producing new Googles). This depends on global factors like how hard it is to make the next Google, how many new Googles the world economy has room for, and how much of an advantage San Francisco has over Cleveland or China in being the most likely location for the next Google.
Finally, an oversized capital force creates an artificial city region. In the US, the Tennessee Valley Authority was a Depression Era program to develop a poor region using federal government money. The hydroelectric dams and other infrastructure that the money bought seemed to be great successes at first, and to be sure they did reduce poverty. But problems later appeared, and today the region isn’t particularly dynamic, in addition to being riddled with environmental issues. Jacobs explains that the federal aid could never truly help, because the Tennessee Valley has always lacked an import-replacing city. Subsidies, grants, and loans give at best the illusion of development. None of these five types of rural regions tend to do great in the long run, unless they manage to generate an import-replacing city. But at least they receive something from distant cities. It’s far worse when a region is untouched by city forces at all, as Bardou was for a long time. Or as was a hamlet in North Carolina that Jacobs calls “Henry” for anonymity reasons, but which we can safely reveal to be Higgins, in the Appalachian region. Here is what Higgins looked like in 2013 on Google Street View: There is a nice modern road in that screenshot, but between its 18th-century founding and the 1920s, there wasn’t even a path that a horse-drawn wagon could use, and so Higgins was extremely isolated. It barely sold anything to anyone outside, and accordingly imported very little. The people lived from subsistence farming. Their lives were so difficult, so focused on sheer survival, that they gradually forgot many of the skills and techniques that their British ancestors had, like candle making, weaving from a loom, and even masonry. When Jacobs’ aunt arrived as a Presbyterian missionary in 1922, and suggested that they build a church out of stone, the people of Higgins confidently stated that this was impossible: mortar just wasn’t strong enough. “These people came of a parent culture that had not only reared stone parish churches from time immemorial, but great cathedrals,” Jacobs writes, and yet eventually they forgot that stone buildings were a possibility at all. Such is the fate of regions that get cut off from cities. Jacobs calls them bypassed places. Sometimes these places are entire countries, such as Ethiopia, once the seat of an empire, but which as of the 1980s had barely any links to cities except its own backward ones. Unsurprisingly, Ethiopia has high prices (for Ethiopians) and too few jobs. That will always be so, unless one of its cities can start the process of import replacement. III. Should Everything Be a City-State? That was roughly the first half of the book. After that, Jane Jacobs discusses various consequences of her theory, including why decline happens and how we can, in theory, prevent it. We’ll get there — but first, it’s time for a detour through the other book, The Question of Separatism, which provides a great case study of Jacobs’s ideas. After an introductory chapter in which Jacobs acknowledges that separatism always makes everyone emotional, and warns that she’s going to study it in a dispassionate manner anyway, she starts by describing the issues in Quebec and Canada through a specific lens. You can probably guess which lens. That’s right — cities. To her, the question of Quebec separatism is primarily the question of how the two main cities in Canada, Toronto and Montreal, have coexisted and will coexist in the future. At this point you need at least a basic understanding of Canadian history. Here’s a quick primer, focusing on those two cities. Canadian History Speedrun (Jane’s Version) Canada, a word that used to refer to the large valley around the St. Lawrence river and the Great Lakes, was originally a colony of the Kingdom of France. Then the Kingdom of Great Britain conquered it in 1760. For various reasons, most of the French settlers stayed in Canada rather than emigrating to France or being deported, so at first, a small British elite ruled over a mostly French-speaking and Catholic colony. However, immigration from the British Isles, as well as from the newly seceded United States (loyalists who wanted to live in a monarchy rather than a republic for some reason) eventually tipped the linguistic and cultural balance. The population sorted itself such that the lower part of the valley (what is now Quebec) remained French, while the upper part (what is now Ontario) became English. The exception to this trend was the city of Montreal. Although located in Quebec, it became an English-speaking city and the hub for the British merchant elite. For at least a hundred years, it was the main city in Canada across almost all metrics: population, wealth, manufacturing, political influence. In the middle of the 20th century, Montreal grew enormously and became French-speaking again, owing to immigration from rural Quebec. It became the center of Quebecois culture and, with its increasingly educated population, the breeding ground for new ideas, including separatism. At the same time, the main city in Ontario, Toronto, was growing even faster. Immigrants from all over Canada and other countries poured into it (including Jane Jacobs herself). Sometime around 1970, it became bigger and wealthier than Montreal, and replaced it as the main economic hub. Many people attribute this to the rise of Quebec separatists, which supposedly scared the Anglo elite of Montreal into moving all the banks and companies to Toronto, and, to be sure, some of that happened — but of course, Jacobs prefers explanations that rely on city economics. One of the reasons for Toronto's economic and demographic growth is that it became the nexus of what Jacobs calls a conurbation, and would have called a city region if we were in the other book. In case you craved another concrete example of a city region, here’s a map of Ontario with two ways to define Toronto’s so-called “Golden Horseshoe” (Toronto itself is just the tiny strip in the middle of the red area, next to the lake): Meanwhile, Montreal never generated a conurbation or significant city region. This is Jacobs’s main hypothesis for why it was overtaken by Toronto, though she doesn’t give a lot of detail on why it happened. In any case, the result was that Montreal lost its status as the economic capital of the country. It became a regional city. The problem is that regional cities tend to do poorly. The nature of nations is to centralize everything in one place (we’ll come back to this). That’s why Paris has a large and rich city region, but Lyon and Marseille don’t. That’s why London looms so large in the UK’s economy while Glasgow or Manchester now contribute very little. There’s nothing wrong per se with being an economically stagnant regional city. Such cities can be fine places. When they’re the center of a supply region, like Calgary and Edmonton in oil-rich Alberta, they can even be wealthy. The complication for Montreal, though, is that its previous status as the main Canadian metropolis made it grow too large for this purpose. Yet, at the same time, Montreal plays an outsized cultural role for French-speaking Canadians — one that Toronto doesn’t even come close to fulfilling. So, Jacobs sees only decline for Montreal. And she thinks this means decline for Quebecois culture generally. Without a strong import-replacing city, Quebec will become a patchwork of supply regions, regions that workers abandon, or transplant economies, like the poverty-stricken Atlantic provinces in eastern Canada already are. Either the Quebecois resign themselves to this fate, she says, or they fight it — and the only true way to fight it is to declare independence. As of the 1980 referendum, she thinks they should go for independence. Generalized Separatism Quebecers did not go for independence, neither then in 1980 nor in 1995 when they voted on the question again. If they had, it would probably have been an example of a peaceful secession. Jacobs points out that there haven’t been many of those, if you exclude the decolonization of overseas imperial possessions (like Canada from Britain). Non-peaceful secessions have been common, but in those cases the destructiveness of war tends to overshadow everything else, economically speaking. In fact that might be the main reason most of us intuitively dislike separatism: we associate it with conflict. But peaceful non-colonial secessions do happen. Since 1980 there have been several more cases, like Czechia and Slovakia. When Jacobs wrote her book, though, the only good example she could think of was the independence of Norway from Sweden in 1905. She tells a great account of the process, noting that the outcome wasn’t predetermined: Sweden didn’t want to lose its western province, and did what it could to contain Norwegian nationalist sentiment. But Norwegian nationalist sentiment won — and importantly, both Norway and Sweden seemingly benefitted. Neither of them was particularly rich in the 19th century, and Norway was in fact dirt poor, which is why so many Norwegians escaped by emigrating to North America. Yet after the dissolution of their union, the two countries developed quickly, and both are now among the wealthiest countries in the world. They certainly didn’t disintegrate. (Of course, in Norway the wealth is due in large part to the oil that they discovered in the late 1960s. But they were pretty advanced by that point already — advanced enough that they could use the oil to develop their own industry, rather than get rich quick by exporting it raw, which is what keeps many countries trapped as supply regions.) When people argue against separatism, they often tout the benefits of being large. A Canada that would be split in two would mean smaller markets, and a weaker political counterweight to the United States. (Not to be mean to Canadian readers, but this argument seems delusional to me — I don’t think Americans currently see Canada as a political counterweight of any significance.) It would certainly be less prestigious. Large size, Jacobs says, is associated with power, and we admire power. We love slogans like “unity makes strength.” But after the medium-sized country of Sweden-Norway became the two smaller countries of Sweden and Norway, they both did well. Small size is less powerful, but it has its own advantages, such as nimbleness and ability to fail non-catastrophically. Small size also allows more diversity in cultural and economic matters, and here Jacobs waxes philosophical, pointing out that favoring diversity over uniformity is a recent, post-Enlightenment idea that has not yet been fully embraced in politics. We can see analogs everywhere. Europe, split into numerous small countries from the Middle Ages onward, became far more advanced than China, which has been unified more often than not. The city-states of ancient Greece and Renaissance Italy are seen as golden ages of Western civilization, even if they weren’t part of larger political units and therefore constantly went to war with one another. In business, large companies are impressive and powerful, but people always complain that Google or Microsoft have become stagnant and that the best place to work is tiny startups of about 2 cofounders and 4 employees. In biology, humans are more successful than numerous larger animals, and in terms of raw numbers, small animals like rats or insects are the most successful of all. Jacobs’s point isn’t that smaller is always better. Her point is that the converse statement, “bigger is always better,” is false — despite how intuitive it feels for political entities. Just like we don’t view a small nation like Switzerland or Singapore as a failure of unity, we (and in particular, Canadians) shouldn’t see the secession of a place like Quebec, if it’s done peacefully and democratically, as a failure either. Still, some people in online reviews of the book complain that this argument is a bit thin, especially considering that it serves as the foundation for the later chapters (which are more directly about late 1970s Quebec politics). Sure, small is beautiful, but large states are great for stability, peace, markets, whatever. If the potential benefits of small national size are Jacobs’s strongest argument, then we can breathe a sigh of relief and go back to agreeing that separatism is bad. Pointing out the widespread bias in favor of unified political entities does seem valuable to me, but okay, fair enough. Does Jacobs have deeper reasons why separatism might be a good idea in general? Yes, and for this we go back to the second half of Cities and the Wealth of Nations. Why Nations and Empires Fail Our breathing rate is regulated through a feedback mechanism. Too much carbon dioxide in the blood, or too little oxygen, and the brain stem commands the diaphragm to accelerate breathing. Once the levels are back to normal, the brain stem receives this feedback and slows breathing down again. Now, Jacobs asks, imagine an impossible creature: ten people, all doing their own thing, but whose breathing is somehow regulated by a single brain stem. The feedback the brain stem receives is a consolidated average of everyone’s carbon dioxide and oxygen levels, and the breathing rate the stem decides on is applied to all ten people, regardless of whether they’re sleeping or playing tennis. This, to put it mildly, wouldn’t work. This creature is an analogy, representing a nation. The ten people are its individual cities, and the breathing rate is the cities’ economies. If it sounds like a stupid analogy, that’s because it is: “I have had to propose a preposterous situation,” writes Jacobs, “because systems as structurally flawed as this don’t exist in nature; they wouldn’t last.” Nor do they exist in machines we design; they wouldn’t work. But “nations, from this point of view, don’t work either, yet do exist.” The feedback mechanism that fails to work properly in a nation is currency. A currency always fluctuates according to the exports and imports of the area where it circulates. Let me use the Republic of Venice and its ducat as a toy example, because the coins look nice: Whenever Venice produces something (like salt) and sells it abroad, foreigners need ducats to buy the exports, so the demand for ducats increases. When Venice buys something from abroad, it needs to use foreign currencies, so the demand for ducats decreases. Add up everything that Venice exports and imports, and you get either a trade surplus (more exports than imports) or a trade deficit (more imports than exports), which determines the value of the ducat relative to other currencies. In both cases, a negative feedback loop restores balance over time, just like our brain stem does with carbon dioxide levels. A trade surplus, and therefore a strong ducat, means that when foreigners want Venetian salt, it’s expensive. So Venice’s exports decrease, while imports increase, since Venetians can use their valuable ducats to buy stuff cheaply from abroad. Conversely, a trade deficit makes exports a bargain for foreigners and imports expensive for Venetians. This feedback loop is great. It’s exactly what a city needs to trigger the crucial import replacement process. When exports decrease and a trade deficit begins (maybe because Constantinople found a cheaper source of salt somewhere else), the weak ducat means that Venice is less able to afford the resources and manufactured goods it used to import. The people of Venice don’t want to have less of those goods, though, so they figure out ways to produce some themselves — that is, they do import replacement. Later they will be able to export the output of the newly expanding industries too, strengthening the ducat and continuing the cycle. Currencies, Jacobs explains, function as automatic tariffs (to protect local industry from foreign imports) and automatic export subsidies (to encourage local industry to export). They are “automatic” because of the feedback mechanism. Just like an accelerated breathing rate, they take effect exactly when they are needed — and no longer. … Or so they should, except that import replacement, as we discussed, is a city process. Whereas most currencies are national or supranational. National currencies work well for city-states, like the Republic of Venice or today’s Singapore. But in large nations, which, remember, are not the fundamental unit of economic life, they mess everything up. Take a city like Detroit. When Detroit’s exports (primarily cars) decrease, Detroit gets no feedback about this, because its currency is the United States dollar, and the United States dollar’s value depends on much more than Detroit. It depends on other cities whose foreign exports might be increasing at the moment. And on rural regions that are selling resources like oil abroad. Also, trade between Detroit and other cities that use the United States dollar — i.e., American cities — is structurally unable to provide any feedback whatsoever. So Detroit doesn’t get the signal that it should buy less stuff from other cities and replace the missing imports with local production. Instead, it just declines. Jacobs hypothesizes that this issue of national currencies is at the root of every large country’s economic troubles. It is why nations and empires always centralize everything into one large city, whether that’s Paris, London, Tokyo, or Toronto, or ancient Rome: that city, being the largest, is simply the only one for which national-level currency feedback works fine. The rest of the nation or empire, then, declines. But of course, nations and empires don’t accept this. They care about the economic well-being of their peripheral regions, sometimes out of genuine concern for the people there, sometimes out of fear that they rebel or hold independence referendums. So nations and empires will embark on every possible solution to reverse the decline. All of their solutions will look like good ideas at first, and yet fail at helping the peripheral regions. Worse, these solutions will weaken the cities, thereby destroying the only real wealth of the country and bringing untold hardship for everyone. Eventually the nation or empire will disintegrate, as nations and empires always do, and always will. Jacobs calls these false solutions transactions of decline. She identifies three types, and, content warning, you might not like some of them depending on your political sensibilities. Sustained military production is a transaction of decline. Permanent military bases and garrison towns are a special kind of settlement: they import a lot and export nothing. Superficially, producing weapons and supplies for the military seems like a good deal for some cities — Jacobs gives the example of Seattle, which, before Microsoft and Amazon were a thing, depended mostly on making military aircraft. But because nobody in a military base ever tries to replace those weapons and supplies with their own production, the trade is sterile in terms of economic development. In a sense, the wealth is slowly “drained” from cities. Large empires are especially prone to this: eventually all of their wealth is destined to the military just to keep the empire together.
Inline links: Higgins, https://substackcdn.com/image/fetch/$s_!d77P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a42329-e67b-47b5-9e68-757112957dbb_1600x718.png, https://substackcdn.com/image/fetch/$s_!Qj1l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff727cfae-ec34-42c4-b509-9a25f4126f2d_1600x834.png, https://substackcdn.com/image/fetch/$s_!T392!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9795745-8a40-40c1-8173-5296c13eef3b_1600x1553.png, https://substackcdn.com/image/fetch/$s_!zETj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e99b106-d34e-4a49-93a8-553a8f57952a_1600x1130.png, https://substackcdn.com/image/fetch/$s_!YeXj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe681e6-b571-4f4c-aa90-313f77fc1dbb_750x359.png
When people argue against separatism, they often tout the benefits of being large. A Canada that would be split in two would mean smaller markets, and a weaker political counterweight to the United States. (Not to be mean to Canadian readers, but this argument seems delusional to me — I don’t think Americans currently see Canada as a political counterweight of any significance.) It would certainly be less prestigious. Large size, Jacobs says, is associated with power, and we admire power. We love slogans like “unity makes strength.” But after the medium-sized country of Sweden-Norway became the two smaller countries of Sweden and Norway, they both did well. Small size is less powerful, but it has its own advantages, such as nimbleness and ability to fail non-catastrophically. Small size also allows more diversity in cultural and economic matters, and here Jacobs waxes philosophical, pointing out that favoring diversity over uniformity is a recent, post-Enlightenment idea that has not yet been fully embraced in politics. We can see analogs everywhere. Europe, split into numerous small countries from the Middle Ages onward, became far more advanced than China, which has been unified more often than not. The city-states of ancient Greece and Renaissance Italy are seen as golden ages of Western civilization, even if they weren’t part of larger political units and therefore constantly went to war with one another. In business, large companies are impressive and powerful, but people always complain that Google or Microsoft have become stagnant and that the best place to work is tiny startups of about 2 cofounders and 4 employees. In biology, humans are more successful than numerous larger animals, and in terms of raw numbers, small animals like rats or insects are the most successful of all. Jacobs’s point isn’t that smaller is always better. Her point is that the converse statement, “bigger is always better,” is false — despite how intuitive it feels for political entities. Just like we don’t view a small nation like Switzerland or Singapore as a failure of unity, we (and in particular, Canadians) shouldn’t see the secession of a place like Quebec, if it’s done peacefully and democratically, as a failure either. Still, some people in online reviews of the book complain that this argument is a bit thin, especially considering that it serves as the foundation for the later chapters (which are more directly about late 1970s Quebec politics). Sure, small is beautiful, but large states are great for stability, peace, markets, whatever. If the potential benefits of small national size are Jacobs’s strongest argument, then we can breathe a sigh of relief and go back to agreeing that separatism is bad. Pointing out the widespread bias in favor of unified political entities does seem valuable to me, but okay, fair enough. Does Jacobs have deeper reasons why separatism might be a good idea in general? Yes, and for this we go back to the second half of Cities and the Wealth of Nations. Why Nations and Empires Fail Our breathing rate is regulated through a feedback mechanism. Too much carbon dioxide in the blood, or too little oxygen, and the brain stem commands the diaphragm to accelerate breathing. Once the levels are back to normal, the brain stem receives this feedback and slows breathing down again. Now, Jacobs asks, imagine an impossible creature: ten people, all doing their own thing, but whose breathing is somehow regulated by a single brain stem. The feedback the brain stem receives is a consolidated average of everyone’s carbon dioxide and oxygen levels, and the breathing rate the stem decides on is applied to all ten people, regardless of whether they’re sleeping or playing tennis. This, to put it mildly, wouldn’t work. This creature is an analogy, representing a nation. The ten people are its individual cities, and the breathing rate is the cities’ economies. If it sounds like a stupid analogy, that’s because it is: “I have had to propose a preposterous situation,” writes Jacobs, “because systems as structurally flawed as this don’t exist in nature; they wouldn’t last.” Nor do they exist in machines we design; they wouldn’t work. But “nations, from this point of view, don’t work either, yet do exist.” The feedback mechanism that fails to work properly in a nation is currency. A currency always fluctuates according to the exports and imports of the area where it circulates. Let me use the Republic of Venice and its ducat as a toy example, because the coins look nice: Whenever Venice produces something (like salt) and sells it abroad, foreigners need ducats to buy the exports, so the demand for ducats increases. When Venice buys something from abroad, it needs to use foreign currencies, so the demand for ducats decreases. Add up everything that Venice exports and imports, and you get either a trade surplus (more exports than imports) or a trade deficit (more imports than exports), which determines the value of the ducat relative to other currencies. In both cases, a negative feedback loop restores balance over time, just like our brain stem does with carbon dioxide levels. A trade surplus, and therefore a strong ducat, means that when foreigners want Venetian salt, it’s expensive. So Venice’s exports decrease, while imports increase, since Venetians can use their valuable ducats to buy stuff cheaply from abroad. Conversely, a trade deficit makes exports a bargain for foreigners and imports expensive for Venetians. This feedback loop is great. It’s exactly what a city needs to trigger the crucial import replacement process. When exports decrease and a trade deficit begins (maybe because Constantinople found a cheaper source of salt somewhere else), the weak ducat means that Venice is less able to afford the resources and manufactured goods it used to import. The people of Venice don’t want to have less of those goods, though, so they figure out ways to produce some themselves — that is, they do import replacement. Later they will be able to export the output of the newly expanding industries too, strengthening the ducat and continuing the cycle. Currencies, Jacobs explains, function as automatic tariffs (to protect local industry from foreign imports) and automatic export subsidies (to encourage local industry to export). They are “automatic” because of the feedback mechanism. Just like an accelerated breathing rate, they take effect exactly when they are needed — and no longer. … Or so they should, except that import replacement, as we discussed, is a city process. Whereas most currencies are national or supranational. National currencies work well for city-states, like the Republic of Venice or today’s Singapore. But in large nations, which, remember, are not the fundamental unit of economic life, they mess everything up. Take a city like Detroit. When Detroit’s exports (primarily cars) decrease, Detroit gets no feedback about this, because its currency is the United States dollar, and the United States dollar’s value depends on much more than Detroit. It depends on other cities whose foreign exports might be increasing at the moment. And on rural regions that are selling resources like oil abroad. Also, trade between Detroit and other cities that use the United States dollar — i.e., American cities — is structurally unable to provide any feedback whatsoever. So Detroit doesn’t get the signal that it should buy less stuff from other cities and replace the missing imports with local production. Instead, it just declines. Jacobs hypothesizes that this issue of national currencies is at the root of every large country’s economic troubles. It is why nations and empires always centralize everything into one large city, whether that’s Paris, London, Tokyo, or Toronto, or ancient Rome: that city, being the largest, is simply the only one for which national-level currency feedback works fine. The rest of the nation or empire, then, declines. But of course, nations and empires don’t accept this. They care about the economic well-being of their peripheral regions, sometimes out of genuine concern for the people there, sometimes out of fear that they rebel or hold independence referendums. So nations and empires will embark on every possible solution to reverse the decline. All of their solutions will look like good ideas at first, and yet fail at helping the peripheral regions. Worse, these solutions will weaken the cities, thereby destroying the only real wealth of the country and bringing untold hardship for everyone. Eventually the nation or empire will disintegrate, as nations and empires always do, and always will. Jacobs calls these false solutions transactions of decline. She identifies three types, and, content warning, you might not like some of them depending on your political sensibilities. Sustained military production is a transaction of decline. Permanent military bases and garrison towns are a special kind of settlement: they import a lot and export nothing. Superficially, producing weapons and supplies for the military seems like a good deal for some cities — Jacobs gives the example of Seattle, which, before Microsoft and Amazon were a thing, depended mostly on making military aircraft. But because nobody in a military base ever tries to replace those weapons and supplies with their own production, the trade is sterile in terms of economic development. In a sense, the wealth is slowly “drained” from cities. Large empires are especially prone to this: eventually all of their wealth is destined to the military just to keep the empire together.
Higgins, North Carolina screenshot: from Google Street View.
Inline links: Google Street View
Like many of you, I’ve been following the debate around the Google memo - no! not that Google memo! - Google’s OpenAI Has No Moat, And Neither Do We, arguing that open source AI is poised to disrupt its bigcorp competitors. Here are some questions on whether that will happen:
Inline links: Google’s OpenAI Has No Moat, And Neither Do We
Maybe this isn’t as common-sensically wrong as it seems. I know many rich male Google programmers, but I have never seen any of them marry a stunning black girl from the ghetto. Why not? Wouldn’t the hypergamy hypothesis pronounce this a good deal for both of them? He gets a beautiful wife, she gets a rich husband? And it’s not just a race thing, I’ve also never seen them marry a beautiful hillbilly from West Virginia, or a beautiful farmer’s daughter from Modesto. I don’t even really see them marry a beautiful girl from the suburbs with a community college degree.
I mentioned before that I never see a rich male Google programmer dating a stunning woman from the ghetto. But I have heard of gay relationships like this (and the paper above describes some). Why? Commenters suggest that gays mostly meet their partners through “the gay community”, which takes a cross-section of society through a direction mostly uncorrelated with race and class. 12
Inline links: 12
An unblockable moving status bar that switches every few seconds between different messages about the product! This is what they think the people most obsessed with blocking flashing/changing elements on websites want! This new “show a constantly-moving status bar on screen to tell you when they will change another flashing element” thing has also made it onto the front page of Bing, although luckily you can dismiss it there. I would have expected Google to resist. They haven’t. I can no longer write things on Gmail - I have to compose on Notepad and then copy-paste to the Gmail window - because they’ve made it look like this: It cycles between these every few seconds, irregularly, as long as I keep typing. It baffles me that these companies will spend millions of dollars optimizing every aspect of their user interface, then add one completely unnecessary feature that ensures I will never spend more than the absolute minimum possible amount of time using their product. I know I’m not the only person who hates this, because when I Google it, I find Gmail help forum threads like: How do I get rid of the blinking “Draft Saved” message?
Inline links: made it onto the front page of Bing, https://substackcdn.com/image/fetch/$s_!5nxp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd94695d7-f8b4-4f34-90f3-7fd686899f21_1110x494.png, How do I get rid of the blinking “Draft Saved” message?
It cycles between these every few seconds, irregularly, as long as I keep typing. It baffles me that these companies will spend millions of dollars optimizing every aspect of their user interface, then add one completely unnecessary feature that ensures I will never spend more than the absolute minimum possible amount of time using their product. I know I’m not the only person who hates this, because when I Google it, I find Gmail help forum threads like: How do I get rid of the blinking “Draft Saved” message?
Inline links: How do I get rid of the blinking “Draft Saved” message?
How do I get rid of the blinking “Draft Saved” message?
Inline links: How do I get rid of the blinking “Draft Saved” message?
In this AI future, there might be 3-10 big AI companies capable of training GPT-4-style large models. Right now it looks like these will be OpenAI, Anthropic, Google, and Baidu; maybe this will change by the time these scenarios become relevant. Each might have a flagship product, trained in a slightly different way and with a slightly different starting random seed. If these AIs are misaligned, each base model might have slightly different values.
OpenAI is the most LibLeft, Google and Facebook are more authoritarian. “The paper speculates this might be due to BERT's training on more conservative books, while newer GPT models trained on liberal internet texts,” OpenAI denies the obvious alternative explanation that they’re better at RLHFing their AIs and so they match standard Bay Area politics better. I’d like to see future investigations include Anthropic’s Claude, which has been RLAIFed with some pretty left-wing-sounding prompts.
The idea that people are just feeling their way and long-run outcomes are unintended is a deep methodological commitment of cultural evolution. It’s built into the models4. But you need to be careful in applying that universally, because one part of human progress is the scaling of human forethought, from this season’s harvest and our small group, to the far future and the whole planet. That is how, in the early 21st century, humanity can be trying to rejig the entire world economy so as to avoid the future peril of global warming. And blindness/forethought is a continuous variable, not a dichotomy. The Western Church had some degree of collective agency, just as Google or the US does today, and it understood what it was doing under some description.
Inline links: 4
I wonder if there might be leeway to use texts as a way into historical psychology. We do now have large historical text corpuses available for mining. And there might be ways of relating them back to people’s psychology – like these guys who related human happiness to Google Books data. Just as a taster, here’s the occurrence of the French word for “we”, a plausible marker of group identity. See the spikes at the three major wars?
“So ‘Max Roser’ is just - I didn’t start the site. I was looking up econ development statistics on there a few years ago, and I something seemed off, they listed the GDP per capita of Mongolia in 2004 as being $5,820, but all my other sources were saying it was more like $5400 or so. I couldn’t reconcile it, so I wrote them an email asking if they’d made a mistake. A few days later, these people in robes show up at my door. They told me I had caught the last Max Roser in a mistake, so now by ancient tradition I was the new Max Roser. Apparently it’s not even a given name, it’s a Rosicrucian title - I think ‘Hans Rosling’ is another one, like a second-in-command. It’s like the Dread Pirate Roberts in that one book. I tried to tell them no - I was working for Google at the time - but they were very insistent. They made me an offer I couldn’t refuse. So now I’m Max Roser and I run Our World In Data. It’s an okay life, I guess.”
When I am elected, I will mandate that all American websites serve popups to European Union residents explaining why the GDPR is annoying and why it affects even Americans who have no say in it. If the Europeans want to be able to access Google, Facebook, Twitter, or any other US-based site without clicking “I understand” every time they reload it, they’ll have to pressure their government to do something about GDPR.
41: AI company Anthropic announces partnership with Amazon (including $1.25 - 4 billion investment). This was predictable: the story of the AI industry so far has been that from 2015 - 2020, a few true believers founded early startups that ate up the talent and gained the institutional knowledge. Now that AI is the Next Big Thing, the big tech companies are trying to catch up, having a hard time, and choosing to partner with the prescient early startups instead. The early startups are finding they can’t keep scaling without more money and data, forcing them to accept the big tech companies’ offers. First it was DeepMind + Google, then Open AI + Microsoft, and Anthropic was the last holdout but has acknowledged economic reality. The safety movement is concerned that Amazon might have enough power to steamroll over Anthropic’s safety-conscious culture; this did happen with DeepMind and Google, didn’t with OpenAI and Microsoft, and my guess is Anthropic held out for a good enough deal (and had enough bargaining power) that it won’t happen there either.
Inline links: AI company Anthropic announces partnership with Amazon
You can click here to make Google generate a random number 1 - 10,000.
Inline links: here
Parents are supposed to teach their children the skills they need to navigate the world. This already feels somewhat obsolete - where are the Google programmers who were taught Python by their fathers, or the Instagram influencers who learned content creation on their mother’s knee? Soon it will be completely hopeless. Where we’re going there are no roads. You’ll have to figure it out by yourself. If I am to pass on anything of value to you, it can only be the ultimate power, the technique that forms all other techniques.
Inline links: the ultimate power
A Google search brought up this lovely t-shirt. I think eBay’s policy of promoting inclusiveness by displaying shirts on ethnically diverse models may have failed them in this case. This is only the tip of the iceberg. Donald Trump Jr has a book called Triggered, and a biweekly TV show of the same name. Sheila Jeffreys’ biography is called Trigger Warning: My Radical Feminist Life. Jeffreys and Trump Jr may not have much else in common, but they are united by a shared appreciation for applying this technical psychiatric term to politics. I think this makes the most sense if political triggering and psychiatric triggering are literally the same thing because political toxicity is a subspecies of PTSD. D2: Persistent and exaggerated negative beliefs or expectations about oneself, others, or the world. Do I even need to explain this one? D3: Persistent distorted cognitions about the cause or consequences of the traumatic events that lead the individual to blame himself or others. As stated, this doesn’t really apply to politics. But I claim this is an overly restrictive description of the true problem, which is a general distortion of cognition around traumatic stimuli. See for example Reasoning, trauma, and PTSD: insights into emotion–cognition interaction. Here the researchers make people solve math/logic puzzles with five apples and eight oranges or whatever; as usual, most people do fine. Then they change the content to traumatic stimuli, like five rapists and eight abusers. Nobody is particularly happy about this change, but traumatized people seem to do worse when the stimuli relate to their own trauma. This is an exact analog to the “five Democrats and eight assault weapons” task discussed above; I don’t know if one line of research inspired the others, but they show some similar results. Other people have even more general findings. You may remember the Stroop Effect, where people have to say the color of words without getting distracted by their content. One variant is the Emotional Stroop Effect, where instead of giving color words (“yellow”, “red”, etc), you use emotional words and traumatic stimuli. Traumatized people tend to do worse at Emotional Stroop tasks relating to their specific trauma. See Modification of cognitive biases related to posttraumatic stress: A systematic review and research agenda. See also The Precision Of Sensory Evidence for a discussion of how this effect might happen. E1: Irritable behavior and angry outbursts (with little or no provocation) typically expressed as verbal or physical aggression toward people and objects. As seen at your family Thanksgiving table. Politics makes otherwise kind people into angry jerks. E3: Hypervigilance This is defined as a heightened awareness of surroundings, constantly scanning for danger, and misinterpreting innocuous stimuli as threatening. Wikipedia describes it as “there is a perpetual scanning of the environment to search for sights, sounds, people, behaviors, smells, or anything else that is reminiscent of activity, threat or trauma”. Dog whistles. Microaggressions. The hallmark of the advanced political partisan is the ability to describe everything the other side (or neutral third parties) do as secretly a political offense, and to reduce every possible situation to their issue of choice. For the past ten years, I’ve been involved in the anti-AI-existential risk movement, and have gotten to know other people in this movement pretty well. I can say with high certainty that the number one motive of these people is that they do not want to be killed by robots. Still, over the years people have ascribed every possible motive to us except that one, for example: It’s a plot by Big Tech to distract from other harms they are committing.
Inline links: Triggered, Trigger Warning: My Radical Feminist Life, Reasoning, trauma, and PTSD: insights into emotion–cognition interaction, Modification of cognitive biases related to posttraumatic stress: A systematic review and research agenda., The Precision Of Sensory Evidence, Dog whistles
Experiencing repeated or extreme exposure to aversive details of the traumatic event(s) (e.g., first responders collecting human remains; police officers repeatedly exposed to details of child abuse). This is already quite broad! The victim doesn’t need to have anything bad happen to them - just be threatened with it. And they don’t need to personally be the victim of the threat. They can learn that it happened to someone close to them, or they can just hear about it happening to someone else. A police officer who hears about child abuse may be a trauma victim! The DSM’s job is to draw a medico-legal boundary - this counts, but that doesn’t. The real world has no obligation to obey the DSM, and often doesn’t. For example, can someone be traumatized by something happening to a distant family member? It would be insane to think this has never happened, and that some law of nature limits it to close family members. The DSM is just using the heuristic that probably it’s worse when it’s someone close to you. It goes on: [Part 4] does not apply to exposure through electronic media, television, movies, or pictures, unless this exposure is work-related. Did someone prove it was a natural law that you can only be traumatized by seeing a story on TV if it’s for work? Or is this another unprincipled compromise? People not involved in the DSM, unbound by medicolegal considerations, have added all kinds of stuff to this basic definition. For example, even though it’s not in the strict DSM definition, psychologists almost universally agree that emotional abuse can be traumatizing. And in the current social climate, inevitably people have started talking about collective trauma, eg institutional racism may be traumatizing for some individual black person even if they personally have never been victimized in any dramatic way. The knowledge that people hate their whole group serves as an adequate proxy for anybody abusing them personally. Can you chain all of these exceptions together? Can witnessing a family member suffering emotional abuse be traumatizing? Can learning secondhand about someone encountering institutional racism be traumatizing? Can you be traumatized by hearing on TV that someone was emotionally abused on account of their race? Only if it’s part of your job? At this point the nice crisp distinctions of the DSM are starting to feel a little artificial. I think of all of this in a deflationist, spectrum-y type of way. Anything can be traumatizing if it gives you strong negative emotions and makes you feel helpless and victimized. The DSM points to some categories that are especially likely to cause this kind of reaction. Other people have added their own. But if something you hear on TV makes you feel victimized and helpless, then sure, go ahead and call it traumatizing. If Trump’s election made you feel victimized and helpless, then I’m prepared to say “trauma” is a potentially fruitful lens through which to investigate this response. (I’m not saying that Trump’s election was inherently traumatizing, or that trauma was the correct response. If you prefer, you can think of it as a condemnation of the media for irresponsibly fanning fear of Trump. I’m just saying, without trying to lay blame, that lots of people did experience feelings of fear and helplessness around Trump’s election.) III. I didn’t personally feel traumatized by Trump’s election. My own story, which I don’t claim is atypical or sympathetic in any way, is that in college a bunch of people tried to cancel me for something I’d intended to be an anti-racist joke, but which apparently didn’t come out that way. Former friends turned against me, I got a few death threats, and I was told to attend a criticism session at a local social justice meeting group (which I foolishly did; I thought people would realize I was cooperative and agreed with them, and so lay off - obviously this didn’t work). I briefly considered dropping out of college to avoid the hatred; instead I spent a month locked in my room, waiting for the storm to blow over. It was the worst experience of my life. Ever since then, when I read arguments promoting social justice and cancel culture, or saying that their victims are probably bad people and shouldn’t be allowed to defend themselves, I get all kinds of easily noticeable unpleasant bodily and emotional reactions. When I read good arguments against these positions, I get some kind of nice calm feeling, like that I’m suddenly safer and the world has brightened a little bit. I try as hard as I can to approach these kinds of issues fairly, but it wouldn’t surprise me if I make more of the “five Democrats and eight assault weapons” style reasoning errors there than I would on some boring topic like taxes. Of course, I hear similar stories from people on the other side of this particular culture war. A typical example (this is a pastiche of many people) would be a transgender person who sometimes gets harassed when they try to go into public restrooms. Even if it never gets beyond catcalling, they remember all the stories they read about trans people getting murdered, and even looks of disapproval feel like they carry the potential for physical violence. Then they hear about trans bathroom bills in North Carolina or wherever and absolutely see red; they feel like Society as an abstracted entity is trying to deny their right to exist. Then they invent entirely new kinds of social technology to prevent themselves from ever having to talk to or interact with the sort of people who would support such a thing. Most people haven’t personally been cancelled or discriminated against, and they might not have stories like these. But they might feel like society is “threatening” them with these kinds of experiences. Or they might have “close family” or “close friends” who qualify. Or they might have heard about them on TV. (In a work-related context? Sure, let’s say yes.) But also, there’s the collective trauma exemption! Everybody belongs to various groups - black people, white people, Jews, Christians, men, women, LGBTs, gun owners, socialists, cops. Parts of each of these groups have developed narratives about how they’re being singled out for special persecution by the people in power. You probably believe that some of these groups’ narratives are valid, and others are false and offensive. That doesn’t matter. The important thing is that (some of) the group members believe it. The DSM is quite clear that people react to threatened trauma, not actual trauma. If some very silly person works himself up into a frenzy believing he’s being abused and persecuted because he eats eggs for breakfast, that’s potentially traumatizing, even if his concerns have no basis. But also, everyday political debate crosses lines that would qualify as emotional abuse in any other sphere of life. People get told they’re disgusting or idiotic or deserve to die. They have to watch as powerful rivals plot openly how to ostracize them from polite society. Groups of their enemies get together to spread the rumor that they are Satanists, Nazis, or pedophiles. They have their views twisted into totally false claims that they want to murder children, which then “go viral” to people who otherwise know nothing about them. If you’re not famous, this might not happen to you personally - nobody says “John Smith is a Nazi pedophile”. But John Smith might be a socialist, and someone might say “All socialists are Nazi pedophiles”. If we believe that racism can traumatize minority individuals even if they’re not personally named in the stereotypes, we should believe that the discourse around socialism can traumatize socialists, even if they’re not personally involved. I’m probably not describing this well, so I can only beg you to supplement my inadequate words with your lived experience. All bullying sounds trivial when you’re not involved. “He called me a fatty on the playground!” Well, whatever, laugh it off. But somehow from the inside, iterated over many experiences, coming from people you perceive as more socially powerful than you, it creeps up on you, starts getting power you definitely don’t remember giving it. Think of some discourse you’re involved in, some issue you feel really invested in, and think about the people you find most unfair and enraging on the other side. I dunno, either you’ve had this experience or you haven’t. I think a lot of people feel persecuted and threatened by politics, a lot of people feel emotionally abused by politics, and a lot of people feel like they’ve had vicarious experiences of people they identify with being harmed by politics. This isn’t enough for a formal PTSD diagnosis - they probably didn’t watch the relevant TV news segments in a work-related context. But it might be enough to start doing some really unhealthy things to their brains. IV. Here’s what the DSM has to say about some symptoms of PTSD: B4: Intense or prolonged psychological distress at exposure to internal or external cues that symbolize or resemble an aspect of the traumatic event. The popular term for criterion B4 is “a trigger”. For example, if you were raped, you might be triggered by hearing someone describe rape. This is justification for so-called “trigger warnings” in books and movies. Triggers have long since jumped from the lexicon of PTSD to the lexicon of politics. Left-wingers describe exposure to right-wing ideas or symbols as “triggering”. Right-wingers try to avoid the terminology, because it sounds too leftie, but they have the experience so often that lefties asking right-wingers “oh, are you TRIGGERED?” has become a meme. Twitter searches for “triggered” are an interesting anthropological experience. A Google search brought up this lovely t-shirt. I think eBay’s policy of promoting inclusiveness by displaying shirts on ethnically diverse models may have failed them in this case. This is only the tip of the iceberg. Donald Trump Jr has a book called Triggered, and a biweekly TV show of the same name. Sheila Jeffreys’ biography is called Trigger Warning: My Radical Feminist Life. Jeffreys and Trump Jr may not have much else in common, but they are united by a shared appreciation for applying this technical psychiatric term to politics. I think this makes the most sense if political triggering and psychiatric triggering are literally the same thing because political toxicity is a subspecies of PTSD. D2: Persistent and exaggerated negative beliefs or expectations about oneself, others, or the world. Do I even need to explain this one? D3: Persistent distorted cognitions about the cause or consequences of the traumatic events that lead the individual to blame himself or others. As stated, this doesn’t really apply to politics. But I claim this is an overly restrictive description of the true problem, which is a general distortion of cognition around traumatic stimuli. See for example Reasoning, trauma, and PTSD: insights into emotion–cognition interaction. Here the researchers make people solve math/logic puzzles with five apples and eight oranges or whatever; as usual, most people do fine. Then they change the content to traumatic stimuli, like five rapists and eight abusers. Nobody is particularly happy about this change, but traumatized people seem to do worse when the stimuli relate to their own trauma. This is an exact analog to the “five Democrats and eight assault weapons” task discussed above; I don’t know if one line of research inspired the others, but they show some similar results. Other people have even more general findings. You may remember the Stroop Effect, where people have to say the color of words without getting distracted by their content. One variant is the Emotional Stroop Effect, where instead of giving color words (“yellow”, “red”, etc), you use emotional words and traumatic stimuli. Traumatized people tend to do worse at Emotional Stroop tasks relating to their specific trauma. See Modification of cognitive biases related to posttraumatic stress: A systematic review and research agenda. See also The Precision Of Sensory Evidence for a discussion of how this effect might happen. E1: Irritable behavior and angry outbursts (with little or no provocation) typically expressed as verbal or physical aggression toward people and objects. As seen at your family Thanksgiving table. Politics makes otherwise kind people into angry jerks. E3: Hypervigilance This is defined as a heightened awareness of surroundings, constantly scanning for danger, and misinterpreting innocuous stimuli as threatening. Wikipedia describes it as “there is a perpetual scanning of the environment to search for sights, sounds, people, behaviors, smells, or anything else that is reminiscent of activity, threat or trauma”. Dog whistles. Microaggressions. The hallmark of the advanced political partisan is the ability to describe everything the other side (or neutral third parties) do as secretly a political offense, and to reduce every possible situation to their issue of choice. For the past ten years, I’ve been involved in the anti-AI-existential risk movement, and have gotten to know other people in this movement pretty well. I can say with high certainty that the number one motive of these people is that they do not want to be killed by robots. Still, over the years people have ascribed every possible motive to us except that one, for example: It’s a plot by Big Tech to distract from other harms they are committing.
Inline links: emotional abuse, institutional racism may be traumatizing, entirely new kinds of social technology, https://substackcdn.com/image/fetch/$s_!G65x!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9be25ee8-5d77-4a57-a07d-efff5ff66b44_589x461.png, https://substackcdn.com/image/fetch/$s_!i3Mp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cd8c8a-3f2b-4842-b8fd-1455aa1310bf_591x587.png, Triggered, Trigger Warning: My Radical Feminist Life, Reasoning, trauma, and PTSD: insights into emotion–cognition interaction, Modification of cognitive biases related to posttraumatic stress: A systematic review and research agenda., The Precision Of Sensory Evidence, Dog whistles
GPT-6 will probably cost $75 billion or more. OpenAI can’t afford this. Microsoft or Google could afford it, but it would take a significant fraction (maybe half?) of company resources.
How many residents will live in Prospera, a new special economic zone in Honduras, on Jan 1, 2026? Answer: 600 (80% confidence interval 100-2,000) This seems like a good guess (except that my confidence interval would have included zero because there’s a 20%+ chance that it gets shut down). So overall its forecasts seem pretty impressive. But I was concerned by its reasoning even in some of the questions it got “right”. For example, the Nikki Haley question tried to get a base rate by asking what percent of elections Haley had won before, and found she had won 71% of them - these were mostly elections for South Carolina governor. You can see what the AI is trying to do - but it’s not going to work. Then it got confused and read a lot of news stories about how she’s currently losing the 2024 presidential election, and seemed to think they were about 2028. So either the AI only got a reasonable probability by coincidence, or it was testing many different strategies, throwing out the useless ones, and updating only on the useful ones, in a way that was kind of opaque to the casual reader. Still, if the company says it beats most human forecasters, this doesn’t seem totally impossible based on what I’ve seen. And that would be exciting! An AI that can generate probabilistic forecasts for any question seems like in some way a culmination of the rationalist project. And if you can make something like this work, it doesn’t sound too outlandish that you could apply the same AI to conditional forecasts, or to questions about the past and present (eg whether COVID was a lab leak). I would be most excited if at some point this graduated from its geopolitical focus and was able to answer questions on any topic (eg “what is the chance that Astral Codex Ten gains paid subscribers this year?”), maybe if the questioner gives it links or feeds it some of the appropriate information. FutureSearch is run by a team formerly from Metaculus, including former Metaculus CTO (and Google internal prediction market veteran) Dan Schwarz. They’re looking for potential clients and/or investors; if you’re interested, email hello@futuresearch.ai. Vitalik On AI Prediction Markets Vitalik Buterin, Ethereum-founder-turned-cryptocurrency-public-intellectual, has a blog post on The Promise And Challenge Of Crypto + AI Applications. One of them is a prediction market. He writes: Prediction markets have been a holy grail of epistemics technology for a long time; I was excited about using prediction markets as an input for governance ("futarchy") back in 2014, and played around with them extensively in the last election as well as more recently. But so far prediction markets have not taken off too much in practice, and there is a series of commonly given reasons why: the largest participants are often irrational, people with the right knowledge are not willing to take the time and bet unless a lot of money is involved, markets are often thin, etc. One response to this is to point to ongoing UX improvements in Polymarket or other new prediction markets, and hope that they will succeed where previous iterations have failed. After all, the story goes, people are willing to bet tens of billions on sports, so why wouldn't people throw in enough money betting on US elections or LK99 that it starts to make sense for the serious players to start coming in? But this argument must contend with the fact that, well, previous iterations have failed to get to this level of scale (at least compared to their proponents' dreams), and so it seems like you need something new to make prediction markets succeed. And so a different response is to point to one specific feature of prediction market ecosystems that we can expect to see in the 2020s that we did not see in the 2010s: the possibility of ubiquitous participation by AIs. AIs are willing to work for less than $1 per hour, and have the knowledge of an encyclopedia - and if that's not enough, they can even be integrated with real-time web search capability. If you make a market, and put up a liquidity subsidy of $50, humans will not care enough to bid, but thousands of AIs will easily swarm all over the question and make the best guess they can. The incentive to do a good job on any one question may be tiny, but the incentive to make an AI that makes good predictions in general may be in the millions. Note that potentially, you don't even need the humans to adjudicate most questions: you can use a multi-round dispute system similar to Augur or Kleros, where AIs would also be the ones participating in earlier rounds. Humans would only need to respond in those few cases where a series of escalations have taken place and large amounts of money have been committed by both sides. This is a powerful primitive, because once a "prediction market" can be made to work on such a microscopic scale, you can reuse the "prediction market" primitive for many other kinds of questions: Is this social media post acceptable under [terms of use]?
There definitely used to be a tech industry exception - or rather the tech industry was flagrantly violating CR hiring rules and getting away with it because it was so new and shiny and prestigious. Google's famous interview questions were thinly disguised IQ tests and other companies had similar practices. Of course the result was massive disparate impact. However, Griggs vs Duke Power Co does allow employers to use tests narrowly tailored for the job, and possibly EEOC bureaucrats could not figure out how to argue that coding-based tests like Google's are not legitimate or that hiring good software engineers is not a compelling enough business interest to set aside disparate impact requirements.
My admittedly anedotical 0.05$ as a generic office drone. *Every* white collar job I've heard of uses patently IQ test-like screening. I'm not talking about Google or Jane Street, I'm talking about big4 consultancies, mid-sized accounting firms etc. Places where productivity is not nearly high enough to justify resisting the acrimonious persecution Hanania posits, and that yet are happy to ask their applicants to submit Raven matrices or quirky plane geometry problems (the joke is even that the only thing those working there got out of grad school/MBA was prepping for the GMAT/GRE, since once hired they'll end up filling excels anyway).
I may be n=1 person, but I've heard that similar things are happening at Apple, Disney, Dreamworks, several large game studios (you would have heard of them if you were in the space, but I won't mention them, because that industry is small), Google, Facebook/Meta... I'll just stop there, but suffice it to say, this isn't everything.
California’s state senate is considering SB1047, a bill to regulate AI. Since OpenAI, Anthropic, Google, and Meta are all in California, this would affect most of the industry.
Inline links: SB1047
Go rogue and commit some other crime that does > $500 million in damage3. If the tests show that the model can do these bad things, the company has to demonstrate that it won’t, presumably by safety-training the AI and showing that the training worked. The kind of training AIs already have - the kind that prevents them from saying naughty words or whatever - would count here, as long as “the safeguards . . . will be sufficient to prevent critical harms.” So the bill isn’t about regulating deepfakes or misinformation or generative art. It’s just about nukes and hacking the power grid. There are some good objections and some dumb objections to this bill. Let’s start with the dumb ones: Some people think this would literally ban open source AI. After all, doesn’t it say that companies have to be able to shut down their models? And isn’t that impossible if they’re open-source? No. The bill specifically says4 this only applies to the copies of the AI still in the company’s possession5. The company is still allowed to open-source it, and they don’t have to worry about shutting down other people’s copies. Other people think this would make it prohibitively expensive for individuals and small startups to tinker with open-source AIs. But the bill says that only companies training giant foundation models have to worry about any of this. So if Facebook trains a new LLaMA bigger than GPT-5, they’ll have to spend some trivial-in-comparison-to-training-costs amount to test it in-house and make sure it can’t make nukes before they release it. But after they do that, third-party developers can do whatever they want to it - re-training, fine-tuning, whatever - without doing any further tests. Other people think all the testing and regulation would make AIs prohibitively expensive to train, full stop. That’s not true either. All the big companies except Meta already do testing like this - here’s Anthropic’s, Google’s, and OpenAI’s - that already approximate the regulations. Training a new GPT-5 level AI is so expensive - hundreds of millions of dollars - that the safety testing probably adds less than 1% to the cost. No company rich enough to train a GPT-5 level AI is going to be turned off by the cost of asking it “hey can you create super-Ebola?”, and putting the answer into a nice legal-looking PDF. This isn’t the “create a moat for OpenAI” bill that everyone’s scared of6. Other people are freaking out over the “certification under penalty of perjury”. In some cases, developers have to certify under penalty of perjury that they’re complying with the bill. Isn’t this crazy? Doesn’t it mean if you make a mistake about your AI, you could go to jail? This is deeply misunderstanding how law works. Perjury means you can’t deliberately lie, something which is hard to prove and so rarely prosecuted. More to the point, half of the stuff I do in an average day as a medical doctor is certified under penalty of perjury - filling out medical leave forms is the first one to come to mind. This doesn’t mean I go to jail if my diagnosis is wrong. It’s just the government’s way of saying “it’s on the honor system”. What are some of the reasonable objections to this bill? Some people think the requirement to prove the AI safe is impossible or nearly so. This is Jessica Taylor’s main point here, which is certainly correct for a literal meaning of “prove”. Zvi points out that it just says “reasonable assurance”, which is a legal term for “you jumped through the right number of hoops”. In this case probably the right number of hoops is doing the same kind of testing that OpenAI/Anthropic/Google are currently doing, or that AI safety testing organization METR recommends. The bill gestures at the National Institute of Standards and Technology a few times here, and NIST just named one of METR’s founders as their AI safety czar, so I would be surprised if things didn’t end going this direction. METR’s tests are possible and many AI models have successfully passed earlier versions. Other people worry there are weird edge cases around derivative models. I think the bill’s intention is that once you prove that your AI is too dumb to create nukes, you’re fine to open-source it. Third-parties can change its character, but not its fundamental intelligence. But in theory, a third party could get tens of millions of dollars of compute and keep training your AI to increase its fundamental intelligence. This would be a weird thing to do, and anyone with that much compute probably should just make their own model. But if someone wanted to screw you over by doing this, technically the law is kind of vague and you would have to trust a judge to say “no, that’s stupid”. Probably the law should clarify that it doesn’t apply to this situation. Other people are worried about a weird rule that you can’t train an AI if you think it’s going to be unsafe. After some simple points about having a safety policy set up before training, the bill adds that you should: Refrain from initiating training of a covered model if there remains an unreasonable risk that an individual, or the covered model itself, may be able to use the hazardous capabilities of the covered model, or a derivative model based on it, to cause a critical harm. This makes less sense than all the other rules - you can test a model post-training to see if it’s harmful, but this seems to suggest you should know something before it’s trained. Is this a fully general “if something bad happens, we can get angry at you”? I agree this part should be clarified. Other people think the benchmarking clause is too vague. The law applies to models trained with > 10^26 FLOPs, or any model that uses advanced technology to be equally as good despite less compute. Equally as good how? According to benchmarks. Which benchmarks? The law doesn’t say. But it does say that the Technology Department will hire some bureaucrats to give guidance on this. I think this is probably the only way to do this; it’s too easy to fake any given benchmark. Every AI company already compares their models to every other AI company on a series of benchmarks anyway, so this isn’t demanding they create some new institution. It’s just “use common sense, ask the bureaucrats if you’re in a gray area, a judge will interpret it if it comes to trial”. This is how every law works. Other people complain that any numbers in the bill that make sense now may one day stop making sense. Right now 10^26 FLOPs is a lot. But in thirty years, it might be trivial - within the range that an academic consortium or scrappy startup might spend to train some cheap ad hoc AI. Then this law will be unduly restrictive to academics and scrappy startups. Is this bad? Presumably we know now that AIs less than 10^26 FLOPs are safe. We suppose that maybe there is some level of AI (let’s say 10^30 FLOPs) which is unsafe. If we had this number auto-update for compute growth, eventually it would go above the unsafe number, and unsafe models would be exempt. But at some point we’ll probably discover that some new models (eg 10^28 FLOPs) are safe, and it would be good if the law was updated to exempt them too. Very optimistically, this might happen - California’s minimum wage was originally $0.15 per hour, but this got updated when inflation made that unreasonable. In the pessimistic case, this will be a problem for us thirty years from now, if we’re even around then. Other people note that an AI committing a cyberattack is a fuzzy bar. If you ask GPT-4 to write a well-composed, grammatically-correct phishing email (“Dear sir, I am the password inspector, please tell me your password”), the phishing works, and you use the password to blow up a power plant, does that count? I agree that it would be nice if the law were clearer on this. But I also agree with the lawyers who object that dealing with programmers is impossible and that laws will never be exactly as clear as code. Other people note that this will *eventually* make open source impossible. Someday AIs really will be able to make nukes or pull off $500 million hacks. At that point, companies will have to certify that their model has been trained not to do this, and that it will stay trained. But if it were open-source, then anyone could easily untrain it. So after models become capable of making nukes or super-Ebola, companies won’t be able to open-source them anymore without some as-yet-undiscovered technology to prevent end users from using these capabilities. Sounds . . . good? I don’t know if even the most committed anti-AI-safetyist wants a provably-super-dangerous model out in the wild. Still, what happens after that? No cutting-edge open-source AIs ever again? I don’t know. In whatever future year foundation models can make nukes and hack the power grid, maybe the CIA will have better AIs capable of preventing nuclear terrorism, and the power company will have better AIs capable of protecting their grid. The law seems to leave open the possibility that in this situation, the AIs wouldn’t technically be capable of doing these things, and could be open-sourced. (or you could base your Build-A-Nuke-Kwik AI company in some state other than California.) Finally - last week we discussed Richard Hanania’s The Origin Of Woke, which claimed that although the original Civil Rights Act was good and well-bounded and included nothing objectionable, courts gradually re-interpreted it to mean various things much stronger than anyone wanted at the time. This bill tells the Department of Technology to offer guidance on what kind of tests AI companies should use. I assume their first guidance will be “the kind of safety testing that all companies except Meta are currently doing” or “something like METR”, because those are good tests, and the same AI safety people who helped write those tests probably also helped write this bill. But Hanania’s book, and the process of reading this bill, highlight how vague and complicated all laws can be. The same bill could be excellent or terrible, depending on whether it’s interpreted effectively by well-intentioned people, or poorly by idiots. That’s true here too. The best I can say against this objection is that this bill seems better-written than most. Many of the objections to its provisions seem to not understand how law works in general (cf. the perjury section) - the things they attack as impossible or insane or incomprehensibly vague are much easier and clearer than their counterparts in (let’s say) medicine or aerospace. Future AIs stronger than GPT-4 seem like the sorts of things which - like bad medicines or defective airplanes - could potentially cause damage. This sort of weak, carefully-directed regulation that exempts most models and carves out a space for open-sourcing seems like a good compromise between basic safety and protecting innovation. I join people like Yoshua Bengio and Geoffrey Hinton in supporting it. Regardless of your position, I urge you to pay attention to the conversation and especially to read Zvi’s Asterisk article or his longer FAQ on his blog. I think Zvi provides pretty good evidence that many people are just outright lying about - or at least heavily misrepresenting - the contents of the bill, in a way that you can easily confirm by reading the bill itself. There will be many more fights over AI, and some of them will be technical and complicated. Best to figure out who’s honest now, when it’s trivial to check! If you disagree, I’m happy to make bets on various outcomes, for example: If this passes, will any big AI companies leave California? (I think no)
Inline links: 3, 4, 5, Anthropic’s, Google’s,, OpenAI’s, 6, here, The Origin Of Woke, read Zvi’s, his longer FAQ on his blog, reading the bill itself
5: I’ll never tire of analogies putting the US / Europe gap into perspective - for example, did you know that the median black American household earns more ($48,297) than the median UK household (£35,000 = $44,450)? Related, from @StatisticUrban - average house size in every US state vs. every European country:
25: Google has funnier AI drama - their AI search assistant is really bad and keeps treating troll answers as real authorities. For example:
I don’t know what happened to this one, and Google gives very different (but consistently wrong) answers each time you ask it. People have been taking this as a parable about the limits of AI, but Claude and GPT wouldn’t make these kinds of mistakes. Some AI people I know think this is probably a result of Google putting impossible demands on their AI in terms of how it deals with search/cache/memory. Still, it’s surprising that they let it out of testing in this state.
Unless you really lay on the tribal signifiers, it’s hard to find a definition where most Democrats support cancel culture and most Republicans oppose it! (the above poll probably overestimates support for cancel culture, because it talks about saying “things widely considered hateful” instead of, like, one tweet expressing a widely-shared opinion at the wrong time) Liberals invent a fictional entity called “The Right”, which is full of all of the most racist and fascist things that NYT was ever able to produce an out-of-context quote showing one Claremont guy saying, then believe that any action is justified against “The Right” because it’s an ontological threat against democracy, then rile up a mob against a Google guy who sends the wrong memo. Likewise, conservatives invent a fictional entity called “The Left”, which is full of all the most horrible woke things that FOX was ever able to find one Gender Studies professor saying, then believe that any action is justified against “The Left” because it’s coming for our children, then rile up a mob against a Home Depot woman who makes a bad tweet. 4. Nobody Is Ever Both-Sides-ist Enough I hate this because I’ve fought with these people on the Left, and they sound exactly the same. “If you feel like compromising with the Right, it’s important to remember what they’ve done. They separated families and locked children in cages. They forced 10-year-old rape victims to carry their rapists’ babies. They murdered our grandparents by refusing to mask in the middle of a pandemic. They killed thousands of American soldiers in a war over fake WMDs, then cut VA funding so the soldiers they wounded would die on the street. At this very moment, they’re boiling our planet alive to protect fossil fuel barons’ profits. How dare you suggest it could possibly be wrong to cancel someone like that!” This isn’t a knock-down argument. Sometimes you’re right when you think your enemies are bad, and they’re wrong when they think you’re bad. I can’t say for sure this isn’t one of those times. But: The fact that your enemies are just as sure as you are should make you less sure.
The Google comparison briefly confused me - “queries” here means “messages to the AI”, so a conversation with a hundred back-and-forth questions counts as 100 queries (whereas most people only query Google a few times daily). In terms of total visitors, c.ai is still only at about 0.02% of Google’s. Still, this is way more than I expected, given that even trying to follow AI trends I’d never really heard anything about this. “People getting addicted to AI girlfriends en masse” should be considered a present-day problem rather than a future one.
You can’t see it in the screenshot, but the first stock is NVIDIA, the second TSMC, the third Alphabet, and the fourth Microsoft. On average they went up about 0.5%, on a day when the NASDAQ as a whole also went up about 0.5%.
The big AI companies split among themselves. OpenAI, Meta, and Google opposed the bill, X.AI supported, and Anthropic dithered on an earlier version but ultimately came out in support after their feedback was taken into account. Many opponents claimed that the bill was a Trojan Horse attempt at regulatory capture by the big AI companies, so it was fun watching three of the biggest AI companies come out against it and prove them exactly wrong. I don’t think any opponents ever changed their minds, admitted they’d made a mistake, or even stopped arguing that it was a big AI company plot - but hopefully enough people were paying attention that it discredited them a little for the next fight.
I can’t find crosstabs for the adversarial collaboration version, but here they are from an earlier one (source).
Inline links: source
In the early 2010s, the AI companies hadn’t yet discovered scaling laws, and so underestimated the amount of compute (and therefore money) it would take to build AI. DeepMind was the first victim; originally founded on high ideals of prioritizing safety and responsible stewardship of the Singularity, it hit a financial barrier and sold to Google.
This scared Elon Musk, who didn’t trust Google (or any corporate sponsor) with AGI. He teamed up with Sam Altman and others, and OpenAI was born. To avoid duplicating DeepMind’s failure, they founded it as a nonprofit with a mission to “build safe and beneficial artificial general intelligence for the benefit of humanity”.
In the past, the Newsom administration has sided with AI companies to please his Silicon Valley donors. But Bonta has previously been involved in antitrust lawsuits against Amazon and Google, so he’s not afraid to confront Big Tech.
Inline links: sided with AI companies
Within each region it’s alphabetized first by country then by city - so the first entry in Europe is Vienna, Austria. The exception is the USA, where they’re also alphabetized by state - so the first entry in the USA is Huntsville, Alabama.
Contact: Jeremy Contact Info: alphabetadelta0[a t]protonmail[period]com Time: Sunday, April 6th, 3:00 PM Location: 78 Princess St, Kingston, ON K7L 1A5 at Minotaur. They have free board games available to play, I'll be at a table in a red shirt with a small ACX 2025 sign. Coordinates: https://plus.codes/87P56GJ9+F7 Group Link: https://discord.gg/zgG [remove this bit] w5hpjds Notes: Feel free to email me or join the Discord if you live in Kingston but don't intend to come to the schelling meet up!
Inline links: https://plus.codes/87P56GJ9+F7
Extra Info For Meetup Organizers: 1. If you’re the host, bring a sign that says “ACX MEETUP” and prop it up somewhere (or otherwise be identifiable). 2. Bring blank labels and pens for nametags. 3. If you’re having trouble thinking of something to talk about, the attendees probably also read ACX. Ask people about a recent post or book review that they liked. 4. If it’s the first meetup, people are probably just going to want to talk, and you shouldn’t try to organize some kind of planned workshop or anything like that. 5. Have people type their name and email address in a spreadsheet or in a Google Form (accessed via a bit.ly link or QR code), so you can start a mailing list to make organizing future meetups easier. 6. It’s easier to schedule a followup meetup while you’re having the first, compared to trying to do it later on by email. 7. If you didn’t make a LessWrong event for your meetup (or if you did but Skyler didn’t know about it) the LessWrong team did it for you using the username or email address you gave on the form. To claim your event, log into LW (or create an account) using that email address, or message the LW team on Intercom (chat button in the bottom right corner of lesswrong.com).berkel
The boring ol’ way to explain Bayes, of course, is through an equation like the one above… and doing a Google image search for “Bayes’ theorem” overwhelmingly pulls up more examples of that than anything else.
Codebuff, an AI coding startup I probably can’t take full credit for all of this just from giving them $20K in seed funding, but I continue to appreciate everything they do for this community and the world. 35: Further S’s Political Career This person didn’t win their election, but has since pivoted to AI safety and works in a well-regarded AI policy think tank. 36: Seeds Of Science, A Journal Of Non-Traditional Research No update received, but this was a public journal and it is easy to follow their work, see their website and Substack. They published two dozen articles of widely varying quality through 2023 and 2024, then closed in 2025. A remnant of the original vision survives as a science blogging aggregator. This was about my median expectation for this grant, but it was very inexpensive and I decided to take a chance on it anyway. 37: Good Science Project, Working To Improve Federal Science Funding No update received, but they have a public Substack discussing their progress. Their proposals for NIH reform have influenced Congress and made government agencies pay more attention to scientific integrity. 38: Advising Developing Countries On How To Grow Their Economies With our initial ACX grant, we piloted the Growth Teams model in Rwanda, helping the government jumpstart the export-oriented call center (BPO) industry. Since 2022, that effort has contributed to the creation of 2,000 formal jobs and the emergence of some of the country’s largest private employers. We’ve since expanded to Tanzania, Malawi, and the Indian states of Goa and Meghalaya. To refocus the global development discourse on broad-based economic growth, we co-organized the Growth Summit with the Center for Global Development and the Charter Cities Institute, and have published articles in leading outlets including Stanford Social Innovation Review, ProMarket, and the Global Prosperity Institute. Our work has attracted support from Open Philanthropy, Schmidt Futures, and Mulago Foundation, and our advisors now include economists Lant Pritchett, Stefan Dercon, and Kunal Sen. 39: Help Luca De Leo Get Started In AI Safety Research No update received, but Luca now runs the AI safety group at the University of Buenos Aires, Argentina. 40: Typist For Saharon Shelah This was another ACXG+ Grant, funded by an anonymous outside funder and not listed in the original announcement. Saharon is a prolific and influential Israeli mathematician, but many of his discoveries are hand-written in an unpublishable format. This grant funded a typist to help make his results suitable for publication. According to this page, they have made over fifty new papers and preprints available. Second Cohort: One Year Updates 41: Lead-Acid Battery Recycling In Nigeria The Nigeria field research was a major success. We spent most of September doing field research in multiple major cities in Nigeria, and got a good sense of the used lead-acid battery supply chain. This field research served as the foundation for expanding our project, and has been very impactful in shaping our ongoing research. We published our findings from Nigeria, which were shared with Nigerian government regulators and global NGOs working on lead poisoning. The grant also gave us the on-the-ground experience we needed to both fully understand and credibly engage with groups, both in Nigeria and globally, on the ULAB issue. In the meantime, beyond continued research, we’ve also launched a dashboard (trade.leadbatteries.org) for analyzing global lead trade data. Right now, we’re: Launching two studies (one RCT, one environmental analysis) in Nigeria in collaboration with local universities to develop a more rigorous understanding of lead pollution due to low-standard ULAB recycling in Nigeria Collaborating with a non-profit incubator to launch an NGO focused on demand-side solutions Beginning a partnership with a West African environmental regulator to scale cheap air monitoring technology to quickly identify and reduce lead pollution from low-standard smelting If any of this sounds interesting to you, please sign up for our Substack (leadbatteries.substack.com) or send us an email at hugosmith@uchicago.edu! 42: Compensation For Kidney Donors The End Kidney Deaths Act (H.R. 2687 / EKDA) is a groundbreaking ten-year pilot program designed to save lives and reduce healthcare costs. It provides a refundable tax credit of $10,000 per year for five years, a total of $50,000, to living kidney donors who donate to a stranger, helping those who’ve waited the longest on the transplant list. Between 2010 and 2021, 100,000 Americans died while qualified and waiting for a kidney. The EKDA aims to change that trajectory. Within ten years of its passage, up to 100,000 Americans could receive a life-saving living donor kidney which typically lasts twice as long as a deceased donor kidney. This would not only save lives but also save taxpayers up to $37 billion. The legislation has been reintroduced in the House, and we have a committed Republican Senate lead. Now, we need a Democratic Senator to co-lead and help move this bipartisan effort forward. Time is short, and we are racing to pass the bill this Congressional session. 36 organizations already support the EKDA. Join the movement and help end preventable kidney deaths. Visit EndKidneyDeaths.org to help us get to the finish line. Elaine and her org have been working extremely hard on this; you can read a Vox article on their campaign here. If you want to sign up for her email list and get updates any time there is a representative you can contact or meeting you can join in, go here. 43: Genetic Hack To Prevent Suffering In the estimate of multiple team members, the ACX grant was “worth it” - it likely had a counterfactual net positive impact, even though we had to pivot from our initial fast-track plans for developing the precision anti-suffering therapy. We identify three primary streams of value: a) reducing uncertainty in the emerging field through early exploratory research, helping with the identification of dead ends and promising R&D trajectories; b) a wide range of downstream effects (beyond the “raising awareness” cliché), including talent mobilization and rekindled interest in suffering abolitionism as a distinct cause area; and c) certain developments that cannot yet be publicly disclosed. In December 2024, Marcin Kowrygo (Acting CEO & volunteering contributor), David Pearce (Director of Bioethics), Aatu Koskensilta (President), and a few other team members decided to leave The Far Out Initiative. They look forward to collaborating and applying their experience to advance the suffering abolitionist lineage in the spirit of open science, public good, and thoughtfully decentralized governance. Feel free to reach out to us at suffab at protonmail dot com to discuss collaboration opportunities! I wrote a post profiling the Far Out Initiative here. Unfortunately there were some internal disagreements, and the people ACX Grants was closest to left the organization. I plan to continue to monitor whatever they do next. 44: Advocate For Pandemic Response Team At FDA This team prefers has asked me not to discuss their progress publicly, but you can probably guess what their lives are like right now, and your guess would be correct. 45: Anti-Mosquito Drones We developed a cheap sonar that is able to detect, track and classify the ultrasonic echoes of mosquito wings at more than three meters. I believe it’s a world first! We also have control algorithms that take the sonar data and output control commands that both ram into mosquitoes and avoid the walls of a simulated environment. Our current work is on integrating both components on a real drone, and we expect to be able to kill mosquitoes by June. We’ve also made an internal impact study (napkin-sized) that shows we’ll be more cost-effective than ITNs in urban to periurban environments. So, we’re super excited with what comes next and can’t wait to share the videos of our first interceptions! More information [in the video below] and on our website, https://tornyol.com 46: Tarbell Fellowship For AI Journalism No update received, but they have a public website. I can’t find the Voices program in particular, but the overall fellowship completed their first class of seven fellows and is working on their second. 47: Germicidal UV Lamp Study The research has successfully demonstrated the ability of off the shelf ozone scrubbers to mitigate the ozone production of far-UVC lamps, is now available as a preprint (https://chemrxiv.org/engage/chemrxiv/article-details/67e4cde76dde43c9084d88b7). The paper has been submitted for publication and is currently undergoing peer review. Any ideas you have for potential funders we can approach to help execute our six-year plan to accelerate far-UVC would be appreciated https://blueprintbiosecurity.org/introducing-project-air/ 48: Technological Solutions To Animal Welfare Challenges Directly because of Innovate Animal Ag's work, the first U.S. egg producer publicly announced in the New York Times their adoption of in-ovo sexing technology, eliminating the need to cull day-old male chicks. The initial in-ovo sexing machine began operating in the U.S. at the end of 2024, with the first eggs from these hens expected on shelves in mid-2025. External evaluations estimate our work accelerated U.S. adoption of this technology by over seven years, meaning that once fully implemented, more than 2 billion chicks will have been spared. In addition to continuing to support the rollout of in-ovo sexing in the US and globally, we're now exploring other technologies and paths to impact. Current promising projects include developing humane slaughter methods for fish and advocating for USDA approval of a poultry vaccine against bird flu. They add: If you ever meet folks that are interested animal welfare and are partial to more technocratic and practical solutions, please continue to pass them our way, or connect them directly to me. 49: Assurance Contract Website www.Spartacus.app is an ACX grantee that created a platform to help solve coordination and collective action problems. It enables the creation of campaigns that build critical mass through conditional commitments, which only activate when a sufficient number of people join, converting risk and uncertainty into a higher probability of successful outcomes. They are currently facilitating several projects that leverage conditional commitments, including a dominant assurance contract interface for fashion pop-ups, accelerating a community business association's membership drive, and helping an AI safety organization organize petitions and events, among others. They have pivoted from an emphasis on high-stakes coordination problems requiring anonymity (because they occur too infrequently) to a broader range of more common use cases and have successfully run small-scale campaigns, but are still working toward product-market fit. Despite resource constraints and split time commitments that have impeded faster progress, they remain dedicated to the project's growth and success. You can follow its progress on X or Substack, or email Jordan directly here. 50: Cause Prioritization @ Center For Exploratory Altruism Research Moderately good progress on a salt reduction policy advocacy project we funded; informal commitments have been made by the Ministry of Health, and we're awaiting the publication of a formal administrative order. The official description sounds maximally generic, but this is an EA charity with a broad mandate whose current thesis is that dietary guidelines in developing countries can have outsized effects in saving lives. They’re making some progress on a salt reduction campaign in a developing country they prefer not to name publicly. 51: Mark Webb Studying Land Reform The purpose of this project was to identify specific farmland that could be acquired and transferred to the farmers already working the land. This has been difficult to achieve. I have been able to connect with other charities and landless farmers, and was able to interview a number of people about what their situation looks like, as well as what it would look like to them personally if they owned, rather than rented, their farmland. All this was immensely helpful in pushing this long-term project forward, even if I was unable to identify a specific plot of land that could be used to try the experiment. I intend to continue this project. If you have any insights or connections, I am interested. 52: More AI Advocacy In Australia Good Ancestors is focused on AI safety policy in Australia. Middle powers might be a useful path to influence as the US and China focus on racing, rather than safety. The ACX grant helped us give testimony about AI safety to the Australian Senate alongside Google, Microsoft and Facebook (We were the only nonprofit to give oral evidence to the inquiry. We also engaged government on other AI-related issues, including cybersecurity, biosecurity, consumer law and automated decision making (https://www.goodancestors.org.au/ai-safety). We’re currently working to inform voters about where parties stand on AI safety for the election, ahead of engaging on a likely Australian AI Act in 2025 (https://www.australiansforaisafety.com.au/). This is the same Australian lobbying organization we founded in Year 1, after a change in name and leadership. I continue to be excited about AI safety in middle-tier countries for a few reasons. First, these countries have some power in international organizations to set international standards. Second, companies will usually comply with any not-excessively-burdensome regulation set by any country with a significant market. Third, AI safety is underfunded by the standard of government programs, so Australia setting up a national AI Safety Institute would significantly expand the field. It’s kind of crazy that ACX Grants tier levels of money can have significant effects at this scale, but GA continues to do a great job and we continue to be proud to support them. 53: Campus For African School Of Economics At Zanzibar Charter City The ACX grant helped launch the first research center at the African School of Economics-Zanzibar, which is a main anchor of the Fumba Town charter city project in Zanzibar. This research center is called the Africa Urban Lab (AUL), focused on rapid urbanization across Africa. The AUL launched its first Diploma program in Urban Development with 38 students in our first cohort (now graduated!), including mayors, and deputy mayor, a director of a national Ministry of urban development, and many others. We published our research framing papers for the AUL's research agenda. We raised funding to launch an Urban Expansion Program that's now selecting 15 African cities to support in implementing urban expansion planning on the urban periphery. We held two Public Talks by renowned cities scholars and practitioners. We received additional funding from Emergent Ventures and from the Templeton Foundation. And we've partnered with 8 universities across the region, and with one of these universities (Ardhi) we'll be working with them to update their urban planning and urban economics curriculum (amplifying AUL's impact beyond our own organization). A longer update from end of 2024 is here: https://www.aul.city/blog/reflecting-on-africa-urban-lab-s-inaugural-year-2024-highlights) 54: Online Training Program For Health Workers In Developing Countries To date, over 11,000 health workers in Nigeria have completed our course on basic, life-saving newborn care. ACX funding was catalytic for helping us secure government approvals and complete an evaluation of the impact of our training on health workers' clinical practices. The evaluation shows that birth attendants provide better birth care after taking the course. We fed the evaluation results into an updated model, which suggests the program is 24 times more cost-effective than direct cash transfers (a widely recognized benchmark for cost-effectiveness). The program is likely to become even more cost-effective as we scale up. https://healthlearn.org/blog/updated-impact-model 55: Smartphone Pupillometry To Diagnose Neurological Conditions We have continued to expand our work in the smartphone pupillometry space and the development of our application, PupilScreen (https://www.apertur.ai/). We have expanded our pilot/research program to include new sites across the United States (Missouri, New Jersey, Kentucky, USAC racing, PitFit driver performance training in Indiana) and the world (Nepal, Taiwan, South Africa). We continue to publish at the leading edge of the pupillometry literature as well looking at concussion (https://neuro.jmir.org/2024/1/e58398 and https://pubmed.ncbi.nlm.nih.gov/39682632/), cerebral vasospasm (https://pubmed.ncbi.nlm.nih.gov/39128501/), and stroke (https://pubmed.ncbi.nlm.nih.gov/39674431/ and https://pubmed.ncbi.nlm.nih.gov/39561861/). Currently, we are raising a $3 million seed round via a SAFE to fund the expansion of our work into the hands of healthcare workers and the general public. We will first focus on traumatic brain injury for clinical use and develop a neuro-monitoring wellness application utilizing our technology for the general public. They add: “We would welcome connections to anyone that you think might be interested in supporting our work further by investing in our $3M seed round of funding.” 56: Mike Saint-Antoine’s Biology Tutorial Videos Since getting the grant, I've continued to make Youtube tutorials as planned. One series that I'm especially proud of is about how to make a neural network in the Julia programming language completely from scratch, with no imports, up to the point of being able to solve MNIST (https://www.youtube.com/playlist?list=PLWVKUEZ25V97tNULapu07DhWv6_W4NfpE). Also, a college student in Pakistan came across my videos and invited me to give a virtual Zoom-lecture to her department, so I ended up teaching a 6-hour "Python-for-Biologists" workshop to more than a hundred college students in Pakistan over Zoom. So that was pretty awesome. Also, lately I've been teaching some in-person classes too, mostly at Fractal University in NYC, and I also recently organized a day-long, in-person Beginner Python class for people in my local area (Philly suburbs) who wanted to learn some basic programming. I'm having a lot of fun with this project, and am grateful to Scott and the grant funders for their generosity! 57: Conceptual Boundaries Workshop On AI Safety The workshop was completed successfully; you can read a writeup here. 58: Apart Research To Incubate AI Safety Scientists No update received, but they have a public website, and you can see their impact metrics here. They seem to be in urgent need of more funding. 59: Primer On How To Achieve Political Change No update received and I can’t find anything about this. 60: Research IVF Clinic Success Rates We've built a predictive model that estimates the odds of having a child at different IVF clinics across the country while controlling for factors like patient age and infertility differences that can falsely make some clinics look better than others. We found that an average patient can increase their odds of having a kid by 43% just by going to a top 10% clinic. Patients unlucky enough to go to a bottom 10% clinic will reduce their odds of having a kid by 40%. Next month, we're adding several more clinics, 2023 data, additional procedural controls, and donor/gestational carrier models, which should push our accuracy beyond state-of-the-art models in this space and better isolate clinic impact on patient outcomes. We've launched ivf.clinic, a website where patients can access personalized IVF reports and browse our clinic rankings (though we're still squashing some bugs). Currently, we're expanding our research to include comprehensive insurance coverage and pricing data across clinics nationwide. If anyone has insights on automating the collection of IVF clinic pricing information, I'd love to hear from you at scelarek@gmail.com. 61: Replicate Study On Brain Wave Synchronization For Speeding Learning We have acquired and configured the OpenBCI UltraCortex Mark IV 8-channel EEG headset and a clinical-grade Biosemi 32-channel EEG system. We’ve implemented the required components for the experimental pipeline (computing alpha from EEG, flashing bright white light, presenting stimulus images). We are currently putting them together into a single system that we’ll use to collect the data from several participants. We are aiming to gather data on several participants in late June / early July and complete the pilot of the replication in July 2025. If you’d like to be a participant in the study, [they might announce a link once they have it]. 62: Advocate Repeal Of Interstate Runaway Compact No update received and I can’t find anything about this. 63: Animal Welfare (Especially Fish) In Turkiye Future For Fish asks companies to sign up to FFF's fish welfare commitment, which requires producers to certify their facilities and enforce specific standards for stocking density and harvest. Luckyfish, İlknak, Divan (35 restaurants, 17 hotels) and NG Hotels (5 hotels) have signed and published FFF's fish welfare commitment with İlknak publishing the commitment on their website. Kılıç published its first sustainability report detailing fish welfare policies, including enforcing a maximum stocking density of 10 kg/m³ and confirmation of electrical stunning practices. Longer version with some caveats: https://manifund.org/projects/improving-fish-w From the longer document, these commitments involve things like reducing overcrowding, or stunning fish before killing them. Over 30 million fish were affected just from their single largest commitment, and they say 100 fish are helped per dollar spent. 64: More Georgism Advocacy Lars and Will used the 2021 grant to co-found ValueBase. Will remained with the company, and Lars left to do advocacy work at the Center For Land Economics. Here’s their summary of how things are going: [Our] organization transitioned leadership with Greg Miller, a former Program Analyst at the US Department of Housing and Urban Development, and Lars Doucet, author of Land is A Big Deal and Co-Founder of Valuebase, working full time and Joe Caissie stepping aside. This transition happened naturally as the next career transition for each respective person. Since then, progress has been made on pushing forward legislation. Maryland had two bills introduced to give Baltimore and counties the ability to enact split-rate taxes. One of the bills passed the state senate and would allow Baltimore to enact land value taxes within one mile of rail corridors–this contains 50% of Baltimore’s land value. However, the legislative session ended. We expect the bill to revive next session. The Center for Land Economics has been actively working to help efforts to get this bill passed the line. At the same time, we have uncovered systematic undervaluing of vacant land in assessments. We are writing a report on the assessment issues in Maryland with actionable steps to resolve them.
Inline links: Codebuff, website, Substack, survives, a public Substack, in Rwanda, Growth Summit, Stanford Social Innovation Review, ProMarket, Global Prosperity Institute, Saharon, this page, eadbatteries.substack.com, here, here, a post profiling the Far Out Initiative here, https://tornyol.com, a public website, https://chemrxiv.org/engage/chemrxiv/article-details/67e4cde76dde43c9084d88b7, https://blueprintbiosecurity.org/introducing-project-air/, our way, connect them directly to me, www.Spartacus.app, X, Substack, here, https://www.goodancestors.org.au/ai-safety, https://www.australiansforaisafety.com.au/, https://www.aul.city/blog/reflecting-on-africa-urban-lab-s-inaugural-year-2024-highlights, https://healthlearn.org/blog/updated-impact-model, https://www.youtube.com/playlist?list=PLWVKUEZ25V97tNULapu07DhWv6_W4NfpE, here, public website, here, in urgent need, https://manifund.org/projects/improving-fish-w
Minnesota and Virginia also have legislation to enable cities to implement land value taxes. We are monitoring these efforts. There are a few other cities we are operating in. We have helped another organization prepare for a meeting in Tennessee by doing impact analysis of land value taxes in the city. We have presented to city officials in the City of South Bend who have expressed support for land value taxes. Finally, we are in conversation with a State Senator in Colorado who is a champion of land value taxes. Meanwhile, we have soft launched and developed the OpenAVMKit, which uses a unified schema to do assessment accuracy reports and automated valuation methods for any property tax data given. Valuation of land is the key binding constraint to successful implementation of land value taxes. We plan to be the leaders in this space with strong benchmarking capabilities and a repo that can enable the open-source community to make the best automated valuation methods. Along with these efforts, we have expanded the movement. We have posted to the Progress and Poverty Substack growing the subscriber base to around 5,000 subscribers. We have spoken to over 25 local advocates interested in working on land value taxes in their local communities. Yet, there is a long way to go. We need to start earning income through technical assistance contracts as our grant funding expires. We need to continue pushing for a state to implement, and we need to be prepared to tell the success story for when they do. 65: EN’s Work On Bacteriophage Therapy Our project is aimed at pioneering phage therapy in Nigeria, where limited resources/infrastructure have historically held back research in this field. Starting from the ground up, we are establishing the foundational systems needed to support a robust phage research ecosystem. So far, we’ve isolated 34 bacteriophages targeting Pseudomonas aeruginosa, an essential step toward building a comprehensive phage bank. This began with collecting a wide range of clinical Pseudomonas isolates, which we are now characterizing alongside the phages through genome sequencing and phenotypic assays including studies on phage stability across pH, temperature, and salinity ranges. Our long-term goal is to develop a phage-based hydrogel for treating diabetic wounds. On the regulatory front, we have secured approval from the Attorney General to register our nonprofit organization, the Centre for Phage Biology and Therapeutics. Additionally, we’re expanding into vaccine development; following a research stay in Prof. Roderick's lab at the University of Waterloo, we have initiated the design of a phage-based universal Salmonella vaccine aimed at covering all major serotypes—an urgent need underscored by Africa’s reliance on external vaccine sources during the COVID-19 pandemic. I have signed an MTA agreement with Roderick to use his phage-based vaccine platform patents to enable us to design vaccines against any common disease affecting us. This is only the beginning, but we are proud to be laying the scientific and institutional groundwork for homegrown phage innovation in Africa. Emergent Ventures funded EN before we did and deserves a lot of credit here also. 66: Create An Artificial Kidney For an implantable artificial kidney, the first essential component is a hemofilter designed to emulate the glomerulus. Critical requirements for this hemofilter include high permeability (to maximize flow for a given area), selectivity (specifically, the retention of albumin), and robust blood compatibility (ensuring sustained function over time). Our initial strategy focused on using negative surface charge to reduce fouling. I began by testing polyelectrolyte (PE) coatings on 24nm pore membranes featuring a negative terminal charge, similar to the glomerular barrier. These initial static tests, assessing platelet adsorption in whole blood, yielded positive outcomes for some polyelectrolytes, indicating potentially desirable blood compatibility. However, static test setups are not truly representative of dynamic in-vitro conditions and don't provide data on key parameters like permeability, fouling progression, or changes in membrane selectivity. To address these limitations, I designed and built a blood filtration setup. This system sustains human whole blood in circulation for 20 minutes, allowing us to analyze all the aforementioned parameters, as well as platelet activation markers. This has resulted in a fairly high-throughput system for evaluating any surface coating. I'm pleased to report this setup has been accepted for presentation at this year's European Society for Artificial Organs (ESAIO) conference. I am also currently working on a full manuscript, as I believe this system offers a viable way to partially replace animal experiments in our early-stage research, requiring only 1.2ml of human blood per run. Working with a PhD student (hired to support both this research and work on membrane substrates), we have continued testing these PE coatings, alongside PEG coatings, on our membranes. Here, we're finding that optimization of the coating layer is crucial. With the current PE coatings, we observe a permeability drop of about an order of magnitude compared to the base membrane, making them unsuitable for an implantable device in their present form. This is likely due to the specific nature of the initial PE layer, which we can modify. We also suspect there may be ingress of PE into the pores, meaning we're not achieving just a surface coating (our goal), but rather a very thick coating, which would explain the flux loss. Optimizing the coating process to control penetration depth is now a primary focus of my ongoing work. I am currently aiming for a flux of 20ul/min (as this is cap introduced by the protein gel layer anyway) but for it to be at this 'steady state' permeability without drop in permeability. I am also imaging the membranes after contact with SEM to see if there is indeed any platelet adsorption etc. Tugrul has the dubious honor of maybe being "the only person to climb a 4000m peak with severe kidney failure". To raise money and awareness for his artificial kidney project, he is running Climb Against Time, where he will climb 41 mountains over 4000m (13000 ft) this summer. He is looking for donors and climbing partners. 67: Add Tardigrade Genes To Human Cells The goal of this one was to make hybrid cells that are more resilient for research and certain medical applications. They report: The grant was to synthesize vectors for the expression of humanized tardigrade proteins that can be targeted to different areas of the cell. All the vectors were designed, generated, and transposed into human cells. The proteins all localize successfully (e.g. they match the designed target), with one exception (we are still working on validating it). We've done some stress testing with the trangenic cells, but haven't reached firm conclusions yet. We've further generated some multigene designs but have not yet transposed them into cells, but should shortly. We're hoping to submit a manuscript on the first round later this year. 68: Teach Forecasting To EU Policy-Makers The original project didn't work out, but our grantee (who still prefers to remain anonymous) is now working with an EU think tank pursuing the same agenda, and has been teaching forecasting workshops to policy-makers for the past two months. 69: Platform For Single-Cell Imaging They ended up unable to accept this grant and returned the money. 70: Open Source Polygenic Predictor For EA/IQ They have an update here. They think they have a predictor that can explain 12% of variance in intelligence, and they’re working on validating it and creating an easy-to-use website. 71: Improve Flu Vaccines The grant mainly funded agent based modelling to demonstrate the benefit of pre-existing immunity to pandemic influenza if and when a future pandemic occurs (academic publication will result). The original proposal was to attempt to influence the WHO influenza strain selection process. After attending WHO meetings and a global influenza conference, I believe this is not feasible. Stakeholder feedback was the potential short term negative effect on vaccine hesitancy is believed to outweigh the less tangible future benefit. Given the conservative nature of decision makers, pandemic vaccines are likely to remain research only. There are still green shoots of research into pandemic preparedness/prevention that I am continuing to work on. I'm working under the "Australians for Pandemic Prevention" brand of Good Ancestors, another group that ACX funded in 2024. 72: Scenario Analysis For Developing World Agricultural Programs In addition to the research and analysis funded by the grant, I’ve learned to code with LLMs and have built an MVP of the project. The app is being considered for further development by staff at a large international organization. 73: Further C’s Political Career C’s political career is going well, but he continues to think it wouldn’t be strategic to give more information publicly at this time. Lessons Learned I'm most impressed with our lobbying/advocacy organizations. In particular, Good Ancestors has gotten the Australian government to sign onto an international AI safety declaration, partner with various x-risk-related organizations, and (possibly) extend charity tax deductions to some EA causes that previously didn't have it - I think this on its own goes a substantial way to paying back the cost of all ACX Grants. Coalition to Modify NOTA has a kidney donation bill in front of Congress that the (very illiquid) prediction markets give a 45% chance of passing; if it works, it could save thousands of lives. The Georgists are partly responsible for bills making land value taxes slightly easier to implement in a handful of states. Good Science Project seems to have significantly improved science. Are lobbying organizations a better bet than other types of nonprofit (within the constraints of ACX Grants)? I'm not sure. It could just be that lobbyists are (naturally) better at playing themselves up and sounding successful than (for example) scientists, or that politicians are good at people-pleasing and make people feel heard and encouraged in a way that might not change overall policy later. Also, I recently talked to some grantmakers who funded a lobbying organization that superficially seems excellent, but they expressed concern it was net negative (!) by taking away oxygen and spotlight from potentially more effective orgs. So I am encouraged but wary. Animal welfare organizations were another standout success. Again, I don't know how to think about this - while I think our grantees were exceptional, there's also an issue where the scale of animal welfare challenges is so great, and work on them so neglected, that lots of organizations can save a million chickens here, or a million fish there, without particularly making a splash. On the one hand, this is exactly what effective altruism should be doing - exploring grants that are very high in linear utility even if they don't feel satisfying. On the other, they're unsatisfying - and also hard to assess retroactively. How many chickens should a good animal welfare grant save? Any realistic number will both be overwhelmingly large in absolute terms and far too small in relative terms. I'm most ambivalent about our science grants. Many of them say they are successful and can point to published papers which explain the science they did. But it's hard to judge whether anything useful has changed based on the science getting done. I know it's important to fund basic research and not just last-mile technology startups, but it's hard for a mini-grants program like this one to evaluate these kinds of abstract interventions. One disappointing result was that grants to legibly-credentialled people operating in high-status ways usually did better than betting on small scrappy startups (whether companies or nonprofits). For example, Innovate Animal Ag was in many ways overdetermined as a grantee - former Yale grad and Google engineer founder, profiled in NYT, already funded by Open Philanthropy - and they in fact did amazing work. On the other hand, there were a lot of promising ACX community members with interesting ideas who were going to turn them into startups any day now, but who ended up kind of floundering (although this also describes Manifold, one of our standout successes). One thing I still don't understand is that Innovate Animal Ag seemed to genuinely need more funding despite being legibly great and high status - does this screen off a theoretical objection that they don't provide ACX Grants with as much counterfactual impact? Am I really just mad that it would be boring to give too many grants to obviously-good things that even moron could spot as promising? Someone (I think it might be Paul Graham) once said that they were always surprised how quickly destined-to-be-successful startup founders responded to emails - sometimes within a single-digit number of minutes regardless of time of day. I used to think of this as mysterious - some sort of psychological trait? Working with these grants has made me think of it as just a straightforward fact of life: some people operate an order of magnitude faster than others. The Manifold team created something like five different novel institutions in the amount of time it's taken some other grantees to figure out a business plan; I particularly remember one time when I needed something, sent out a request to talk about it with two or three different teams, and the Manifold team had fully created the thing and were pestering me to launch a trial version before some of the other people had even gotten back to me. I take no pleasure in reporting this - I sometimes take a week or two to answer emails, and all of the predictions about my personality that this implies would be correct - but it's increasingly something that I look for and respect. A lot of the most successful grants succeeded quickly, or at least were quick to get on a promising track. Since everything takes ten times longer than people expect, only someone who moves ten times faster than people expect can get things done in a reasonable amount of time. In almost every case where I thought to myself “this is a cool idea, but I don’t know how it’s going to really pay off, as opposed to reaching a cool intermediate accomplishment and then stagnating”, this was a correct criticism, and I should have taken it more seriously. But I can’t rule out that these were good in vague and hard-to-measure ways that I should take more seriously. This one is really self-serving, but in general when people were good communicators (or even bloggers) and wowed me with the writing-composition of their application, they turned out to be a good bet. And when people were hard to understand and annoying to communicate with, even if their ideas seemed good, they were less likely to pan out. Overall Thoughts The total cost of ACX Grants, both rounds, was about $3 million. Do these outcomes represent a successful use of that amount of money? Very naively, startups originating from ACX Grants have about $50 million in value1. If ACX Grants is equivalent to a pre-seed funder, and pre-seed funders usually get ~5%, then if we were VCs we would have a portfolio worth $2.5 million. About 1/5 of ACX Grants were attempting to be market-valued startups, so if we assume the charitable portion did about as well as the startup portion, then the charity portion is “worth” $10 million. There’s some reason to expect this is too high, since much of the startup value came from one successful outlier. But there’s another reason to expect this is too low, since we were aiming at charity rather than market cap, and any actual market cap that our grantees got was an unexpected side effect. I’m treating this as a sanity check rather than as a real number. It’s harder to produce Inside View estimates, because so many of the projects either produce vague deliverables (eg a white paper that might guide future action) or intermediate results only (eg getting a government to pass AI safety regulations is good, but can’t be considered an end result unless those regulations prevent the AI apocalypse). Because we tend towards incubating charities and funding research (rather than last-mile causes like buying bednets), achieved measurable deliverables are thin on the ground. But here are things that ACX grantees have already accomplished: Improved the living/slaughter conditions of 30 million fish.
Basilica: And the artillery! Imagine, Arundel, that you hear that Google has just offered a $1 billion a year salary to a new employee, a young woman from a small tribe in Africa who was illiterate until the age of fifteen.
Anti-amyloid drugs (like Aduhelm) don't reverse the disease, and only slow progression a relatively small amount. Opponents call the amyloid hypothesis zombie science, propped up only by pharmaceutical companies hoping to sell off a few more anti-amyloid me-too drugs before it collapses. Meanwhile, mainstream scientists . . . continue to believe it without really offering any public defense. Scott was so surprised by the size of the gap between official and unofficial opinion that he asked if someone from the orthodox camp would speak out in its favor. I am David Schneider-Joseph, an engineer formerly with SpaceX and Google, now working in AI safety. Alzheimer’s isn’t my field, but I got very interested in it, spent six months studying the literature, and came away believing the amyloid hypothesis was basically completely solid. I thought I’d share that understanding with current skeptics. The ATN model The most plausible variant of the amyloid hypothesis is the A → T → N model: amyloid causes tau causes neurodegeneration. 1: Amyloid The common entrypoint, typically at least 15 years before clinically detectable symptoms [1], is accumulation of amyloid-β deposits (especially Aβ42, one of several variants). Amyloid-β is a peptide produced in healthy human beings and many other animals, probably for antimicrobial purposes [2, 3]. Factors which cause overproduction of amyloid also cause Alzheimer’s. Factors that cause decreased clearance of amyloid also cause Alzheimer’s. The clearest relationship is various genes which massively increase amyloid production (while doing nothing else); these genes are Alzheimer’s risk factors, with some of the rarer and more severe ones causing extreme versions of the disease that manifest at otherwise almost-never-seen ages. One of the clearest examples is Down syndrome, which is caused by three (rather than the usual two) copies of chromosome 21. People with Down syndrome are at much higher risk of Alzheimer’s than the general population: two-thirds will have the condition by age sixty, and 15% have it by age forty. APP, the gene for the amyloid precursor protein, is on chromosome 21. This means that people with Down syndrome will have an extra copy. This extra copy has been observed to lead to higher-than-normal amyloid levels. But there are many genes on chromosome 21; do we have additional evidence that it’s the amyloid one that’s involved? Yes. Dozens of other mutations on APP cause the same sort of extremely young and severe Alzheimer’s. So do mutations on PSEN1 and 2, the genes for the enzyme that processes amyloid precursor protein into amyloid. So do mutations on several other amyloid-related genes. [6, 91 - 96] Researchers call these autosomal-dominant Alzheimer’s, meaning Alzheimer’s cases that get inherited from a single parent in a simple fashion typical of single-gene disorders. They make up about 1% of all cases, and are our strongest evidence for the causal role of amyloid in the disorder. To my knowledge, there is no serious claim that these genes could be working through any pathway other than their shared role in the amyloid system. But these autosomal-dominant cases only make up about 1% of all Alzheimer’s patients. Might they be a different disease than the usual sporadic Alzheimer’s that strikes people without strong family histories at normal ages? Probably not: the presentation and trajectory of autosomal-dominant and sporadic Alzheimer’s cases are strikingly similar. Both show an initial appearance of amyloid pathology starting in intrinsic connectivity networks in both autosomal-dominant [14] and sporadic [15–18] types, cortical tau appearing first in the medial temporal lobe and with the exact same fold in both disease types [97] (despite human tauopathies having at least seven other possible characteristic folds [36]), that tau pathology worsening and spreading outside this region only once amyloid pathology reaches sufficient severity [65], neurodegeneration progressing closely in step with the tau pathology, and the same usual approximate trajectory of cognitive symptoms due to the sequence of affected regions. So it’s as if two bank robberies occurred hours apart, in the same town, and in a highly similar and idiosyncratic manner, and we can positively identify the culprit of one on security camera footage. It’s a good bet the culprit of the other is the same. Increased amyloid production → Alzheimer’s is an especially clear and simple pathway, but any other change in amyloid can also cause the disease. For example Overproduction or reduced clearance of amyloid due to impaired slow wave sleep. Aβ production is neuronal activity-dependent, and toxins (perhaps including Aβ) are cleared from the brain during sleep via the glymphatic system. Thus Aβ can accumulate if the brain is more active and/or has less opportunity for clearance. [7, 8, 9, 10, 11]
Inline links: Aduhelm, David Schneider-Joseph
Overproduction or reduced clearance due to microbial infection. Amyloid-β appears to be an antimicrobial peptide and will form plaques in response to infection. [2, 3] This explains various observations that have been used to support the “infectious hypothesis”, sometimes proposed as an alternative to the amyloid hypothesis. However, it can only explain a subset of cases and, as I argue below, is even then still mediated by amyloid via an “IATN” pathway: infection → amyloid → tau → neurodegeneration.
Inline links: below
Overproduction or reduced clearance due to microbial infection. Amyloid-β appears to be an antimicrobial peptide and will form plaques in response to infection. [2, 3] This explains various observations that have been used to support the “infectious hypothesis”, sometimes proposed as an alternative to the amyloid hypothesis. However, it can only explain a subset of cases and, as I argue below, is even then still mediated by amyloid via an “IATN” pathway: infection → amyloid → tau → neurodegeneration. In cases of increased production, cerebrospinal fluid (CSF) will show elevated amyloid. In cases of reduced clearance, amyloid will decrease in CSF. In all cases, however, PET scans will show elevated brain amyloid, usually at first mainly in “intrinsic connectivity networks” such as the default mode network [14–20], which experience brain activity even at rest. These neurons are the most active - which causes more production and possibly less opportunity for clearance - so they tend to be the first to suffer from a production/clearance imbalance. Over time, amyloid pathology spreads spatially throughout the brain. [14, 18] Aggregations of amyloid peptides induce more such aggregations. Some of our clearest evidence for this comes from growth hormone deficiency patients, who used to have cadaver-derived ground-up brain matter injected into their own brains to provide the missing hormones. If the ground-up brain matter was sourced from the corpse of an Alzheimer’s patient, the growth hormone deficiency patients would themselves develop Alzheimer’s at a young age, probably through prion-like spread of the misfolded proteins. [21, 22] After ∼15 years of preclinical spread, the pathology eventually covers the whole brain. [14, 18] While some subtle cognitive impairment may occur during this time, it is usually not severe enough to be clinically detectable from amyloid alone. Indeed, in both humans [23–30] and mice [31–35], the severity of neurodegeneration and cognitive deficits is not a good spatiotemporal match for the severity of amyloid pathology (rather, it is a good match for the severity of tau pathology; see next section for more). These facts are often suggested as evidence against the amyloid hypothesis. However, amyloid is causally upstream of tau, as I will argue below. Therefore, the existence of cognitively normal individuals with amyloid pathology is expected in the ATN model - but typically only for a few decades, before progression to the next stage. 2: Tau pathology (T) and neurodegeneration (N) Tauopathies are a range of prion-like diseases involving the tau protein [36], whose usual function is to assist in stabilizing microtubule structure. In a tauopathy, the tau protein misfolds, and induces other, nearby tau proteins to misfold into the same shape. [37–46]. Injecting nothing but misfolded tau fibrils into a mouse brain can recruit the endogenously-produced mouse tau into this pathology, which spreads far beyond the injection site, causing neurodegeneration wherever it goes. [35, 47–59] There are at least eight distinct ways the tau protein can misfold in human disease [36], and over a dozen distinct human tauopathies, each involving a specific one of those misfoldings. These include chronic traumatic encephalopathy, Pick’s disease, corticobasal degeneration, progressive supranuclear palsy, and Alzheimer’s disease, with the last by far the most common. Each of these five diseases has its own distinct tau fold. Most normal human beings eventually develop some tau pathology in adulthood, originating probably in the locus coeruleus [60–62], which is part of the brainstem. By middle age, some amount has usually spread to the hippocampus and entorhinal cortex in the medial temporal lobe, regions responsible for episodic memory. This is called primary age-related tauopathy (PART) [63], and has its own tau fold which is distinct from most tauopathies, but the same as Alzheimer’s. [36, 64] Usually, its local severity is mild and it doesn’t spread much beyond those regions. But with sufficient amyloid pathology, this “normal” tau pathology tends to both locally worsen and spread through the rest of the brain [65], becoming the tau pathology of Alzheimer’s. Some genetic risk factors such as ApoE, in addition to affecting the clearance of amyloid-β, also increase the brain’s susceptibility to this A → T pathology conversion [66, 67]. But this is a matter of degree, as sufficient amyloid pathology seems to virtually guarantee the transition: Every 10-centiloid increase in amyloid pathology for a cognitively normal individual increases by 2.7x the probability of a PET scan detecting pathological levels of tau within five years [68]. The only known cases where patients with extremely high amyloid levels can go significant amounts of time without developing tau pathology are a few individuals with extremely rare protective genes, known only from a few case studies, e.g. [69]. Even in these instances, the individuals will eventually succumb to the tau phase, suffering neural atrophy and dementia. [70] After it forms, the tau pathology no longer appears to require amyloid’s assistance to keep spreading (although amyloid may still accelerate it). This probably explains why existing anti-amyloid therapies have been only ∼30% effective in test patients, who are usually late in the amyloid → tau progression even if early in having symptomatic disease. Neurodegeneration follows tau pathology extremely closely in time and space, in humans as well as basically all animal models, and cognitive impairments match the functions of the affected regions. There are rare reports of advanced tau pathology without cognitive decline, often in people with protective ApoE2 alleles [71], but even then, systematic analysis finds that actual density of tau inclusions is highly predictive of cognitive impairment, and that these exceptional cases usually involve widespread but locally sparse pathology [66]. The regional distribution of tau pathology explains why the first symptom of Alzheimer’s is typically impaired memory; the first cortical sites affected are usually in regions involved in memory formation. As the pathology spreads, further regions are affected, until eventually all cognitive functions are affected. As with most other aspects of the disease, the high-level picture seems relatively clear but the exact cellular and molecular pathways are not well understood (though may involve an assist from the innate immune system, especially microglia and astrocytes. [13, 35, 72]) Early Alzheimer mouse models were amyloid-only, with extremely heavy overproduction of Aβ, much more than required to recapitulate the human disease, and apparently enough to cause detectable cognitive dysfunction. However, normal mice do not get age-related tauopathy, so an amyloid-only mouse model - while useful for investigating certain questions - is not a full Alzheimer’s disease model. Combined amyloid+tau pathology mouse models, which are transgenically modified and/or injected with misfolded human tau fibrils, display the property that the presence of amyloid pathology induces the worsening and spreading of tau pathology. This is also observed in vitro in human cells. How do we know the amyloid causes the tau? Researchers have measured the correlation in many ways, from the spatiotemporal timeline (tau pathology only begins locally worsening and spreading outside the medial temporal lobe once amyloid reaches sufficient severity) [65], [98], to causal mediation modeling in the human disease [26], [99–101], to causal intervention using in vitro human cell studies [54, 102] and animal models [35, 55], [103 – 113]. But also, giving people drugs that reduce amyloid levels also decreases tau pathology. [78, 80, 82] (I’ve left out or merely alluded to much other complexity, involving the innate immune system, lipid processing, and detailed molecular and cellular mechanisms, preferring to focus on the parts of the story which are crucial to deciding the causal role of amyloid, and for which I am aware of a satisfactory account from the literature. But I don’t intend to leave the impression that the above is all there is to Alzheimer’s disease, or that all cases progress in the same exact way.) The mechanistic claims I make the following two claims about amyloid-β’s role in Alzheimer’s: Amyloid deposits are a necessary (i.e. but-for) cause in all instances of Alzheimer dementia. That is, if someone has PET or CSF positivity for amyloid and tau pathologies, and the tau pathology involves the Alzheimer tau fold and made its first cortical appearance in the medial temporal lobe, and then they developed medial temporal volume loss + amnestic mild cognitive impairment + later dementia, then counterfactually, early enough (probably ∼15 years before clinical presentation) causal intervention solely to remove the amyloid deposits would have prevented almost all tau pathology and symptoms.
Note: percentages are of total, not of each row! 29: Related: social science team proposes a three-stage model of secularization: decreased public ritual participation → decreased personal importance → decreased identification, presents apparently confirmatory data. If true, would be somewhat inconsistent with intellectual models (eg people learn about evolution and start doubting the Bible) and more consistent with institutional models (eg the government provides welfare so people no longer need to be part of a tight-knit church). 30: Navigating LLMs’ spiky intelligence profile is a constant source of delight; in any given area, it seems like almost a random draw whether they will be completely transformative or totally useless. Now Ethan Strauss reports that they are, for some reason, extraordinarily effective at teaching people golf. “I am predicting the Golf Revolution, or perhaps decline, if your perspective is that optimization tends to ruin hobbies. A sport for obsessives has been gifted the ideal tool for refinement.” 31: Claim (via nxthompson on X): “In a huge survey of young kids about phones and technology, they all say they want to be out playing in the real world. But parents don't let them out unsupervised. So they're stuck on their phones.” Interesting, but I’m nervous about social desirability bias - how many adults would say on a survey that they would rather be on their phones than playing with friends? But adults do have this choice and mostly go with the phones. 32: Steven Adler on AI psychosis. He tries to analyze ER admissions data for psychosis and finds no change. I don’t think anyone reasonable expected this to be a large enough effect to show up in ER admissions data, but there are lots of unreasonable people so I appreciate his effort. He thinks AI companies might have better data on this, and encourages them to release it. 33: Cuartetera was the greatest polo horse ever. Polo players responded in a very practical way: they cloned her, dozens of times (and it worked; the clones are also excellent). Now there is a lawsuit as different polo teams fight to get their hands on Cuartetera clones. What is the equilibrium? If the outsiders get their hands on the genetic material, do we see a world where every polo horse is a Cuartetera clone? How much is lost if nobody ever tries to breed a polo horse better than Cuartetera (since the economics might not check out if the odds of success for any given foal is too low)? H/T Gwern and Siberian Fox (on X). 34: Claim: as of 2013, India’s Agarwal caste, who make up less than 1% of the population, got 40% of the e-commerce funding. 35: Owlposting: What Happened To Pathology AI Companies? Pathology is a medical specialty. A typical task involves looking at a microscope slide full of cells and trying to determine if any of them are cancerous. This seems like a good match for AI - and for years, studies have been showing that in fact AI can equal human experts. So why isn’t it being used more? The author’s three answers: first, slide scanning is expensive and clunky, and you can’t apply AI to a slide until you digitize it. Second, it’s hard to figure out a business plan where this saves someone money and doesn’t step on the toes of big companies that can outcompete anyone they don’t like. Third, pathologists use the context of a patient’s entire clinical history when they interpret a slide, and AIs that can’t do that (either because of technical limitations or legal/privacy limitations) are at a disadvantage even if their skills specifically relating to slide-reading are better. 36: Noahpinion: Will Data Centers Crash The Economy? Suppose that AI is a bubble, either permanently (because the technology isn’t really transformative) or temporarily (because it can’t transform things quickly enough to keep up with all the dumb money pouring into it). Will the sudden write-off of data centers lead to a broader economic collapse? In 2001, the dot-com bubble harmed the tech sector, but didn’t take the rest of the economy down with it; in 2008, the subprime mortgage bubble did take the rest of the economy down with it, because it damaged banks that the whole economy relied on. The optimistic case for AI is that data center spending is mostly coming from big companies like Google and Meta that can absorb a lot of loss. The pessimistic case is that some of the money is coming from private credit, a new-ish form of finance which hasn’t really been stress-tested and whose failure modes are still poorly understood. Noah’s final verdict: the stage isn’t obviously set for a crisis yet, but there’s the potential to get there and we should consider acting (how?) early. 37: The latest Twitter talking point is that universal hepatitis B vaccination at birth is “woke”: Hep B is (aside from mother-to-child transmission) often sexually transmitted, slutty women’s children are more likely to have Hep B, so perhaps giving the vaccine to everyone (instead of testing and only giving to the children of women who test positive) is an attempt to spare slutty women the embarrassment of getting a positive test. Ruxandra Teslo provides the counterargument - Hep B tests take a while, the medical system is fragmented, and any attempt to test people and then give the vaccine inevitably leads to many positive tests falling through the cracks. Vaccinating at birth is easy and hard to screw up, the vaccine has no known side effects, and empirically child Hepatitis B rates go down (by as much as 2/3!) when countries switch from test-and-vaccinate to universal vaccination. This benefits everyone - even people who never have unprotected sex and always follow up on their medical tests - because toddlers in daycare exchange saliva copiously, and if your toddler exchanges saliva with a Hep B positive toddler they could get the disease. A funny Twitter interaction was seeing Republicans in Congress hop on the anti-slut anti-vaccination bandwagon - except for Senator Bill Cassidy (R-Louisiana), who happens to be a liver doctor, and who is still fighting the good fight. I am always nervous when a good person who I like starts engaging on Twitter, since it elevates the discourse there but also gradually turns their brain into mush - but Ruxandra has made the leap and is doing a great job not just on bio related topics but also (for example) countering Curtis Yarvin on the history of her native Romania. 38: The response to GPT-5 was confusing; most specific people who reviewed it said they were impressed (Ethan Mollick, Tyler Cowen, Nabeel Qureshi, Taelin), it performed as expected on formal benchmarks, but the overall vibes declared it a big failure. Peter Wildeford speculated that maybe there was some kind of sinister pay-to-play early access bias involved. Zvi went the other way, calling it a “reverse DeepSeek moment” (insofar as DeepSeek was a pretty average model that got glowing praise.) In the end, I agree with Peter that this was mostly a branding issue. o3 was a genuinely revolutionary model; if OpenAI had called it “GPT-5”, it would have met expectations. Instead, they called it “o3”, and called a minor incremental update a few months later “GPT-5”. Then people got mad that the exciting-sounding “GPT-5” was merely an incremental update. A secondary issue was that the router wasn’t very good, and so many queries got routed to a small version without thinking mode that was if anything a downgrade from o3. I think this tweet by Shakeel perfectly encapsulates the essence of GPT discourse in two sentences: …but maybe it’s worth asking why GPT-5 isn’t bigger than o3. Was 4.5 a failed attempt at scaling? Did it fail in a way that sort of back-handedly justifies the “lost steam” take? Does the answer depend on distinctions between pre-training scaling, post-training scaling, etc? How? 39: This month in etymology: did you know that “oy vey” is a “fully Germanic phrase” which is cognate with English “oh woe!” (h/t Wylfcen on X) 40: mRNA shows promise to be a game-changing treatment for cancer, but RFK is trying to halt research. But so far he can only starve it of money, not ban it, and the funding gap is only $500 million. Will there be enough philanthropic billionaires and private foundations to step up? Zvi points out that although there is usually a game of chicken where foundations are hesitant to touch something the government cancelled lest the government decide it can cancel everything and hope philanthropists pick up the bill, in this case there are no game theory considerations - RFK is halting it because he genuinely wants it halted, and they are thwarting him rather than playing into his hands. The only problem is that $500M is a lot of money for the private sector; a few foundations could technically afford it, but not many could afford it comfortably and still have money left over for the next few crises of this magnitude. I hope someone is trying to organize a coalition. 41: AI fantasy flash fiction Turing test. Eight stories about demons, four by famous fantasy authors, four by ChatGPT. After 3000 votes, AI wins: humans can't tell the difference and slightly prefer the AI stories. My own score was only 75%. But I will say that I thought Mark Lawrence's was obviously the best, I was ~100% sure it was human, and it convinced me that regardless of the official results it's still possible to write flash fiction that an AI obviously can't do. 42: “SignPro” offers customized “In This House We Believe” signs, try not to use this for evil. 43: China think tank assessment of how in control Xi is: still very in control, maybe not infinitely in control. 44: Related - did you know (h/t xlr8harder) that if you ask AI to write a science fiction story, it will very often name the protagonist “Elara Voss” (or some very close variant like Elena Voss), and this remains true across various models and versions? Related: Chelsea Voss of OpenAI is having a baby and has the opportunity to do the funniest thing. 45: “Hector (cloud) is a cumulonimbus thundercloud cluster that forms regularly nearly every afternoon on the Tiwi Islands in the Northern Territory of Australia…[he is sometimes called] Hector the Convector”. 46: British allergy sufferers who want to know the ingredients of things demand that British cosmetics stop listing their ingredients in Latin. “For example, sweet almond oil is Prunus Amygdalus Dulcis, peanut oil is Arachis Hypogaea, and wheat germ extract is Triticum Vulgare.” 47: Text-based RPG about being an NYT journalist at the Manifest prediction market conference. I make a brief appearance. 48: Study uses supposedly-random variation in doctor assignments to test whether the marginal mental health commitment is good or bad for patients, finds that it is quite bad. Freddie de Boer is violently skeptical (maybe literally so?) and makes some good points about how a single quasi-experimental study is never absolute proof. But I don’t think he quite justifies his opinion that the paper was irresponsible and should never have been published; it’s just a normal quasi-experimental study that we should nod and say “huh” at but not overweight as the culmination of all possible research that overcomes all possible priors. My prior is that the marginal commitment is pretty useless (many commitments are just “well, since this person arrived at our ED for some reason, it would look bad from a medico-legal perspective to just let them go, so let’s keep them a few days to evaluate” - and yeah, you should be upset about this) but I’m still surprised by how many outright negative (as opposed to zero) effects the researchers found. The strongest argument for negative effects is that it will make some people miss work and maybe lose their job. But this study found that commitment ~doubles the risk of near-term suicide (admittedly only from 1% to 2%), which would have been outside my confidence intervals for how bad it could be. I suspect confounding, but only on general principle, and I wouldn’t be too surprised either way. 49: This tweet is probably bait, but I found it a thought-provoking question: I think there’s a boring answer, where the law is more complex than just a single number and whatever kind of weird trafficking Epstein was doing is worse than whatever normal relationships these European laws are permitting. But assuming that there’s a substantive difference even after taking that into account, I think my answer is something like - we’ve got to divide kids from adults at some age, there’s a range of reasonable possible ages, we shouldn’t be too mad at other societies that choose different dividing lines within that range - but having decided upon the age, we’ve got to stick with it and take it seriously (in the sense of penalizing/shaming people who break it). This is more culturally relativist than I expected to find myself being, so good job to Richard for highlighting the apparent paradox. 50: Dilan Esper describes his experience as one of Hulk Hogan’s attorneys in the Gawker lawsuit (X). Parts I found interesting: none of the lawyers knew Thiel was funding the lawsuit; Gawker probably could have won if they had been slightly competent but kept "shooting themselves in the foot"; and Gawker probably could have won if they had just pixelated the private parts in the video. 51: Amazing concept and poems (link on X): I tried to see if AI could do this, and it did something that technically met the requirements but had zero artistic merit - using a lot of words like “nowhere” and “outside” in one, then separating them out to “no where” and “out side” in the other. I didn’t invest much energy in creating a clever prompt telling it not to do that, so feel free to report if you get better success. 52: New study claims consultants are actually good, at least for profits: "We find positive effects on labor productivity of 3.6% over five years, driven by modest employment reductions alongside stable or growing revenue" 53: A Polish team tries to test Peter Turchin’s equations for predicting political unrest on recent Polish history, has to make some changes but claims mostly positive results. 54: New big multi-author Substack, The Argument, trying to be a sort of center-left version of the model pioneered by The Free Press and other high-production-value ideological Substack properties. Excited to see Kelsey Piper is involved, and she starts off strong with a post on the latest round of First World basic income studies, which find few positive effects. This is surprising, because recipients didn’t waste the money on alcohol or gambling or anything - they paid down debt and got useful goods. Still, it didn’t even affect things that should have been obvious, like stress level. It’s not even clear that amounts of money large enough to help with rent made homeless people more likely to get houses! Matt Bruenig criticizes the article, accusing Kelsey’s studies of being downstream of Perry Preschool style dreams that exactly the right welfare program will have massively compounding effects that cut poverty out at the root and turn everyone into elite human capital; he thinks giving people money won’t do this, but it will increase equality and give the poor better lives. I assume he’s not a strong hereditarian, but his argument makes even more sense from that perspective, and I’ve certainly criticized dumb outcome measures like infant brain waves which we have only tenuous reasons to think are related to anything we care about. But Kelsey reasonably responds that the outcome measures she’s talking about include stress level and life satisfaction. To defuse this critique, Bruenig either has to argue that our construct “life satisfaction” doesn’t really measure whether someone’s life is satisfactory, or else claim that giving poor people satisfactory lives isn’t really what we’re going for - which I think would require more explanation on his part. There’s some further (impressively acrimonious) debate on X, but I don’t see anything that addresses my core concern. GiveDirectly, a charity involved in basic income experiments, has a presponse here; they say that some studies are positive, and that the ones that aren’t might have tried too little cash to matter, or been confounded by COVID making everything worse. They also point out that basic income is harder to study than traditional programs like giving people housing, because if you’re giving housing you can measure housing-related outcomes directly and have a pretty good chance of getting enough statistical power to find them, but since everyone spends cash on different things, the positive effects might be scattered across many different outcomes (and therefore too small to reach significance on each). Everyone involved in this debate wants to emphasize that the poor results are for First World studies only, and that studies continue to show large benefits to giving cash in the developing world. 55: Related: I was less impressed by The Argument’s first foray into housing policy, which follows an all-too-familiar pattern: Some people say they don’t like noise and disorder and try to make rules against it in their apartments.
Inline links: proposes a three-stage model of secularization, extraordinarily effective at teaching people golf, nxthompson on X, a huge survey, Steven Adler on AI psychosis, they cloned her, dozens of times, a lawsuit, Gwern, on X, got 40% of the e-commerce funding, What Happened To Pathology AI Companies?, Will Data Centers Crash The Economy?, Ruxandra Teslo provides the counterargument, and who is still fighting the good fight, countering Curtis Yarvin on the history of her native Romania, Ethan Mollick, Tyler Cowen, Nabeel Qureshi, Taelin, on formal benchmarks, speculated, a “reverse DeepSeek moment”, with Peter, this tweet by Shakeel, https://substackcdn.com/image/fetch/$s_!GJNZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba0d8cf-fab8-4370-bcad-df789e157fdc_591x402.png, Wylfcen on X, Zvi points out that, AI fantasy flash fiction Turing test, customized “In This House We Believe” signs, China think tank assessment of how in control Xi is, xlr8harder, Chelsea Voss of OpenAI is having a baby, Hector (cloud), demand that British cosmetics stop listing their ingredients in Latin, Text-based RPG about being an NYT journalist at the Manifest prediction market conference, finds that it is quite bad, violently skeptical, literally so?, This tweet, https://substackcdn.com/image/fetch/$s_!S9fU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa558c09b-7fb6-40a8-a8a0-27b658a2c876_576x687.png, describes his experience as one of Hulk Hogan’s attorneys in the Gawker lawsuit (X), link on X, https://substackcdn.com/image/fetch/$s_!zyh7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e9f0f6-d794-4ea2-b24b-5d4803bf28dc_590x478.png, New study claims consultants are actually good, tries to test Peter Turchin’s equations for predicting political unrest on recent Polish history, The Argument, a post on the latest round of First World basic income studies, criticizes the article, infant brain waves, debate on X, has a presponse here, first foray into housing policy
Compute: America is far ahead. We have better chips (thanks, NVIDIA) and can produce many more of them (thanks, TSMC). Our recent capex boom, where companies like Google and Microsoft spend hundreds of billions of dollars on data centers, has no Chinese equivalent. By the simplest measure - total FLOPs on each sides - we have 10x as much compute as China, and our advantage is growing every day. A 10x compute advantage corresponds to about a 1-2 year time advantage, or an 0.5 - 1 generation advantage (eg GPT-4 to GPT-5).
The biggest companies (eg OpenAI, Anthropic, Google) must disclose their model spec, ie the internal document saying what their models are vs. aren’t banned from doing.
Then they write about it in the New York Times and The New Yorker, and their readers - including the average people who take the consumer sentiment surveys - believe the economy is uniquely awful. This isn’t the same as saying “it’s all vibes, there’s no crisis”. The crisis is that young people who want to join the elite are being forced into places they can’t afford. Would-be financial elites must spend years of misery chasing a lottery ticket that might not pay off; would-be cultural elites face the same challenge, plus their economic situation may not improve even if they win the culturally-prestigious (but low-paying) positions they seek. A natural test for this hypothesis would be to check economic sentiment in Brooklyn vs. the rest of the country. But this wouldn’t necessarily work: the hypothesis predicts that malaise will spread from Brooklyn to everywhere else. More Work To Stay In The Same Place Brenda Boomer applied to a local business she liked at age 18. She got hired, worked her way up from the bottom, and by age 35 she was a regional manager making $50,000 per year. Martha Millennial lost her adolescence to endless lessons in Mandarin, water polo, and competitive debate, all intended to pad her college resume; her only break was the three months she spent building houses in Rwanda to establish her social justice credentials. She eventually got accepted to Penn and earned a 4.2 in her college classes, despite having to complete several of them remotely from the Google campus where she was doing a simultaneous internship. After graduation, she applied to twenty-eight grad schools but was rejected from all of them, so she instead got two half-time jobs, one as a waitress and one at a startup that pitched itself as “Uber for humidifiers”. The humidifier startup failed, reducing her equity to $0, but she had only been in it for networking anyway, and by attending industry conferences every weekend she had collected the right contacts to get a warm introduction to the vice-president of their biggest competitor, “Uber for dehumidifiers”. She joined the dehumidifier startup, rose to associate manager, bumped up against a local ceiling (“we don’t promote from inside”), and successfully got herself poached by an air purifier startup, where at age 35 she was a regional manager making $50,001 per year. Technically Martha did better than Brenda at the same age. But she might still yearn for simpler times. (source) (source) What causes this one? It must be something big: after all, we see the same trend in college admissions, job applications, and (really!) dating, where matches that used to happen naturally have turned to an endless grind through hundreds of rejections and near-misses. The most likely explanation is technology removing frictions: when it’s easy to apply en masse to every opportunity in the world, every opportunity in the world gets thousands of applicants. They search for the best based on formal qualifications, so the value of formal qualifications goes up, so there’s an increasing arms race to achieve them. The only problem with this theory is that it doesn’t entirely match people’s complaints. They don’t complain that it was too hard to achieve their success, they complain that they are not achieving success, or that it feels hopeless. Speculatively, maybe people complain that they are not getting the level of success they expected based on their qualifications. That is, the same average-talent person is getting the same average-salary job they would have forty years ago. But since they have a masters’ degree and five internships and 12,000 LinkedIn contacts, they expected to get a better-than-average job. When they don’t, it feels like success slipping away. Conclusion Until now, we’ve tried to take disillusioned young people at their word. If instead we lean towards the economists, what might be ruining the vibes? The obvious answer is increasing negative bias in the media. I didn’t expect that Googling “graph about how negative media is over time” would work. We really do live in an age of wonders (source). This measure likely underestimates the trend towards negativity, because it only tracks a specific basket of media outlets. But the change could also have included viewers shifting consumption from more mainstream outlets towards more conspiratorial ones, including social media and blogs. (my Substack is tagged Science, but I hear the real money is in the Health Politics tag, where top performers feature articles like The Great Alzheimers Scam And The Proven Cures They’ve Buried For Billions and Russian COVID Vaccines Caused Global Turbo Cancer Crisis) So, is that all there is? I think the strongest case for an economic crisis beyond vibes would be: Because of decreasing application friction, any given opportunity requires more effort to achieve than in earlier generations. Although this can’t lower the average society-wide success level (because there are still the same set of people competing for the same opportunities, so by definition average success will be the same), it can inflict deadweight loss on contenders and a subjective sense of underachievement.
Inline links: https://substackcdn.com/image/fetch/$s_!06v5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fd4d840-0d95-402a-81e6-ff444ef09316_848x480.jpeg, source, https://substackcdn.com/image/fetch/$s_!HlGa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c9d01fd-1834-41bd-abb7-5b935149bd17_726x227.png, source, https://substackcdn.com/image/fetch/$s_!Ll2f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0095e4e7-7e15-4bcf-b64a-534e75fe0f3d_1080x884.jpeg, The most likely explanation, https://substackcdn.com/image/fetch/$s_!N2xe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c6ff762-10f4-4022-a759-206d873c2ec1_1456x807.webp, source, The Great Alzheimers Scam And The Proven Cures They’ve Buried For Billions, Russian COVID Vaccines Caused Global Turbo Cancer Crisis
I’m looking for a strong software or ML engineer to cofound the world’s first ‘automation-first’ AI safety lab. As a founding member of the UK’s AI Safety Institute, I saw firsthand how organisational, engineering and research bottlenecks limit humanity’s ability to build the safety tooling we need. To keep pace with AI’s rapid capability advances, we’ll need to go all-in on augmenting safety research and engineering with AI. I’m betting that a different kind of organisation - lean, flexible, relentlessly focused on automation with AI agents - can capture these gains to build at scales that would have been unimaginable a few years ago. I’ve received a generous grant from ACX to build this full-time, starting with AI evaluations. If this is something you feel should exist (no AI safety background required), reach out here or via LinkedIn.
Some people have argued that you have to find a way to join an AI company, because AI company employees will form the new ruling class, with everyone else as serfs. I disagree. The main thing an AI company employee has that you don’t is AI company stock. But you can buy stock in Google, you may soon be able to buy stock in OpenAI and Anthropic, and even if not, you can get indirect exposure to these companies via stock in Amazon and Microsoft. I don’t recommend putting all your money in these stocks. But there’s no fundamental difference between a Google employee having 75% of their money in Google stock because they didn’t cash out their equity vs. you having 75% of your money in Google stock because you’re crazy and fail at diversification. So either put 75% of your money in Google stock or don’t (I recommend don’t), and don’t worry about how you need to join an AI company or be left out of the future oligarchy.
If America nation-builds Venezuela, for whatever definition of nation-build, will that work well, or backfire? Some of these are long-horizon, some are conditional, and some are hard to resolve. There are potential solutions to all these problems. But why worry about them when you can go to the moon on sports bets? Annals of The Rulescucks The new era of prediction markets has provided charming additions to the language, including “rulescuck” - someone who loses an otherwise-prescient bet based on technicalities of the resolution criteria. Resolution criteria are the small print explaining what counts as the prediction market topic “happening'“. For example, in the Khameini example above, Khameini qualifies as being “out of power” if: …he resigns, is detained, or otherwise loses his position or is prevented from fulfilling his duties as Supreme Leader of Iran within this market's timeframe. The primary resolution source for this market will be a consensus of credible reporting. You can imagine ways this definition departs from an exact common-sensical concept of “out of power” - for example, if Khameini gets stuck in an elevator for half an hour and misses a key meeting, does this count as him being “prevented from fulfilling his duties”? With thousands of markets getting resolved per month, chances are high that at least one will hinge upon one of these edge cases. Kalshi resolves markets by having a staff member with good judgment decide whether or not the situation satisfies the resolution criteria. Polymarket resolves markets by . . . oh man, how long do you have? There’s a cryptocurrency called UMA. UMA owners can stake it to vote on Polymarket resolutions in an associated contract called the UMA Oracle. Voters on the losing side get their cryptocurrency confiscated and given to the winners. This creates a Keynesian beauty contest, ie a situation where everyone tries to vote for the winning side. The most natural Schelling point is the side which is actually correct. If someone tries to attack the oracle by buying lots of UMA and voting for the wrong side, this incentivizes bystanders to come in and defend the oracle by voting for the right side, since (conditional on there being common knowledge that everyone will do this) that means they get free money at the attackers’ expense. But also, the UMA currency goes up in value if people trust the oracle and plan to use it more often, and it goes down if people think the oracle is useless and may soon get replaced by other systems. So regardless of their other incentives, everyone who owns the currency has an incentive to vote for the true answer so that people keep trusting the oracle. This system works most of the time, but tends towards so-called “oracle drama” where seemingly prosaic resolutions might lie at the end of a thrilling story of attacks, counterattacks, and escalations. Here are some of the most interesting alleged rulescuckings of 2026: Mr Ozi: Will Zelensky wear a suit? Ivan Cryptoslav calls this “the most infamous example in Polymarket history”. Ukraine’s president dresses mostly in military fatigues, vowing never to wear a suit until the war is over. As his sartorial notoriety spread, Polymarket traders bet over $100 million on the question of whether he would crack in any given month. At the Pope’s funeral, Zelensky showed up in a respectful-looking jacket which might or might not count. Most media organizations refused to describe it as a “suit”, so the decentralized oracle ruled against. But over the next few months, Zelensky continued to straddle the border of suithood, and the media eventually started using the word “suit” in their articles. This presented a quandary for the oracle, which was supposed to respect both the precedent of its past rulings, and the consensus of media organizations. Voters switched sides several times until finally settling on NO; true suit believers were unsatisfied with this decision. For what it’s worth, the Twitter menswear guy told Wired that “It meets the technical definition, [but] I would also recognize that most people would not think of that as a suit.” Domer: Will Ukraine agree to the US mineral deal? AFAICT, this is the only case where the oracle genuinely broke down (as opposed to a legitimate disagreement). In February, it looked like both America and Ukraine had agreed to a mineral deal, but the oracle considered the question and decided this didn’t count as a full agreement (and indeed, the apparent agreement then fell apart). In March, a cabal of YES holders tried again. They waited for a time when all Polymarket employees would be out of the office, and when not too many people would be voting on the decentralized resolution oracle, then spammed it with calls to resolve to YES based on an argument that the February agreement had qualified after all. The YES holders and not-particularly-plugged-in oracle voters pushed the vote towards YES. Then, with two minutes to spare, a Polymarket employee showed up and said that Polymarket’s opinion was that it should be NO. This was technically framed as a recommendation to oracle voters, but it is so effective in establishing the Schelling point that it’s practically always followed. However, in this case, there were only two minutes left, which wasn’t enough time for the voters to change their mind. Seeing that the resolution was trending towards yes, the Polymarket representatives, not wanting to break their streak of always establishing the Schelling point, changed their own opinion to YES, and the final vote was YES 99%. Domer: How many people watched the Oscars on 3/5/25?: Kalshi’s resolution criteria for this market said they would resolve it when a major news source published Oscar viewership numbers. A few minutes after the Oscars, NYT published preliminary viewership numbers, without any caveats saying they were preliminary. The next day, they published another article saying that actually, the real viewership numbers were higher. Kalshi decided that the letter of the resolution criteria was met when NYT published its first article, and that NYT changing its opinion didn’t imply that Kalshi should change the resolution. Traders who bet on the later (ie correct) numbers were unsatisfied with this decision. NYPost: Will America invade Venezuela? On January 3, the US bombed Venezuela, sent in a Special Forces team that successfully captured President Maduro, and announced that they would thenceforward “run the country” (a claim they later walked back). Does this qualify as an “invasion”? Polymarket’s resolution criteria defined “invasion” as “a military offensive intended to establish control over any portion of Venezuela”. It didn’t seem like the US was trying to establish control over Venezuelan territory, exactly, so they resolved NO. Traders who bet on YES were unsatisfied with this decision. With one exception, these aren’t outright oracle failures. They’re honest cases of ambiguous rules. Most of the links end with pleas for Polymarket to get better at clarifying rules. My perspective is that the few times I’ve talked to Polymarket people, I’ve begged them to implement various cool features, and they’ve always said “Nope, sorry, too busy figuring out ways to make rules clearer”. Prediction market people obsess over maximally finicky resolution criteria, but somehow it’s never enough - you just can’t specify every possible state of the world beforehand. The most interesting proposal I’ve seen in this space is to make LLMs do it; you can train them on good rulesets, and they’re tolerant enough of tedium to print out pages and pages of every possible edge case without going crazy. It’ll be fun the first time one of them hallucinates, though. …And Miscellaneous N’er-Do-Wells I include this section under protest. The media likes engaging with prediction markets through dramatic stories about insider trading and market manipulation. This is as useful as engaging with Waymo through stories about cats being run over. It doesn’t matter whether you can find one lurid example of something going wrong. What matters is the base rates, the consequences, and the alternatives. Polymarket resolves about a thousand markets a month, and Kalshi closer to five thousand. It’s no surprise that a few go wrong; it’s even less surprise that there are false accusations of a few going wrong. Still, I would be remiss to not mention this at all, so here are some of the more interesting stories: Fhantombets: Who will win the 2025 Nobel Peace Prize? Twelve hours before the announcement, someone placed a large Polymarket bet on Venezuelan opposition leader Maria Corina Machado, bringing her probability from 4% to 73%. When Machado later won, observers suspected insider trading. But an account named fhantombets claims to have interviewed the winning trader; although he did not reveal his exact strategy, the interview better matches a story where he was good at navigating WordPress directories, and found that the Nobel team put a draft of the announcement up early in a nonpublic part of their WordPress site. He won about $70,000. LuishXYZ: Will the Russians capture Myrnohrad? This is a small town in Ukraine that the Russians obviously were not going to capture; the Polymarket price trended toward zero. The resolution criteria named maps by the well-regarded Institute For The Study of War as canon. A few hours before resolution, ISW updated their maps to show the the town captured by Russia, which was definitely false. Polymarket resolved to YES, and the fictional Russian advance disappeared. The Institute then issued a statement saying the map update was “unapproved”, and fired one of its staffers who had presumably been involved. The cheater’s exact winnings are unknown, but based on the size of the market are probably mid-6-digits. TechCrunch: What words will be used in Coinbase’s earnings call? Coinbase CEO Brian Armstrong delivered the company’s “earnings call”, ie a speech to investors about its recent progress. At the end, he said “I've been tracking the prediction market about what Coinbase will say on their next earnings call, and I just want to add here the words Bitcoin, Ethereum, Blockchain, Staking, and Web3 to make sure we get those in before the end of the call”. Armstrong is worth $10 billion and doesn’t need to manipulate a $50,000 market for the money - he later described his comments as “trolling”. Other crypto executives condemned the move, with one saying that “you need your head examined if you think it’s cute or clever or savvy that the CEO of the biggest company in this industry openly manipulated a market.” I might need my head examined, because I think it’s at least kind of funny. Forbes: Who will rank highest on Google Search volume this year? A trader called AlphaRaccoon got 22/23 of these Polymarket questions right, and has a history of implausibly good performance on Google-related questions. They basically have to be a Google insider, but (since all of this is done through crypto) nobody has a good way to figure out who. They made $1 million. NPR: Will Maduro be captured? Just before the secret operation that captured Maduro, someone placed a mysterious $32,000 wager on YES. Was this insider trading by someone in the administration or military? Nobody knows, since the profits go to an anonymous crypto wallet. But the article mentions that the crypto wallet appears to be cashing out through regulated KYC-compliant US exchanges, which suggests they’re not very worried about their identity getting discovered. Maybe they just got lucky after all. AlanMCole: How long will Karoline Leavitt speak at the White House briefing? Karoline Leavitt is Trump’s press secretary. On January 7, she held an ordinary press briefing. Kalshi had its usual market about how long the briefing would last, divided into bins of greater than vs. less than 65 minutes. At the 64:24 mark, Leavitt ended the conference in what appeared to be a sudden manner, and the “less than 65 minutes” bin shot from 2% to 100%. A viral tweet convinced many people that Leavitt must have been insider trading, but Cole counterargued that Leavitt could only have won about $4,000 from the market, which probably isn’t enough to risk one’s job as White House Press Secretary. Sometimes people just end press conferences at weird times. Cole concluded: Now, some opinions and generalizations, as someone who looks at prediction markets plenty (I’ll probably write something about my own experience with them at some point.) 1. This market, like many of them, is pretty stupid. I like substantive markets; this isn’t substantive. 2. The major prediction markets have a wildly undisciplined comms strategy where any attention is good attention, and they love implying all sorts of crazy wild west stuff is going on to get attention. 3. People do bet on things potentially subject to manipulation or insider trading. But usually the markets like that (such as duration of press conference, or stupid “what will be mentioned” markets) are small, especially relative to the wealth of key decisionmakers. 4. Losers in markets are huge whiners, and the more frivolous and tiny their bets, the more likely they are to whine. Sometimes in sports it’s pretty egregious. They’ll get mad at a team for running out the clock when ahead but under some spread they bet on. 5. Lower-quality financial news often doesn’t pay much attention to quantity. (For example, dumb stories about how a decisionmaker has a conflict of interest because they’re invested in an index fund which is 3 percent comprised of some company.) 6. Given the platforms’ undisciplined social media strategy of “promote prediction market chatter no matter what kind of chatter it is,” I don’t think this tweet rises even to the status of “lower-quality financial news.” Kalshi’s team, whatever their faults, are extraordinarily efficient at getting batched approvals of many near-identical markets with slight parameter variation; I’ve seen Tarek speak about this on Odd Lots. The result is they’ve got TONS of them, for better or worse. You’re gonna see 1-in-100 upsets on tiny Kalshi markets for as long as this regulatory equilibrium holds, even if nothing unusual is going on, simply because they’re publishing hundreds (thousands?) of markets per day. There’s a saying that you can’t con an honest man. This isn’t exactly true. But it’s easier to con people who are playing in a “what words will Brian Armstrong say today” market than people who are trying to do something useful, and I have trouble feeling sorry for these people when Brian Armstrong says silly words. Conditional Markets: A Modest Proposal Conditional markets (“decision markets”) are the strongest case for prediction markets potentially being revolutionary. The idea is - you may want to base a decision (like which candidate to elect) on an outcome (like how they’ll affect the economy). So you make two markets: If the Democrat gets elected, will the economy be good four years later?
Inline links: UMA Oracle, Keynesian beauty contest, Will Zelensky wear a suit?, calls this, told Wired, Will Ukraine agree to the US mineral deal?, How many people watched the Oscars on 3/5/25?, another article, Will America invade Venezuela?, stories about cats, Who will win the 2025 Nobel Peace Prize?, claims, Will the Russians capture Myrnohrad?, a statement, What words will be used in Coinbase’s earnings call?, described, Who will rank highest on Google Search volume this year?, Will Maduro be captured?, How long will Karoline Leavitt speak at the White House briefing?
A California union has announced a campaign to force a 2026 ballot proposition that levies a “one time” wealth tax on billionaires; the mere threat of this tax has spooked several billionaires, including Google founders Larry Page and Sergey Brin, into leaving the state (the initiative would apply to anyone residing in California as of 1/1/2026, so there’s incentive for them to leave proactively). The markets above are the first attempts I’ve seen to estimate the chance of it actually passing.
Inline links: leaving the state
Polymarket has a few of these “who has the best AI when?” markets - resolution is usually position on the LMArena Leaderboard, which usually but not always mirrors common-sense consensus. I get more interested in these the further out they go, but the June version is bizarre (it doesn’t even list Google as an option), and there’s nothing past mid-year. Other implied claims from Polymarket’s tech section: only 44% chance Anthropic will still dominate coding by late March; Anthropic and (especially) OpenAI probably won’t IPO this year; xAI will call their next model Grok 4.20 (of course).
I have seen people try to walk this back by saying Adams only meant they would be persecuted in some way that was metaphorically equivalent to hunting, but I feel like “good chance you will be dead within the year” is saying he means the kind of hunting which literally kills you, and “police will stand down” means that it will be the sort of extremely illegal thing that police would normally react to. I have seen other people try to link this to examples of Republicans actually getting killed, such as Charlie Kirk. But Adams was telling his readers there was “a good chance” that “they” would be dead within a year, which I think implies this fate happening to a significant proportion of ordinary Republicans, not just one prominent person. Also, Kirk was five years after the comment was posted. Can we dismiss this as a joke? I think Adams has used the manipulation technique of saying things that might or might not be jokes and then strategically sticking to them or saying “What? Me? I was only joking! Haha! You can’t take a joke!” depending on which was more convenient to him at that exact second, enough times that I’m not comfortable letting him have that escape. Also, when I was replying to Joel Pollak about this, I happened to glance at his Twitter account, and one of the top tweets was a repost of someone saying that “The Democrat playbook is to arrest every single person who disagrees with them”. I think if I forced Pollak into some kind of extremely literal frame of mind - maybe asked him to bet money on whether I could tweet the words “the Democrats are wrong about immigration” in my Democrat-controlled state without getting arrested - he would admit that, okay, they don’t want to arrest literally every single person who disagrees with them. He was exaggerating for effect, probably in much the way he’s going to say that Scott Adams was exaggerating for effect. You say stuff like “The Democrats are going to HUNT YOU DOWN and LITERALLY MURDER YOU. They will TORTURE YOUR FAMILY and RAPE YOUR DAUGHTER and EAT YOUR PETS and TURN YOUR HOUSE INTO A CHURCH OF SATAN”, and what you mean is “I disagree with the Democrats and sometimes they go overboard cancelling people”. I have a post called If It’s Worth Your Time To Lie, It’s Worth My Time To Correct It. My thesis is that tolerating claims of “directional correctness” - the thing where someone asks to get a pass because even if they said wasn’t literally true, it “points to” an “emotionally correct” thing - is eventually totally corrosive. It means everyone ratchets up their claims to the highest level they think they can get away with (ie walk back later if challenged, as a motte and bailey). And then you end up with this miasma where maybe 5% of people totally believe you, and 50% of people sort of absorb the connotation and think something like that is true, and then people get terrified of the Democrats and think of them as monsters and treat politics as an existential struggle where they will genuinely get arrested or murdered unless they do it to the Democrats first, and then you get a civil war or something. I think Adams and Pollak’s milieu has in fact reached this point, and their love for these kinds of exaggerations is a big part of the cause. Adams was one of the funniest people in the world. If he was actually telling a joke, you could tell by the fact that you were laughing hysterically. “Democrats will hunt and kill you” isn’t funny. I’ll refrain from judgment about whether it was Adams’ sincerely held belief, some kind of annoying manipulation attempt, or whether Adams even recognized a difference between the two. But I think judging him on the fact that it didn’t happen is completely within bounds. … 3: Comments On The Substance Of The Piece … Zanzibar BuckBuck McFate writes: This business where boomers are tolerant of contradictions and find them amusing whereas millennials are horrified is a dynamic I've noticed as well, it seems to be true in politics also, I myself feel this hunger to be authentic all the time. I think it has something to do with the difficulty children have in putting negativity in context. They can't distinguish between a parent having a bad day and venting, or having an existential crisis. So the 50s guy was half right - you don't have to love your boss in your heart of hearts but careful what you say to your kids. Feral Finster writes: » “This is the basic engine of Dilbert: everyone is rewarded in exact inverse proportion to their virtue. Dilbert and Alice are brilliant and hard-working, so they get crumbs. Wally is brilliant but lazy, so he at least enjoys a fool’s paradise of endless coffee and donuts while his co-workers clean up his messes. The P.H.B. is neither smart nor industrious, so he is forever on top, reaping the rewards of everyone else’s toil. Dogbert, an inveterate scammer with a passing resemblance to various trickster deities, makes out best of all.” Compare with the famous observation that executives are sociopaths, management are clueless, and the workers losers. Yeah, it’s interesting to compare Rao and Adams. Rao formulated his Gervais Principle as a specific response to Adams’ Dilbert Principle, which I guess means Rao thought Adams got it wrong. Did he? The Pointy Haired Boss seems to go back and forth between Clueless and Sociopath, which is probably why Rao thought Adams’ work fell short. Dogbert is clearly Sociopath, but has no permanent role in the corporation, and doesn’t really represent a real thing you can be - his character was a ridiculous scammer who succeeded at near-impossible endeavours (like convincing people he was a Nostradamus-style mystical prophet) because the logic of the strip demanded it. Later, Adams foregrounded the CEO character more, maybe to create a purer Sociopath, letting the Boss go closer to Clueless. This is making me somewhat regret accusing Adams of wanting to be the Pointy-Haired Boss. It would have been fairer (and less of an accusation/surprise) to accuse him of wanting to be Dogbert. But again, Dogbert doesn’t represent a real thing you could be, which might have been why the PHB made a better metaphor. (contra my claim, the cover of Win Bigly shows a mashup of Dogbert and Trump. Fine, Dogbert is a thing one person can be.) You can read my full review of The Gervais Principle here. cincilator writes: Scott Alexander, former tribune of nerds now says that the sneerclub was right about everything all along? I didn’t expect that, let me tell you. Several people interpreted me as attacking nerds. I disagree - I think I was attacking self-hating nerds, because nerdiness is fine and you shouldn’t have to hate yourself for it. To spell it out more explicitly: All nerds must eventually realize they’re not going to immediately dominate everything by intellect alone. This isn’t because intellect isn’t great, it’s because 1) it’s only one of many skills, and 2) you probably aren’t even the person with the most intellect. Again, every mildly-talented person has to face this realization, whether it’s a nerd realizing he won’t be the next Einstein or a jock realizing he won’t be the next LeBron. If someone deals with this using denial (one of Freud’s maladaptive defenses), you get the nerd who says no, I really am the next Einstein, ie a crackpot, aka the sort of person who gets featured on Sneerclub. If they deal with it using reaction formation (another of Freud’s maladaptive defenses), you get the self-hating nerd, aka the sort of person who joins Sneerclub4. If they just deal with it maturely instead of spinning up maladaptive defenses against it, they’re a nerd who is hopefully good-natured and accepting of their nerdiness, and hopefully does some good work in some specific small area, and changes the world in some specific small way (or some very large way, if they can work together with other people and get lucky). Bugmaster writes: I think Adams is basically correct. Yes, facts and evidence do exist and are real; but they have virtually no impact on anything socially important -- i.e., on anything important whatsoever. Memes and charisma and persuasion are what matters if you want to achieve life goals that extend beyound yourself and your immediate family. I worry that Adams (and you) are doing something where unless the average person can solve every problem by facts and intelligence alone, then facts+intelligence lose and memes and persuasion win. But the average person also can’t solve every problem by memes+persuasion alone! If Dilbert is an 80th percentile nerd, the 80th percentile persuader is - I don’t know, a used-car salesman? Dilbert’s probably earning more money, especially nowadays when he could make L5 at Google. And if Donald Trump is a 99.9999th percentile persuader, the 99.9999th percentile nerd is Ilya Sutskever. Probably most people would slightly prefer being Trump to Sutskever, but Sutksever does have a couple billion dollars, plus the more ethereal rewards of genius; it still seems like a pretty good deal. I also think you’re doing a sort of black-and-white thinking here. Every day, great persuaders like Sam Bankman-Fried and Elizabeth Holmes end up in jail, because in fact the things that they said were true were not true. Every day, smooth-talking charismatic manipulators successfully seduce the girl into bed with them, then totally fail to turn it into a happy stable marriage, because after a few years even the dumbest woman catches on and figures out whether her mate provides real value or not. Even Donald Trump has only a 37% approval rating, because he can’t make “we should alienate our allies over Greenland” sound plausible to most of the American people. When someone’s very good at it, persuasion sometimes helps them blur facts around the edges. But that’s it. Nobody except Scott Adams and a few psychotherapists ever go to hypnotist school. Most don’t even go to any formal persuasion classes. That’s because hypnotism/persuasion isn’t really a lifehack that helps you win all the time at everything. If the world’s best hypnotist asked a room of VCs for money with a stupid business plan, he would probably fail. This isn’t to say persuasion is useless, and in certain fields it can be very powerful indeed. But let’s not go crazy and start worshipping it. The grass is always greener on the other side. The nerd sits in his cubicle and thinks “If only I were more charismatic.” But the salesman with the bright teeth and the firm handshake thinks “Man, I bet I could get out of this dead-end job if only I were smarter.”5 … 4: The Part On Race And Cancellation (INCLUDED UNDER PROTEST) … Ilya Lozovsky writes: Ninety percent of this essay is brilliant — smarter and realer than anything anyone else has written about Adams — but the end lost me. It's too generous, to the point of being a whitewash. Adams was vicious and hateful and played a material role in convincing Americans to vote for actual fascism. I don't think it's right to "hand it to him." JJ McCullough (JJM’s Shortstack) writes: Good essay, but I think you kinda yadda-yadda'd away his racist rant, which was extremely explicit and extended. I think it was the opposite of a "bog-standard cancellation," which we think of as being a slightly unfair, overzealous policing of an at least slightly subjectively offensive comment, often from years ago. But Scott went on quite a long diatribe about why black people, as a group, are dangerous and undesirable to be around, and why he, personally, goes out of his way to avoid them. Some conservatives have tried to use "bog-standard" anti-woke logic in defending him, but no, his comments really are quite explicitly and undeniably racist, if that term has any useful definition at all. Alex Wotbot writes: Now, you quoted Adams saying: “the best advice I would give to white people is to get the hell away from black people; just get the fuck away” If this was the intended point, does it really make sense that only the far-left freaked out? It’s kind of important to mention this was within a hypothetical. Suppose a survey reported that 26% of a population believes “The phrase ‘It’s OK to be blonde’ is hate speech” and another 21% weren’t sure if they agree with the statement or not. Now suppose you were blonde, would you hang around that population? Now go read the February 2022 Rasmussen Reports survey. Please do better than this, I don’t want to have to Gell-Mann memoryhole this. Many people had strong opinions on this, so I have to respond to it. But first, I want to make it extra clear in capital letters: I AM DOING THIS IN THE COMMENTS POST, TO RESPOND TO YOUR COMMENTS, AND NOT BECAUSE I THINK IT IS THE MOST IMPORTANT THING. Certain people screenshotted the one paragraph of my ten thousand word essay that discussed this and posted it on Twitter, in order to make it look like I was joining in some kind of chorus of liberals reducing Adams to his worst moment. I posted what I thought was a no-nonsense, factual description of what happened, in order not to be accused of hiding it or covering it up. It was the least important part of my essay, I’m aware that writing about it at all opens me to attack from both sides, and I discuss it here only to respond to all of you who wanted to know my opinion on it. Just don’t screenshot it on Twitter and say “LOOK SCOTT IS STILL HARPING ON THE RACE THING”, that’s all I’m asking. That having been said… To make sure we’re all on the same page - Adams’ comments were prompted by this poll, conducted February 2023. The question was: “Do you agree or disagree with this statement: ‘It’s OK to be white’” Among blacks, 53% agreed, 26% disagreed, and 21% were “not sure”. Among whites, the numbers were 81/7/13. Here’s the video of Adams’ comments: Transcript: If nearly half of all blacks are not okay with white people - according to this poll, not according to me - that’s a hate group. And I don’t want to have anything to do with them. And I would say, based on the current way things are going, the best advice I would give to white people is to get the hell away from black people. Just get the f**k away. Wherever you have to go. Just get away. Cause there’s no fixing this. This can’t be fixed. You just have to escape. That’s what I did. I went to a neighborhood with a very low black population. Because unfortunately, there’s a high correlation between the density - this is according to Don Lemon, here I’m just quoting Don Lemon, who said when he lived in a mostly black neighborhood, there were a bunch of problems he didn’t see in white neighborhoods. So even Don Lemon sees a big difference, for your quality of living, based on where you live and who’s there. So I think it makes no sense whatsoever as a white citizen of America to try to help black citizens anymore. It doesn’t make sense. Because there’s no longer a rational impulse. And so I’m… I’m gonna, uh, I’m gonna back off from being helpful to black America, because it doesn’t seem like it pays off. Like I’ve been doing it all my life, and I’ve been… the only outcome is I get called a racist. That’s the only outcome. [cackles] It makes no sense to help black Americans if you’re white… it’s over. Don’t even think it’s worth trying. Totally not trying. Is this racist? I have a piece called Against Murderism, where I talk about why it’s so hard for people to agree on questions about “racism”. The summary: although it would be possible to have someone be purely, axiomatically racist - having it be a premise of their reasoning that they hate black people - in practice few people are like this. More typically, people have some argument more like: I don’t like [specific bad thing]
Inline links: https://substackcdn.com/image/fetch/$s_!KA_U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2cb6162-0262-4c4c-8048-e8355f38967a_900x280.gif, If It’s Worth Your Time To Lie, It’s Worth My Time To Correct It, writes, writes, the famous observation that executives are sociopaths, management are clueless, and the workers losers, The Gervais Principle, writes, 4, writes, 5, writes, JJM’s Shortstack, writes, writes, this poll, Against Murderism
The right difficulty level is “too hard to Google immediately, but not so hard that it’s beyond the frontier of human knowledge”. Questions where you could figure out the answer through an hour of Google searches, collating various different sources, and doing math on a spreadsheet are at the sweet spot.
The theory is that AI skeptics won’t pay (because they don’t think it’s capable enough to be worth it) and then never learn the full capabilities (because they won’t pay for them). Then they get their impressions about AI entirely from the Google result summary bot or Twitter screenshots of the most embarrassing mistake an AI has made that week. Let’s test this! Reply to this post with a question. I’ll ask Claude 4.6 Opus, the most capable paid-tier AI model currently available, and you can tell me whether you’re surprised by the answer or not. Suggestions for you: Consider asking a real question you’re interested in, rather than an annoying gotcha question to trick the AI.
But since AI is a strategically important technology, doesn’t that turn this into a national security issue? It might if there weren’t other AI companies, but there are. Why is Hegseth throwing a hissy fit instead of switching to an Anthropic competitor, like OpenAI or GoogleDeepMind5? I’ve heard it’s because Anthropic is the only company currently integrated into classified systems (a legacy of their earlier contract with Palantir) and it would be annoying to integrate another company’s product. Faced with doing this annoying thing, Hegseth got a bruised ego from someone refusing to comply with his orders, and decided to turn this into a clash of personalities so he could feel in control. He should just do the annoying thing.
Inline links: 5
If you’re so smart, what’s your preferred solution? In an ideal world, the Pentagon backs off from its desire to mass surveil American citizens. In the real world, the Pentagon cancels its contract with Anthropic, pays whatever its normal contract cancellation damages are, learns an important lesson about negotiating things beforehand next time, and replaces them with OpenAI or Google, accepting the minor annoyance of getting them connected to the classified systems. If OpenAI and Google are also unwilling to participate in this, they use Grok. If they’re unhappy with having use an inferior technology, they think hard about why no intelligent people capable of making good products are willing to work with them.
Superforecaster Nuño Sempere, maybe as part of his work with Sentinel. He seems to think higher chance of supply chain risk than others, but that supply chain risk might be handled in a way that only affects DoD contracts themselves, which wouldn’t be so bad. I haven’t heard anyone else make this distinction. Tweet here, full document here. And big praise to most other AI companies, including Anthropic’s competitors, for standing up for them and for the AI industry more broadly:
Framed this way, the Pentagon’s actions sound devastating. Anthropic relies on compute to train and run its AIs. Most of this compute is in data centers owned by Amazon, Google, and Microsoft. At least Amazon and Microsoft have contracts with the US military. If they had to drop Anthropic, it would make it impossible for the company to stay a frontier AI lab.
The lawyers who weighed in seem to think that Anthropic’s interpretation of the law is correct, and Secretary Hegseth’s interpretation confused. In some situations, this might be cold comfort - how much does it help to be right about the law when the government is wrong? But in this case, it probably helps a lot. Amazon, Google, and Microsoft are all big Anthropic investors - each owns about a 10% stake - and have multi-billion dollar AI compute contracts. Together, the three tech giants must have at least $100 billion riding on Anthropic’s success. They also have good administration connections and great lobbyists, and even Hegseth isn’t stupid enough to pick fights with them all at once. So probably they send their lobbyists to have a talk with Hegseth about what the “supply chain risk” designation actually entails, Hegseth enforces the letter of the law, and Anthropic is barely affected. At least this is the story the prediction markets are going with:
It appears to value company stakes by voting rights rather than ownership, so a typical founder who maintains control of their company despite dilution might see themselves taxed for more than they have. Garry Tan explains the math here with reference to Google. However, Current Affairs has a good article (?!) that pushes back, saying the proposal exempts public companies like Google. Although private companies would still be affected, this would be so obviously unfair that founders would easily win an exemption based on a provision allowing them to appeal nonsensical results. Still, some might counterobject that proposed legislation is generally supposed to be good, rather than so bad that its victims will easily win on appeal.
Inline links: explains the math here, Current Affairs has a good article
Backlinks
- 1984
- 538
- 80,000 Hours
- A Cyclic Theory Of Subcultures
- ACX Grant
- ACX Grant
- ACX Grants ++: The First Half
- ACX Grants ++: The Second Half
- ACX Grants 1-3 Year Updates
- Afghanistan
- AGI
- Agincourt
- AI Circle
- AI Safety
- Ajeya
- Akhenaten
- Al Franken
- Alex Poterack
- Alexander the Great
- AlphaGo
- AlphaGo
- alt-right
- AMA (Ask Machines Anything)
- Amazon
- Amazon
- Another Bay Area House Party
- Anthropic
- Apple
- Art Deco
- Zvi on California’s AI Bill
- Astral Codex Ten
- Augur
- Balaji Srinivasan
- Bay Area
- Bayes For Everyone
- Benjamin Franklin
- Bible
- Billionaires, Surplus, And Replaceability
- Bing
- Biological Anchors: A Trick That Might Or Might Not Work
- biorxiv
- Bitcoin
- Blackrock
- BLM
- BLM
- blockchain
- Bob
- Book Review Contest
- Books: B
- Books: O
- Books: T
- Brands
- Bride Of Bay Area House Party
- Broadway
- BTC
- Canadian government
- Carbon Costs Quantified
- Carl
- Chad
- Chapo Trap House
- Charles
- ChatGPT
- Chicago Principles
- Taiwan conflict
- Chinese government
- Claude
- Claude
- clinicaltrials.gov
- Cognitive behavioral therapy
- Concepts: A
- Concepts: B
- Concepts: C
- Concepts: D
- Concepts: E
- Concepts: F
- Concepts: G
- Concepts: H
- Concepts: I
- Concepts: L
- Concepts: M
- Concepts: N
- Concepts: O
- Concepts: P
- Concepts: R
- Concepts: S
- Concepts: T
- Concepts: Y
- Conservatives
- Constantinople
- cryptocurrency
- Current Affairs
- DALL-E
- Dan Hendrycks
- Daniel Kokotajlo
- David Chapman
- Dean Ball
- DeepMind
- Dictator Book Club: Xi Jinping
- DoD
- Donald Trump
- Dow
- Dutch
- Earth
- Economist
- Effective Altruist Forum
- effective altruists
- Egypt
- Eliezer
- Elizabeth
- Elon Musk
- Emil Kierkegaard
- Emil Kirkegaard
- Epstein
- Ethan Strauss
- Ethereum
- Ethereum
- European Union
- Even More Bay Area House Party
- Every Bay Area House Party
- Every Flashing Element On Your Site Alienates And Enrages Users
- Exxon
- F-35
- Federal Reserve
- feminism
- First Amendment
- Fortune
- Francis Fukuyama
- Freddie de Boer
- French Revolution
- FTX
- Futuur
- Galton
- Gamestop
- Garry Tan
- Gary Marcus
- GDP
- Geoffrey Hinton
- GitHub
- GK Chesterton
- GMail
- Goldman Sachs
- Google Trends
- Governor Newsom
- GPT
- GPT
- GPT
- GPT-2
- GPT-3
- GPT-4
- GPT-4
- GPT-5
- GPT-6
- GPT-7
- Grading My 2018 Predictions For 2023
- Grading My 2021 Predictions
- Greta Thunberg
- Gwern
- Half An Hour Before Dawn In San Francisco
- Hanson
- Harvard
- Hegseth
- Henry V
- Highlights From The Comments On “The Origin Of Woke”
- Highlights From The Comments On Great Families
- Highlights From The Comments On Housing Density And Prices
- Highlights From The Comments On IRBs
- Highlights From The Comments On Justice Creep
- Highlights From The Comments On Kidney Donation
- Highlights From The Comments On Modern Architecture
- Highlights From The Comments On Orban
- Highlights From The Comments On Scott Adams
- Holden Karnofsky
- House
- Huawei
- humans
- Hungarians
- Huns
- Hypergamy: Much More Than You Wanted To Know
- I Won My Three Year AI Progress Bet In Three Months
- Ice Age
- In Defense Of The Amyloid Hypothesis
- In The Long Run, We’re All Dad
- Intellectual Dark Web
- Iowa
- Iran
- Palestine conflict
- Jacobin
- Jeremiah Johnson
- Jesus
- John Burn-Murdoch
- John McCain
- John Roberts
- Jonathan Haidt
- Jordan Peterson
- Katja Grace
- Kazakhstan
- ketamine
- Khameini
- King
- Kingdom of France
- Koran
- Left
- libertarianism
- Links For August 2023
- Links For February
- Links for July 2024
- Links for May 2024
- Links For September 2022
- Links For September 2023
- Links For September 2025
- Lorien
- Maduro
- Magyars
- Mandela Effect
- 24
- 23
- Mantic Monday: Dogs In Wizard Hats
- Mantic Monday: Groundhog Day
- Mantic Monday: Let Me Google That For You
- Mantic Monday: Predictions For 2021
- Mantic Monday: The Monkey’s Paw Curls
- Mao
- Mark Twain
- Mark Zuckerberg
- Martin Blank
- Meetups Everywhere Spring 2025: Times & Places
- Meta
- METR
- Michael Watts
- Michael Wiebe
- Microsoft
- MidJourney
- Milan
- Miranda
- MIRI
- Modi
- Mongolia
- Mormon
- Most Mentioned Entities
- Mostly Skeptical Thoughts On The Chatbot Propaganda Apocalypse
- MTA
- My Presidential Platform
- Nancy Pelosi
- Napoleon
- Netanyahu
- New York Magazine
- Nick Fuentes
- NIMBYism
- NIMBYs
- Nirvana
- NIST
- Noah Smith
- Nobel prize
- north Africa
- North Korea
- NYT
- Obamacare
- Omicron
- Ontology Of Psychiatric Conditions: Tradeoffs And Failures
- Open Thread 187
- Open Thread 198
- Open Thread 230
- Open Thread 242
- Open Thread 415
- OpenAI
- OpenAI Nonprofit Buyout: Much More Than You Wanted To Know
- OpenAI’s “Planning For AGI And Beyond”
- Organizations: 0-9
- Organizations: A
- Organizations: B
- Organizations: C
- Organizations: D
- Organizations: E
- Organizations: F
- Organizations: G
- Organizations: H
- Organizations: I
- Organizations: M
- Organizations: N
- Organizations: O
- Organizations: S
- Organizations: U
- Organizations: W
- Origins Of Woke
- Pakistan
- Pandagon
- Patri Friedman
- Peer Review Request: Depression
- Pentagon
- Pentagon
- People: A
- People: B
- People: C
- People: D
- People: E
- People: G
- People: H
- People: J
- People: K
- People: M
- People: P
- People: R
- People: S
- People: V
- People: Z
- Pepsi
- Peter Turchin
- Phil Getz
- Places: A
- Places: B
- Places: C
- Places: E
- Places: I
- Places: M
- Places: N
- Places: P
- Places: R
- Places: T
- Places: V
- Places: W
- PLOS ONE
- Pope
- Prediction Market FAQ
- Predictions For 2022
- President Biden
- Progress Studies
- Protestantism
- Publications: C
- Publications: F
- Publications: J
- Publications: N
- Publications: P
- Publications: S
- Publications: T
- Publications: V
- Publications: X
- racism
- Ramchandra
- Ray Kurzweil
- Renaissance Italy
- Ukraine war
- Sam Altman
- Sam Altman Wants $7 Trillion
- Samuel Hammond
- Sarah
- SAT
- Satoshi
- SB 1047: Our Side Of The Story
- Scott Alexander
- Scott Wiener
- Scout Mindset
- SEIU Delenda Est
- Senate
- Sergey Brin
- Shakesville
- Silicon Valley
- Singularity
- SJW
- SJWs
- Socrates
- Some Practical Considerations Before Descending Into An Orgy Of Vengeance
- South Sudan
- SpaceX
- StableDiffusion
- Starship
- Stephen
- Substack
- Superintelligence
- Taj Mahal
- Tales Of Takeover In CCF-World
- Talmud
- Tartaria
- technocracy
- The Intercept
- The Passage Of Polymarket
- The Pentagon Threatens Anthropic
- The Psychopolitics Of Trauma
- The Rise And Fall Of Online Culture Wars
- The Unbearable Semiheaviness Of Being
- TIME
- Tokyo Olympics
- TracingWoodgrains
- TSMC
- Tumblr
- Tyler Cowen
- UCSF
- United Nations
- Urbanist Coven
- US
- US Supreme Court
- Venice
- Venues: T
- Vibecession: Much More Than You Wanted To Know
- Vitalik
- Vox
- Vox Future Perfect
- WebMD, And The Tragedy Of Legible Expertise
- West
- Western world
- Whither Tartaria?
- WHO
- Why AI Safety Won’t Make America Lose The Race With China
- Why I’m Less Than Infinitely Hostile To Cryptocurrency
- Why Is The Central Valley So Bad?
- Why Not Slow AI Progress?
- Why Not Slow AI Progress?
- Wikimedia Foundation
- Wikipedia
- Wikipedia
- X
- Yale
- Yang
- YIMBY
- YIMBYs
- The Question Of Separatism
- Your Book Review: The Internationalists
- Your Book Review: The Weirdest People in the World
- Your Review: Joan of Arc
- YouTube
- Yudkowsky Contra Christiano On AI Takeoff Speeds
- Yudkowsky contra Cotra on biological anchors
- Zach Stein-Perlman
- Zoom
- Zvi
- Zvi Mowshowitz