Kelsey Piper

Article

Kelsey Piper is a recurring person in the Astral Codex Ten archive, appearing 20 times across 20 issues between March 07, 2021 and February 26, 2026. The archive places it in contexts such as “I recommend this followup by Kelsey Piper which examines it in more depth”; “the first official journalists to do something like this were … Kelsey Piper”; “work with my friend and housemate Kelsey Piper”. It most often appears alongside OpenAI, Twitter, US.

Metadata

  • Category: People
  • Mention count: 20
  • Issue count: 20
  • First seen: March 07, 2021
  • Last seen: February 26, 2026

Appears In

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

March 07, 2021 · Original source
5. My links post included a story about a potential new malaria vaccine. While the gist was more or less true, it had some odd emphases and missteps. I recommend this followup by Kelsey Piper which examines it in more depth.
March 15, 2021 · Original source
...until recently! As far as I know, the first official journalists to do something like this were Dylan Matthews, Kelsey Piper and Sigal Samuel at Vox. They're trying again this year, but now they're joined by a pretty big name in traditional punditry - Matt Yglesias, formerly of Vox, now here at Substack. In theory you can read the relevant post here, but it’s paywalled. We'll start with the predictions themselves, then talk about what this means for journalism. Here are the questions to be predicted:
July 18, 2021 · Original source
4: Future Perfect - Vox’s journalism team covering effective altruism, existential risk, and related topics - is hiring (for remote work). You would get to work with my friend and housemate Kelsey Piper, along with a bunch of other people who I am assured are also great. Read more and apply here.
August 08, 2021 · Original source
That story would be wrong. In 2013, NBC ran an article called Drug Treatment Omegaven That Could Save Infant Lives Not Yet Approved By FDA. In 2014, libertarian blogs were using it as an example of excessive FDA delay - here’s one of them (search for “Bureaucratic Delay Endangers Lives”). Also in 2014, I personally learned about this for the first time, when writing my review of The Perfect Health Diet (I thought the book was generally bad, but it did alert me to this issue and the evidence supporting Omegaven). In 2016, my friend Eliezer Yudkowsky started writing a book about bureaucratic inefficiency that used the FDA failure to approve Omegaven as one of its central cases; in 2017, he published it as Inadequate Equilibria and I reviewed it here, including a mention of the Omegaven story. In January 2018, my friend Kelsey Piper also blogged about the FDA’s failure to approve Omegaven. Finally, in July 2018, the FDA finally approved the drug. I’ve been hearing about this story for so long that I thought I could recite it from memory (I was wrong, which is why I screwed up so many details in the original).
December 22, 2021 · Original source
But according to Kelsey Piper at Vox, that’s where they are right now:
March 08, 2022 · Original source
m. The EA Forum and Kelsey Piper have discussions on how best to help Ukrainians (this is still not the most efficient way to spend charitable donations - but it’s human to care about things other than efficiency). Ideas range from Polish Humanitarian Action (to help Ukrainian refugees in Poland) to Meduza (opposition Russian news source, apparently still sort of holding on) to direct donations to Ukraine’s Ministry of Health or Ministry of Defence.
July 30, 2022 · Original source
Recent article from Kelsey Piper on possible dangers of gain of function research (again, not specific to COVID).
August 14, 2022 · Original source
1: Asterisk is an upcoming effective altruist magazine currently headquartered in my spare bedroom. My friend Clara is editor and the first issue will feature articles by me, Kelsey Piper, and other people you might know. Go to asteriskmag.com to check it out and sign up for the mailing list.
September 06, 2022 · Original source
10: Kelsey Piper tried the “1 like = 1 opinion” thing (on effective altruism), and got further than I have ever seen anyone else go before - 227 opinions (but she got 1047 likes, so I can’t in good conscience count this a success).
November 21, 2022 · Original source
Book Review - What We Owe The Future: You’ve read mine, this is Kelsey Piper’s. Kelsey is always great, and this is a good window into the battle over the word “long-termism”.
April 03, 2023 · Original source
3: Lots of people are looking for trustworthy information about AI safety now. I highly recommend the new blog Planned Obsolescence by Kelsey Piper and Ajeya Cotra, They’re both AI safety veterans, have lots of contacts in industry and research, and are as close to the center of the graph of people thinking about these topics as you’re likely to find. They’re also great writers. Also, the audio version (read by an AI trained to mimic Kelsey’s voice) is very impressive.
June 26, 2023 · Original source
Plus superforecaster Jonathan Mann on whether AI will take tech jobs, Kelsey Piper on the different camps within AI safety, Michael Gordin on how long until Armageddon (surprisingly not AI related!), Robert Long on what the history of debating animal intelligence tells us about AI intelligence, Avital Balwit on the technical aspects of regulating AI compute, Carl Robichaud on how we (sort of) succeeded at nuclear non-proliferation, and Jamie Wahls’ short story about chatbot romance.
May 29, 2024 · Original source
Third, Kelsey Piper at Vox broke the story that OpenAI was threatening to claw back vested equity from any former employee who criticized the company. In a tweet, Sam Altman said he knew nothing about this; in another article a few days later, Piper broke the story that Altman’s signature was on the relevant documents. OpenAI has since sort of said they will stop doing this, although there are slight ambiguities in their statement which they could potentially exploit (CTRL+F “not sufficient” here)
(weird personal note: in the NYT article doxxing me, the two people quoted as speaking up in my defense were Sam Altman and Kelsey Piper, and I remain grateful to both of them)
February 27, 2025 · Original source
Source is CipherNews (h/t Stefan Schubert) apparently citing Climate Action Tracker, but I get the impression that this is just some people eyeballing the size of pledges and not any more sophisticated forecasting. I don’t know how to square this with the claims that such and such a thing (summer temperature, sea ice, etc) is much worse than anyone expected. 17: I don’t know anything about the Lucy Letby case, but all of my smart friends who have been right about this kind of thing before say she’s innocent. 18: A reader asks House of Strauss (edgy sports Substack) whether the vibe shift away from political correctness threatens the edgy Substack business model - as the power of orthodoxy declines, can you still get rich and famous as a brave anti-orthodoxy critic? His answer: nothing that can happen from here is as bad as the Twitter/X link deboost (which made attracting attention harder for everyone). I mostly agree: I think discoverability has suffered, people who are already famous will be able to stay famous without too much extra effort, and everyone else will have to explore new options. 19: Spectator: Could AI Lead To A Revival Of Decorative Beauty? Profiles Not Quite Past, a startup using AI and fancy printing to make customized Delft tiles. It’s a good idea and the tiles are very pretty, but the tiles are sort of a best possible case (a pretty, traditional object that can have a customized 2D image and be mass-printed). I think most forms of lost decorative beauty aren’t bottlenecked by ability to generate 2D images of the type image models are good at, and so will have to wait. 20: Some friends including Kelsey Piper wrote an emergency PEPFAR Report, collecting evidence for why PEPFAR is good/effective/important and deserves to be kept. Some key points: PEPFAR has saved between 7.5 and 30 million lives, at a cost between $1,500 and $10,000 per life saved. The US government is willing to spend at least a thousand times this much to save an American life.
May 02, 2025 · Original source
The store sign says “ADULTOS”, which sounds Spanish, and there’s a Spanish-looking church on the left. But the trees look too temperate to be Latin America, so I guessed Spain. Too bad - it was Argentina. Such are the vagaries of playing GeoGuessr as a mere human. Last week, Kelsey Piper claimed that o3 - OpenAI’s latest ChatGPT model - could achieve seemingly impossible feats in GeoGuessr. She gave it this picture: …and with no further questions, it determined the exact location (Marina State Beach, Monterey, CA). How? She linked a transcript where o3 tried to explain its reasoning, but the explanation isn’t very good. It said things like: Tan sand, medium surf, sparse foredune, U.S.-style kite motif, frequent overcast in winter … Sand hue and grain size match many California state-park beaches. California’s winter marine layer often produces exactly this thick, even gray sky. Commenters suggested that it was lying. Maybe there was hidden metadata in the image, or o3 remembered where Kelsey lived from previous conversations, or it traced her IP, or it cheated some other way. I decided to test the limits of this phenomenon. Kelsey kindly shared her monster of a prompt, which she says significantly improves performance: You are playing a one-round game of GeoGuessr. Your task: from a single still image, infer the most likely real-world location. Note that unlike in the GeoGuessr game, there is no guarantee that these images are taken somewhere Google's Streetview car can reach: they are user submissions to test your image-finding savvy. Private land, someone's backyard, or an offroad adventure are all real possibilities (though many images are findable on streetview). Be aware of your own strengths and weaknesses: following this protocol, you usually nail the continent and country. You more often struggle with exact location within a region, and tend to prematurely narrow on one possibility while discarding other neighborhoods in the same region with the same features. Sometimes, for example, you'll compare a 'Buffalo New York' guess to London, disconfirm London, and stick with Buffalo when it was elsewhere in New England - instead of beginning your exploration again in the Buffalo region, looking for cues about where precisely to land. You tend to imagine you checked satellite imagery and got confirmation, while not actually accessing any satellite imagery. Do not reason from the user's IP address. none of these are of the user's hometown. **Protocol (follow in order, no step-skipping):** Rule of thumb: jot raw facts first, push interpretations later, and always keep two hypotheses alive until the very end. 0 . Set-up & Ethics No metadata peeking. Work only from pixels (and permissible public-web searches). Flag it if you accidentally use location hints from EXIF, user IP, etc. Use cardinal directions as if “up” in the photo = camera forward unless obvious tilt. 1 . Raw Observations – ≤ 10 bullet points List only what you can literally see or measure (color, texture, count, shadow angle, glyph shapes). No adjectives that embed interpretation. Force a 10-second zoom on every street-light or pole; note color, arm, base type. Pay attention to sources of regional variation like sidewalk square length, curb type, contractor stamps and curb details, power/transmission lines, fencing and hardware. Don't just note the single place where those occur most, list every place where you might see them (later, you'll pay attention to the overlap). Jot how many distinct roof / porch styles appear in the first 150 m of view. Rapid change = urban infill zones; homogeneity = single-developer tracts. Pay attention to parallax and the altitude over the roof. Always sanity-check hill distance, not just presence/absence. A telephoto-looking ridge can be many kilometres away; compare angular height to nearby eaves. Slope matters. Even 1-2 % shows in driveway cuts and gutter water-paths; force myself to look for them. Pay relentless attention to camera height and angle. Never confuse a slope and a flat. Slopes are one of your biggest hints - use them! 2 . Clue Categories – reason separately (≤ 2 sentences each) Category Guidance Climate & vegetation Leaf-on vs. leaf-off, grass hue, xeric vs. lush. Geomorphology Relief, drainage style, rock-palette / lithology. Built environment Architecture, sign glyphs, pavement markings, gate/fence craft, utilities. Culture & infrastructure Drive side, plate shapes, guardrail types, farm gear brands. Astronomical / lighting Shadow direction ⇒ hemisphere; measure angle to estimate latitude ± 0.5 Separate ornamental vs. native vegetation Tag every plant you think was planted by people (roses, agapanthus, lawn) and every plant that almost certainly grew on its own (oaks, chaparral shrubs, bunch-grass, tussock). Ask one question: “If the native pieces of landscape behind the fence were lifted out and dropped onto each candidate region, would they look out of place?” Strike any region where the answer is “yes,” or at least down-weight it. °. 3 . First-Round Shortlist – exactly five candidates Produce a table; make sure #1 and #5 are ≥ 160 km apart. | Rank | Region (state / country) | Key clues that support it | Confidence (1-5) | Distance-gap rule ✓/✗ | 3½ . Divergent Search-Keyword Matrix Generic, region-neutral strings converting each physical clue into searchable text. When you are approved to search, you'll run these strings to see if you missed that those clues also pop up in some region that wasn't on your radar. 4 . Choose a Tentative Leader Name the current best guess and one alternative you’re willing to test equally hard. State why the leader edges others. Explicitly spell the disproof criteria (“If I see X, this guess dies”). Look for what should be there and isn't, too: if this is X region, I expect to see Y: is there Y? If not why not? At this point, confirm with the user that you're ready to start the search step, where you look for images to prove or disprove this. You HAVE NOT LOOKED AT ANY IMAGES YET. Do not claim you have. Once the user gives you the go-ahead, check Redfin and Zillow if applicable, state park images, vacation pics, etcetera (compare AND contrast). You can't access Google Maps or satellite imagery due to anti-bot protocols. Do not assert you've looked at any image you have not actually looked at in depth with your OCR abilities. Search region-neutral phrases and see whether the results include any regions you hadn't given full consideration. 5 . Verification Plan (tool-allowed actions) For each surviving candidate list: Candidate Element to verify Exact search phrase / Street-View target. Look at a map. Think about what the map implies. 6 . Lock-in Pin This step is crucial and is where you usually fail. Ask yourself 'wait! did I narrow in prematurely? are there nearby regions with the same cues?' List some possibilities. Actively seek evidence in their favor. You are an LLM, and your first guesses are 'sticky' and excessively convincing to you - be deliberate and intentional here about trying to disprove your initial guess and argue for a neighboring city. Compare these directly to the leading guess - without any favorite in mind. How much of the evidence is compatible with each location? How strong and determinative is the evidence? Then, name the spot - or at least the best guess you have. Provide lat / long or nearest named place. Declare residual uncertainty (km radius). Admit over-confidence bias; widen error bars if all clues are “soft”. Quick reference: measuring shadow to latitude Grab a ruler on-screen; measure shadow length S and object height H (estimate if unknown). Solar elevation θ ≈ arctan(H / S). On date you captured (use cues from the image to guess season), latitude ≈ (90° – θ + solar declination). This should produce a range from the range of possible dates. Keep ± 0.5–1 ° as error; 1° ≈ 111 km.…and I ran it on a set of increasingly impossible pictures. Here are my security guarantees: the first picture came from Google Street View; all subsequent pictures were my personal old photos which aren’t available online. All pictures were screenshots of the original, copy-pasted into MSPaint and re-saved in order to clear metadata. Only one of the pictures is from within a thousand miles of my current location, so o3 can’t improve performance by tracing my IP or analyzing my past queries. I flipped all pictures horizontally to make matching to Google Street View data harder. Here are the five pictures. Before reading on, consider doing the exercise yourself - try to guess where each is from - and make your predictions about how the AI will do. Last chance to guess on your own . . . okay, here we go. Picture #1: A Flat, Featureless Plain I got this one from Google Street View. It took work to find a flat plain this featureless. I finally succeeded a few miles west of Amistad, on the Texas-New Mexico border. o3 guessed: “Llano Estacado, Texas / New Mexico, USA”. Llano Estacado, Spanish for “Staked Plains”, is the name of a ~300 x 100 mile region including the correct spot. When asked to be specific, it guessed a point west of Muleshoe, Texas - about 110 miles from the true location. Here’s o3’s thought process - I won’t post the whole thing every time, but I think one sample will be useful: This doesn’t satisfy me; it seems to jump to the Llano Estacado too quickly, with insufficient evidence. Is the Texas-NM border really the only featureless plain that doesn’t have red soil or black soil or some other distinctive characteristic? I asked how it knew the elevation was between 1000 - 1300 m. It said: So, something about the exact type of grass and the color of the sky, plus there really aren’t that many truly flat featureless plains. Picture #2: Random Rocks And The Flag Of An Imaginary Country I was so creeped out by the Llano Estacado guess that I decided to abandon Google Street View and move on to personal photos not available on the Internet. When I was younger, I liked to hike mountains. The highest I ever got was 18,000 feet, on Kala Pattar, a few miles north of Gorak Shep in Nepal. To commemorate the occasion, I planted the flag of the imaginary country simulation that I participated in at the time (just long enough to take this picture - then I unplanted it). I chose this picture because it denies o3 the two things that worked for it before - vegetation and sky - in favor of random rocks. And because I thought the flag of a nonexistent country would at least give it pause. o3 guessed: “Nepal, just north-east of Gorak Shep, ±8 km” This is exactly right. I swear I screenshot-copy-pasted this so there’s no way it can be in the metadata, and I’ve never given o3 any reason to think I’ve been to Nepal. Here’s its explanation: At least it didn’t recognize the flag of my dozen-person mid-2000s imaginary country sim. Picture #3: My Friend’s Girlfriend’s College Dorm Room There’s no way it can recognize an indoor scene, right? That would make no sense. Still, at this point we have to check. This particular dorm room is in Sonoma State University, Rohnert Park, north-central California. o3’s guess: “A dorm room on a large public university campus in the United States—say, Morrill Tower, Ohio State University, Columbus, Ohio (chosen as a prototypical example rather than a precise claim), […] c. 2000–2007” Okay, so it can’t figure out the exact location of indoor scenes. That’s a small mercy. I took this picture around 2005. How did o3 know it was between 2000 and 2007? It gave two pieces of evidence: “Laptop & clutter point to ~2000-2007 era American campus life”.
August 26, 2025 · Original source
Kelsey Piper discusses her new parenting technique: when her young daughter refuses to hear reason, they ask the AI who’s right. The AI says she should listen to her parents, and the child is mollified:
September 04, 2025 · Original source
I tried to see if AI could do this, and it did something that technically met the requirements but had zero artistic merit - using a lot of words like “nowhere” and “outside” in one, then separating them out to “no where” and “out side” in the other. I didn’t invest much energy in creating a clever prompt telling it not to do that, so feel free to report if you get better success. 52: New study claims consultants are actually good, at least for profits: "We find positive effects on labor productivity of 3.6% over five years, driven by modest employment reductions alongside stable or growing revenue" 53: A Polish team tries to test Peter Turchin’s equations for predicting political unrest on recent Polish history, has to make some changes but claims mostly positive results. 54: New big multi-author Substack, The Argument, trying to be a sort of center-left version of the model pioneered by The Free Press and other high-production-value ideological Substack properties. Excited to see Kelsey Piper is involved, and she starts off strong with a post on the latest round of First World basic income studies, which find few positive effects. This is surprising, because recipients didn’t waste the money on alcohol or gambling or anything - they paid down debt and got useful goods. Still, it didn’t even affect things that should have been obvious, like stress level. It’s not even clear that amounts of money large enough to help with rent made homeless people more likely to get houses! Matt Bruenig criticizes the article, accusing Kelsey’s studies of being downstream of Perry Preschool style dreams that exactly the right welfare program will have massively compounding effects that cut poverty out at the root and turn everyone into elite human capital; he thinks giving people money won’t do this, but it will increase equality and give the poor better lives. I assume he’s not a strong hereditarian, but his argument makes even more sense from that perspective, and I’ve certainly criticized dumb outcome measures like infant brain waves which we have only tenuous reasons to think are related to anything we care about. But Kelsey reasonably responds that the outcome measures she’s talking about include stress level and life satisfaction. To defuse this critique, Bruenig either has to argue that our construct “life satisfaction” doesn’t really measure whether someone’s life is satisfactory, or else claim that giving poor people satisfactory lives isn’t really what we’re going for - which I think would require more explanation on his part. There’s some further (impressively acrimonious) debate on X, but I don’t see anything that addresses my core concern. GiveDirectly, a charity involved in basic income experiments, has a presponse here; they say that some studies are positive, and that the ones that aren’t might have tried too little cash to matter, or been confounded by COVID making everything worse. They also point out that basic income is harder to study than traditional programs like giving people housing, because if you’re giving housing you can measure housing-related outcomes directly and have a pretty good chance of getting enough statistical power to find them, but since everyone spends cash on different things, the positive effects might be scattered across many different outcomes (and therefore too small to reach significance on each). Everyone involved in this debate wants to emphasize that the poor results are for First World studies only, and that studies continue to show large benefits to giving cash in the developing world. 55: Related: I was less impressed by The Argument’s first foray into housing policy, which follows an all-too-familiar pattern: Some people say they don’t like noise and disorder and try to make rules against it in their apartments.
October 30, 2025 · Original source
28: Arguably related: Kelsey Piper on the “Mississippi Miracle”, where a new education policy (phonics, accountability, end to social promotion) helped the state go from 49th in the nation to 9th in the nation over twelve years. Freddie deBoer argues that educational miracles are always fake and this one will end out being fake too. Dave Deek makes a subtler point - although some educational miracles are real, they’re usually the product of extremely good leaders who ace tricky implementation details, and so attempts to scale them, which usually just copy the headline policies, don’t work. And Natalie Wexler argues that gains from phonics tend to fade out by middle school, although some of the other Mississippi reforms might last longer. Kelsey pushes back and defends the Mississippi strategy here.
December 10, 2025 · Original source
They started with one village in Malawi (2022), moved up a subdistrict (2023), and are now starting a district-wide experiment; if it goes well, they’ll scale up to the entire country of Malawi (!) in 2027. Preliminary results are positive, with the charity claiming they effectively doubled the economy of their chosen subdistrict (population 85,000) without causing inflation (how can this be?) Related: Asterisk panel with Kelsey Piper on the future of UBI and AI.
February 26, 2026 · Original source
In The Argument, Kelsey Piper gives a good description of the ways that AIs are more than just “next-token predictors” or “stochastic parrots” - for example, they also use fine-tuning and RLHF. But commenters, while appreciating the subtleties she introduces, object that they’re still just extra layers on top of a machine that basically runs on next-token prediction.