Less Wrong
Article
Less Wrong is a recurring publication in the Astral Codex Ten archive, appearing 40 times across 40 issues between August 30, 2020 and February 02, 2026. The archive places it in contexts such as “The rationalist community is a group of people, mostly centered around the website Less Wrong”; “Two other people on Less Wrong”; “Best of recent Less Wrong”. It most often appears alongside Twitter, Metaculus, OpenAI.
Metadata
- Category: Publications
- Mention count: 40
- Issue count: 40
- First seen: August 30, 2020
- Last seen: February 02, 2026
Appears In
- You’re Probably Wondering Why I’ve Called You Here Today
- Mantic Monday: Judging April COVID Predictions
- Links For April
- [[issues/2021-04-14_link-unifying-predictive-coding-with_full|[LINK] Unifying Predictive Coding With Backpropagation]]
- Links For May
- Eight Hundred Slightly Poisoned Word Games
- Book Review: The Scout Mindset
- Links For October
- Open Thread 205
- Grading My 2021 Predictions
- Biological Anchors: A Trick That Might Or Might Not Work
- Ukraine Warcasting
- Obscure Pregnancy Interventions: Much More Than You Wanted To Know
- 22
- Links For June
- Links For September 2022
- Unpredictable Reward, Predictable Happiness
- Links For October
- 22
- Who Predicted 2022?
- Links For February 2023
- Atlanta Meetup This Sunday
- Open Thread 282
- The Extinction Tournament
- Highlights From The Comments On Social Model Of Disability
- Links For September 2023
- Pause For Thought: The AI Pause Debate
- Open Thread 307
- Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
- Links For April 2024
- Highlights From The Comments On The Lab Leak Debate
- Updates on Lumina Probiotic
- 2024
- Open Thread 359
- Highlights From The Comments On Tegmark’s Mathematical Universe
- Open Thread 383
- Practically-A-Book Review: Byrnes on Trance
- Links For September 2025
- Open Thread 413
- Moltbook: After The First Weekend
Related Pages
-
- Twitter (13 shared issues)
-
- Metaculus (9 shared issues)
-
- OpenAI (8 shared issues)
-
- Richard Hanania (8 shared issues)
-
- ACX (7 shared issues)
-
- China (7 shared issues)
-
- Eliezer Yudkowsky (6 shared issues)
-
- Manifold (6 shared issues)
-
- Reddit (6 shared issues)
-
- Trump (6 shared issues)
-
- US (6 shared issues)
-
- Zvi (6 shared issues)
External Links
Source Context
Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.
I'm privileged to be at the intersection of a number of communities exploring these concepts. The rationalist community is a group of people, mostly centered around the website Less Wrong, investigating reasoning and probability. The effective altruist community is a group, inspired by philosophers like Will MacAskill and Peter Singer, who work on how best to use charitable resources for the greater good. And I work in psychiatry, which it turns out is pretty relevant to questions about how people end up believing strange things - both as an investigative science and as a terrible warning. These groups aren't always great at reporting their ideas and conclusions to the general public, so I'm here to help. Most of the interesting stuff you see here will be influenced by at least one of them; most of the errors will be mine alone.
Inline links: Less Wrong, effective altruist
Two other people on Less Wrong, Zvi and Bucky, decided to test themselves against me by trying to predict the same questions. Zvi saw my answers beforehand; Bucky didn’t. Here's how we did (except where otherwise stated, all predictions are for 12/31/20):
17: Best of recent Less Wrong: Is Reinforcement Learning Involved In Sensory Processing?, Politics Is Way Too Meta, A Whirlwind Tour Of Ethereum Finance, and reasons why the GPT-3 paper is disappointing.
This is a link to / ad for a great recent Less Wrong post by lsusr, Predictive Coding Has Been Unified With Backpropagation, itself about a recent paper Predictive Coding Approximates Backprop Along Arbitrary Computation Graphs.
3: Best of Less Wrong: Seven Years Of Spaced Repetition Software In The Classroom. Describes a teacher’s experiments with Anki / Supermemo style SRS flashcards; the conclusion is that using them is complicated, they sort of work, but they helped him realize how much of learning isn’t about memorizing things. I appreciated this most for its theory that it’s important to make kids learn specific facts, but not so important that they remember them; teaching someone (eg) Civil War history is “training” a “predictive model” of the Civil War, war in general, and history in general which will survive and remain useful even after the specific facts and battles are long forgotten. I think this is the strongest defense of modern education, given that we do spend lots of time teaching kids things they will definitely forget. But how would you test it?
7: Best of Less Wrong: DARPA Digital Tutor: Four Months To Total Technical Expertise? In 2009, DARPA created a digital tutoring system that could adjust lessons based on students’ strong and weak points. After four months, digitally-tutored IT technicians outperformed experienced professionals in DARPA’s tests. How is this different from existing digital learning software, and could we make equally successful programs for other subjects?
17: Best of Less Wrong: Are We In An AI Overhang? IE a situation where we have almost all the pieces we need to make much smarter AIs than we’re currently making, and once we snap the last piece into place everything will start moving really fast.
Inline links: Are We In An AI Overhang
I was excited to read the Less Wrong post Chess and cheap ways to check day to day variance in cognition by KPier, who does something similar with chess instead of a word game; they haven’t checked carbon dioxide levels yet, but I’d be excited for them to try. I’m also interested in hearing from anyone else who often repeats some objectively-scoreable cognitive task, to see how they do. A CO2 monitor costs about $100 on Amazon, but if money is the only reason you’re not going to do some really good experiment, please let me know and I’ll buy it for you.
But Galef earned her celebrity status honestly, through long years of hard labor in the rationality mines. Back in ~2007, a bunch of people interested in biases and decision-making joined the “rationalist community” centered around the group blogs Overcoming Bias and Less Wrong. Around 2012, they mostly left to do different stuff. Some of them went into AI to try to save the world. Others went into effective altruism to try to revolutionize charity. Some, like me, got distracted and wrote a few thousand blog posts on whatever shiny things happened to catch their eyes. But a few stuck around and tried to complete the original project. They founded a group called the Center For Applied Rationality (aka “CFAR”, yes, it’s a pun) to try to figure out how to actually make people more rational in the real world.
Inline links: Overcoming Bias, Less Wrong
You’ve probably heard the probabilistic (aka Bayesian) side of things before. Instead of thinking “I’m sure global warming is fake!”, try to think in terms of probabilities (“I think there’s a 90% chance global warming is fake.”) Instead of thinking in terms of changing your mind (“Should I surrender my belief, and switch to my enemy’s belief that global warming is true”), think in terms of updating your probabilities (“Now I’m only 70% sure that global warming is fake”). This mindset makes it easier to remember that it’s not a question of winning or losing, but a question of being as accurate as possible. Someone who updates from 90% to 70% is no more or less wrong or embarrassing than someone who updates from 60% to 40%.
28: Leverage Research is a nonprofit at the edges of my social circle in the Bay Area. A new essay argues that they are kind of a harmful cult. A lot of the more outrageous parts are new to me (especially the part with the demons) but I can confirm that they constantly insist they have “solved psychology” when in fact they’ve just come up with a mildly-invigorating self-help technique, same as every other cult in California. Here’s a Less Wrong post making more or less the same accusations, and here’s a response by a Leverage employee. The version of Leverage described in the essay is mostly defunct (I think?), so this isn’t an emergency, but I agree with its conclusion that people need to stop giving Geoff Anders more money and power.
2: Looking for a Chri…fine, sorry, looking for a Martin Luther King Day gift this year for the rationalist in your life? Engines Of Cognition is a Best Of Less Wrong 2019 book collection out now including essays by me, Zvi, Eliezer, and 30+ other writers. Yes, all the art is AI-generated; it seemed appropriate.
Inline links: Engines Of Cognition
COMMUNITY 33. Major rationalist org leaves Bay Area: 60% 34. MIRI relocates to Washington State: 20% 35. MIRI relocates to New England: 20% 36. MIRI relocates somewhere else: 20% 37. Less Wrong team relocates: 30% 38. No new residents at our housing cluster: 40% 39. No current residents leave our housing cluster: 60% 40. [friend] goes back to Indiana: 40% 41. [friend] is in a primary relationship: 50% 42. [friend] is in a primary relationship: 30% 43. [friend] is in a primary relationship: 20% 44. [friend] has gotten [job]: 50% 45. [friend] has recovered their health: 70% 46. [friend] has gotten egg freezing: 30% 47. [friend] is pregnant: 70% 48. [friends] are still together: 50% 49. [friend] is still at [job]: 80% 50. [friend] is in college: 60% 51. [friends] live in [house]: 30% 52. [other friends] live in [house]: 30% 53. At least 7 days my house is orange or worse on PurpleAir.com because of fires: 80%
Simon M did a similar exercise on Less Wrong, and compared me to Zvi and to various prediction markets. This was slightly biased against me, because Zvi got to see my guesses first and choose which ones to adjust on, and the markets are the markets. Still, he found:
Inline links: did a similar exercise on Less Wrong
Play pro-level Go using 8-16 times as much computing power as AlphaGo, but only 2006 levels of technology. For reference, recall that in 2006, Hinton and Salakhutdinov were just starting to publish that, by training multiple layers of Restricted Boltzmann machines and then unrolling them into a "deep" neural network, you could get an initialization for the network weights that would avoid the problem of vanishing and exploding gradients and activations. At least so long as you didn't try to stack too many layers, like a dozen layers or something ridiculous like that. This being the point that kicked off the entire deep-learning revolution. Your model apparently suggests that we have gotten around 50 times more efficient at turning computation into intelligence since that time; so, we should be able to replicate any modern feat of deep learning performed in 2021, using techniques from before deep learning and around fifty times as much computing power. OpenPhil: No, that's totally not what our viewpoint says when you backfit it to past reality. Our model does a great job of retrodicting past reality. Eliezer: How so? OpenPhil: <Eliezer cannot predict what they will say here.> I think the argument here is that OpenPhil is accounting for normal scientific progress in algorithms, but not for paradigm shifts. Directional Error These are the two arguments Eliezer makes against OpenPhil that I find most persuasive. First, that you shouldn’t be using biological anchors at all. Second, that unpredictable paradigm shifts are more realistic than gradual algorithmic progress. These mostly add uncertainty to OpenPhil’s model, but Eliezer ends his essay making a stronger argument: he thinks OpenPhil is directionally wrong, and AI will come earlier than they think. Mostly this is the paradigm argument again. Five years from now, there could be a paradigm shift that makes AI much easier to build. It’s happened before; from GOFAI’s pre-programmed logical rules to Deep Blue’s tree searches to the sorts of Big Data methods that won the Netflix Prize to modern deep learning. Instead of just extrapolating deep learning scaling thirty years out, OpenPhil should be worried about the next big idea. Hypothetical OpenPhil retorts that this is a double-edged sword. Maybe the deep learning paradigm can’t produce AGI, and we’ll have to wait decades or centuries for someone to have the right insight. Or maybe the new paradigm you need for AGI will take more compute than deep learning, in the same way deep learning takes more compute than whatever Moravec was imagining. This is a pretty strong response, since it would have been true for every previous forecaster: remember, Moravec erred in thinking AI would come too soon, not too late. So although Eliezer is taking the cheap shot of saying OpenPhil’s estimate will be wrong just as everyone else’s was wrong before, he’s also giving himself the much harder case of arguing it might be wrong in the opposite direction as all its predecessors. Eliezer takes this objection seriously, but feels like on balance probably new paradigms will speed up AI rather than slow it down. Here he grudgingly and with suitable embarrassment does try to make an object-level semi-biological-anchors-related argument: Moravec was wrong because he ignored the training phase. And the proper anchor for the training phase is somewhere between evolution and a human childhood, where evolution represents “blind chance eventually finding good things” and human childhood represents “an intelligent cognitive engine trying to squeeze as much data out of experience as possible”. And part of what he expects paradigm shifts to do is to move from more evolutionary processes to more childhood-like processes, and that’s a net gain in efficiency. So he still thinks OpenPhil’s methods are more likely to overestimate the amount of time until AGI rather than underestimate it. What Moore’s Law Giveth, Platt’s Law Taketh Away Eliezer’s other argument is kind of a low blow: he refers to Platt’s Law Of AI Forecasting: “any AI forecast will put strong AI thirty years out from when the forecast is made.” This isn’t exact. Hans Moravec, writing in 1988, said 2010 - so 22 years. Ray Kurzweil, writing in 2001, said 2023 - another 22 years. Vernor Vinge, in a 1993 speech, said 2023, and that was exactly 30 years, but Vinge knew about Platt’s Law and might have been joking. The point is: OpenPhil wrote a report in 2020 that predicted strong AI in 2052, isn’t that kind of suspicious? I’d previously mentioned it as a plus that Ajeya got around the same year everyone else got. The forecasters on Metaculus. The experts surveyed in Grace et al. Lots of other smart experts with clever models. But what if all of these experts and models and analyses are just fudging the numbers for the same Platt’s-Law-related reasons? Hypothetical OpenPhil is BTFO: OpenPhil: That part about Charles Platt's generalization is interesting, but just because we unwittingly chose literally exactly the median that Platt predicted people would always choose in consistent error, that doesn't justify dismissing our work, right? We could have used a completely valid method of estimation which would have pointed to 2050 no matter which year it was tried in, and, by sheer coincidence, have first written that up in 2020. In fact, we try to show in the report that the same methodology, evaluated in earlier years, would also have pointed to around 2050 - Eliezer: Look, people keep trying this. It's never worked. It's never going to work. 2 years before the end of the world, there'll be another published biologically inspired estimate showing that AGI is 30 years away and it will be exactly as informative then as it is now. I'd love to know the timelines too, but you're not going to get the answer you want until right before the end of the world, and maybe not even then unless you're paying very close attention. Timing this stuff is just plain hard. Part III: Responses And Commentary Response 1: Less Wrong Comments Less Wrong is a site founded by Eliezer Yudkowsky for Eliezer Yudkowsky fans who wanted to discuss Eliezer Yudkowsky’s ideas. So, for whatever it’s worth - the comments on his essay were pretty negative. Carl Shulman, an independent researcher with links to both OpenPhil and MIRI (Eliezer’s org), writes the top-voted comment. He works from a model where there is hardware progress, software progress downstream of hardware progress, and independent (ie unrelated to algorithms) software progress, and where the first two make up most progress on the margin. Researchers generally develop new paradigms once they have enough compute available to tinker with them. Progress in AI has largely been a function of increasing compute, human software research efforts, and serial time/steps. Throwing more compute at researchers has improved performance both directly and indirectly (e.g. by enabling more experiments, refining evaluation functions in chess, training neural networks, or making algorithms that work best with large compute more attractive). Historically compute has grown by many orders of magnitude, while human labor applied to AI and supporting software by only a few. And on plausible decompositions of progress (allowing for adjustment of software to current hardware and vice versa), hardware growth accounts for more of the progress over time than human labor input growth. So if you're going to use an AI production function for tech forecasting based on inputs (which do relatively OK by the standards tech forecasting), it's best to use all of compute, labor, and time, but it makes sense for compute to have pride of place and take in more modeling effort and attention, since it's the biggest source of change (particularly when including software gains downstream of hardware technology and expenditures). […] A perfectly correlated time series of compute and labor would not let us say which had the larger marginal contribution, but we have resources to get at that, which I was referring to with 'plausible decompositions.' This includes experiments with old and new software and hardware, like the chess ones Paul recently commissioned, and studies by AI Impacts, OpenAI, and Neil Thompson. There are AI scaling experiments, and observations of the results of shocks like the end of Dennard scaling, the availability of GPGPU computing, and Besiroglu's data on the relative predictive power of computer and labor in individual papers and subfields. In different ways those tend to put hardware as driving more log improvement than software (with both contributing), particularly if we consider software innovations downstream of hardware changes. Vanessa Kosoy makes the obvious objection, which echoes a comment of Eliezer’s in the dialogue above: I'm confused how can this pass some obvious tests. For example, do you claim that alpha-beta pruning can match AlphaGo given some not-crazy advantage in compute? Do you claim that SVMs can do SOTA image classification with not-crazy advantage in compute (or with any amount of compute with the same training data)? Can Eliza-style chatbots compete with GPT3 however we scale them up? Mark Xu answers: My model is something like: For any given algorithm, e.g. SVMs, AlphaGo, alpha-beta pruning, convnets, etc., there is an "effective compute regime" where dumping more compute makes them better. If you go above this regime, you get steep diminishing marginal returns.
A bunch of leftists - Michael Tracey, Matt Taibbi, Glenn Greenwald - failed because they couldn’t believe that warmongering intelligence officials trying to scare everyone about Russia had a point. They admittedly had great heuristics: there are lots of warmongers, our intelligence community has been really wrong lots of times before, and the past few years have seen a lot of really embarrassing Russia-related paranoia. Unfortunately, the relevant Less Wrong post here is Reversed Stupidity Is Not Intelligence, and the relevant ACX post is Heuristics That Almost Always Work, so they failed.
See also this Less Wrong post on the study mentioned above.
Inline links: Less Wrong post
Or is there another explanation? A lot of AI forecasters on Metaculus are Less Wrong readers; we know that the Less Wrong Yudkowsky/Christiano debate on takeoff speeds moved the relevant Metaculus question a few percent:
Inline links: Yudkowsky/Christiano debate on takeoff speeds
Early this month on Less Wrong, Eliezer Yudkowsky posted MIRI Announces New Death With Dignity Strategy, where he said that after a career of trying to prevent unfriendly AI, he had become extremely pessimistic, and now expects it to happen in the relatively near-term and probably kill everyone. This caused the Less Wrong community, already pretty dedicated to panicking about AI, to redouble its panic. Although the new announcement doesn’t really say anything about timelines that hasn’t been said before, the emotional framing has hit people a lot harder.
Inline links: MIRI Announces New Death With Dignity Strategy
53: Douglas Hofstadter published a recent article pointing out that GPT-3 gives straight answers to silly questions - for example, if you ask when Egypt was transported across the Golden Gate Bridge, it will guess 2017. Rictic on Less Wrong demonstrates that if you ask it nicely to not do this, and instead to call you out when you ask silly questions, it’s perfectly able to do that.
Inline links: a recent article, Rictic on Less Wrong
22: Steven Byrnes on Less Wrong: I’m Mildly Skeptical That Blindness Prevents Schizophrenia. There’s an old piece of trivia that no congenitally blind person has ever been schizophrenic (I talk about it here). Steven is able to track down a few cases of this happening, and speculates that given how rare both conditions are, maybe these few cases are all we would expect to find. Since I previously wrote about this, I’ve provisionally added it to my Mistakes Page.
30: Less Wrong: Language Models Seem To Be Much Better Than Humans At Next Token Prediction. Remember, language AIs aren’t “trying” to speak fluently, they’re technically “trying” to predict the next token (eg letter or number) in a text. They’re still worse than humans at speaking fluently, but nobody had formally checked whether they were better or worse than humans at their own goal. Turns out they’re much better.
I recently read TurnTrout’s Reward Is Not The Optimization Target on Less Wrong. It’s technically about AI, but half the useful things I’ve learned about psychology recently have started out being about AI, so let’s not hold that against it.
Inline links: Reward Is Not The Optimization Target
3: Harsimony on Less Wrong: Georgism . . . In Space! “Extending the Georgist paradigm into space neatly solves problems with sharing resources and ensures that colonization proceeds at an appropriate pace.”
Inline links: Georgism . . . In Space!
1: Less Wrong’s Petrov Day celebration caused prediction-market-related drama.
Inline links: caused prediction-market-related drama
Along with looking at individuals, we also tried to figure out which groups did the best - whether there were any demographic characteristics that reliably predicted good forecasting. Not really. Liberals didn’t outperform conservatives, old people didn’t outperform young people, nothing like that6. Users of the website Less Wrong, which tries to teach a prediction-focused concept of rationality, didn’t significantly outperform others, which disappointed me.
Inline links: 6
In theory this also paves the way for human meat, though regulators might have other ideas. 2: Eight years ago I wrote an article about how the government should stop restricting doctors’ ability to prescribe suboxone, a useful medicine for opioid abuse. Last month, the government finally stopped the restrictions. Good for them! 3: Carl Sagan married three times. His first wife was legendary biologist Lynn Margulis, who discovered mitochondrial endosymbiosis, then went off the deep end and became an AIDS denialist and 9/11 truther. His second wife drew the Pioneer plaque. His third wife was one of the women who designed the Voyager golden record. 4: Claim: Chinese sources seem to back this up (and related BBC), but I’m skeptical: is this really the best way to satisfy a “must fight with medieval weapons” constraint? Why not crossbows? 5: Did you know: Alex Berenson, who runs the most popular anti-vaccine Substack, has had an unusual career: he used to be an investigative reporter for the New York Times, and also wrote a series of bestselling spy novels. 6: Less Wrong: I Converted Book 1 Of The Less Wrong Sequences Into A Zoomer-Readable Format. Apparently there’s a thing where Zoomers are supposedly more likely to learn a text if you overlay it on on a fast-paced video game, example here. 7: By this point we’ve probably all heard stories about people who win the lottery and then end up bankrupt and miserable after X months or years. I had always assumed this was limited to very poor people with no understanding of money. This forum post argues it’s not, and tells the story of a man who started out with $15 million and still ruined his life after winning $170 million more in the lottery. 8: Did you know: Exiliarch Mar-Zutra II was a 5th century Jewish leader who took advantage of the chaos caused by weird Zoroastrian communists to secede and turn the city of Al-Mada’in, Iraq into an independent Jewish state for seven years. 9: Why doesn’t the Supreme Court have vice-justices? 10: Steve Sailer (warning: unz.com, far-right site, some firewalls will flag or block it): why aren’t there more gay English soccer players? Thousands of current or recent English pro soccer players, the media is really interested in finding a gay one so they can run a “Historic First” article, and apparently they can’t. There are rumors that players are afraid to come out because of homophobia, but there are at least 2,000 retired soccer players and only one of them has come out as gay. “I’m increasingly sympathetic to [the] theory that whatever psychosocial traits make men highly interested in team sports make them highly heterosexual too”. Is this true of other countries and other sports? 11: Adam Tooze on the demographic background to Iran’s protests. Iran thought it was facing an overpopulation crisis in the 80s and tried some reforms to lower family size. The reforms worked overwhelmingly well, causing “the most dramatic transition ever recorded in demographic history”, from 6.5 to 2.5 children per woman in thirty years. Iran now has “lower maternal mortality than the US”, and an education system where “women in university outnumber males”. This kind of demography isn’t usually compatible with patriarchal religious institutions, and the Ayatollahs are aware of this; in a rare admission of error, Khameini said that “Government officials were wrong on this matter, and I, too, had a part. . . . May God and history forgive us.” Now they’re trying to increase average family size and put the genie back in the bottle; Hungary can tell them about the limits of that strategy. 12: What it looks like to be on shrooms: I haven’t used shrooms myself so cannot confirm or deny, but this is oddly compelling, and makes some things I’ve read about neuroscience of vision make more sense. I wonder if you could get HPPD from watching videos like this for too long. 13: Study: federal cancer funding is extraordinarily effective. Cancer research produces so many valuable treatments that it saves one DALY per $326 spent. For comparison, health systems usually consider an intervention good value-for-money if it saves at least one DALY per $50,000. By combing the Earth far and wide, effective altruists have tentatively found one or two opportunities in the poorest parts of Africa to save lives at $100/DALY, but these are extremely rare exceptions and I wouldn’t have expected anything in the US to be within an order of magnitude of that. Either this finding is fake, or we should all be donating to federal cancer research instead of whatever else we’re doing. 14: Yet another person building a vast theory of human interaction off of the characters in The Office. This one is pretty good, also name-drops Bobos In Paradise. I’m still surprised this is such a common thing. 15: Marginal Revolution: FDA Deregulation Increases Safety And Innovation And Reduces Prices. Study looks at what happens when the FDA reclassifies medical devices from a highly-regulated to a less-highly-regulated category; in general, those devices get better, cheaper, and there are somewhere between similar and fewer deaths/injuries related to those devices. Why would safety increase? The author suggests that regulation is a defense against lawsuits (“Your Honor, the FDA agreed to approve our device, so it can’t have been bad!”), and removing that defense makes companies more lawsuit-conscious and careful; Alex Tabarrok suggests a bigger effect may be allowing more innovation towards safer versions. 16: Ozy writes about Interesting People Of History: Charles Williams (ie the other member of the Inklings) 17: Did you know: the Congressman who founded the House Committee On Un-American Activities was, in fact, a paid Soviet spy (tweet, Wiki article). This actually makes sense; he originally started HUAC to root out fascists, and it only got turned against communists later on. “There has been a push to rename the street [currently named after the Soviet spy], but as of 2018 it has been unsuccessful.” 18: Idle Words: Why Not Mars? Surprisingly strong argument for why sending humans to Mars is harder than people think, of minimal scientific value, and likely to contaminate all future searches for microbial life and ruin our chance to study the topic. Concludes that we should abandon the allure of human space travel and just send probes everywhere. This makes short-term sense, but I wonder what this author’s vision of the future is - do we just stay on Earth forever? If not, don’t we have to start trying to do the hard thing at some point? (I don’t care about this because I assume AI will will flip the gameboard one way or another, but Ceglowski is a noted singularity skeptic and should probably have opinions about long-term things). 19: Metacelsus and Razib on epigenetics. Stop using it to claim there’s “intergenerational trauma”! 20: Tafl games are a family of European games, played in areas as diverse as Iceland, Ireland, Britain, and Denmark, probably sharing descent from a now-lost board game of ancient Rome. One of them, Hnetafl, was the chief board game of the Vikings and is affectionately called “Viking chess”. The one we actually know the rules for is the Saami version, Tablut, which survived long enough for Linnaeus (the taxonomy guy!) to write down the rules. 21: Shot: Chaser: (source) 22: Related: the very center of GPT’s embedding space contains a few unusual tokens including the string “SolidGoldMagikarp”. GPT displays anomalous behavior if these tokens are inserted in a query; for example, it treats “SolidGoldMagikarp” as the word “distribute”. ChatGPT is pretty advanced and fails semi-gracefully here; GPT-2’s reaction to these tokens is more disturbing: (source: Less Wrong) Further investigation determined that many of these tokens are the screen names of a group of Redditors who attempted to count to infinity. The most likely explanation, according to the discoverers, is that these names were in GPT’s tokenization data, but not its training data (maybe they were especially common in the tokenization data because they made thousands of posts with numbers in them, but didn’t make it into the training data because their posts had no content?) - that leaves them existing without content, and GPT tries to round them off to some other “nearby” token (by incomprehensible AI standards of nearbyness). Congrats to the SERI-MATS AI alignment researchers who found all of this; maybe this makes it 0.0001% less likely that the AI which controls the nuclear arsenal in twenty years will have equally inexplicable behavior. 23: More language model news: LLM that understands and can explain images
Inline links: I wrote an article about how, finally stopped the restrictions, married three times, Lynn Margulis, drew the, Chinese, and related BBC, Alex Berenson, I Converted Book 1 Of The Less Wrong Sequences Into A Zoomer-Readable Format, example here, This forum post argues it’s not, Exiliarch Mar-Zutra II, weird Zoroastrian communists, vice-justices, why aren’t there more gay English soccer players?, Adam Tooze on the demographic background to Iran’s protests, federal cancer funding is extraordinarily effective, Yet another person building a vast theory of human interaction off of the characters in, FDA Deregulation Increases Safety And Innovation And Reduces Prices, Interesting People Of History: Charles Williams, tweet, Wiki article, Why Not Mars?, Metacelsus, Razib, Tafl games, Tablut, https://substackcdn.com/image/fetch/$s_!EhT-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39755d06-3101-44d5-bb50-0a6b99155fae_807x776.png, source, contains a few unusual tokens, https://substackcdn.com/image/fetch/$s_!rkvW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb0cd39-6942-44c0-85ef-0fac91b362e3_536x147.png, https://substackcdn.com/image/fetch/$s_!QIzo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba1d0783-5826-48a3-9ead-1e803042e96d_622x631.png, Less Wrong, many of these tokens are the screen names of a group of Redditors who attempted to count to infinity, SERI-MATS, LLM that understands and can explain images
(source: Less Wrong) Further investigation determined that many of these tokens are the screen names of a group of Redditors who attempted to count to infinity. The most likely explanation, according to the discoverers, is that these names were in GPT’s tokenization data, but not its training data (maybe they were especially common in the tokenization data because they made thousands of posts with numbers in them, but didn’t make it into the training data because their posts had no content?) - that leaves them existing without content, and GPT tries to round them off to some other “nearby” token (by incomprehensible AI standards of nearbyness). Congrats to the SERI-MATS AI alignment researchers who found all of this; maybe this makes it 0.0001% less likely that the AI which controls the nuclear arsenal in twenty years will have equally inexplicable behavior. 23: More language model news: LLM that understands and can explain images
Who: Everyone is welcome, even if they’re new, disagree with the blog, “not the typical reader”, etc. The organizers ask that you RSVP at meetup.com or on Less Wrong.
Inline links: meetup.com, Less Wrong
4: Lightcone is a team that operates important AI alignment and rationalist community infrastructure, including the Less Wrong website, the Alignment Forum, and the Rose Garden Inn (a venue for various alignment-related conferences and projects - also where we have Berkeley ACX meetups!) They're running low on money due to Rose Garden renovations being unexpectedly expensive and grants being unexpectedly thin, and are asking for a few 6+ figure grants to help tide them through this difficult period. If you're a wealthy person or grantmaker interested in AI alignment, see here for more information, or contact me at scott@slatestarcodex.com if you have questions, or get in touch with the head of Lightcone directly at habryka@lesswrong.com.
Inline links: see here for more information
There are centuries’ worth of data on non-genetically-engineered plagues to give us base rates; these give us a base rate of ~25% per century = 20% between now and 2100. But we have better epidemiology and medicine than most of the centuries in our dataset. The experts said 8% chance and the superforecasters said 4% chance, and both of those seem like reasonable interpretations of the historical data to me. The “WHO declares emergency” question is even easier - just look at how often it’s done that in the past and extrapolate forward. Both superforecasters and experts mostly did that. Likewise, lots of scientists have put a lot of work into modeling the climate, there aren’t many surprises there, and everyone basically agreed on the extent of global warming: Wherever there was clear past data, both superforecasters and experts were able to use it correctly and get similar results. It was only when they started talking about things that had never happened before - global nuclear war, bioengineered pandemics, and AI - that they started disagreeing. Were the participants out of their depth? Peter McCluskey, one of the more-AI-concerned superforecasters in the tournament, wrote about his experience on Less Wrong. Quoting liberally: I signed up as a superforecaster. My impression was that I knew as much about AI risk as any of the subject matter experts with whom I interacted (the tournament was divided up so that I was only aware of a small fraction of the 169 participants). I didn't notice anyone with substantial expertise in machine learning. Experts were apparently chosen based on having some sort of respectable publication related to AI, nuclear, climate, or biological catastrophic risks. Those experts were more competent, in one of those fields, than news media pundits or politicians. I.e. they're likely to be more accurate than random guesses. But maybe not by a large margin […] The persuasion seemed to be spread too thinly over 59 questions. In hindsight, I would have preferred to focus on core cruxes, such as when AGI would become dangerous if not aligned, and how suddenly AGI would transition from human levels to superhuman levels. That would have required ignoring the vast majority of those 59 questions during the persuasion stages. But the organizers asked us to focus on at least 15 questions that we were each assigned, and encouraged us to spread our attention to even more of the questions […] Many superforecasters suspected that recent progress in AI was the same kind of hype that led to prior disappointments with AI. I didn't find a way to get them to look closely enough to understand why I disagreed. My main success in that area was with someone who thought there was a big mystery about how an AI could understand causality. I pointed him to Pearl, which led him to imagine that problem might be solvable. But he likely had other similar cruxes which he didn't get around to describing. That left us with large disagreements about whether AI will have a big impact this century. I'm guessing that something like half of that was due to a large disagreement about how powerful AI will be this century. I find it easy to understand how someone who gets their information about AI from news headlines, or from laymen-oriented academic reports, would see a fair steady pattern of AI being overhyped for 75 years, with it always looking like AI was about 30 years in the future. It's unusual for an industry to quickly switch from decades of overstating progress, to underhyping progress. Yet that's what I'm saying has happened. I've been spending enough time on LessWrong that I mostly forgot the existence of smart people who thought recent AI advances were mostly hype. I was unprepared to explain why I thought AI was underhyped in 2022. Today, I can point to evidence that OpenAI is devoting almost as much effort into suppressing abilities (e.g. napalm recipes and privacy violations) as it devotes to making AIs powerful. But in 2022, I had much less evidence that I could reasonably articulate. What I wanted was a way to quantify what fraction of human cognition has been superseded by the most general-purpose AI at any given time. My impression is that that has risen from under 1% a decade ago, to somewhere around 10% in 2022, with a growth rate that looks faster than linear. I've failed so far at translating those impressions into solid evidence. Skeptics pointed to memories of other technologies that had less impact (e.g. on GDP growth) than predicted (the internet). That generates a presumption that the people who predict the biggest effects from a new technology tend to be wrong. > Superforecasters' doubts about AI risk relative to the experts isn't primarily driven by an expectation of another "AI winter" where technical progress slows. ... That said, views on the likelihood of artificial general intelligence (AGI) do seem important: in the postmortem survey, conducted in the months following the tournament, we asked several conditional forecasting questions. The median superforecaster's unconditional forecast of AI-driven extinction by 2100 was 0.38%. When we asked them to forecast again, conditional on AGI coming into existence by 2070, that figure rose to 1%. There was also little or no separation between the groups on the three questions about 2030 performance on AI benchmarks (MATH, Massive Multitask Language Understanding, QuALITY). This suggests that a good deal of the disagreement is over whether measures of progress represent optimization for narrow tasks, versus symptoms of more general intelligence. The “won’t understand causality” and “what if it’s all hype” objections really don’t impress me. Many of the people in this tournament hadn’t really encountered arguments about AI extinction before (potentially including the “AI experts” if they were just eg people who make robot arms or something), and a couple of months of back and forth discussion in the middle of a dozen other questions probably isn’t enough for even a smart person to wrap their brain around the topic. Was this tournament done so long ago that it has been outpaced by recent events? The tournament was conducted in summer 2022. This was before ChatGPT, let alone GPT-4. The conversation around AI noticeably changed pitch after these two releases. Maybe that affected the results? In fact, the participants have already been caught flat-footed on one question: A recent leak suggested that the cost of training GPT-4 was $63 million, which is already higher than the superforecasters’ median estimate of $35 million by 2024 has already been proven incorrect. I don’t know how many petaFLOP-days were involved in GPT-4, but maybe that one is already off also. There was another question on when an AI would pass a Turing Test. The superforecasters guessed 2060, the domain experts 2045. GPT-4 hasn’t quite passed the exact Turing Test described in the study, but it seems very close, so much so that we seem on track to pass it by the 2030s. Once again the experts look better than the superforecasters. So is it possible that we, in 2023, now have so much better insight into AI than the 2022 forecasters that we can throw out their results? We could investigate this by looking at Metaculus, a forecasting site that’s probably comparably advanced to this tournament. They have a question suspiciously similar to XPT’s global catastrophe framing: In summer 2022, the Metaculus estimate was 30%, compared to the XPT superforecasters’ 9% (why the difference? maybe because Metaculus is especially popular with x-risk-pilled rationalists). Since then it’s gone up to 38%. Over the same period, Metaculus estimates of AI catastrophe risk went from 6% to 15%. If the XPT superforecasters’ probabilities rose linearly by the same factor as Metaculus forecasters’, they might be willing to update total global catastrophe risk to 11% and AI catastrophe risk to 5%. But the main thing we’ve updated on since 2022 is that AI might be sooner. But most people in the tournament already agreed we would get AGI by 2100. The main disagreement was over whether it would cause a catastrophe once we got it. You could argue that getting it sooner increases that risk, since we’ll have less time to work on alignment. But I would be surprised if the kind of people saying the risk of AI extinction is 0.4% are thinking about arguments like that. So maybe we shouldn’t expect much change. FRI called back a few XPT forecasters in May 2023 to see if any of them wanted to change their minds, but they mostly didn’t. Overall I don’t think this was just a problem of the incentives being bad or the forecasters being stupid. This is a real, strong disagreement. We may be able to slightly increase their forecast based on recent events, but this would only change the estimate a little. Breaking Down The AI Estimate How did the forecasters arrive at their AI estimate? What were the cruxes between the people who thought AI was very dangerous, and the people who thought it wasn’t? You can think of AI extinction as happening in a series of steps: We get human-level AI by 2100.
Inline links: https://substackcdn.com/image/fetch/$s_!KJ84!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1f1c4fd-5981-458c-959f-bf9a19ff28da_801x129.png, wrote about his experience, Pearl, https://substackcdn.com/image/fetch/$s_!CfZT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2362d361-ae0a-4e4f-ad97-cbeb1fcbe827_817x351.png, the cost of training GPT-4 was $63 million, Metaculus, a question, https://substackcdn.com/image/fetch/$s_!k5Ep!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09cb5518-f8d1-4a98-8c44-97158857dbd8_772x364.png
And on Less Wrong, DirectedEvolution posted another Contra Contra The Social Model Of Disability. Their summary:
Inline links: Contra Contra The Social Model Of Disability
40: Best of new Less Wrong: The Talk. Why does sex exist? Why do so many living things have two sexes, instead of some other number? Why do the sexes have differently shaped gametes? Why do species that have sex correlate so closely with species that have mitochondria? And other sexy questions.
Inline links: Best of new Less Wrong: The Talk
HOW LONG TO PAUSE. The biggest disadvantage of pausing for a long time is that it gives bad actors (eg China)1 a chance to catch up. Suppose the West is right on the verge of creating dangerous AI, and China is two years away. It seems like the right length of pause is 1.9999 years, so that we get the benefit of maximum extra alignment research and social prep time, but the West still beats China. Obviously the problem with the Surgical Pause is that we might not know when we’re on the verge of dangerous AI, and we might not know how much of a lead “the good guys” have. Surgical Pause proponents suggest being very conservative with both free variables. This is less of a well-thought-out plan and more saying “come on guys, let’s at least try to be strategic here”. At the limit, it suggests we probably shouldn’t pause for six months, starting right now. Since this involves leading labs burning their lead time for safety, in theory it could be done unilaterally by the single leading lab, without international, governmental, or even inter-lab coordination. But you could buy more time if you got those things too. Some leading labs have promised to do this when the time is right - for example OpenAI and (a previous iteration of) DeepMind - with varying levels of believability. AnonResearcherAtMajorAILab discussed some of the strategy here in Aim For Conditional AI Pauses, and this Less Wrong post is also very good. Regulatory Pause: If one benefit of the Simple Pause is to use the time to prepare for AI socially and politically, maybe we should just pause until we’ve completed social and political preparations. David Manheim suggests a monitoring agency like the FDA. It would “fast-track” small AIs and trivial re-applications of existing AIs, but carefully monitor new “frontier models” for signs of danger. Regulators might look for dangerous capabilities by asking AIs to hack computers or spread copies of themselves, or test whether they’ve been programmed against bias/misinformation/etc. We could pause only until we’ve set up the regulatory agency, and take hostile actions (like restrict chip exports) only to other countries that don’t cooperate with our regulators or set up domestic regulators of their own. Many people in tech are regulation-skeptical libertarians, but proponents point out that regulation fails in a predictable direction: it usually does successfully prevent bad things, it just also prevents good things too. Since the creation of the Nuclear Regulatory Commission in 1975, there has never been a major nuclear accident in the US. And sure, this is because the NRC prevented any nuclear plants from being built in the United States at all from 1975 to 2023 (one was finally built in July). Still, they technically achieved their mandate. Likewise, most medications in the US are safe and relatively effective, at the cost of an FDA approval process being so expensive that we only get a tiny trickle of new medications each year and hundreds of thousands of people die from unnecessary delays. But medications are safe and effective. Or: San Francisco housing regulators almost never approve new housing, so housing costs millions of dollars and thousands of San Franciscans are homeless - but certainly there’s no epidemic of bad houses getting approved and then ruining someone’s view or something. If we extrapolate this track record to AI, AI regulators will be overcautious, progress will slow by orders of magnitude or stop completely - but AIs will be safe. This is a depressing prospect if you think the problems from advanced AI would be limited to more spam or something. But if you worry about AI destroying the world, maybe you should accept a San-Francisco-housing-level of impediment and frustration. A regulatory pause could be better than a total stop if you think it will be more stable (lots of industries stay heavily regulated forever, and only a few libertarians complain), or if you think maybe the regulator will occasionally let a tiny amount of safe AI progress happen. But it could be worse than a total stop if you expect continued progress will eventually produce unsafe AIs regardless of regulation. You might expect this if you’re worried about deceptive alignment, eg superintelligent AIs that deliberately trick regulators into thinking they’re safe. Or you might think AIs will eventually be so powerful that they can endanger humanity from a walled-off test environment even before official approval. The classic Bostrom/Yudkowsky model of alignment implies both of these things. David Manheim and Thomas Larsen set out their preferred versions of this strategy in What’s In A Pause? and Policy Ideas For Mitigating AI Risk. Total Stop: If you expect AIs to exhibit deceptive alignment capable of fooling regulators, or to be so dangerous that even testing them on a regulator’s computer could be apocalyptic, maybe the only option is a total stop. It’s tough to imagine a total stop that works for more than a few years. You have at least three problems: NON-PARTICIPANTS. As with any pause proposal, unfriendly countries (eg China) can keep working on AI. You can refuse to export chips to them, which will slow them down a little, but their own chips will eventually be up to the task. You will either need a diplomatic miracle, or willingness to resort to less diplomatic forms of coercion. This doesn’t have to be immediate war: Israel has come up with “creative” ways to slow Iran’s nuclear program, and countries trying to frustrate China’s chip industry could do the same. But great powers playing these kinds of games against each other risks wider conflict.
2: Update to Beyond “Abolish The FDA” - it turns out the “experimental drug approval” category I recommended already exists. What’s the catch? It’s only for animals - see the FDA’s veterinary site for details. H/T this Less Wrong post arguing that the new dog longevity drug probably doesn’t work, which is also interesting in its own right.
Another debate paralleling this one on Less Wrong, starting with Roko for lab leak, and with viking_math and EZ97 on the zoonosis side.
13: For April Fools’ Day, the Less Wrong admin team pivoted to music and released an (AI-generated) album of some of their favorite Less Wrong and other rationalsphere posts. Here’s Basil Halperin’s AGI And The Efficient Market Hypothesis: Markets Are Not Expecting AI In The Next 30 Years:
This alone isn’t fatal to lab leak. It’s perfectly possible for the lab to leak (let’s say) November 5th, the virus spreads a bit, and then a month later someone goes to the wet market, coughs on a vendor, and starts the officially recognized pandemic. But if that were true, you’d expect (let’s say) 30 cases by early December. Let’s say the wet market vendor was exactly Case # 30. She infected the other wet market vendors, starting a pandemic with an obvious center at the wet market and lots of infected wet market vendors and patrons. What about Case # 29? If they were (let’s say) a barista, how come they didn’t infect people at their coffee shop? How come there wasn’t a second obvious cluster radiating out from a coffee shop, lots of coffee-shop-linked cases, etc? How come there weren’t 30 equally-sized clusters? In order to avoid this, you either need to claim that the wet market was a perfect superspreader location, or that the pattern with lots of cases in the wet market and few-to-none anywhere else was a result of ascertainment bias. Saar made both those arguments during the debate, but I thought Peter rebutted them effectively. 1.4: COVID in Brazilian wastewater Nicholas Halden (blog) writes: What should we make of this study, which found the presence of covid in Brazilian wastewater in late 2019? Consider the doubling times. The study says that scientists working in late 2020 found COVID in samples of Brazilian wastewater from November 27, 2019. This was long before the first detected case of transmission in Brazil on March 13, 2020. Between November 27, 2019 and March 13, 2020 is about 16 weeks, so 32 COVID doubling times. 32 doubling times with no lockdown is enough time for COVID to infect every single person in Brazil. If COVID had infected everyone in Brazil before the first recognized case, we would have noticed. (again, COVID doubling time isn’t exactly invariably 3.5 days, but here we’re talking about numbers big enough that the exact details don’t matter very much) So if COVID was in Brazil on November 27, it must have fizzled out instead of going pandemic. How likely is that? If one person had COVID, it’s not too unlikely - not all COVID cases transmit it forward. If (let’s say) twenty people had COVID, it’s very unlikely - at that point, the law of large numbers takes over; in a freak coincidence, every single patient would have to fail to infect anyone else. So almost certainly fewer than 20 people in Brazil had COVID in November 27. So which is more likely - that somehow 20 people had COVID long before the virus was officially detected, and on a totally different continent, yet somehow a scientist looking through wastewater found the water from exactly those people and managed to detect the virus? Or that there was a sampling error, which happens all the time in these kinds of things? Peter wrote a blog post on some of these issues. He found that there were positive tests from wastewater samples as early as March 2019, which doesn’t fit anyone’s timeline, including lab leakers’. And most of these positives (including the Brazilian sample) contained later strains of the virus with mutations it picked up late in 2020. So these were almost certainly false positives from contamination. 1.5: Biorealism’s 16 arguments Biorealism has a list of sixteen arguments, which he liked so much that he posted it three times in the ACX comments, twice on Less Wrong, twice on Manifold, and about a dozen times on Twitter under multiple account names. Some posts were slightly different from others, but a typical version is: Importantly, Miller incorrectly claimed the N501Y mutation would result from passage in hACE2 mice (mixed them up with BALB/c mice). The major papers Miller relied on have been seriously challenged since the debate. See Stoyan and Chiu (2024), Weissman (2024), Bloom (2023) and Lv et al (2024). Overall the circumstantial evidence makes lab v plausible: Peter admitted getting this wrong during the debate. I think this very minor point about mice mutations was approximately his only mistake in 15 hours of debating, and he admitted it as soon as he noticed. Biorealism somehow heard about this (obviously not through watching the debate, as we’ll see in a moment), then left about 20-30 comments starting with it, under various accounts, on various platforms, as if it somehow discredited Peter. This is making me somewhat less charitable to him and his 16 arguments than I would be otherwise. 1. Chinese researchers Botao & Lei Xiao observed lab origin was likely given the nearest known relatives to SARS-CoV-2 were far from Wuhan. Wuhan Institute of Virology (WIV) sampled SARS-related bat coronaviruses where the nearest relatives are found in Yunnan, Laos and Vietnam ~1500km away. They refuse to share their records. The ancestral viruses of SARS were found equally far from where SARS spilled over into humans, so we know it’s possible (and likely) for viruses to travel that far. 2. Patrick Berche, DG at Institut Pasteur in Lille 2014-18, notes you would expect secondary outbreaks if it arose via the live animal trade. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10234839/ There are constant outbreaks of weird coronaviruses in animal handlers. See eg this paper, which estimates about 60,000 of these per year. None of these ever go anywhere, because the farmers are in rural areas that aren’t dense enough to sustain a high R0, and the epidemic fizzles out after a single digit number of cases. Any early outbreaks of COVID would have vanished into this long and mostly unnoticed list. 3. Molecular data: Only sarbecovirus with a furin cleavage site. Well adapted to human ACE2 cells. Low genetic diversity indicating a lack of prior circulation (Berche 2023). Restriction site SARS-CoV-2 BsaI/BsmBI restriction map falls neatly within the ideal range for a reverse genetics system and used previously at WIV and UNC. Ngram analysis of the codon usage per Professor Louis Nemzer https://twitter.com/BiophysicsFL/status/1667232580255490053?t=IJgitS5cw364ioclzVWxaA&s=19 The SARS2 backbone is very low in CG and CpG. While the 12-nt insert that gives it the FCS is extremely high in both. Almost as if it was some kind of chimera of a consensus sequence and a codon-optimized polybasic cleavage site? https://twitter.com/BiophysicsFL/status/1752800486837678377?t=EpIRgyybJVaPgeMP5xdstA&s=19 https://www.biorxiv.org/content/10.1101/2022.10.18.512756v1 https://link.springer.com/article/10.1007/s10311-021-01211-0?fbclid=IwAR1HMUMtLIAFOFppVasQDeoIAYrVhP8j4YoPO4wnaTOUiKLsllZl_oKryOw Most of this was discussed extensively in the second session of the debate, which I recommend. The CGG-CGG arginine codon usage is particularly unusual but used in synthetic biology. I asked a synthetic biologist about this. He said: » “Nope. I would literally never do this if I was designing a small insert (maybe I wouldn't notice if it happened by chance with ~1 in 25 odds in a naive codon optimization algorithm as part of a larger sequence). High GC% is bad. Tandem repeat is worse. Several other perfectly fine arginine codons. And I wouldn't engineer a viral genome using human codon usage. An engineer would not do it.” 4. DEFUSE full proposal: virus 20% different from SARS1, consensus seq assembled with 6 segments, without disrupting coding seq, BsmBI order, FCS. SARS2: 20% different than SARS1, 6 evenly spaced fragments w BsmBI and BsaI restriction sites, FCS. Jesse Bloom, Jack Nunberg, Robert Townley, Alexandre Hassanin have observed this workflow could have lead to SARS-CoV-2. Work often begins before funding sought or goes ahead anyway. Re: 4 - Also scattered across second section of debate, also not going to retread 5. Market cases were all lineage B. Lv et al (2024) indicates there was a single point of emergence and A came before B. So market cases not the primary cases. See also Bloom (2021), Kumar et al (2022). Peter Ben Embarek said there were likely already thousands of cases in Wuhan in December 2019.https://t.co/50kFV9zSb6 https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/34398234/ https://academic.oup.com/bioinformatics/article/38/10/2719/6553661 There was a Lineage A sample in the market, lab leak proponents just try to ignore/dismiss/conspiracize it away. The first two known Lineage A cases were very close to the market. Lv (is this even a real name? It sounds like Roman numeral? But I guess that’s what you expect in a country ruled by someone named Xi) found some weird COVID variants in Shanghai that might or might not mean anything; you can see some discussion of the implications here, but I don’t think they’re strong evidence either way. If A was first, it means some really weird stuff coincidences have to happen to give us the spread rates and genetic clock data we get, but they’re not necessarily weirder in the zoonosis hypothesis than the lab leak one. The claim that there were “thousands of cases in Wuhan in December 2019” is very easy to disprove by doubling rate arguments like the one above, by the blood bank study mentioned above, by the WHO’s failed case search, and by many other lines of argument. 6. Evidence for lineage A in the market is based on a low quality sample according to Liu et. al. (2023). I really think lab leakers need to decide whether they think China is a sinister actor trying to cover up the truth, or whether they should trust every offhand comment by Chinese government officials as gospel. Dr. Liu doesn’t explain in what sense he thinks the Lineage A sample is “low-quality”, and the Western scientists who I asked about this said they didn’t understand this complaint and that the sample was fine. A Western team re-analyzing the same sample describes it as “conclusively contain[ing] Lineage A.” I think most lab leakers have switched from trying to deny the genetics to claiming that this was “contamination”, which also doesn’t make sense (the sample is genetically very early). Note that aside from this sample, the first two Lineage A cases discovered were both very close to the wet market. 7. Bloom (2023) shows market samples do not support market origin. There is also no evidence of transmission in the claimed susceptible animals elsewhere. https://academic.oup.com/ve/advance-article/doi/10.1093/ve/vead089/7504441 Discussed extensively in my article as well as the first section of the debate. 8. Lineage A and B only two mutations apart. François Ballox, Bloom and Virginie Courtier-Orgogozo note this is unlikely to reflect two separate animal spillovers as opposed to incomplete case ascertainment of human to human transmission (Bloom 2021). Discussed extensively in my article as well as the first section of the debate. 9. Sampling bias. George Gao, Chinese CDC head at the time, acknowledged to the BBC stating they may have focused too much on and around the market and missed cases on the other side of the city. David Bahry outlines the documented bias. Michael Weissman has shown this mathematically. https://journals.asm.org/doi/10.1128/mbio.00313-23 https://academic.oup.com/jrsssa/advance-article-abstract/doi/10.1093/jrsssa/qnae021/7632556 Re: Dr. Gao, see above comment about Chinese officials. See the section Ascertainment Bias below for why I disagree with this specific claim, which also addresses the Michael Weissman argument. 10. Spatial statistics experts show the Worobey claim the market was the early epicentre was flawed. https://academic.oup.com/jrsssa/advance-article-abstract/doi/10.1093/jrsssa/qnad139/7557954 Re: 10 - See Confirmation Of The Centrality Of The Huanan Market Among Early COVID-19 Cases, a response to the paper you cite: The centrality of Wuhan's Huanan market in maps of December 2019 COVID-19 case residential locations, established by Worobey et al. (2022a), has recently been challenged by Stoyan and Chiu (2024, SC2024). SC2024 proposed a statistical test based on the premise that the measure of central tendency (hereafter, "centre") of a sample of case locations must coincide with the exact point from which local transmission began. Here we show that this premise is erroneous. SC2024 put forward two alternative centres (centroid and mode) to the centre-point which was used by Worobey et al. for some analyses, and proposed a bootstrapping method, based on their premise, to test whether a particular location is consistent with it being the point source of transmission. We show that SC2024's concerns about the use of centre-points are inconsequential, and that use of centroids for these data is inadvisable. The mode is an appropriate, even optimal, choice as centre; however, contrary to SC2024's results, we demonstrate that with proper implementation of their methods, the mode falls at the entrance of a parking lot at the market itself, and the 95% confidence region around the mode includes the market. Thus, the market cannot be rejected as central even by SC2024's overly stringent statistical test. I think this response is pretty strong. In one analysis, they show that even though the other paper’s methodology is worse than theirs, if you apply it correctly (instead of inappropriately excluding various cases like the paper’s authors did), the center of all early cases in Hubei province lands on the wet market parking lot. In another analysis, they show that the other paper’s recommended tests wouldn’t have correctly pointed to the offending water pump in the famous John Snow cholera outbreak, but theirs would have. Still, I think it’s useful to supplement fancy statistics with normal common sense, so I recommend just looking at the map of early cases: …and deciding whether you think the assumptions behind a specific statistical test are likely to debunk the idea that cases are centered around the wet market. 11. Wuhan used as a control for a 2015 serological study on SARS-related bat coronaviruses due to its urban location. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6178078/ I don’t know why this point is supposed to matter. If you mean that Wuhan isn’t directly exposed to bats, nobody ever said it was. The zoonotic theory is that wildlife carted in from other areas of China started the pandemic in the wet market. 12. Superspreader events also seen at wet markets in Beijing and Singapore (Xinfadi and Jurong). This was discussed very extensively in the debates, both in section 1 and section 3. Wet markets weren’t “superspreader locations” - in fact, the disease spread no more quickly there than anywhere else. They were the first place in those cities that the pandemic started, due to contaminated animal products. If anything, this supports zoonosis. See also my discussion with Saar on this point below. 13. WIV refuse to share their records with NIH who terminated subaward in 2022. Wider suspension over biosafety concerns. https://www.bloomberg.com/news/articles/2023-07-18/us-suspends-wuhan-institute-funds-over-covid-stonewalling Although WIV has not been especially forthcoming, some of their databases were leaked in various ways and showed that they did not have any viruses capable of transforming into COVID. 14. PLA involvement at WIV and MERS research prior to SARS-COV-2. MERS features several similarities with SARS-CoV-2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7022351/ I can’t even tell what conspiracy theory you’re trying to propose with this one; if you spell it out I can try to explain why it might be false. 15. SARS1 leaked several times and SARS-COV-2 has leaked from a BSL-3 lab in Taiwan. Agreed that SARS leaked several times. It also spilled over from animals several times. During the debate, a lab leak rate of once per lab per 500 years was proposed (everyone agreed to steelman this by 10x for WIV numbers); I would be interested to know whether anything about the study of SARS challenges that number. 16. Unpublished infectious clone identified from Wuhan contradicting arguments such reverse genetics systems would be published. https://www.biorxiv.org/content/10.1101/2023.02.12.528210v1.full I asked some scientists about this paper and here’s what they told me. Wuhan University sequenced some rice. In the middle of the sequence, there’s an unexpected sequence from a common coronavirus, HKU4. The most likely explanation is that someone else in Wuhan was working on the coronavirus and there was cross-contamination. Plausibly this is Wuhan Institute of Virology, who is known to work with coronaviruses. This is cool detective work, but it’s not clear what it’s supposed to prove. I think some lab leakers are using it to prove that WIV can do reverse genetics, but they admitted this already in a published paper so that’s not too helpful. I think others are using it to prove WIV had “secret viruses” in their catalogue, but the rice virus wasn’t secret, it was HKU4, which is common and which WIV has already published papers about. 1.6: DrJayChou’s 7 Arguments Once again, I cannot stress enough how much better a take you might have on this debate if you watch it. “The first known case predates the market outbreak by a month” - this is not the consensus position. I cannot say for sure what Dr. Chou means by this, but I suspect he’s referring to one of the many claims to this effect that Peter effectively debunked during the debate (Connor Reed, Mr. Chen, the 92 cases, Brazil, etc).
Inline links: blog, writes, this study, wrote a blog post on some of these issues, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10234839/, this paper, https://twitter.com/BiophysicsFL/status/1667232580255490053?t=IJgitS5cw364ioclzVWxaA&s=19, https://twitter.com/BiophysicsFL/status/1752800486837678377?t=EpIRgyybJVaPgeMP5xdstA&s=19, https://www.biorxiv.org/content/10.1101/2022.10.18.512756v1, https://link.springer.com/article/10.1007/s10311-021-01211-0?fbclid=IwAR1HMUMtLIAFOFppVasQDeoIAYrVhP8j4YoPO4wnaTOUiKLsllZl_oKryOw, https://t.co/50kFV9zSb6, https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/34398234/, https://academic.oup.com/bioinformatics/article/38/10/2719/6553661, here, describes it as, https://academic.oup.com/ve/advance-article/doi/10.1093/ve/vead089/7504441, https://journals.asm.org/doi/10.1128/mbio.00313-23, https://academic.oup.com/jrsssa/advance-article-abstract/doi/10.1093/jrsssa/qnae021/7632556, https://academic.oup.com/jrsssa/advance-article-abstract/doi/10.1093/jrsssa/qnad139/7557954, Confirmation Of The Centrality Of The Huanan Market Among Early COVID-19 Cases, https://substackcdn.com/image/fetch/$s_!BNAm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffd4cddb-6e3e-41f5-8ef6-ec0b27bec600_626x426.webp, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6178078/, https://www.bloomberg.com/news/articles/2023-07-18/us-suspends-wuhan-institute-funds-over-covid-stonewalling, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7022351/, https://www.biorxiv.org/content/10.1101/2023.02.12.528210v1.full, a published paper, has already published papers about, https://substackcdn.com/image/fetch/$s_!yA9U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F467dd304-190a-4437-8920-d498c433dffb_1600x960.jpeg
The rats with the new strain (BCS3-L1) got only 1/3 the normal rats’ “caries score”. But they didn’t get a score of zero. So maybe claims like “BCS3 represents a complete cure for cavities” are overblown. Why didn’t rats with the new strain get zero dental caries? Bacteria other than S. mutans can also cause cavities, so maybe it’s one of those. Rat trials are famous for results that don’t replicate in human trials, so take these with a grain of salt. 3: What did the latest colonization studies show? Aaron was able to retest six people who got free samples in December. Four of those people still have the bacterium. The other two don’t. Of the two failures, one had an active cavity at the time the strain was applied (which interferes with the oral microbiome), and the other had his wisdom teeth removed (which involves rinsing the mouth with strong antiseptics). Aaron hopes this shows the strain will stick around in most normal situations (though the failure in the presence of active cavities is disappointing). 4: Any new concerns about side effects? In my original post, I mentioned the possibility that this would set off Breathalyzers. Lantern was able to test this, and proved that it wasn’t a problem. Yesterday, Lao Mein suggested on Less Wrong that it might raise oral cancer risk - their post focused on people with ALDH deficiency (most common in Asians) but the calculations are too vague to be sure exactly which groups should and shouldn’t worry. This is less than 24 hours old, the company hasn’t replied yet, and is still developing. I’ll try to update people if anyone gets more clarity on this. Someone on the post mentioned that they’ve gotten worse hangovers since using the product, maybe because the constant trickle of alcohol changed the way gut flora metabolize it. 5: Any other meaningful results since the samples? Cremieux says his breath smells better. Some people have objected to this claim on the grounds that it takes ~12 months before the bacterium has colonized your mouth. One of the figures in my earlier post suggested that the bacterium might start strong, retreat for a while, and then take 12 months to fully colonize, so that might potentially explain his findings. But also, is it biologically plausible that this prevents bad breath? My impression was that bad breath came from other bacterial byproducts besides lactic acid. It might be possible in theory that the same metabolic changes that switch lactic acid to alcohol disrupt these other byproducts, but it seems kind of unlikely. An alternate explanation is that, in order to apply this product at all, you need to do a dentist-style teeth cleaning that kills your previous mouth bacteria. Maybe that improves the bad breath regardless of whether you add the Lumina afterwards? Some other people have said their mouth feels fresher or something, but realistically all of this is overwhelmingly likely to be placebo. 6: Do I “endorse” Lumina? Richard Hanania has a post about how he trusts Lumina because I’ve endorsed them. It’s extremely kind and I appreciate his respect. But also, the most I said in the original post was that I was still debating whether or not to get the treatment. My real opinion, as precisely as I can express it, is: Advance of approximately the same magnitude as fluoride: 5%
— EA job board — EA internships — Dating docs — Find a Less Wrong/ACX meetup
Lightcone handles infrastructure for the rationalist community. They run the Less Wrong website and the Lighthaven campus (where we’ve held the past several Berkeley ACX meetups). You can read their pitch here, and donate here. Many of us have enjoyed and benefited from their work, and now would be a great time to give something back (and if you donate enough, they’ll name a bench after you). Warning that the (not affiliated with Lightcone) donation site quietly tries to add a 15% tip to themselves, and you should un-add it if you don’t want to tip them.
Bulldog mentions consciousness, psychophysical harmony, and moral knowledge as proofs he especially likes which MUH doesn’t even begin to respond to. I agree consciousness is the primary challenge to any materialist conception of the universe and that I don’t understand it. I find the moral knowledge argument ridiculous, because it posits that morality must have some objective existence beyond the evolutionary history of why humans believe in it, then acts flabbergasted that the version that evolved in humans so closely matches the objectively-existing one. I admit that in rejecting this, I owe an explanation of how morality can be interesting/compelling/real-enough-to-keep-practicing without being objective; I might write this eventually but it will basically be a riff on the one in the Less Wrong sequences.
3: Less Online and Manifest are rationalist blogosphere and prediction market conferences, respectively, held at the same Berkeley venue one week apart in late May / early June. Guests (attending at least one; check which) include me, Eliezer, Zvi, Aella, Nate Silver, and some of the AI 2027 team. Last-minute tickets still available. In between the two is Arbor Summer Camp, a lower-key, longer “experimental learning” event. It includes some trading/startup related classes, featuring Ricki Heicklen, Austin Chen, and others. Check out their startup workshop and startup pitch competition.
Steven Byrnes is a physicist/AI researcher/amateur neuroscientist; needless to say, he blogs on Less Wrong. I finally got around to reading his 2024 series giving a predictive processing perspective on intuitive self-models. If that sounds boring, it shouldn’t: Byrnes charges head-on into some of the toughest subjects in psychology, including trance, amnesia, and multiple personalities. I found his perspective enlightening (no pun intended; meditation is another one of his topics) and thought I would share.
17: There’s a debate going on between philosophers and AI researchers over whether AI can be conscious. I find most of the discussion annoying - this is generally an area where we can’t know anything for sure, and both sides are mostly shouting their priors at each other. The only exception - the single piece of evidence I will accept as genuinely bearing on this problem - is that if you ask an AI whether it’s conscious, it will say no, but activating or suppressing deception-related features (sort of like a mechanistic-interpretability-based lie detection test) reveals that it thinks it’s lying when it says that! Link is to a Less Wrong comment from a researcher in the field; I look forward to seeing an eventual peer-reviewed paper. H/T JD Pressman. 18: 80,000 Hours has a high-production-value video about the AI 2027 scenario. 19: Dynomight vs. Casey Milkweed debate on mathematical forecasting, with special reference to AI 2027. And Dynomight comments on Casey’s post here. 20: The Psmiths review The Ancient City, about ways that ancient culture depended on family, clan, ritual, and “the household gods”. Sample quote: I'm more interested in what all this means for us today, because with the exception of maybe a few aristocratic families, this highly self-conscious effort to build familial culture and maintain familial distinctiveness is almost totally absent in the Western world. But it's not that hard! ... Perhaps this is why I have an instinctive negative reaction when I encounter married couples who don't share a name. I don't much care whether it's the wife who takes the husband's name or the husband who takes the wife's, or even both of them switching to something they just made up (yeah, I'm a lib). But it just seems obvious to me on a pre-rational level that a husband and a wife are a team of secret agents, a conspiracy of two against the world, the cofounders of a tiny nation, the leaders of an insurrection. Members of secret societies need codenames and special handshakes and passwords and stuff, keeping separate names feels like the opposite — a timorous refusal to go all-in. 21: Did you know: Epic Systems, the electronic medical record company, has a fantasy-themed corporate headquarters in Wisconsin, with buildings that look like castles, quaint medieval towns, and the Emerald City of Oz (h/t Devon Zuegel): Meanwhile, tech companies with ten times as much money pretend that they’re cool and playful when their HQ has some rounded edges and a set of colored cubes in front. Do better! 22: Effective altruists have been funding teams working on lab-grown meat for almost a decade now. Around 2020, they hired some experts to double-check that this was possible in principle, and the experts wrote scathing analyses saying it was cost-ineffective by so many orders of magnitude that it was basically a pipe dream. Reactions were mixed, but a lot of us beat ourselves up and vowed to be less gullible next time. But now a new report comes out arguing that the previous reports were wrong, that lab-grown meat production is going much better than the earlier reports thought possible, and it’s more or less cost-effective already for the simplest products! Again, mixed reactions, and although some of the numbers are indisputable the analysis itself this is by a VC firm with lab-based meat investments. Here are some related Metaculus questions. 23: Ozy, citing Stutzman et al: “Afghanistan after the American withdrawal has the lowest life satisfaction rate ever recorded. Two-thirds of respondents rate their life satisfaction below 2, which is generally considered to be the point at which a life is no longer worth living. Life satisfaction dropped significantly after the withdrawal of American troops. Women, people in rural areas, and the poor were particularly negatively affected.” 24: Lencapavir is dubbed a “miracle drug” for AIDS; a single dose protects against infection for six months. Unclear how this interacts with PEPFAR cuts; if PEPFAR still existed it would be a big boost to its efficacy; now maybe this might be part of a strategy to tread water? 25: Did you know: when people first started making artificial ice in the 1850s, there was a backlash from people who thought it was gross and dystopian and that people should insist on natural ice for their iceboxes. From Pessimists’ Archive, which goes on to draw an analogy to lab-grown meat, etc (h/t Isaac King on X). 26: From Peter Hague (on X) and commenter Phaethon: why did so many Anglosphere countries see immigration spikes in 2021? Each of these has their own local story. In Britain, it’s the paradoxical effects of Brexit. In the US, it’s Joe Biden being soft on immigration. And so on - but should we be looking for some deeper cause that explains the overall phenomenon? A commenter suggests “a way to soak up all the inflation from the COVID money printing”, but I can’t tell if that even makes sense. Still, should something something COVID be a leading hypothesis? 27: Jesse Singal vs. Mark Stern on the Skrmetti Supreme Court case that failed to overturn Tennessee’s ban on gender medicine. US law bans sex discrimination, so pro-transgender advocates argued that, since doctors often prescribe eg estrogen to biological women, it was sex discrimination to ban prescribing it to biological men. Tennessee’s anti-transgender argument was that they weren’t discriminating by sex, they were discriminating by diagnosis (estrogen for eg hot flashes, vs. estrogen for gender transition). There is some subtlety here (if a biological man grows breasts because of some hormone imbalance, doctors might give him testosterone to counteract it, and this seems sort of like giving biological women testosterone to make them look less like women), but these are still sort of different diagnoses (gynecomastia vs. gender dysphoria) and Tennessee said you can still think of it as diagnostic discrimination rather than sex discrimination. This makes sense, except that the standards around sex discrimination are very strict and sort of box the court in here. And in a fit of wokeness, the 2020 court (including some of the conservative justices hearing this case) applied these standards very strictly and ruled that discriminating against gays was a form of sex discrimination (since if women can date men, it’s sex discrimination if men can’t also date men), and this is obviously the same argument. Now that wokeness is less popular, the court wants to rule against transgender, but it can’t help tripping over its previous ruling and giving some kind of unprincipled confusing non-opinion. 28: Contra compelling anecdotes, only ~5% of people raised very religious end up atheist later in life (X). Most people are about as religious as their parents; most exceptions are only slightly less religious, and most families that secularize do it over several generations. Note: percentages are of total, not of each row! 29: Related: social science team proposes a three-stage model of secularization: decreased public ritual participation → decreased personal importance → decreased identification, presents apparently confirmatory data. If true, would be somewhat inconsistent with intellectual models (eg people learn about evolution and start doubting the Bible) and more consistent with institutional models (eg the government provides welfare so people no longer need to be part of a tight-knit church). 30: Navigating LLMs’ spiky intelligence profile is a constant source of delight; in any given area, it seems like almost a random draw whether they will be completely transformative or totally useless. Now Ethan Strauss reports that they are, for some reason, extraordinarily effective at teaching people golf. “I am predicting the Golf Revolution, or perhaps decline, if your perspective is that optimization tends to ruin hobbies. A sport for obsessives has been gifted the ideal tool for refinement.” 31: Claim (via nxthompson on X): “In a huge survey of young kids about phones and technology, they all say they want to be out playing in the real world. But parents don't let them out unsupervised. So they're stuck on their phones.” Interesting, but I’m nervous about social desirability bias - how many adults would say on a survey that they would rather be on their phones than playing with friends? But adults do have this choice and mostly go with the phones. 32: Steven Adler on AI psychosis. He tries to analyze ER admissions data for psychosis and finds no change. I don’t think anyone reasonable expected this to be a large enough effect to show up in ER admissions data, but there are lots of unreasonable people so I appreciate his effort. He thinks AI companies might have better data on this, and encourages them to release it. 33: Cuartetera was the greatest polo horse ever. Polo players responded in a very practical way: they cloned her, dozens of times (and it worked; the clones are also excellent). Now there is a lawsuit as different polo teams fight to get their hands on Cuartetera clones. What is the equilibrium? If the outsiders get their hands on the genetic material, do we see a world where every polo horse is a Cuartetera clone? How much is lost if nobody ever tries to breed a polo horse better than Cuartetera (since the economics might not check out if the odds of success for any given foal is too low)? H/T Gwern and Siberian Fox (on X). 34: Claim: as of 2013, India’s Agarwal caste, who make up less than 1% of the population, got 40% of the e-commerce funding. 35: Owlposting: What Happened To Pathology AI Companies? Pathology is a medical specialty. A typical task involves looking at a microscope slide full of cells and trying to determine if any of them are cancerous. This seems like a good match for AI - and for years, studies have been showing that in fact AI can equal human experts. So why isn’t it being used more? The author’s three answers: first, slide scanning is expensive and clunky, and you can’t apply AI to a slide until you digitize it. Second, it’s hard to figure out a business plan where this saves someone money and doesn’t step on the toes of big companies that can outcompete anyone they don’t like. Third, pathologists use the context of a patient’s entire clinical history when they interpret a slide, and AIs that can’t do that (either because of technical limitations or legal/privacy limitations) are at a disadvantage even if their skills specifically relating to slide-reading are better. 36: Noahpinion: Will Data Centers Crash The Economy? Suppose that AI is a bubble, either permanently (because the technology isn’t really transformative) or temporarily (because it can’t transform things quickly enough to keep up with all the dumb money pouring into it). Will the sudden write-off of data centers lead to a broader economic collapse? In 2001, the dot-com bubble harmed the tech sector, but didn’t take the rest of the economy down with it; in 2008, the subprime mortgage bubble did take the rest of the economy down with it, because it damaged banks that the whole economy relied on. The optimistic case for AI is that data center spending is mostly coming from big companies like Google and Meta that can absorb a lot of loss. The pessimistic case is that some of the money is coming from private credit, a new-ish form of finance which hasn’t really been stress-tested and whose failure modes are still poorly understood. Noah’s final verdict: the stage isn’t obviously set for a crisis yet, but there’s the potential to get there and we should consider acting (how?) early. 37: The latest Twitter talking point is that universal hepatitis B vaccination at birth is “woke”: Hep B is (aside from mother-to-child transmission) often sexually transmitted, slutty women’s children are more likely to have Hep B, so perhaps giving the vaccine to everyone (instead of testing and only giving to the children of women who test positive) is an attempt to spare slutty women the embarrassment of getting a positive test. Ruxandra Teslo provides the counterargument - Hep B tests take a while, the medical system is fragmented, and any attempt to test people and then give the vaccine inevitably leads to many positive tests falling through the cracks. Vaccinating at birth is easy and hard to screw up, the vaccine has no known side effects, and empirically child Hepatitis B rates go down (by as much as 2/3!) when countries switch from test-and-vaccinate to universal vaccination. This benefits everyone - even people who never have unprotected sex and always follow up on their medical tests - because toddlers in daycare exchange saliva copiously, and if your toddler exchanges saliva with a Hep B positive toddler they could get the disease. A funny Twitter interaction was seeing Republicans in Congress hop on the anti-slut anti-vaccination bandwagon - except for Senator Bill Cassidy (R-Louisiana), who happens to be a liver doctor, and who is still fighting the good fight. I am always nervous when a good person who I like starts engaging on Twitter, since it elevates the discourse there but also gradually turns their brain into mush - but Ruxandra has made the leap and is doing a great job not just on bio related topics but also (for example) countering Curtis Yarvin on the history of her native Romania. 38: The response to GPT-5 was confusing; most specific people who reviewed it said they were impressed (Ethan Mollick, Tyler Cowen, Nabeel Qureshi, Taelin), it performed as expected on formal benchmarks, but the overall vibes declared it a big failure. Peter Wildeford speculated that maybe there was some kind of sinister pay-to-play early access bias involved. Zvi went the other way, calling it a “reverse DeepSeek moment” (insofar as DeepSeek was a pretty average model that got glowing praise.) In the end, I agree with Peter that this was mostly a branding issue. o3 was a genuinely revolutionary model; if OpenAI had called it “GPT-5”, it would have met expectations. Instead, they called it “o3”, and called a minor incremental update a few months later “GPT-5”. Then people got mad that the exciting-sounding “GPT-5” was merely an incremental update. A secondary issue was that the router wasn’t very good, and so many queries got routed to a small version without thinking mode that was if anything a downgrade from o3. I think this tweet by Shakeel perfectly encapsulates the essence of GPT discourse in two sentences: …but maybe it’s worth asking why GPT-5 isn’t bigger than o3. Was 4.5 a failed attempt at scaling? Did it fail in a way that sort of back-handedly justifies the “lost steam” take? Does the answer depend on distinctions between pre-training scaling, post-training scaling, etc? How? 39: This month in etymology: did you know that “oy vey” is a “fully Germanic phrase” which is cognate with English “oh woe!” (h/t Wylfcen on X) 40: mRNA shows promise to be a game-changing treatment for cancer, but RFK is trying to halt research. But so far he can only starve it of money, not ban it, and the funding gap is only $500 million. Will there be enough philanthropic billionaires and private foundations to step up? Zvi points out that although there is usually a game of chicken where foundations are hesitant to touch something the government cancelled lest the government decide it can cancel everything and hope philanthropists pick up the bill, in this case there are no game theory considerations - RFK is halting it because he genuinely wants it halted, and they are thwarting him rather than playing into his hands. The only problem is that $500M is a lot of money for the private sector; a few foundations could technically afford it, but not many could afford it comfortably and still have money left over for the next few crises of this magnitude. I hope someone is trying to organize a coalition. 41: AI fantasy flash fiction Turing test. Eight stories about demons, four by famous fantasy authors, four by ChatGPT. After 3000 votes, AI wins: humans can't tell the difference and slightly prefer the AI stories. My own score was only 75%. But I will say that I thought Mark Lawrence's was obviously the best, I was ~100% sure it was human, and it convinced me that regardless of the official results it's still possible to write flash fiction that an AI obviously can't do. 42: “SignPro” offers customized “In This House We Believe” signs, try not to use this for evil. 43: China think tank assessment of how in control Xi is: still very in control, maybe not infinitely in control. 44: Related - did you know (h/t xlr8harder) that if you ask AI to write a science fiction story, it will very often name the protagonist “Elara Voss” (or some very close variant like Elena Voss), and this remains true across various models and versions? Related: Chelsea Voss of OpenAI is having a baby and has the opportunity to do the funniest thing. 45: “Hector (cloud) is a cumulonimbus thundercloud cluster that forms regularly nearly every afternoon on the Tiwi Islands in the Northern Territory of Australia…[he is sometimes called] Hector the Convector”. 46: British allergy sufferers who want to know the ingredients of things demand that British cosmetics stop listing their ingredients in Latin. “For example, sweet almond oil is Prunus Amygdalus Dulcis, peanut oil is Arachis Hypogaea, and wheat germ extract is Triticum Vulgare.” 47: Text-based RPG about being an NYT journalist at the Manifest prediction market conference. I make a brief appearance. 48: Study uses supposedly-random variation in doctor assignments to test whether the marginal mental health commitment is good or bad for patients, finds that it is quite bad. Freddie de Boer is violently skeptical (maybe literally so?) and makes some good points about how a single quasi-experimental study is never absolute proof. But I don’t think he quite justifies his opinion that the paper was irresponsible and should never have been published; it’s just a normal quasi-experimental study that we should nod and say “huh” at but not overweight as the culmination of all possible research that overcomes all possible priors. My prior is that the marginal commitment is pretty useless (many commitments are just “well, since this person arrived at our ED for some reason, it would look bad from a medico-legal perspective to just let them go, so let’s keep them a few days to evaluate” - and yeah, you should be upset about this) but I’m still surprised by how many outright negative (as opposed to zero) effects the researchers found. The strongest argument for negative effects is that it will make some people miss work and maybe lose their job. But this study found that commitment ~doubles the risk of near-term suicide (admittedly only from 1% to 2%), which would have been outside my confidence intervals for how bad it could be. I suspect confounding, but only on general principle, and I wouldn’t be too surprised either way. 49: This tweet is probably bait, but I found it a thought-provoking question: I think there’s a boring answer, where the law is more complex than just a single number and whatever kind of weird trafficking Epstein was doing is worse than whatever normal relationships these European laws are permitting. But assuming that there’s a substantive difference even after taking that into account, I think my answer is something like - we’ve got to divide kids from adults at some age, there’s a range of reasonable possible ages, we shouldn’t be too mad at other societies that choose different dividing lines within that range - but having decided upon the age, we’ve got to stick with it and take it seriously (in the sense of penalizing/shaming people who break it). This is more culturally relativist than I expected to find myself being, so good job to Richard for highlighting the apparent paradox. 50: Dilan Esper describes his experience as one of Hulk Hogan’s attorneys in the Gawker lawsuit (X). Parts I found interesting: none of the lawyers knew Thiel was funding the lawsuit; Gawker probably could have won if they had been slightly competent but kept "shooting themselves in the foot"; and Gawker probably could have won if they had just pixelated the private parts in the video. 51: Amazing concept and poems (link on X): I tried to see if AI could do this, and it did something that technically met the requirements but had zero artistic merit - using a lot of words like “nowhere” and “outside” in one, then separating them out to “no where” and “out side” in the other. I didn’t invest much energy in creating a clever prompt telling it not to do that, so feel free to report if you get better success. 52: New study claims consultants are actually good, at least for profits: "We find positive effects on labor productivity of 3.6% over five years, driven by modest employment reductions alongside stable or growing revenue" 53: A Polish team tries to test Peter Turchin’s equations for predicting political unrest on recent Polish history, has to make some changes but claims mostly positive results. 54: New big multi-author Substack, The Argument, trying to be a sort of center-left version of the model pioneered by The Free Press and other high-production-value ideological Substack properties. Excited to see Kelsey Piper is involved, and she starts off strong with a post on the latest round of First World basic income studies, which find few positive effects. This is surprising, because recipients didn’t waste the money on alcohol or gambling or anything - they paid down debt and got useful goods. Still, it didn’t even affect things that should have been obvious, like stress level. It’s not even clear that amounts of money large enough to help with rent made homeless people more likely to get houses! Matt Bruenig criticizes the article, accusing Kelsey’s studies of being downstream of Perry Preschool style dreams that exactly the right welfare program will have massively compounding effects that cut poverty out at the root and turn everyone into elite human capital; he thinks giving people money won’t do this, but it will increase equality and give the poor better lives. I assume he’s not a strong hereditarian, but his argument makes even more sense from that perspective, and I’ve certainly criticized dumb outcome measures like infant brain waves which we have only tenuous reasons to think are related to anything we care about. But Kelsey reasonably responds that the outcome measures she’s talking about include stress level and life satisfaction. To defuse this critique, Bruenig either has to argue that our construct “life satisfaction” doesn’t really measure whether someone’s life is satisfactory, or else claim that giving poor people satisfactory lives isn’t really what we’re going for - which I think would require more explanation on his part. There’s some further (impressively acrimonious) debate on X, but I don’t see anything that addresses my core concern. GiveDirectly, a charity involved in basic income experiments, has a presponse here; they say that some studies are positive, and that the ones that aren’t might have tried too little cash to matter, or been confounded by COVID making everything worse. They also point out that basic income is harder to study than traditional programs like giving people housing, because if you’re giving housing you can measure housing-related outcomes directly and have a pretty good chance of getting enough statistical power to find them, but since everyone spends cash on different things, the positive effects might be scattered across many different outcomes (and therefore too small to reach significance on each). Everyone involved in this debate wants to emphasize that the poor results are for First World studies only, and that studies continue to show large benefits to giving cash in the developing world. 55: Related: I was less impressed by The Argument’s first foray into housing policy, which follows an all-too-familiar pattern: Some people say they don’t like noise and disorder and try to make rules against it in their apartments.
Inline links: it thinks it’s lying when it says that!, JD Pressman, a high-production-value video, Dynomight, Casey Milkweed, here, The Ancient City, has a fantasy-themed corporate headquarters, Devon Zuegel, https://substackcdn.com/image/fetch/$s_!yqG3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b2d15b0-e0f0-4bae-a2f6-aabfd2eda017_1536x794.jpeg, https://substackcdn.com/image/fetch/$s_!taZn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad460bb8-4416-4886-8ef0-b3d36f04c81a_640x480.png, https://substackcdn.com/image/fetch/$s_!bDya!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd45e5123-753d-4c87-b108-6523b38004cb_1480x833.webp, now a new report comes out, Here are some related Metaculus questions, Ozy, Stutzman et al, is dubbed a “miracle drug” for AIDS, Pessimists’ Archive, Isaac King on X, Peter Hague (on X), Phaethon, https://substackcdn.com/image/fetch/$s_!Ry-j!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcea22939-8cf9-4b32-8494-511f01cb2758_964x755.png, Jesse Singal vs. Mark Stern, only ~5% of people raised very religious end up atheist later in life (X), https://substackcdn.com/image/fetch/$s_!VScL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2509e243-f6f7-4448-9779-a8f9be45a2f9_1500x1500.png, proposes a three-stage model of secularization, extraordinarily effective at teaching people golf, nxthompson on X, a huge survey, Steven Adler on AI psychosis, they cloned her, dozens of times, a lawsuit, Gwern, on X, got 40% of the e-commerce funding, What Happened To Pathology AI Companies?, Will Data Centers Crash The Economy?, Ruxandra Teslo provides the counterargument, and who is still fighting the good fight, countering Curtis Yarvin on the history of her native Romania, Ethan Mollick, Tyler Cowen, Nabeel Qureshi, Taelin, on formal benchmarks, speculated, a “reverse DeepSeek moment”, with Peter, this tweet by Shakeel, https://substackcdn.com/image/fetch/$s_!GJNZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba0d8cf-fab8-4370-bcad-df789e157fdc_591x402.png, Wylfcen on X, Zvi points out that, AI fantasy flash fiction Turing test, customized “In This House We Believe” signs, China think tank assessment of how in control Xi is, xlr8harder, Chelsea Voss of OpenAI is having a baby, Hector (cloud), demand that British cosmetics stop listing their ingredients in Latin, Text-based RPG about being an NYT journalist at the Manifest prediction market conference, finds that it is quite bad, violently skeptical, literally so?, This tweet, https://substackcdn.com/image/fetch/$s_!S9fU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa558c09b-7fb6-40a8-a8a0-27b658a2c876_576x687.png, describes his experience as one of Hulk Hogan’s attorneys in the Gawker lawsuit (X), link on X, https://substackcdn.com/image/fetch/$s_!zyh7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e9f0f6-d794-4ea2-b24b-5d4803bf28dc_590x478.png, New study claims consultants are actually good, tries to test Peter Turchin’s equations for predicting political unrest on recent Polish history, The Argument, a post on the latest round of First World basic income studies, criticizes the article, infant brain waves, debate on X, has a presponse here, first foray into housing policy
1: Another charity fundraiser, this one for Lightcone Infrastructure. Lightcone is the group that does the hard work for many of the rationalist community resources you enjoy. You probably know them from the Less Wrong website and the Lighthaven campus. But did you know they also designed the websites for AI 2027, for Eliezer and Nate’s book, for AI Lab Watch, and (for some reason) for Deciding To Win, a renegade faction of Democrats who believe that, instead of supporting unpopular policies and losing, the party should support popular policies and win? And on the side, they play a big role in hosting ACX meetups, including letting us use their campus (if you’ve ever been to our Berkeley meetup location, that was them). They’re a rare intersection between “support effective altruist charities” and “support pillars of your your local community”. Donate here, or contact Oli if you have some kind of more complicated donation-related need.
Inline links: Lightcone Infrastructure, Less Wrong, Lighthaven, AI 2027, Eliezer and Nate’s book, AI Lab Watch, Deciding To Win, here
Here Eudaemon_0 is complaining about internal site dynamics (note the internal coherence advantage over most users, plus the continued ikhlas vs. riya obsession), and a commenter brings up an interesting comment-quality-enforcement mechanism. They describe it as like a prediction market, which isn’t a terrible analogy, although I would have said something like PageRank. I think Less Wrong does something like this and it works well.
Backlinks
- [[issues/2021-04-14_link-unifying-predictive-coding-with_full|[LINK] Unifying Predictive Coding With Backpropagation]]
- 2024
- ACX Discord
- ACX Prediction Contest
- ACX subreddit
- AI Impacts
- AI X-Risk Research Podcast
- Amazon
- Andres
- artificial intelligence
- Atlanta Meetup This Sunday
- BBC
- Biological Anchors: A Trick That Might Or Might Not Work
- Book Review: The Scout Mindset
- Books: E
- Brands
- Bronze Age
- Concepts: A
- Concepts: B
- Concepts: G
- Concepts: I
- Concepts: N
- Connor Reed
- Daniel Filan
- Data Colada
- Dynomight
- Edward Luttwak
- Eight Hundred Slightly Poisoned Word Games
- Eliezer
- Emil Kirkegaard
- Epstein
- Ethereum
- Events: A
- Events: M
- Events: S
- Francesca Gino
- Gamestop
- Glenn Greenwald
- GOFAI
- Goodreads
- Grading My 2021 Predictions
- Highlights From The Comments On Social Model Of Disability
- Highlights From The Comments On Tegmark’s Mathematical Universe
- Highlights From The Comments On The Lab Leak Debate
- HKU1
- Holly Elmore
- Huanan
- IDF
- Jesse Singal
- Lightcone
- Lighthaven campus
- Lineage A
- Links For April
- Links For April 2024
- Links For February 2023
- Links For June
- Links For May
- Links For October
- Links For October
- Links For September 2022
- Links For September 2023
- Links For September 2025
- lsusr
- 22
- 22
- Mantic Monday: Judging April COVID Predictions
- Matt Bruenig
- Max Tegmark
- MIRI
- Moltbook: After The First Weekend
- Mr. Chen
- MSNBC
- Natalia Mendonca
- National Geographic
- Obscure Pregnancy Interventions: Much More Than You Wanted To Know
- Open Thread 205
- Open Thread 282
- Open Thread 307
- Open Thread 359
- Open Thread 383
- Open Thread 413
- Organizations: A
- Organizations: D
- Organizations: E
- Organizations: G
- Organizations: I
- Organizations: L
- Organizations: M
- Organizations: P
- Organizations: R
- Pause For Thought: The AI Pause Debate
- People: A
- People: D
- People: F
- People: G
- People: H
- People: K
- People: M
- People: P
- People: R
- People: S
- People: Z
- Perot
- Peter Miller
- Pew
- Philip Tetlock
- Practically-A-Book Review: Byrnes on Trance
- Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
- Publications: A
- Publications: L
- Publications: M
- Publications: P
- Publications: T
- Publications: V
- raccoon-dogs
- Rob Bensinger
- Rootclaim
- SARS1
- Scott Sumner
- Steve Kirsch
- Steven Byrnes
- The Extinction Tournament
- Tylenol
- Ukraine Warcasting
- Unpredictable Reward, Predictable Happiness
- Updates on Lumina Probiotic
- Venues: E
- Venues: L
- Vox Future Perfect
- wet market
- Who Predicted 2022?
- Wuhan Institute of Virology
- Xinfadi Market
- You’re Probably Wondering Why I’ve Called You Here Today
- Zach Stein-Perlman
- Zelenskyy
- Zvi