Gwern

Article

Gwern is a recurring person in the Astral Codex Ten archive, appearing 27 times across 27 issues between July 01, 2021 and January 26, 2026. The archive places it in contexts such as “Gwern has done some calculations”; “if your idea is actually good (eg https://www.gwern.net/CO2-Coin )”; “Comment of the week is Gwern on whether we should consider China “successful”“. It most often appears alongside China, Elon Musk, OpenAI.

Metadata

  • Category: People
  • Mention count: 27
  • Issue count: 27
  • First seen: July 01, 2021
  • Last seen: January 26, 2026

Appears In

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

July 01, 2021 · Original source
What about IQ? There are definitely scientists who have figured out how to do polygenic analyses to predict a modest amount of variation in IQ, though I don't know if their algorithms are public, and they're certainly not convenient for amateurs to use. If you had them, would they work? Gwern has done some calculations and finds that with ten embryos (a near-best-case scenario of what you're likely to get from egg extraction) and modern (as of 2016) polygenic scoring technology, you could get on average +3 IQ points by implanting the smartest. If polygenic scoring technology reached the limits of its potential (might happen within a decade or two) you could get +9 IQ points. Embryos from the same parents only vary a certain amount in IQ, and about half of IQ variation is non-genetic, so you can't work miracles with this (if you want to know how to work miracles, read the rest of Gwern's article).
November 14, 2021 · Original source
Things on blockchains, although if your idea is actually good (eg https://www.gwern.net/CO2-Coin) this doesn’t have to be a dealbreaker
November 28, 2021 · Original source
3: Comment of the week is Gwern on whether we should consider China “successful”:"
February 10, 2022 · Original source
#84: Study Cognitive Strategies, Argument Distillation, And Build A Better Social Network (3) Hi, I'm a regular pseudonymous commenter here (and in other Rationalist spaces), but my real name is Isaac P. Burke. Some of you may know me from the Irish SSC meetups. I submitted three proposals: the first, a Rationalist nonprofit to conduct studies on effective cognitive strategies, initially focusing on group rationality in toy scenarios/competitions. The second, a nonprofit social network, not subject to advertiser pressure or clickbait incentives, with a focus on providing users with choices in terms of the moderation and algorithms they want to experience - think AO3. The third, a collaborative tool, based on Gwern's proposal here: https://www.gwern.net/CYOA but focusing on user-submitted arguments, distilling debates down to a dialogue between the most persuasive crowdsourced points on both sides (this would also be useful as an artistic tool for collaborative storytelling, but that's less EA-relevant.) In terms of qualifications, I'm a programmer with experience primarily in games and web design, and a passionate EA, currently working part-time for a small educational nonprofit. None of these proposals necessarily require a huge budget, at least to reach the "minimum viable product" stage - maybe $15k-$20k - but all would require a lot of collaboration (even more so if more than one of them gets interest). If you're interested in volunteering/funding/collaborating on any of these proposals, you can reach me at Isaac.Philip.Burke[at]gmail[dot]com.
February 22, 2022 · Original source
14: You’ve probably heard statistics about how 50% of transgender youth attempt suicide before age 21. This paper tries to analyze the situation in more depth. The 50% number usually comes from surveys, but there’s some evidence people exaggerate on surveys, rounding up “I think about it a lot” to “I attempted”. The authors gather data on completed suicides among trans people, and find that they’re about 0.01%/year (which is about 5x the cisgender rate). If we suppose that people have about 5 years between becoming transgender and turning 21, then the 50% attempted suicide rate → 0.05% completed suicide rate implies that 1/1000th of the youth who report attempting suicide on surveys complete suicide - which sounds about right to me [but see this comment for a critique] 15: Gwern on the failures of 20th century eugenics. I’ve previously linked a piece about how, aside from the general moral failure, the 20th century eugenicists got lots of implementation details really wrong. Gwern adds to the picture: they had a purely Mendelian (as opposed to polygenic) model of intelligence, and felt that bad traits were probably caused by single recessive genes. This dichotomized the population in a way that contributed to the moral problems - if IQ is truly a continuum, then someone with 120 IQ might still wonder if they were “inferior” to someone with 130 IQ, in a way that made them feel some sympathy to someone with 80 IQ who was being pronounced “inferior” by the eugenicists of the time. But instead, they thought some people had the specific recessive “low intelligence” gene, those people could be “cleansed” from the population, and then everyone else would be fine! It also prevented them from considering improving the populace by encouraging intelligent people to breed more (as opposed to sterilizing unintelligent people) - this wouldn’t eliminate the recessive variants that were causing all the trouble! I’m confused how they could have believed this even with the limited knowledge of the time; this was long after Galton had proven that genius was genetic, and once you have genetic genius you know there’s more going on than Mendelian inheritance of subnormality. 16: Sexual selection bridges peaks in adaptive fitness landscapes 17: NFTorah: “The Torah [is] the original blockchain”. I think it’s funny that this exists, but it’s exactly what you would expect, and you don’t have to click on the link. 18: More IRB nightmares. 19: @ethanbdm When we piloted a public lottery to evaluate cash transfers in Liberia, the potential recipients arranged beforehand to insure one another. After the randomization and grant, the winners compensated the losers and unraveled the field experiment.","username":"cblatts","name":"Chris Blattman","profile_image_url":"","date":"Tue Jan 18 19:01:29 +0000 2022","photos":[],"quoted_tweet":{},"reply_count":0,"retweet_count":77,"like_count":678,"impression_count":0,"expanded_url":{},"video_url":null,"belowTheFold":true}" data-component-name="Twitter2ToDOM"> 20: DeepMind made a programming AI that was able to participate in a human coding competition and place around the middle. Nostalgebraist gives his thoughts: “impressed with the raw performance, not massively surprised, not sold that it implies anything big in particular”. A lot of people will be watching whether it can win programming competitions outright a year or two from now, though I bet their perspectives on how relevant this is for AI takeoff speeds will be pretty mixed. 21: Effective altruist organizations as Zendaya outfits. 22: Brain Efficiency: Much More Than You Wanted To Know. “Why should we care? Brain efficiency matters a great deal for AGI timelines and takeoff speeds, as AGI is implicitly/explicitly defined in terms of brain parity.” 23: I’m not going throw out my copy of The Case Against Education just yet - I haven’t checked this study but I bet there are lots of possible confounders. Still, this would be fun for somebody more interested to analyze in depth: 24: Best of Scott Sumner archives: There’s Only One Sensible Way To Measure Economic Inequality. “You cannot put the burden of a tax on someone unless you cut into his or her consumption. If … tax increases did not cause Gates and Buffett to tighten their belts, then they paid precisely 0% of that tax increase. Someone else paid, even if they wrote the check. If they invested less due to the tax, then workers might have received lower wages. If they gave less to charity then very poor Africans paid the tax.” 25: The latest in the Greater Male Variability Hypothesis: Harrison, Noble, and Jennions publish a meta-analysis failing to find evidence of greater male variability in the personality of non-human animals. Del Giudice and Gangestad have a rebuttal saying that they were underpowered to detect it even if it did exist, plus noting the ways that media coverage of this study was incredibly irresponsible even by its own terms. 26: Some recent critiques of Cook (2014) on racial violence vs. black patents, including Michael Wiebe challenging the violence measures and AnechoicMedia arguing that the black patent measure declines right when switching from one (more complete) dataset to another (less complete) one. Rebuttal by Brad DeLong here, he argues that Cook uses multiple methods and some of them don’t have this problem. Relevant since Cook is now being considered for the Federal Reserve; see eg this Wall Street Journal editorial against. 27: Claim: 31% of British people say they have seen or met Queen Elizabeth (this seems plausible to me, I would answer ‘yes’ to this because she visited Ireland when I lived there, I watched the parade in her honor, and I could vaguely glimpse her on the inside of her car). 28: This couple-of-month-period in wokeness: Scientific American attacks late biologist EO Wilson, in a screed whose highlight is calling him problematic for describing ants as having “colonies”. This is part of a more general (and surprisingly fast) pivot at Scientific American from real science to culture warring; when even Eric Turkheimer thinks you’ve gotten too woke, you’ve gotten too woke.
April 04, 2022 · Original source
Gwern answered (long comment, only partly cited):
So here it looks like Matthew is taking the reductionist perspective (that the GPTs were just a predictable continuation of trend) and Gwern is taking the more interesting perspective (the trend continuing is exciting and important).
While I acknowledge Gwern has a good point here, it seems - not entirely related to the point under discussion? Yes, progress will come from specific people doing specific things, and they deserve to be celebrated, but Paul’s position - that progress is gradual and predictable - still stands.
April 11, 2022 · Original source
Prosaic alignment is hard… “Prosaic alignment” (see this article for more) means alignment of normal AIs like the ones we use today. For a while, people thought those AIs couldn’t reach dangerous levels, and that AIs that reached dangerous levels would have so many exotic new discoveries that we couldn’t even begin to speculate on what they would be like or how to align them. After GPT-2, DALL-E, and the rest, alignment researchers got more concerned that AIs kind of like current models could be dangerous. Prosaic alignment - trying to align AIs like the ones we have now - has become the dominant (though not unchallenged) paradigm in alignment research. “Prosaic” doesn’t necessarily mean the AI cannot write poetry; see Gwern’s AI generated poetry for examples. … because OOD behavior is unpredictable “OOD” stands for “out of distribution”. All AIs are trained in a certain environment. Then they get deployed in some other environment. If it’s like the training environment, presumably their training is pretty relevant and helpful. If it’s not like the training environment, anything can happen. Returning to our stock example, the “training environment” where evolution designed humans didn’t involve contraceptives. In that environment, the base optimizer’s goal (pass on genes) and the mesa-optimizer’s goal (get genital friction) were very well-aligned - doing one often led to the other - so there wasn’t much pressure on evolution to look for a better proxy. Then 1957, boom, the FDA approves the oral contraceptive pill, and suddenly the deployment environment looks really really different from the training environment and the proxy collapses so humiliatingly that people start doing crazy things like electing Viktor Orban prime minister. So: suppose we train a robot to pick strawberries. We let it flail around in a strawberry patch, and reinforce it whenever strawberries end up in a bucket. Eventually it learns to pick strawberries very well indeed. But maybe all the training was done on a sunny day. And maybe what it actually learned was to identify the metal bucket by the way it gleamed in the sunlight. Later we ask it to pick strawberries in the evening, where a local streetlight is the brightest thing around, and it throws the strawberries at the streetlight instead. So fine. We train it in a variety of different lighting conditions, until we’re sure that, no matter what the lighting situation, the strawberries go in the bucket. Then one day someone with a big bulbous red nose wanders on to the field, and the robot tears his nose off and pulls it into the bucket. If only there had been someone with a nose that big and red in the training distribution, so we could have told it not to do that! The point is, just because it’s learned “strawberries into bucket” in one environment, doesn’t mean it’s safe or effective in another. And we can never be sure we’ve caught all the ways the environment can vary. …and deception is more dangerous than Goodharting. To “Goodhart” is to take advantage of Goodhart’s Law: to follow the letter of your reward function, rather than the spirit. The ordinary-life equivalent is “teaching to the test”. The system’s programmers (eg the Department of Education) have an objective (children should learn). They delegate that objective to mesa-optimizers (the teachers) via a proxy objective (children should do well on the standardized test) and a correlated reward function (teachers get paid more if their students get higher test scores). The teachers can either pursue the base objective for less reward (teach children useful skills), or pursue their mesa-level objective for more reward (teach them how to do well on the test). An alignment failure! This sucks, but it’s a bounded problem. We already know that some teachers teach to the test, and the Department of Education has accepted this as a reasonable cost of having the incentive system at all. We might imagine our strawberry-picker cutting strawberries into little pieces, so that it counts as having picked more strawberries. Again, it sucks, but once a programmer notices it can be fixed pretty quickly (as long as the AI is still weak and under control). What about deception? Suppose the strawberry-picker happens to land on some goal function other than the intended one. Maybe, as before, it wants to toss strawberries at light sources, in a way that works when the nearest light source is a metal bucket, but fails when it’s a streetlight. Our programmers are (somewhat) smart and careful, so during training, they test it at night, next to a streetlight. What happens? If it’s just a dumb collection of reflexes trained by gradient descent, it throws the strawberry at the streetlight and this is easily caught and fixed. If it’s a very smart mesa-optimizer, it might think “If I throw the strawberry at the streetlight, I will be caught and trained to have different goals. This totally fails to achieve my goal of having strawberries near light sources. So throwing the strawberry at the light source this time, in the training environment, fails to achieve my overall goal of having strawberries thrown at light sources in general. I’ll do what the humans want - put the strawberry in the bucket - for now.” So it puts the strawberry in the bucket and doesn’t get caught. Then, as soon as the humans stop looking, it throws strawberries at the streetlight again. Deception is more dangerous than Goodharting because Goodharting will get caught and trained away, and deception might not. I might not be explaining this well, see also Deceptively Aligned Mesa-Optimizers? It’s More Likely Than You Think: We prevent OOD behavior by detecting OOD and obtaining more human labels when we detect it… If you’re (somewhat) careful, you can run your strawberry-picking AI at night, see it throw strawberries at streetlights, and train it out of this behavior (ie have a human programmer label it “bad” so the AI gradient-descends away from it) …and we eliminate the incentive for deception by ensuring that the base optimizer is myopic A myopic optimizer is one that reinforces programs based only on their performance within a short time horizon. So for example, the outside gradient descent loop might grade a strawberry picker only on how well it did picking strawberries for the first hour it was deployed. If this worked perfectly, it would create an optimizer with a short time horizon. When it considered deceiving its programmers in order to get a payoff a few days later when they stopped watching it, it wouldn’t bother, since a few days later is outside the time horizon. …and implements a decision theory incapable of acausal trade. You don’t want to know about this one, really. Just pretend it never mentioned this, sorry for the inconvenience. There are deceptively-aligned non-myopic mesa-optimizers even for a myopic base objective. Even if the base optimizer is myopic, the mesa-optimizer might not be. Evolution designed humans myopically, in the sense that we live some number of years, and nothing that happens after that can reward or punish us further. But we still “build for posterity” anyway, presumably as a spandrel of having working planning software at all. Infinite optimization power might be able to evolve this out of us, but infinite optimization power could do lots of stuff, and real evolution remains stubbornly finite. Maybe it would be helpful if we could make the mesa-optimizer itself myopic (though this would severely limit its utility). But so far there is no way to make a mesa-optimizer anything. You just run the gradient descent and cross your fingers. The most likely outcome: you run myopic gradient descent to create a strawberry picker. It creates a mesa-optimizer with some kind of proxy goal which corresponds very well to strawberry picking in the training optimization, like flinging red things at lights (realistically it will be weirder and more exotic than this). The mesa-optimizer is not incentivized to think about anything more than an hour out, but does so anyway, for the same reason I’m not incentivized to speculate about the far future but I’m doing so anyway. While speculating about the far future, it realizes that failing to pick strawberries correctly now will thwart its goal of throwing red things at light sources later. It picks strawberries correctly in the training distribution, and then, when training is over and nobody is watching, throws strawberries at streetlights. (Then it realizes it could throw lots more red things at light sources if it was more powerful, achieves superintelligence somehow, and converts the mass of the Earth into red things it can throw at the sun. The end.) III. You’re still here? But we already finished explaining the meme! Okay, fine. Is any of this relevant to the real world? As far as we know, there are no existing full mesa-optimizers. AlphaGo is kind of a mesa-optimizer. You could approximate it as a gradient descent loop creating a good-Go-move optimizer. But this would only be an approximation: DeepMind hard-coded some parts of AlphaGo, then gradient-descended other parts. Its objective function is “win games of Go”, which is hard-coded and pretty clear. Whether or not you choose to call it a mesa-optimizer, it’s not a very scary one. Will we get scary mesa-optimizers in the future? This ties into one of the longest-running debates in AI alignment - see eg my review of Reframing Superintelligence, or the Eliezer Yudkowsky/Richard Ngo dialogue. Optimists say: “Since a goal-seeking AI might kill everyone, I would simply not create one”. They speculate about mechanical/instinctual superintelligences that would be comparatively easy to align, and might help us figure out how to deal with their scarier cousins. But the mesa-optimizer literature argues: we have limited to no control over what kind of AIs we get. We can hope and pray for mechanical instinctual AIs all we want. We can avoid specifically designing goal-seeking AIs. But really, all we’re doing here is setting up a gradient descent loop and pressing ‘go’. Then the loop evolves whatever kind of AI best minimizes our loss function. Will that be a mesa-optimizer? Well, I benefit from considering my actions and then choosing the one that best achieves my goal. Do you benefit from this? It sure does seem like this helps in a broad class of situations. So it would be surprising if planning agents weren’t an effective AI design. And if they are, we should expect gradient descent to stumble across them eventually. This is the scenario that a lot of AI alignment research focuses on. When we create the first true planning agent - on purpose or by accident - the process will probably start with us running a gradient descent loop with some objective function. That will produce a mesa-optimizer with some other, potentially different, objective function. Making sure you actually like the objective function that you gave the original gradient descent loop on purpose is called outer alignment. Carrying that objective function over to the mesa-optimizer you actually get is called inner alignment. Outer alignment problems tend to sound like Sorcerer’s Apprentice. We tell the AI to pick strawberries, but we forgot to include caveats and stop signals. The AI becomes superintelligent and converts the whole world into strawberries so it can pick as many as possible. Inner alignment problems tend to sound like the AI tiling the universe with some crazy thing which, to humans, might not look like picking strawberries at all, even though in the AI’s exotic ontology it served as some useful proxy for strawberries in the training distribution. My stand-in for this is “converts the whole world into red things and throws them into the sun”, but whatever the AI that kills us really does will probably be weirder than that. They’re not ironic Sorcerer’s Apprentice-style comeuppance. They’re just “what?” If you wrote a book about a wizard who created a strawberry-picking golem, and it converted the entire earth into ferrous microspheres and hurled them into the sun, it wouldn’t become iconic the way Sorcerer’s Apprentice did. Inner alignment problems happen “first”, so we won’t even make it to the good-story outer alignment kind unless we solve a lot of issues we don’t currently know how to solve. For more information, you can read: Rob Miles’ video above, direct link here, channel here.
June 07, 2022 · Original source
(related: Gwern on scaling)
September 06, 2022 · Original source
19: Reddit: The current and future state of AI/ML is shockingly demoralizing. A new concern I’ve never seen before, aside from the superintelligence family of concerns or the implicit bias family. AI is slowly eating all creative work. If AI remains slightly worse than humans, it could still take over because it’s so much cheaper and more scaleable, resulting in all our art getting slightly worse. If it becomes better than humans, a world where you (as a human) can never create truly world-class art also sounds pretty depressing. But this nostalgebraist post (with shades of this Gwern post) pushes back a little:
April 20, 2023 · Original source
2: List Of Questions Gwern Is Curious About. Why are cats fascinated by earwax? Why are furries so artistically and economically influential compared to other fetishes? Why are there so few pairs of extremely successful identical twins? Why did it take so long to invent Brazilian jiu-jitsu? Why are short stories so much less popular than they used to be? Why do East Asians have so many famous numbered lists (“Four Noble Truths”, “Thirteen Classics”, etc)? And many more.
August 09, 2023 · Original source
29: Gwern: why hasn’t AI-generated music taken off in the same way as AI-generated art or AI-generated text? He thinks it’s a combination of copyright, low demand, and technical difficulty.
August 16, 2023 · Original source
But Gwern gives a more scientific counterargument:
Gwern is right that there’s a lot of science purporting to argue that describable preferences can’t help people find matches. I want to start by arguing that this science can’t possibly be right, then look closer into what it is and where it might have gone wrong.
Gwern lists some of them here. I won’t go too much into any individual study, except to note that Sparks (2020) is a great name for someone researching the causes of romantic attraction, and Wood & Furr sounds like a children’s cartoon about adorable animals. I’ll separate them, plus some related work, into a few designs:
September 18, 2023 · Original source
Gwern writes:
Surprising to see a psychiatrist write a review of Musk focusing on his psychology and replete with quotations about his erratic sleep habits or obsessive focus, and never use the words "bipolar" or "mood disorder"
The link goes to many articles that Gwern thinks provide evidence, including some where Musk self-describes as bipolar (“maybe not medically tho”).
January 18, 2024 · Original source
8: Gwern’s take on November’s OpenAI board drama (plus some extra context).
9: Related: Gwern discusses the history of the early-2010s neural net revolution. “Everyone except Shane Legg was wrong about [deep learning] prospects & timing, and even Legg was wrong about important things, [which is why] DeepMind is now on the hindfoot.”
April 04, 2024 · Original source
22: George H (formerly of Cerebralab, now of Epistem.ink) claims that Increasing IQ Is Trivial and the scientific consensus that it’s impossible is just scientists being too cowardly to try interesting things (see also his counter to Gwern’s “Algernon” argument here). He says that he was able to increase his IQ 7-9 points (after controlling properly for learning effects) and that the first two people to try to replicate his method got 10 and 11 point increases). He’s being a little coy about what exactly the method is, because he doesn’t want too many people trying it half-assed and messing it up, but says it involves:
April 09, 2024 · Original source
1: Comments Arguing Against Zoonosis — 1.1: Is COVID different from other zoonoses? — 1.2: Were the raccoon-dogs wild-caught? — 1.3: 92 early cases — 1.4: COVID in Brazilian wastewater — 1.5 Biorealism’s 16 arguments — 1.6: DrJayChou’s 7 arguments — 1.7: How much should coverup worry us? — 1.8: Have Worobey and Pekar been debunked? — 1.9: Was there ascertainment bias in early cases — 1.10: Connor Reed / Gwern on cats — 1.11: Rootclaim’s response to my post
Before going further, I recommend reading page 8 of the supplementary text of Worobey’s paper, titled “Robustness Of Statistical Test Results To Ascertainment Bias”, or pages 14-17, “Additional Data Related To Case Ascertainment Biases”, which explain all the reasons he thinks this isn’t true. I promise you aren’t the first person to think that maybe Worobey could be contaminated by ascertainment bias. If that still doesn’t help, Worobey talks more about his strategy for avoiding ascertainment bias here. Most important, he counted only cases from December; the market connection was discovered December 30 and added to diagnostic criteria January 3. This doesn’t mean bias is impossible - some of these points are people who caught COVID on December 31, but only got diagnosed January 4 after the new diagnostic criteria were added. But most cases are pre-criteria. And Worobey looked at various subsets of pre-criteria cases and found they were all at least as market-focused as the overall set. For example, he looked at the earliest COVID records in one Wuhan hospital system: 10 of these hospitals’ 19 earliest COVID-19 cases were linked to Huanan Market (∼53%), comparable both to Jinyintan’s 66% (of 41 cases) (4) and to the WHO-China report’s 33% of 168 retrospectively identified cases within Wuhan across December 2019 (1). Regarding cases at the Wuhan Central Hospital and HPHICWM, patients with a history of exposure at Huanan Market could not have been “cherry picked” before anyone had identified the market as an epidemiologic risk factor. Hence, there was a genuine preponderance of early COVID-19 cases associated with Huanan Market. Likewise, a study conducted January 2 (so not impacted at all by the January 3 criteria) found that 27 of 41 known patients had market links. Likewise, the first five cases were all detected in the market, and it doesn’t even make sense to talk about ascertainment bias for these. What is the Weissman paper that observeralt is talking about? It argues: if the pandemic started at the market, each seemingly non-market-linked case must ultimately derive from a market-linked case. Therefore, we should expect non-market-linked cases to require more steps than market-linked cases. Therefore, they should be further away. But if we look at the map above, we see that not-market-linked cases are closer to the market than market-linked cases. So something must be wrong, and that something might be ascertainment bias. (at least this is my interpretation of Weissman’s argument, which is more mathematical; read the paper to make sure I’m getting it right). This is a weirdly spherical-cow view of an epidemic, worthy of a physicist. It’s easy to think of reasons the linked-cases-should-be-closer rule might not hold. For example, suppose that on their lunch break, market vendors go have lunch at restaurants surrounding the market. They infect people in these restaurants, who then infect their friends and family. But these people never went to the market themselves. Now there are a bunch of non-market-linked cases immediately surrounding the wet market. But also - of all markets in Wuhan, Huanan sold the most weird wildlife. Suppose someone in the boonies gets a craving for raccoon-dog one day, their local convenience store doesn’t have it, so they hop on a bus and go downtown to the city’s main wet market. Then they get infected with COVID. Now there’s a wet-market-linked case in the boonies. In other words, we should expect two modes of spread: general geographic diffusion from the epicenter, and people from far away who made specific trips. If this still doesn’t seem obvious to you, consider - usually when COVID first arrived in America or Brazil or wherever, they were able to trace it back to a specific person from Wuhan who visited the country. If I was the first person in America to get COVID, I could usually say “Oh, it must have been my business meeting with Mr. Chin from Wuhan”. At the same time, if someone from the next town over from Wuhan got COVID, they probably couldn’t trace it back to a specific Wuhanite - everyone from Wuhan is coming and going so often that my town is just full of COVID in general. So I don’t think Weissman’s paper proves anything, and I think the general pattern of blue and orange dots suggests ascertainment bias wasn’t playing a role. So why does George Gao say that there was ascertainment bias? I looked for the direct source of the Gao quote and couldn’t find it; if someone else is able to, please let me know, since I’d be interested in exactly what he thinks about this. 1.10: Connor Reed / Gwern on cats Gwern wrote: Yes, I don't understand this (paraphrased) claim by Peter: > He also told the Mail that his cat got the coronavirus too, which is impossible. 'Impossible', thus implying the man was lying? I was under the impression that, quite aside from cats having tons of coronaviruses in general (FCoV being a particularly serious threat to young cats, which also seems to be a remarkable case study of the harms of the FDA), that it was not just not 'impossible' for domestic pet cats to get the coronavirus too, it was routine for them to get COVID-19, and even other cat species in *zoos* have tested positive and this was true very early in the COVID-19 pandemic and quite well publicized and well known (eg April 2020 https://www.nationalgeographic.com/animals/article/tiger-coronavirus-covid19-positive-test-bronx-zoo ). This was a topic of interest to me at the time because I like cats and have a cat and was wondering what the implications of me being inevitably infected might be for my cat, and so I remember this quite well despite my general attempt to remain ignorant of as many COVID-19 matters as possible... And double-checking now to see if all of these reports were somehow false positives or faked, I continue to see everyone like the CDC stating that it is still totally possible and routine for cats in close contact with infected humans (you know, like a *pet* cat) to be infected with COVID-19: https://www.cdc.gov/healthypets/covid-19/pets.html Given that Peter has supposedly spent years autistically researching every last detail and this detail in particular in order to discredit that British dude, I'm experiencing sudden Gell-Man Amnesia here about the rest of his claims, as well as the supposed experts evaluating Peter's claims if they didn't flag that (I have not checked). This is in the context of Connor Reed, a British man who claimed to have gotten COVID on November 25 - which, if true, would be surprisingly (though not impossibly) early according to the zoonosis narrative. Peter argued his story didn’t hold up, and one of his points centered around his claim that his cat might have caught COVID from him and died. Unfortunately, I mis-quoted Peter. I said Peter argued it was impossible for his cat to get COVID-19 (false). His actual statement was that it’s extremely rare for a cat to die of COVID-19. Peter, Gwern, and I then proceeded to get very confused about the exact claims and timeline, which I think is because Connor said totally different things in different interviews: In an interview with Wales Online on 2/4/2020, he said that "my kitten caught the feline coronavirus and developed pneumonia and died, but I don't think I caught it from her. I think that was just coincidence.”
August 19, 2024 · Original source
4: Comment of the week is Gwern’s comment/summary/review on the Marvel Comics book review.
November 01, 2024 · Original source
13: Gwern on the chip embargo: It is pretty damning. We're told the chip embargo has failed, and smugglers have been running rampant for years, and China is about to jump light years beyond the West and enslave us with AXiI (if you will) . . . And then an expert casually remarks that all of China put together, smuggling chips since 2022, has fewer H100s than Elon Musk orders for his datacenter while playing Elden Ring. And even with that huge bottleneck and 1.4 billion people, there's so little demand for them that they cost less per hour than in the West, where AI is redhot and we can't get enough H100s in datacenters. (And where the serious AI people are now discussing how to put that many into a single datacenter for a single run before the next scaleup with B200s obsoletes those...) 14: A company called Cosm has raised $250 million to build “immersive sports experiences”, ie giant buildings sort of like a cross between a stadium and a movie theater where people can get together and watch high-quality televised sports games in a “realistic” setting; they already have facilities in Dallas and Los Angeles. 15: Cremieux: The Ottoman Origins Of Modernity. The “Ottoman” bit is a distractor; the Ottomans fought the Catholics long enough for the Protestants to get a foothold, and then the Protestants established modernity. A useful pushback against the pushback that the Catholic Church never persecuted scientists or held back progress. I’m most interested in this post in the context of Cremieux saying he wrote it in two hours. Even I can’t work that fast! 16: The Green Party, a US third party, tried to put their candidate Jill Stein on the ballot in November. The Nevada election office sent them the wrong forms and gave them false advice about the process. The Greens filed the wrong forms, the Democrats sued, and the Supreme Court disqualified Stein, calling the election office’s incorrect advice an “unfortunate mistake”. I’m disappointed in this outcome - partly for the obvious reasons, but also because the incorrect forms they submitted technically should have added a state referendum to the ballot containing only the text “Jill Stein”. If they’re going to disqualify her candidacy, then I think they should at least hold the state referendum! 17: Nostalgebraist: Google has a new tool out that will create an AI podcast for any text; you hand it the text (could be a blog post, article, or work of fiction), and the tool generates a podcast of two AI hosts discussing it. You can find podcast discussions of Nostalgebraist’s fiction (Northern Caves and Almost Nowhere) at the link, but the acknowledged peak of the genre is Podcast Hosts Discover They’re AI, Not Human, And Spiral Into Existential Meltdown. 18: Also Nostalgebraist: The Case For Chain Of Thought Unfaithfulness Is Overstated. New AIs like o1 give “chain of thought”, ie display what they’re thinking after each step. This seems like a promising avenue to solve alignment - just see whether they’re thinking “and now I will plot against humans”. Unfortunately it’s not so easy; the chain of thought isn’t always accurate (you can sometimes catch the AI “hiding” thoughts it doesn’t want its human overseers to know, like when it’s using a racial stereotype). This article argues that these examples aren’t as exciting as they sound, and chain-of-thought accurately reflects reasoning for most tasks. 19: Australian government considers making doxxing a crime punishable by up to seven years in jail. 20: Getting your brain cryogenically frozen after your death is now free. 21: Cube Flipper: Hypercomputation without bothering the cactus people. The visual system must solve difficult math problems when translating the 2D visual field into a 3D world. Can we harness this innate mathematical ability to do arbitrary work? Cognitive scientist Mark Changizi developed a series of visual circuits (eg XOR gates) based on Necker cubes, probably easier seen than described: After surveying the field, Cube Flipper proposes a more advanced visual computer based on taking DMT and viewing certain types of tiles with slight deviations: …and makes the extreme claim that something like this might demonstrate hypercomputation, ie the visual system has semi-magic computational properties beyond those permitted by normal physical laws. I am skeptical but appreciate the survey of visual computing (as well as the callback to one of my older posts). 22: Material implication in Mormonism: In the book Doctrines and Covenants, Joseph Smith reports that God told him that if he lived to be 85, he would see the Second Coming (which would place it in 1890 - 1891). Mormon apologists note that Joseph Smith did not live to be 85, so no conclusion can be drawn. 23: More old-timey psychiatric ads (this one is from 1952, source: @justin_garson): This was before they invented what we would call antidepressants today; Dexedrine is an amphetamine related to Adderall. 24: Congratulations to Open Philanthropy, the biggest effective altruist foundation… …whose grantee David Baker recently won a Nobel Prize for his research on synthetic proteins. Potential applications include new drugs, vaccines, and materials. 25: Rich Kid Memes And The Online Culture Of The One Percent. Rich people who want to signal group membership to other rich people online can’t boast about how rich they are; that would be gauche. Instead, they’ve settled on the solution of making fun of rich people in hyperspecific language that proves familiarity with the culture. 26: Tap Water Sommelier: Vladimir Putin has two sons, ages 5 and 9. They are kept in luxurious but total isolation from the outside world and raised by flunkies who are too scared to punish/restrain them in any way. Also some discussion of an unexpected historical analogue. 27: Experiment from Colombia: replacing experienced teachers with less-experience but higher-scoring-on-tests teachers significantly decreased student performance. Got to admit I was expecting the opposite of this, I’d seen US data saying that experience didn’t matter and teacher intelligence did. Looking over this more, I find lots of studies on both sides and will go back to agnosticism on this question until someone I trust investigates further. 28: Large scale-formal Intellectual Turing Test finds that people can imitate partisans effectively; ie nobody on either side can tell the difference between a Democrat arguing for Democrat values vs. a Republican-pretending-to-be-a-Democrat arguing for Democrat values (and vice versa). This study used a 100 word essay on why you supported your party (you can see if you can do better here), but past attempts with different structures (religion, vegetarianism, polyamory) have shown broadly the same results. The researchers try to put this in the context of various studies showing that people do misunderstand their opponents (eg think they’re more extreme, underestimate the level of common ground), but it seems like intellectual Turing Tests aren’t a good way to measure or tease out this misunderstanding. 29: Congratulations to Substacker WoolyAI for doing the impossible and providing a genuinely novel and interesting (to me) take on pickup artistry: 30: Did you know: if you Google “cool websites”, our subreddit (r/slatestarcodex) is the first result. 31: Moshe Koppel, who works at the intersection of computer science and Talmud, is writing a series of posts (presumably) based off of my Every Bay Area House Party, titled Jerusalem Area House Party (it’s multiple part, you have to go to the main Substack page to find the others). I won’t necessarily link everyone who riffs off one of my posts - but honestly I probably will if you also have a Wikipedia page that describes you as working on computational Talmudology. 32: David Roman says it’s a myth that Arabic scholars rescued and preserved the works of the great classical authors. 33: Medications often decrease “secondary endpoints” (eg stroke, heart attack), but the holy grail of pharma studies is proving that a certain drug decreases all-cause mortality. This is much harder (not all heart attacks kill people, and people die from lots of other things), but is the strongest possible endorsement for the drug (without it, you might worry that it only prevented non-fatal heart attacks, or that it killed as many people through side effects as it saves through heart attack prevention). Even great medications that we’re confident in can’t always clear this bar. But a new JAMA article adds another member to this select club: Adderall decreases all-cause mortality in ADHD, probably because it prevents drug addiction, car accidents, and impulsive actions. 34: Before the Gulf War got in the way, Saddam Hussein was building some crazy mosques: 35: Italy bans surrogacy - quite strictly, too, Italians aren’t even allowed to go abroad and do it. I am so sorry for all the Italians who will never get to be mothers and fathers because their government hates progress. You might hope that, whatever the other disadvantages of anti-immigrant parties, at least they’re incentivized to let natives have children, but looks like they can’t even get that one right. Starting to wonder whether the trains even run on time. 36: Elsewhere in “Italy sucks” news - did you know Italy’s tax code effectively bans startups? Companies are taxed before making any money, based on how many assets they have. If they have lots of assets but aren’t making money (eg because they’re still doing research / in stealth) then tax officials get confused and hostile and run increasingly punitive audits. Related: size of the European tech sector. It’s the red line on this chart; if you can’t see a red line at your screen resolution, then you’ve learned something important about the the EU tech sector. 37: Seen on @cremieuxrecuel’s twitter (preliminary, needs replication): Jews may have gone from 65-29 Democrat/Republican in 2020 to 58-40 this election. 38: Extelligence has a post responding to my critique of the cultural Christianity argument (among, uh, many other things), but I don’t really think it connects. I’m not telling atheists they can’t go to church/synagogue if it makes them feel happy and fulfilled - I’ve done this myself sometimes. My post was meant to argue against the claim that, for pragmatic reasons, atheists should support the Christianization of society as a defense against Islam or postmodernism or some other philosophical enemy. 39: Related: Extelligence is finally going for their Trust Assembly project/idea/startup for online consensus-based truth-seeking (I think something like a cross between Community Notes and Wikipedia, but as a browser extension, and for everything). He’s looking for potential developers/testers/users. 40: Jiankui He is the Chinese geneticist who made history with the first germline gene editing in humans (resulting in three babies supposedly immune to AIDS, although nobody has tested this). China sentenced him to three years in prison for unauthorized experimentation, but now he’s out of jail, has an English-language Twitter account, has a new lab, wants to work on Alzheimers, and seems pretty based (although not infinitely based): 41: Anthropic has a new version of their AI Claude which can use your computer. You give it permission, put it on a virtual desktop, and ask it to do things for you (eg “please find and download a picture of a cat” or “please research these ten things and put them in a text file”.) It moves your cursor, browses the Internet, and creates and saves files. People keep saying they’ll care about AI “when it operates autonomously” or “when it becomes an agent”. But this is a trivial barrier, and one which Computer Use Claude has arguably already passed. So far this feature is limited to developers (though anyone with computer knowledge can sign up for it) but I expect it to be the near future of consumer AI, to get better quickly, and to shade gradually into the “autonomous” “agentic” AI that you all think will require a paradigm shift. 42: Claim (from the IDF): Hamas faked polls showing that most Palestinians supported the October 7 attack; the real numbers are 31% in favor, 64% against. 43: Otto von Bismarck wanted to trick France into declaring war on Germany. In order to provoke the French, he sent the Ems Dispatch, a statement describing recent diplomatic events in a way that sounded maximally offensive. The French were so offended that “crowds” in Paris demanded war, and the Franco-Prussian War was declared soon afterwards. The part of this that I find most interesting is the text of the dispatch itself, which read: After the news of the renunciation of the Prince von Hohenzollern had been communicated to the Imperial French government by the Royal Spanish government, the French Ambassador in Ems made a further demand on His Majesty the King that he should authorize him to telegraph to Paris that His Majesty the King undertook for all time never again to give his assent should the Hohenzollerns once more take up their candidature. His Majesty the King thereupon refused to receive the Ambassador again and had the latter informed by the Adjutant of the day that His Majesty had no further communication to make to the Ambassador. I’m fascinated by the idea that only 150 years ago, it was obvious that if someone sent you this statement, you had to declare war or abandon all honor. If I read it carefully, I can sort of parse out that it sounds like the Prussians are unhappy, but that’s the most emotion I gather from it. Anyway, the Franco-Prussian War led to World War I which led to World War II - so if you don’t like 50 million people dying and the total devastation of Europe, blame this statement about ambassadors. 44: The first use of artificial insemination in humans: The first recorded case of artificial insemination by donor didn’t occur until 1884, when Dr. William Pancoast decided to treat a couple’s infertility by secretly inseminating the woman with sperm obtained from a medical student. The insemination happened while the patient was under anesthesia and Dr. Pancoast did not tell her what had occurred. She gave birth to a baby boy nine months later, but it was several years before the doctor finally confessed to her husband what he had done. Neither man ever informed the mother. It was 25 years later the result of this case was published. Dr. Pancoast was roundly condemned for his actions, but it did open the door for consensual sperm donor insemination. 45: ClearerThinking administers several personality tests to the same people to learn more about their comparative accuracy. I am most interested in their finding that tests with “factors” (eg the Big Five, where you rate people on a numeric scale) are inherently more accurate than those with “types” (eg Myers-Briggs, where you assign someone a specific category) and that, adjusting for this, Big Five is no more predictive than the Enneagram: 46: In 2022, I wrote Whither Tartaria, where I asked why ornate classical styles switched to more austere modernist styles around 1900 - 1950 in a variety of different arts (painting, architecture, literature, poetry, etc). I proposed seven theories, but was unsure which if any were true. Since then, Samuel Hughes of Works In Progress has been investigating. In May, he wrote a well-researched article showing that it wasn’t just increasing cost, because ornate classical architecture now costs less than ever. Now in a new article he demolishes a different theory - it’s not just decreasing cost (and subsequent lack of ability to signal wealth) - because costs didn’t decrease in several other arts, and the change was led by artists with rich people as reluctant followers. He concludes: Modernism may well be a status game of some kind; it may well signal taste more than it signals wealth; and this latter feature may be one of the things that distinguishes it from older artistic styles. But the mechanism by which this change came about must be different to the one Alexander describes. 47: Sort of kind of related - When Hamilton Lost Its Snob Appeal. The musical Hamilton was briefly an artistic/cultural phenomenon, but tastemakers eventually switched to making fun of it. Why? Rob Henderson says it happened after ticket prices came down and the common people could enjoy it. I disagree: everyone I knew who was into Hamilton got into it from the free online soundtrack long before they’d seen the show; I think this is more likely the usual fad cycle where anybody who’s too into yesterday’s fad is behind the curve and therefore uncool. 48: Related: Why are people such jerks to public intellectuals? And more. I agree this is a great mystery. 49: Some prominent Substack psychiatrists doing a video Q&A, submit your questions here. 50: Naomi Kanakia: The Literacy Delusion had a number of explanations for why reading books seemed to be so much worse for human beings (in terms of emotional wellness and productivity) than other forms of narrative entertainment, but its main theory was the integration hypothesis. That the stream of words in a book trained the human brain into a habit of self-consciousness, that reading books forced human beings to think of themselves as a stream of text, processed through time, making a coherent argument of some sort. And that this overall flattening effect forced readers to ignore aspects of their personality or their situation that were not otherwise in line with the overarching story they'd created about themselves. Basically, reading books causes repression and neurosis. The Literacy Delusion argued that, yes, human beings are storytelling machines, but that a stream of written text is a particular kind of story—a story that is particularly flat, particularly devoid of conflicting or harmonizing information—and that this flatness creates a peculiar effect on the human brain. 51: Last month, I linked Sasha Gusev’s No, Intelligence Is Not Like Height and asked people who disagreed to share their arguments; they sure did. First, several people pointed me to a new preprint, Family-GWAS Reveals Effects Of Environment And Mating On Genetic Associations, which finds that one of the main papers Gusev cited to make his case, Howe 2022, made a mistake - imputing sibling genotypes using a process designed for non-sibling genotypes - and that once that mistake is corrected, the finding disappears and intelligence and height appear similar. Second, Joseph Bronski has a more specific post where he responds to Gusev’s points one by one. He accuses Gusev of “[making] up his own chart to remove the error bars [from the originals], to obscure the fact that the study found no evidence for this in IQ”, and says that the cases where he didn’t do that are just “population stratification and range restriction”. Third, Noah Carl at Aporia, instead of writing a direct response like Bronski, argues that the usual method of attacking twin studies is obsolete; not only have the most-debated assumptions behind twin studies been thoroughly validated, but there are now other lines of evidence besides twin studies which confirm high IQ heritability. Fourth, Leonardo Parro (not framed as a response to Gusev) goes into more depth about one of those ways, a “pedigree-based analysis” demonstrating heritability of 54 - 69%, ie no “missing heritability” compared to twin studies. He summarizes this as the effect of “rare variants” compared to the usual SNPs - ie if you only look at the most common genes that are easiest to find, you get “missing heritability” compared to twin studies, but if you widen your search to rare genes that are hard to find, you don’t. 52: Extremely related: Heliospect is a startup promising polygenic selection for IQ and other traits; they were trying to stay in stealth mode but The Guardian spied on them and nonconsensually revealed their existence. The discussion on the r/ssc subreddit centered on their claim that (given enough embryos to choose from) they could increase a baby’s expected IQ by 6 points (I’ve also heard 7.5). Sasha Gusev had previously argued that current technology maxed out at 3.5 and future technology would max out at 6, so a claim of 6 - 7.5 is pretty extreme; Gwern, who wrote the pioneering analysis of this technology, was also skeptical. But Heliospect says they’ve got better predictors than academia that use the rare variants everyone else misses; after talking to the company, Gwern retracted his objections and says he finds their claim “pretty plausible”. Local ACX commenter geneticist Gene Smith also redid some calculations, changed his mind, and says “probably pretty realistic”. I find this interesting not just because of the polygenic selection angle, but because if Heliospect is right then their predictor is able to predict more genetic IQ than the “missing heritability” people believe exists, and it should be able to put this argument to bed once and for all. 53: This month in censorship: X/Twitter banned journalist Ken Klippenstein for sharing the Trump campaign’s dossier on JD Vance. Twitter’s side of the story is that the dossier was probably originally stolen by Iranian agents and they don’t want to support that kind of thing by letting people signal-boost the illicitly obtained goods; you can read Klippenstein’s side here. He appears to be unbanned now.
46: In 2022, I wrote Whither Tartaria, where I asked why ornate classical styles switched to more austere modernist styles around 1900 - 1950 in a variety of different arts (painting, architecture, literature, poetry, etc). I proposed seven theories, but was unsure which if any were true. Since then, Samuel Hughes of Works In Progress has been investigating. In May, he wrote a well-researched article showing that it wasn’t just increasing cost, because ornate classical architecture now costs less than ever. Now in a new article he demolishes a different theory - it’s not just decreasing cost (and subsequent lack of ability to signal wealth) - because costs didn’t decrease in several other arts, and the change was led by artists with rich people as reluctant followers. He concludes: Modernism may well be a status game of some kind; it may well signal taste more than it signals wealth; and this latter feature may be one of the things that distinguishes it from older artistic styles. But the mechanism by which this change came about must be different to the one Alexander describes. 47: Sort of kind of related - When Hamilton Lost Its Snob Appeal. The musical Hamilton was briefly an artistic/cultural phenomenon, but tastemakers eventually switched to making fun of it. Why? Rob Henderson says it happened after ticket prices came down and the common people could enjoy it. I disagree: everyone I knew who was into Hamilton got into it from the free online soundtrack long before they’d seen the show; I think this is more likely the usual fad cycle where anybody who’s too into yesterday’s fad is behind the curve and therefore uncool. 48: Related: Why are people such jerks to public intellectuals? And more. I agree this is a great mystery. 49: Some prominent Substack psychiatrists doing a video Q&A, submit your questions here. 50: Naomi Kanakia: The Literacy Delusion had a number of explanations for why reading books seemed to be so much worse for human beings (in terms of emotional wellness and productivity) than other forms of narrative entertainment, but its main theory was the integration hypothesis. That the stream of words in a book trained the human brain into a habit of self-consciousness, that reading books forced human beings to think of themselves as a stream of text, processed through time, making a coherent argument of some sort. And that this overall flattening effect forced readers to ignore aspects of their personality or their situation that were not otherwise in line with the overarching story they'd created about themselves. Basically, reading books causes repression and neurosis. The Literacy Delusion argued that, yes, human beings are storytelling machines, but that a stream of written text is a particular kind of story—a story that is particularly flat, particularly devoid of conflicting or harmonizing information—and that this flatness creates a peculiar effect on the human brain. 51: Last month, I linked Sasha Gusev’s No, Intelligence Is Not Like Height and asked people who disagreed to share their arguments; they sure did. First, several people pointed me to a new preprint, Family-GWAS Reveals Effects Of Environment And Mating On Genetic Associations, which finds that one of the main papers Gusev cited to make his case, Howe 2022, made a mistake - imputing sibling genotypes using a process designed for non-sibling genotypes - and that once that mistake is corrected, the finding disappears and intelligence and height appear similar. Second, Joseph Bronski has a more specific post where he responds to Gusev’s points one by one. He accuses Gusev of “[making] up his own chart to remove the error bars [from the originals], to obscure the fact that the study found no evidence for this in IQ”, and says that the cases where he didn’t do that are just “population stratification and range restriction”. Third, Noah Carl at Aporia, instead of writing a direct response like Bronski, argues that the usual method of attacking twin studies is obsolete; not only have the most-debated assumptions behind twin studies been thoroughly validated, but there are now other lines of evidence besides twin studies which confirm high IQ heritability. Fourth, Leonardo Parro (not framed as a response to Gusev) goes into more depth about one of those ways, a “pedigree-based analysis” demonstrating heritability of 54 - 69%, ie no “missing heritability” compared to twin studies. He summarizes this as the effect of “rare variants” compared to the usual SNPs - ie if you only look at the most common genes that are easiest to find, you get “missing heritability” compared to twin studies, but if you widen your search to rare genes that are hard to find, you don’t. 52: Extremely related: Heliospect is a startup promising polygenic selection for IQ and other traits; they were trying to stay in stealth mode but The Guardian spied on them and nonconsensually revealed their existence. The discussion on the r/ssc subreddit centered on their claim that (given enough embryos to choose from) they could increase a baby’s expected IQ by 6 points (I’ve also heard 7.5). Sasha Gusev had previously argued that current technology maxed out at 3.5 and future technology would max out at 6, so a claim of 6 - 7.5 is pretty extreme; Gwern, who wrote the pioneering analysis of this technology, was also skeptical. But Heliospect says they’ve got better predictors than academia that use the rare variants everyone else misses; after talking to the company, Gwern retracted his objections and says he finds their claim “pretty plausible”. Local ACX commenter geneticist Gene Smith also redid some calculations, changed his mind, and says “probably pretty realistic”. I find this interesting not just because of the polygenic selection angle, but because if Heliospect is right then their predictor is able to predict more genetic IQ than the “missing heritability” people believe exists, and it should be able to put this argument to bed once and for all. 53: This month in censorship: X/Twitter banned journalist Ken Klippenstein for sharing the Trump campaign’s dossier on JD Vance. Twitter’s side of the story is that the dossier was probably originally stolen by Iranian agents and they don’t want to support that kind of thing by letting people signal-boost the illicitly obtained goods; you can read Klippenstein’s side here. He appears to be unbanned now.
January 17, 2025 · Original source
Gwern’s longstanding theory that Musk is bipolar (I keep objecting to this because he doesn’t show the right kind of mood shifts; a single shift from a steady state age 0-40, to a different but worse steady state in his fifties is, if anything, even weirder).
February 20, 2025 · Original source
St. Madeline Medianus had many interesting opinions on AI safety, but nobody listened because she was ugly, shy, and a bad speaker. She prayed for help, and one night Gwern appeared to her in a dream and told her a personalized supplement stack that would make her beautiful and charismatic. She took the supplements, got invited to all the cool parties, and her theories became the talk of the town. But she realized that her beauty and charisma were making people take her too seriously compared to others, so she lowered the dose until she was exactly average-looking and people would update on her opinions exactly the right amount.
July 01, 2025 · Original source
45: Conception beliefs among Australian aborigines. Did you know that pre-contact aborigines didn’t know that sex caused conception? Wait, no, what if they did know it, and were just pretending not to? Wait, no, what if they sort of deliberately suppress the knowledge as a way of defusing paternity conflicts after traditional wife-swapping rituals? Wait, no, what if it’s racist to accuse aborigines of not understanding something obvious like this? Wait, no, what if it’s ethnocentric to call Western beliefs “obvious”? Wait, no! . . . this comes from a book called Arguments About Aborigines, which I have ordered on the strength of this chapter.
56: What Children Fear, h/t Gwern:
July 08, 2025 · Original source
We generate 10 images for each prompt, just like DALL-E2 does. If at least one of the ten images has the scene correct in every particular on 3/5 prompts, I win, otherwise you do. Loser pays winner $100, and whatever the result is I announce it on the blog (probably an open thread). If we disagree, Gwern is the judge.
The bet is now over, and official judge Gwern agrees I’ve won. Before I gloat, let’s look at the images that got us here.
Edwin is presumably still on his yacht, but original contest judge Gwern gave it his seal of approval, saying:
August 04, 2025 · Original source
2: Lighthaven (the rationalist community campus in Berkeley) is hosting Inkhaven - a blogging bootcamp aimed at people who want to blog more but struggle with motivation. Selected fellows will live on site for the month of November, and write one blog post per day or else be kicked out. There will be some mentors around including Gwern, Scott Aaronson, and me. I don’t want to over-endorse this - I have no idea whether it will create any kind of lasting motivation or tendency that sticks around after the program, for most people blogging is a low-reward activity, and the cost is pretty steep - but I think it’s a good experiment for Lighthaven to try, and trust potential applicants to make good choices for their own situation. Cost is $2,000 (program only) to $3,500 (program plus housing for one month) to $4,700 (program _ housing + meals). Some financial assistance available. Apply here. And yeah, they should have called it “Writehaven”.
September 04, 2025 · Original source
Note: percentages are of total, not of each row! 29: Related: social science team proposes a three-stage model of secularization: decreased public ritual participation → decreased personal importance → decreased identification, presents apparently confirmatory data. If true, would be somewhat inconsistent with intellectual models (eg people learn about evolution and start doubting the Bible) and more consistent with institutional models (eg the government provides welfare so people no longer need to be part of a tight-knit church). 30: Navigating LLMs’ spiky intelligence profile is a constant source of delight; in any given area, it seems like almost a random draw whether they will be completely transformative or totally useless. Now Ethan Strauss reports that they are, for some reason, extraordinarily effective at teaching people golf. “I am predicting the Golf Revolution, or perhaps decline, if your perspective is that optimization tends to ruin hobbies. A sport for obsessives has been gifted the ideal tool for refinement.” 31: Claim (via nxthompson on X): “In a huge survey of young kids about phones and technology, they all say they want to be out playing in the real world. But parents don't let them out unsupervised. So they're stuck on their phones.” Interesting, but I’m nervous about social desirability bias - how many adults would say on a survey that they would rather be on their phones than playing with friends? But adults do have this choice and mostly go with the phones. 32: Steven Adler on AI psychosis. He tries to analyze ER admissions data for psychosis and finds no change. I don’t think anyone reasonable expected this to be a large enough effect to show up in ER admissions data, but there are lots of unreasonable people so I appreciate his effort. He thinks AI companies might have better data on this, and encourages them to release it. 33: Cuartetera was the greatest polo horse ever. Polo players responded in a very practical way: they cloned her, dozens of times (and it worked; the clones are also excellent). Now there is a lawsuit as different polo teams fight to get their hands on Cuartetera clones. What is the equilibrium? If the outsiders get their hands on the genetic material, do we see a world where every polo horse is a Cuartetera clone? How much is lost if nobody ever tries to breed a polo horse better than Cuartetera (since the economics might not check out if the odds of success for any given foal is too low)? H/T Gwern and Siberian Fox (on X). 34: Claim: as of 2013, India’s Agarwal caste, who make up less than 1% of the population, got 40% of the e-commerce funding. 35: Owlposting: What Happened To Pathology AI Companies? Pathology is a medical specialty. A typical task involves looking at a microscope slide full of cells and trying to determine if any of them are cancerous. This seems like a good match for AI - and for years, studies have been showing that in fact AI can equal human experts. So why isn’t it being used more? The author’s three answers: first, slide scanning is expensive and clunky, and you can’t apply AI to a slide until you digitize it. Second, it’s hard to figure out a business plan where this saves someone money and doesn’t step on the toes of big companies that can outcompete anyone they don’t like. Third, pathologists use the context of a patient’s entire clinical history when they interpret a slide, and AIs that can’t do that (either because of technical limitations or legal/privacy limitations) are at a disadvantage even if their skills specifically relating to slide-reading are better. 36: Noahpinion: Will Data Centers Crash The Economy? Suppose that AI is a bubble, either permanently (because the technology isn’t really transformative) or temporarily (because it can’t transform things quickly enough to keep up with all the dumb money pouring into it). Will the sudden write-off of data centers lead to a broader economic collapse? In 2001, the dot-com bubble harmed the tech sector, but didn’t take the rest of the economy down with it; in 2008, the subprime mortgage bubble did take the rest of the economy down with it, because it damaged banks that the whole economy relied on. The optimistic case for AI is that data center spending is mostly coming from big companies like Google and Meta that can absorb a lot of loss. The pessimistic case is that some of the money is coming from private credit, a new-ish form of finance which hasn’t really been stress-tested and whose failure modes are still poorly understood. Noah’s final verdict: the stage isn’t obviously set for a crisis yet, but there’s the potential to get there and we should consider acting (how?) early. 37: The latest Twitter talking point is that universal hepatitis B vaccination at birth is “woke”: Hep B is (aside from mother-to-child transmission) often sexually transmitted, slutty women’s children are more likely to have Hep B, so perhaps giving the vaccine to everyone (instead of testing and only giving to the children of women who test positive) is an attempt to spare slutty women the embarrassment of getting a positive test. Ruxandra Teslo provides the counterargument - Hep B tests take a while, the medical system is fragmented, and any attempt to test people and then give the vaccine inevitably leads to many positive tests falling through the cracks. Vaccinating at birth is easy and hard to screw up, the vaccine has no known side effects, and empirically child Hepatitis B rates go down (by as much as 2/3!) when countries switch from test-and-vaccinate to universal vaccination. This benefits everyone - even people who never have unprotected sex and always follow up on their medical tests - because toddlers in daycare exchange saliva copiously, and if your toddler exchanges saliva with a Hep B positive toddler they could get the disease. A funny Twitter interaction was seeing Republicans in Congress hop on the anti-slut anti-vaccination bandwagon - except for Senator Bill Cassidy (R-Louisiana), who happens to be a liver doctor, and who is still fighting the good fight. I am always nervous when a good person who I like starts engaging on Twitter, since it elevates the discourse there but also gradually turns their brain into mush - but Ruxandra has made the leap and is doing a great job not just on bio related topics but also (for example) countering Curtis Yarvin on the history of her native Romania. 38: The response to GPT-5 was confusing; most specific people who reviewed it said they were impressed (Ethan Mollick, Tyler Cowen, Nabeel Qureshi, Taelin), it performed as expected on formal benchmarks, but the overall vibes declared it a big failure. Peter Wildeford speculated that maybe there was some kind of sinister pay-to-play early access bias involved. Zvi went the other way, calling it a “reverse DeepSeek moment” (insofar as DeepSeek was a pretty average model that got glowing praise.) In the end, I agree with Peter that this was mostly a branding issue. o3 was a genuinely revolutionary model; if OpenAI had called it “GPT-5”, it would have met expectations. Instead, they called it “o3”, and called a minor incremental update a few months later “GPT-5”. Then people got mad that the exciting-sounding “GPT-5” was merely an incremental update. A secondary issue was that the router wasn’t very good, and so many queries got routed to a small version without thinking mode that was if anything a downgrade from o3. I think this tweet by Shakeel perfectly encapsulates the essence of GPT discourse in two sentences: …but maybe it’s worth asking why GPT-5 isn’t bigger than o3. Was 4.5 a failed attempt at scaling? Did it fail in a way that sort of back-handedly justifies the “lost steam” take? Does the answer depend on distinctions between pre-training scaling, post-training scaling, etc? How? 39: This month in etymology: did you know that “oy vey” is a “fully Germanic phrase” which is cognate with English “oh woe!” (h/t Wylfcen on X) 40: mRNA shows promise to be a game-changing treatment for cancer, but RFK is trying to halt research. But so far he can only starve it of money, not ban it, and the funding gap is only $500 million. Will there be enough philanthropic billionaires and private foundations to step up? Zvi points out that although there is usually a game of chicken where foundations are hesitant to touch something the government cancelled lest the government decide it can cancel everything and hope philanthropists pick up the bill, in this case there are no game theory considerations - RFK is halting it because he genuinely wants it halted, and they are thwarting him rather than playing into his hands. The only problem is that $500M is a lot of money for the private sector; a few foundations could technically afford it, but not many could afford it comfortably and still have money left over for the next few crises of this magnitude. I hope someone is trying to organize a coalition. 41: AI fantasy flash fiction Turing test. Eight stories about demons, four by famous fantasy authors, four by ChatGPT. After 3000 votes, AI wins: humans can't tell the difference and slightly prefer the AI stories. My own score was only 75%. But I will say that I thought Mark Lawrence's was obviously the best, I was ~100% sure it was human, and it convinced me that regardless of the official results it's still possible to write flash fiction that an AI obviously can't do. 42: “SignPro” offers customized “In This House We Believe” signs, try not to use this for evil. 43: China think tank assessment of how in control Xi is: still very in control, maybe not infinitely in control. 44: Related - did you know (h/t xlr8harder) that if you ask AI to write a science fiction story, it will very often name the protagonist “Elara Voss” (or some very close variant like Elena Voss), and this remains true across various models and versions? Related: Chelsea Voss of OpenAI is having a baby and has the opportunity to do the funniest thing. 45: “Hector (cloud) is a cumulonimbus thundercloud cluster that forms regularly nearly every afternoon on the Tiwi Islands in the Northern Territory of Australia…[he is sometimes called] Hector the Convector”. 46: British allergy sufferers who want to know the ingredients of things demand that British cosmetics stop listing their ingredients in Latin. “For example, sweet almond oil is Prunus Amygdalus Dulcis, peanut oil is Arachis Hypogaea, and wheat germ extract is Triticum Vulgare.” 47: Text-based RPG about being an NYT journalist at the Manifest prediction market conference. I make a brief appearance. 48: Study uses supposedly-random variation in doctor assignments to test whether the marginal mental health commitment is good or bad for patients, finds that it is quite bad. Freddie de Boer is violently skeptical (maybe literally so?) and makes some good points about how a single quasi-experimental study is never absolute proof. But I don’t think he quite justifies his opinion that the paper was irresponsible and should never have been published; it’s just a normal quasi-experimental study that we should nod and say “huh” at but not overweight as the culmination of all possible research that overcomes all possible priors. My prior is that the marginal commitment is pretty useless (many commitments are just “well, since this person arrived at our ED for some reason, it would look bad from a medico-legal perspective to just let them go, so let’s keep them a few days to evaluate” - and yeah, you should be upset about this) but I’m still surprised by how many outright negative (as opposed to zero) effects the researchers found. The strongest argument for negative effects is that it will make some people miss work and maybe lose their job. But this study found that commitment ~doubles the risk of near-term suicide (admittedly only from 1% to 2%), which would have been outside my confidence intervals for how bad it could be. I suspect confounding, but only on general principle, and I wouldn’t be too surprised either way. 49: This tweet is probably bait, but I found it a thought-provoking question: I think there’s a boring answer, where the law is more complex than just a single number and whatever kind of weird trafficking Epstein was doing is worse than whatever normal relationships these European laws are permitting. But assuming that there’s a substantive difference even after taking that into account, I think my answer is something like - we’ve got to divide kids from adults at some age, there’s a range of reasonable possible ages, we shouldn’t be too mad at other societies that choose different dividing lines within that range - but having decided upon the age, we’ve got to stick with it and take it seriously (in the sense of penalizing/shaming people who break it). This is more culturally relativist than I expected to find myself being, so good job to Richard for highlighting the apparent paradox. 50: Dilan Esper describes his experience as one of Hulk Hogan’s attorneys in the Gawker lawsuit (X). Parts I found interesting: none of the lawyers knew Thiel was funding the lawsuit; Gawker probably could have won if they had been slightly competent but kept "shooting themselves in the foot"; and Gawker probably could have won if they had just pixelated the private parts in the video. 51: Amazing concept and poems (link on X): I tried to see if AI could do this, and it did something that technically met the requirements but had zero artistic merit - using a lot of words like “nowhere” and “outside” in one, then separating them out to “no where” and “out side” in the other. I didn’t invest much energy in creating a clever prompt telling it not to do that, so feel free to report if you get better success. 52: New study claims consultants are actually good, at least for profits: "We find positive effects on labor productivity of 3.6% over five years, driven by modest employment reductions alongside stable or growing revenue" 53: A Polish team tries to test Peter Turchin’s equations for predicting political unrest on recent Polish history, has to make some changes but claims mostly positive results. 54: New big multi-author Substack, The Argument, trying to be a sort of center-left version of the model pioneered by The Free Press and other high-production-value ideological Substack properties. Excited to see Kelsey Piper is involved, and she starts off strong with a post on the latest round of First World basic income studies, which find few positive effects. This is surprising, because recipients didn’t waste the money on alcohol or gambling or anything - they paid down debt and got useful goods. Still, it didn’t even affect things that should have been obvious, like stress level. It’s not even clear that amounts of money large enough to help with rent made homeless people more likely to get houses! Matt Bruenig criticizes the article, accusing Kelsey’s studies of being downstream of Perry Preschool style dreams that exactly the right welfare program will have massively compounding effects that cut poverty out at the root and turn everyone into elite human capital; he thinks giving people money won’t do this, but it will increase equality and give the poor better lives. I assume he’s not a strong hereditarian, but his argument makes even more sense from that perspective, and I’ve certainly criticized dumb outcome measures like infant brain waves which we have only tenuous reasons to think are related to anything we care about. But Kelsey reasonably responds that the outcome measures she’s talking about include stress level and life satisfaction. To defuse this critique, Bruenig either has to argue that our construct “life satisfaction” doesn’t really measure whether someone’s life is satisfactory, or else claim that giving poor people satisfactory lives isn’t really what we’re going for - which I think would require more explanation on his part. There’s some further (impressively acrimonious) debate on X, but I don’t see anything that addresses my core concern. GiveDirectly, a charity involved in basic income experiments, has a presponse here; they say that some studies are positive, and that the ones that aren’t might have tried too little cash to matter, or been confounded by COVID making everything worse. They also point out that basic income is harder to study than traditional programs like giving people housing, because if you’re giving housing you can measure housing-related outcomes directly and have a pretty good chance of getting enough statistical power to find them, but since everyone spends cash on different things, the positive effects might be scattered across many different outcomes (and therefore too small to reach significance on each). Everyone involved in this debate wants to emphasize that the poor results are for First World studies only, and that studies continue to show large benefits to giving cash in the developing world. 55: Related: I was less impressed by The Argument’s first foray into housing policy, which follows an all-too-familiar pattern: Some people say they don’t like noise and disorder and try to make rules against it in their apartments.
October 17, 2025 · Original source
Edward writes “I feel like I learned a ton about writing a good review from the feedback I got last year. Particularly the comment from Gwern you highlighted in the open thread afterwards. I don’t think I could have written this review the way it ended up without the harsh feedback I got last year.” I hadn’t realized you could actually learn things from people’s mean online comments, so I’ll have to go back and read all of yours on all my posts and see if there’s anything useful there.
November 03, 2025 · Original source
American Scholar has an article about people who “write for AI”, including Tyler Cowen and Gwern. It’s good that this is getting more attention, because in theory it seems like one of the most influential things a writer could do. In practice, it leaves me feeling mostly muddled and occasionally creeped out.
January 26, 2026 · Original source
1: Inkhaven was a blogging residency/bootcamp/program in Berkeley last November. The conceit was that residents had to write one post per day for thirty days, or else get kicked out without a refund. I ran some sessions, and so did other people you might recognize like Gwern, Zvi, Ozy, Aella, and Scott Aaronson. People seemed to like it (average rating 8/10, see also reflections here, here, here, here, here, here, here, here, etc; when you make forty people write every day, you sure do end up with a lot of written reflections on the experience). They’re doing it again this April, and you’re invited to apply. You’ll need ~$3,500 (some scholarships available) and a month free. I plan to help again. Application deadline March 1.