Sasha Gusev
Article
Sasha Gusev is a recurring person in the Astral Codex Ten archive, appearing 11 times across 11 issues between July 24, 2024 and February 05, 2026. The archive places it in contexts such as “geneticist Sasha Gusev has a critique”; “Sasha Gusev has written an argument that twin studies are inaccurate”; “I linked Sasha Gusev’s No, Intelligence Is Not Like Height”. It most often appears alongside Substack, US, Cremieux.
Metadata
- Category: People
- Mention count: 11
- Issue count: 11
- First seen: July 24, 2024
- Last seen: February 05, 2026
Appears In
- Links for July 2024
- Links For September 2024
- Links For November 2024
- Missing Heritability: Much More Than You Wanted To Know
- Highlights From The Comments On Missing Heritability
- Open Thread 390
- Suddenly, Trait-Based Embryo Selection
- Links For September 2025
- Links For October 2025
- The Good News Is That One Side Has Definitively Won The Missing Heritability Debate
- Links For February 2026
Related Pages
-
- Substack (6 shared issues)
-
- US (6 shared issues)
-
- Cremieux (5 shared issues)
-
- OpenAI (5 shared issues)
-
- Twitter (5 shared issues)
-
- Anthropic (4 shared issues)
-
- Elon Musk (4 shared issues)
-
- Gusev (4 shared issues)
-
- IQ (4 shared issues)
-
- Richard Hanania (4 shared issues)
-
- ACX (3 shared issues)
-
- Alex Young (3 shared issues)
External Links
Source Context
Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.
30: Related: geneticist Sasha Gusev has a critique of (existing) polygenic embryo selection. He thinks it has medium ability to select against “threshold” traits like disease (10% reduction by avoiding high-risk embryos, ~50% by choosing the lowest-risk) and (what he describes as) relatively low ability to select along “continuous” traits like IQ (+4 points if you’re lucky, though I know other people working on this who say +6). I think these are the right numbers, but he’s underestimating how much you should want an extra 4-6 IQ points - something I would gladly take over a 50% absolute reduction in hypertension risk or whatever. And I would very gladly take it over the alternative of not doing polygenic screening at all and getting nothing.
Inline links: a critique of (existing) polygenic embryo selection
34: Sasha Gusev has written an argument that twin studies are inaccurate and the heritability of IQ is much less than previously believed. It’s much better than the average obviously-dumb-and-motivated post trying to argue this, and contains lots of arguments I hadn’t seen before. See also subreddit comments on original, Gusev’s responses, comments on responses. My impression is that there’s enough circumstantial evidence (eg adoption studies) that this probably has to be wrong, but I don’t think any of the arguments against it land, and I don’t know enough statistical genetics to critique it myself. I’d be interested in seeing one of the more mathematically-inclined pro-heritability people (@gwern? ? ? ? @Gene Smith?) give their impression.
46: In 2022, I wrote Whither Tartaria, where I asked why ornate classical styles switched to more austere modernist styles around 1900 - 1950 in a variety of different arts (painting, architecture, literature, poetry, etc). I proposed seven theories, but was unsure which if any were true. Since then, Samuel Hughes of Works In Progress has been investigating. In May, he wrote a well-researched article showing that it wasn’t just increasing cost, because ornate classical architecture now costs less than ever. Now in a new article he demolishes a different theory - it’s not just decreasing cost (and subsequent lack of ability to signal wealth) - because costs didn’t decrease in several other arts, and the change was led by artists with rich people as reluctant followers. He concludes: Modernism may well be a status game of some kind; it may well signal taste more than it signals wealth; and this latter feature may be one of the things that distinguishes it from older artistic styles. But the mechanism by which this change came about must be different to the one Alexander describes. 47: Sort of kind of related - When Hamilton Lost Its Snob Appeal. The musical Hamilton was briefly an artistic/cultural phenomenon, but tastemakers eventually switched to making fun of it. Why? Rob Henderson says it happened after ticket prices came down and the common people could enjoy it. I disagree: everyone I knew who was into Hamilton got into it from the free online soundtrack long before they’d seen the show; I think this is more likely the usual fad cycle where anybody who’s too into yesterday’s fad is behind the curve and therefore uncool. 48: Related: Why are people such jerks to public intellectuals? And more. I agree this is a great mystery. 49: Some prominent Substack psychiatrists doing a video Q&A, submit your questions here. 50: Naomi Kanakia: The Literacy Delusion had a number of explanations for why reading books seemed to be so much worse for human beings (in terms of emotional wellness and productivity) than other forms of narrative entertainment, but its main theory was the integration hypothesis. That the stream of words in a book trained the human brain into a habit of self-consciousness, that reading books forced human beings to think of themselves as a stream of text, processed through time, making a coherent argument of some sort. And that this overall flattening effect forced readers to ignore aspects of their personality or their situation that were not otherwise in line with the overarching story they'd created about themselves. Basically, reading books causes repression and neurosis. The Literacy Delusion argued that, yes, human beings are storytelling machines, but that a stream of written text is a particular kind of story—a story that is particularly flat, particularly devoid of conflicting or harmonizing information—and that this flatness creates a peculiar effect on the human brain. 51: Last month, I linked Sasha Gusev’s No, Intelligence Is Not Like Height and asked people who disagreed to share their arguments; they sure did. First, several people pointed me to a new preprint, Family-GWAS Reveals Effects Of Environment And Mating On Genetic Associations, which finds that one of the main papers Gusev cited to make his case, Howe 2022, made a mistake - imputing sibling genotypes using a process designed for non-sibling genotypes - and that once that mistake is corrected, the finding disappears and intelligence and height appear similar. Second, Joseph Bronski has a more specific post where he responds to Gusev’s points one by one. He accuses Gusev of “[making] up his own chart to remove the error bars [from the originals], to obscure the fact that the study found no evidence for this in IQ”, and says that the cases where he didn’t do that are just “population stratification and range restriction”. Third, Noah Carl at Aporia, instead of writing a direct response like Bronski, argues that the usual method of attacking twin studies is obsolete; not only have the most-debated assumptions behind twin studies been thoroughly validated, but there are now other lines of evidence besides twin studies which confirm high IQ heritability. Fourth, Leonardo Parro (not framed as a response to Gusev) goes into more depth about one of those ways, a “pedigree-based analysis” demonstrating heritability of 54 - 69%, ie no “missing heritability” compared to twin studies. He summarizes this as the effect of “rare variants” compared to the usual SNPs - ie if you only look at the most common genes that are easiest to find, you get “missing heritability” compared to twin studies, but if you widen your search to rare genes that are hard to find, you don’t. 52: Extremely related: Heliospect is a startup promising polygenic selection for IQ and other traits; they were trying to stay in stealth mode but The Guardian spied on them and nonconsensually revealed their existence. The discussion on the r/ssc subreddit centered on their claim that (given enough embryos to choose from) they could increase a baby’s expected IQ by 6 points (I’ve also heard 7.5). Sasha Gusev had previously argued that current technology maxed out at 3.5 and future technology would max out at 6, so a claim of 6 - 7.5 is pretty extreme; Gwern, who wrote the pioneering analysis of this technology, was also skeptical. But Heliospect says they’ve got better predictors than academia that use the rare variants everyone else misses; after talking to the company, Gwern retracted his objections and says he finds their claim “pretty plausible”. Local ACX commenter geneticist Gene Smith also redid some calculations, changed his mind, and says “probably pretty realistic”. I find this interesting not just because of the polygenic selection angle, but because if Heliospect is right then their predictor is able to predict more genetic IQ than the “missing heritability” people believe exists, and it should be able to put this argument to bed once and for all. 53: This month in censorship: X/Twitter banned journalist Ken Klippenstein for sharing the Trump campaign’s dossier on JD Vance. Twitter’s side of the story is that the dossier was probably originally stolen by Iranian agents and they don’t want to support that kind of thing by letting people signal-boost the illicitly obtained goods; you can read Klippenstein’s side here. He appears to be unbanned now.
Inline links: Whither Tartaria, a well-researched article showing, in a new article he demolishes a different theory, When Hamilton Lost Its Snob Appeal, Why are people such jerks to public intellectuals?, more, Some prominent Substack psychiatrists doing a video Q&A, submit your questions here, Naomi Kanakia, No, Intelligence Is Not Like Height, Family-GWAS Reveals Effects Of Environment And Mating On Genetic Associations, Howe 2022, a more specific post where he responds to Gusev’s points one by one, argues that the usual method of attacking twin studies is obsolete, goes into more depth about one of those ways, The Guardian spied on them and nonconsensually revealed their existence, discussion, previously argued, the pioneering analysis of this technology, Gwern retracted his objections, says, polygenic selection, banned journalist Ken Klippenstein, you can read Klippenstein’s side here., appears to be unbanned now
Maybe there are genes we haven’t found yet For most of the 2010s, hypothesis 2 looked pretty good. Researchers gradually gathered bigger and bigger sample sizes, and found more and more of the missing heritability. A big 2018 study increased the predictive power of known genes from 2% to 10%. An even bigger 2022 study increased it to 14%, and current state of the art is around 17%. Seems like it was sample size after all! Once the samples get big enough we’ll reach 40% and finally close the gap, right? This post is the story of how that didn’t happen, of the people trying to rehabilitate the twin-studies-are-wrong hypothesis, and of the current status of the debate. Its most important influence/foil is Sasha Gusev, whose blog The Infintesimal introduced me to the new anti-hereditarian movement and got me to research it further, but it’s also inspired by Eric Turkheimer, Alex Young (not himself an anti-hereditarian, but his research helped ignite interest in this area), and Awais Aftab. (while I was working on this draft, the East Hunter Substack wrote a similar post. Theirs is good and I recommend it, but I think this one adds enough that I’m publishing anyway. You can see Gusev’s response to East Hunter here) In an interview with Aftab, Gusev explained his philosophy like so (I am excerpting heavily from a long interview and editing for flow/emphasis; completionists should read the whole thing): For teacher-reported ADHD, the twin heritability estimate was 69% while the GWAS-based heritability estimate [ie using genome-wide association studies where researchers actually try to find the genes involved] was just 5%; with similar gaps for other behavioral traits. These are huge differences! If we believe the twin study estimates, then this gap implies that there is a lot of causal genetic variation out there that GWAS/molecular data is not picking up. One way to think about this is that traits that are under stronger natural selection will have more of their genetic variants driven to low frequency, and thus less detectable by GWAS. So a big gap between GWAS and twins could imply that rare variants are very important due to strong selection. On the other hand, if we are skeptical of the twin study estimates, then this gap implies a substantial contribution from those environmental complexities I talked about previously. For a long time, the field of molecular genetics was operating under the assumption that the missing heritability was largely in the rare variants we had not yet measured. But a number of recent advances have started to tip the scales against that argument. First, some of the earlier molecular heritability estimates were found to be inflated by some mix of technical issues and cultural transmission, so the amount of missing heritability actually increased. Second, a new model was developed that could estimate total direct heritability using molecular data from mother-father-child trios, with very few model assumptions (the title literally states “… without environmental bias”; Young et al. 2018), and it too found estimates that were substantially lower than twins on average. Third, several studies have now actually measured the influence of rare variants in various forms, and they are so far not adding up to explain as much as we would expect from twin heritability estimates. Fourth, there is little evidence of the strong natural selection that would be needed to generate a massive trove of rare variants untagged by GWAS. I am a molecular geneticist, and this drumbeat of evidence from molecular data has convinced me that twin studies are either 2-3x inflated or estimate something fundamentally different from direct heritability. We’ll start by looking at Gusev’s first claim: that “earlier molecular estimates” (ie polygenic scores) are significantly inflated, or at least don’t mean what we thought they meant. This won’t be directly relevant to our question - even our original number of 17% implies missing heritability2, so moving it down a bit to 5-10% or up a bit to 20% doesn’t add or subtract from the fundamental mystery. But this discussion has gotten a lot of people extremely confused, and we’ll need to deconfuse ourselves if we’re going to get any further. Are Most Current Polygenic Scores Confounded? A polygenic score is one possible result of a genome-wide association study. These scores are algorithms which take a person’s genes as input and return information about their traits as output. Better polygenic scores can predict a higher percent of variance in a certain trait. For example, the latest polygenic score on educational attainment can predict up to 17% of the variance in how much schooling someone completes. Predictive power is different from causal efficacy. Consider a racist society where the government ensures that all white people get rich but all black people stay poor. In this society, the gene for lactose tolerance (which most white people have, but most black people lack) would do a great job predicting social class, but it wouldn’t cause social class3. It certainly wouldn’t be a “gene for social class” in the sense where it controls the part of your brain that helps you manage money, or where genetic engineering on this gene would make people richer. Here are three common ways that not-directly-causal genes can show up as predicting a trait: Population stratification: genes are linked to culture, and culture determines the trait, as in the racism-lactose example above. Many studies naturally mitigate this concern by using the UK Biobank of mostly white British samples, and by correcting for “principal components” that correspond to ancestry (and there are other, even more complicated ways to correct for this). But ancestry variation is fractal; no matter how uniform your sample, there will still be micro-differences you didn’t consider. For example, if you’re analyzing the educational attainment of white British people, it’s very relevant that families with Norman surnames still outperform their Saxon peers at Oxbridge admissions 900 years after William the Conqueror. If Britons with more Norman ancestry have non-education-related genes that their Saxon peers lack, these could be mistakenly classified as genes for education or other behavioral differences between the two groups. Assortative mating: Suppose that both height and wealth are desirable qualities in a mate. Then tall people will tend to marry rich people, and over generations, the same people will be both rich and tall. That means that even if wealth is 0% genetic, a study looking for “the gene for wealth” will be able to find genes that rich people have more often than poor people - namely, the genes for height. Or suppose that smart people tend to marry other smart people - surely true, if only because so many couples meet at college. Then all the intelligence genes will concentrate in the same people. So any study that tries to determine how much Intelligence Gene ABC affects intelligence will get inflated4 results, because everyone with Intelligence Gene ABC will also have many other intelligence genes - if the study naively asks “How much smarter are people with Gene ABC than people without it?”, it will find they are much smarter (because it’s accidentally including part of the effects of all the other intelligence genes that travel along with it). Parent-to-child transmission, aka “genetic nurture”: Children tend to share their parents’ genes. So if there’s a gene that causes parents to create a certain kind of childrearing environment, and that childrearing environment affects a trait, it will falsely look like a gene that directly causes the trait. Suppose Gene XYZ causes parents to read more books to their children, and reading books to children increases their IQ. Parents with Gene XYZ will tend to read books, so their kids will get high IQ. Those kids will also (probably) inherit Gene XYZ from their parents. So people with Gene XYZ will tend to have higher IQ. If you naively study which genes increase IQ, you’ll see Gene XYZ in more smart people than dumb people, and think it’s a “gene for IQ”. This is “causal” in a certain sense, but it’s not the one we traditionally think about, and it behaves importantly differently - for example, if you genetically engineer someone to have Gene XYZ, their IQ won’t go up (although their kids’ IQs might). How can we tell if a polygenic predictor is “direct” vs. confounded by these non-causal pathways? The most common technique is within-family comparisons: do the traditional “check if people with the gene differ on a trait from people without the gene” study, but limit its focus to (for example) sibling pairs. Suppose a couple has two children; the first child inherits Gene ABC and the second one doesn’t. If the first child is smarter than the second child, that provides some infinitesimal evidence that Gene ABC is a gene for intelligence. Repeat this process over hundreds of thousands of sibling pairs, and the infinitesimal evidence can reach statistical significance. Since the family unit is a perfect natural experiment that isolates the variable of interest (genes) while holding everything else (culture and parenting) constant, within-family results are protected against stratification, assortative mating, and genetic nurture effects. The culmination of this research program is Tan et al 2024, which finds that many polygenic predictors lose significant accuracy when retested among siblings. For example, educational attainment is 50% uncorrelated with direct genetic effects. You need to square this to figure out what percent is causal; when you do that, you find that the polygenic score that explained 14% of EA is only 4%pp direct genes, with the other 10%pp being nondirect5 confounders. So yes, it seems like most polygenic scores that don’t validate within families are confounded. However unhappy we previously were that we had only found 14% of genes for EA (vs. 40% expected), we should now be much more unhappy - we really only know 4% of genes that directly cause EA. On the other hand, you might say - so before we only knew 14%pp out of 40%. Now we only know 4%pp out of 40%. This is discouraging, but it doesn’t fundamentally change what we know about nature vs. nurture. Both 4%pp and 14%pp are less than 40% - with either number, we must be missing something or doing something wrong. Probably that’s insufficient sample size. We’ll keep working on sample size and other things, and eventually scrounge up the missing 26%pp or 36%pp or whatever of the variance, so this doesn’t change anything. All it means is that one predictive method that the average person never knew about in the first place doesn’t work as well as we thought. Who cares? Not doctors. So far this research has only just barely begun to reach the clinic. But also, all doctors want to do is predict things (like heart attack risk). They don’t care if they use causal vs. nondirect genes. It doesn’t matter if you’re “only” at higher risk of heart attack because you’re black, or Norman, or because your parents read books to you - you still need more heart attack medication! Polygenic embryo selection companies should care. They offer polygenic scores that can be used to select healthier or smarter embryos. If the predictors they use rely partly on variants that aren’t causal within families, their real benefits could be far lower than advertised. I talked to one of these companies, who said they’d already adjusted for these effects and expected their competitors had too - the proper antidote to this problem, sibling controls, is a natural choice when you’re literally picking between siblings. The biggest losers are the epidemiologists. They had started using polygenic predictors as a novel randomization method; suppose, for example, you wanted to study whether smoking causes Alzheimers. If you just checked how many smokers vs. nonsmokers got Alzheimers, your result would be vulnerable to bias; maybe poor people smoke more and get more Alzheimers. But (they hoped) you might be able to check whether people with the genes for smoking get more Alzheimers. Poverty can’t make you have more or fewer genes! This was a neat idea, but if the polygenic predictors are wrong about which genes cause smoking and what effect size they have, then the less careful among these results will need to be re-examined. But the reason I spent so much time on the subject here is that this has confused a lot of people into thinking heritability itself was confounded and is actually just 4%. When I read my first few blog posts on these findings, I came away thinking they were claiming to have discredited twin studies and heritability. And although I take partial ownership of my own poor reading comprehension, I maintain that the way that the new anti-hereditarians discuss this is pretty bad. For example, Turkheimer’s treatment of the Tan study above is called Is Tan Et Al The End Of Social Science Genomics?, and includes passages like: The median [direct genomic effect] heritability for behavioral phenotypes is .048. Let that sink in for a second. How different would the modern history of behavior genetics be if back in the 80s one study after another had shown that the heritability of behavior was around .05? When Arthur Jensen wrote about IQ, he usually used a figure of .8 for the heritability of intelligence. I know that the relationship between twin heritabilities and SNP heritabilities is complicated, and in fact the DGE heritability of ability is one of the higher ones, at .2336. But still, it seems to me that the appropriate conclusion from these results is that among people who don’t have an identical twin, genomic information is a statistically non-zero but all in all relatively minor contributor to behavioral differences. And comments included things like: I don’t know if [this study] is the end of social science genomics, but it should certainly be the end of attributing significant genetic influence to behavioral traits (despite the recent scientist-generated cartoons touting genes for “income”). And: There's no doubt that this reported findings have dealt a fatal blow to my conviction that behavioral traits are pre-eminently heritable…This is a remarkable example of an objective statistical fact mercilessly crushing the more subjective experiential sense of "A looks and acts more like B than C because A and B have the same parents." This subjective evidence is almost unshakable and universal in its application as a tried and tested psychosocial heuristic. And yet, here we are. Turkheimer is either misstating the relationship between polygenic scores and narrow-sense heritability, or at least egging on some very confused people who are doing that, and the dynamic was bad enough that I got confused myself for a while. But even more confusing, the new anti-hereditarians actually are saying that lots of behavioral traits have very low heritability! But this point requires different arguments, only tangentially related to these. So let’s move on to… Is Heritability Genuinely Low? (Part 1: GWAS & GREML) In the mid 2010s, when genome-wide association studies (GWAS) based polygenic predictors were getting better every year, it was easy to hope they might reach 40% and close the “missing heritability”. But since then, progress has stalled. The second-to-last tripling of sample size, from 300K to 1M between 2016 - 2018, increased predictive power from 6% → 12%. The last tripling, from 1M to 3M between 2018 - 2022, only increased predictive power from 12% → 14%. If you graph sample size vs. predictive power, it looks like there's an asymptote between 15 - 20% or so. (of which - remember - only 5% is directly causal!) Worse, a mid-2010s technique called GREML allowed researchers to estimate the percent of variance in a trait that comes from the sorts of common genes studied in GWAS, without having to identify the genes involved. A 2016 GREML paper suggested that the maximum share of variance that GWASs of educational attainment could ever discover was about 21% (again, compared to 40% predicted genetic from twin studies). Since unavoidable methodological issues will prevent GWASs from reaching the literal maximum possible, this agrees with the evidence suggesting an asymptote between 15 - 20%. So either twin studies are wrong and traits are less heritable than believed, or the heritability must lie somewhere other than the common genes identifiable by GWAS. What about rare genes? GWASs focus on genetic variation common enough to be worth including in a basic genetic test. Most of this is single nucleotide polymorphisms (“SNPs”). A single nucleotide is one letter of DNA - for example, a C or a G. Polymorphisms are genes that commonly vary in humans - sometimes across races (for example, some humans have a gene for light skin, and other humans have a gene for dark skin), and other times within races (for example, some white people have a gene that makes cilantro taste like soap, and others don’t). So SNPs are single-letter spots in DNA where different people often have different letters. How often? Some people say 1%, but the more practical definition is “often enough that someone has noticed and added it to the test panel”. There are three billion letters in the genome, of which only a few million are commonly-tested SNPs. But these SNP studies have limited7 ability to measure personal mutations and rare variants. Sometimes your parents’ egg and sperm cells mess up copying a nucleotide of DNA, and you get a mutation that isn’t inherited from your ethnic group or even from your subgroup/family line - it’s just some idiosyncratic DNA change that you might be the first person in history to have. Since scientists have never seen this mutation before, they don’t know about it and can’t test for it without doing something more expensive than a simple SNP screen. And SNP studies have limited ability to detect anything more complicated than a single letter changing to another single letter. But some mutations are more complicated structural variants. For example, some bits of DNA get stuck on repeat - one person might have GATGAT, another person might have GATGATGATGAT, and a third person might have fifty GATs in a row. Other bits come out backwards. Sometimes a whole chunk of DNA goes missing, or moves to the wrong place. Occasionally a gene reads The Selfish Gene by Richard Dawkins, takes it too seriously, and evolves some ridiculous trick for spamming itself all over the genome. So if even the best molecular studies seem to be asymptoting around 15-20% of variance in educational attainment, but twin studies suggest it’s 40% genetic, might rare variants and structural variants make up the missing 20-25%pp? This remains a topic of bitter disagreement. On the one side, hereditarians bring up a Darwinian argument: imagine a genetic engineer who hopes to find the genes for educational attainment and edit them to make everyone smart and successful. She looks harder and harder, becoming more and more exasperated as they fail to materialize. Finally, she realizes she’s been scooped: evolution has been working on the same project, and has a 100,000 year head start. In the context of intense, recent selection for intelligence, we should expect evolution to have already found (and eliminated) the most straightforward, easy-to-find genes for low intelligence. Therefore, everything left should be convoluted or hidden or impossible to work with. So although this requires a sort of god-of-the-gaps argument - where we keep pushing heritability into whatever genes are too weird for existing techniques to detect - there are some reasons to think God really is in the gaps here. And a 2017 paper uses some clever techniques to estimate the share of intelligence variation lurking in hard-to-measure genes and finds it’s more than half: “By capturing these additional genetic effects, our models closely approximate the heritability estimates from twin studies for intelligence and education.” (see also Wainschtein 2022, Sidorenko 2024) The anti-hereditarians disagree. They cite papers like Zeng which measure the strength of selection on intelligence and suggest that it’s too weak to concentrate so much of the variation in rare genes8. And Sasha Gusev mentions Weiner 2023, which finds that in fact rare variants “explain 1.3% (SE = 0.03%) of phenotypic variance on average – much less than common variants” (other experts say that burden heritability only captures some rare variants and is not the right tool for this problem). But it may not even matter, because another set of findings suggests that heritability is genuinely low even when the rare variants are counted. Is Heritability Genuinely Low? (Part 2: Sib-Regression and RDR) Two newer methods, Sib-Regression and RDR, ask: using what we know from genetic studies, how much genetic variation do we think exists, total, across both common and rare genes? On average siblings share 50% of genes. But there’s a little randomness in meiosis, so some siblings might share 40% and others might share 60%. The more genetic influence on a trait, the more similar sibling pairs who share 60% of their genes will be, compared to sibling pairs who only share 40% of their genes. Since 60%-gene siblings and 40%-gene siblings are both equally part of the same family, you can use these numbers to calculate heritability unconfounded by a range of family factors. This is Sib-Regression. If you do a more complicated statistical process to extend the same idea to relatives other than siblings, it’s relatedness disequilibrium regression or RDR. GWAS asks: Looking at common easy-to-study genes, how much variation in a trait have we explained right now? GREML asks: looking at common easy-to-study genes, how much variation could we ever explain? But sib-regression and RDR ask a question more like twin studies: considering all genes, whether common / rare / easy-to-study / hard-to-study, how much variation is there total? This could address the rare variant objection mentioned above. And in many ways, these techniques are better than twin studies - Sib-Regression eliminates many potential biases, and RDR eliminates even more (although it’s harder to pull off, requiring more genetic information and computational resources). These techniques are new and hard-to-use, and only a few published studies have applied them to the sorts of behavioral traits we’re interested in: Young et al (2018) did Sib-Regression and RDR to genetic data from Iceland. Sib-regression found educational attainment = 40% (±15%) heritable, and RDR found 17% (±9%) heritable. Kemper et al (2021) did Sib-Regression only to genetic data from Britain. It found educational attainment = 14% heritable. This number conflicts with the 40% from the Young paper. Why? Unclear, but it could be selection bias - Young’s Icelandic sample was representative of the country; Kemper’s British population were Biobank volunteers who tend tend to be healthier and higher-class than the population at large. Upper-class people may have restricted range in educational attainment, or different factors affecting their educational attainment compared to the overall population. Either way, these are closer to the low estimates from GWAS and GREML (7% direct, 20% total), than to the higher estimates from twin studies (40%, generally presumed direct). And we can no longer use contributions from rare variants to paper over the difference. So what is going on? It seems like we have to accept one of three possibilities: Either something is wrong with twin studies. Or something is wrong with Sib-Regression and RDR (and then we can explain away GWAS and GREML by saying they’re missing rare variants). Or something is wrong with how we’re thinking about this topic and comparing things. What’s Going On? (Part 1: Is Something Wrong With Twin Studies?) Twin studies have dominated discussion of behavioral genetics for decades, so there’s a vast literature investigating their various assumptions and whether something might be wrong with them. Here are some of the assumptions and what the research says about each. Some of these will be duplicates of the GWAS confounders above, but we’ll go through them again anyway to review how they apply to twins. 1: Parents Treat Fraternal And Identical Twins The Same: Twin studies claim that twins are a uniquely powerful genetic laboratory; both fraternal and identical twin pairs have equally concordant environments, but identical twins have more concordant genes. Therefore, the more similar identical twin pairs are relative to fraternal twin pairs, the more heritable a trait must be. But this conclusion falls apart if identical twin pairs actually have more similar environments than fraternal twin pairs do, maybe because parents (knowing their twins are identical) treat them more similarly than they would fraternal twins. Would-be twin-study-discreditors have been trying to argue that this must be true for decades, but it’s always been a kind of quixotic battle. Remember, twin studies find many behavioral traits like IQ are >60% heritable, so you would need to prove not only that parents treat identical twin pairs differently from fraternal, but that this was an overwhelming effect. Parents of identical twins would have to obsessively expose them to the exact same stimuli in the exact same order; parents of fraternal twins would have to send one to the Gifted Advanced Placement Acceleration program while locking the other in a box and force-feeding them lead pellets. Common sense tells us there are no such differences, and studies confirm this: when parents are wrong about their twins’ status (eg they have fraternal twins, but falsely think they’re identical, or vice versa) their trait similarity matches their real status, rather than the incorrect status that determined how their parents treat them; parental treatment explains less than 1% of why identical twin pairs are more concordant (2, 3, 4). See also Felson 2013, which tries to measure environmental similarity and adjust for it, with minimal effects. Are these two cuties monozygotic or dizygotic? Are you sure? (answer) 2: Fraternal And Identical Twins Have Equally Concordant Uterine Environments: Fraternal twins have different sacs in the uterus and use different placentas. Most identical twins share a placenta, and some share an amniotic sac. If trait similarity is caused by sharing a placenta or sac (maybe because the placenta is defective, the fetal brain is starved of nutrients, and so the person has a lower IQ when they grow up), twin studies would falsely read this identical-fraternal difference as genetic. Luckily this is easy to study; not all identical twins share a placenta or sac, so you can cleanly separate the effect of uterine environment from genetics. If you measure enough traits, you can find small deviations in some, but it’s not clear whether this is just multiple testing, and in any case the deviations are small. The best studies suggest this chips off somewhere between 0 - 3% from heritability estimates9. 3: There is little assortative mating: We discussed this one above in the earlier section on GWAS - smart/pretty/kind/whatever people tend to marry other smart/pretty/kind/whatever people. Why would this bias twin study results? Identical twins share 100% of their genes. Fraternal twins ought to share 50% of their genes - but they get half their genes from their mother, and half from their father. In the degenerate case where the mother and father have exactly the same genes (“would you have sex with your clone?”) even fraternal twins will be extremely similar (although not quite identical, since they’ll get different alleles from each clone). In the more plausible case where mothers and fathers are just a little more alike than chance (eg because smart people tend to marry other smart people), fraternal twins will share a genetic tendency towards a trait somewhat more than their 50% shared genes suggest. Since this makes fraternal twin pairs more (genetically) like identical twin pairs, and twin studies assess heritability as the difference in fraternal-identical-twin-pair concordance, this bias would make twin studies underestimate heritability. But this is the opposite of what you would need to “discredit” twin studies - if this bias is true, then everything is more genetic than twin studies think. And unlike the previous two biases, this one seems real and important, so much so that when you adjust for it, the heritability of educational attainment rises from ~40% to ~50%. I’m only mentioning this one here because some anti-hereditarians argue that you can’t trust twin studies because of assortative mating, without mentioning that this can only bias them down. 4: Population stratification: This is often large and worth worrying about, but it applies to identical and fraternal twin pairs equally, and doesn’t bias twin study heritability estimates much (though it might shift the balance between shared and non-shared environment). See eg the sentence around footnote 30 here. 5: Non-additive / “interaction” effects: These are theoretically interesting, but all research thus far has found they are minimal (1, 2). Some experts think this may miss rarer or harder-to-find interactions; we’ll return to this later. 6: “Genetic nurture”, parent-to-child Mentioned above: if there is a gene for reading books to kids, and reading books raises IQ, it will look like a “gene for IQ”. This isn’t as relevant to twin study estimates of heritability, since both identical twins and fraternal twins are equally related to their parents, and any trait caused by genetic nurture wouldn’t differ between them (and therefore would not falsely appear heritable in this design). Rather, they would appear as shared environment. 7: “Genetic nurture”, sibling-to-sibling That is, suppose your sibling’s traits influence your own development. For example, suppose your sibling has a gene that makes them sabotage your schoolwork, causing you to fail and drop out of school early. An identical twin would share this gene with their sibling more often than a fraternal twin, making it look like a “gene for doing badly at school” (since the people who have it do worse at school than those who don’t). Why are we even talking about this? Do we really think it’s a big part of the variance in behavioral traits? Challenging twin study heritability estimates through this route requires inhabiting a weird no-man’s-land where otherwise-invisible genetic and environmental pathways suddenly flare up when you say the magic words “it was done by a sibling”. For example, this requires a strong effect of shared environment - that is, your educational attainment has to depend on whether you’re being sabotaged or not. But in general, shared environmental effects are weak. And it requires a strong effect of genes - that is, this mechanism only works if your sibling’s tendency to sabotage you is highly genetically determined. But we’re deploying this claim to deny that traits like IQ or educational attainment are highly genetically determined. So to get much out of this, the tendency to sabotage siblings would have to be more genetic than other behavioral traits! The reason this convoluted possibility gets brought up so often is that, unlike the more plausible parent-to-child genetic nurture, twin studies can’t rule it out. So if you really want to deny twin studies, this is one of your best bets. But when investigated, this has effects indistinguishable from zero. I’ve been a bit mean in this whole section, because people really like to dismiss twin studies as “Oh, don’t you know, those depend on assumptions, I bet you never considered that assumptions might be wrong”, and then Gish Gallop you with different assumptions until you give up. But scientists have actually done a lot of really good work checking the assumptions and they mostly hold. An alternative way of validating twin studies (brought up by Noah Carl in this article) is to check them against their close cousins, adoption studies and pedigree studies. Pedigree studies investigate large family trees, and check how trait similarity decreases with genetic distance. They avoid twin specific biases (like different treatment of fraternal vs. identical twin pairs, or different prenatal environments), while adding others like assortative mating. Here are the heritabilities of IQ and EA found in pedigree studies10 (see footnote for sources and caveats, and see also here and here for somewhat similar designs): Adoption studies investigate whether adoptees’ traits are more correlated with their adoptive or biological parents. They avoid a large swathe of biases, at the risk of introducing new adoption-related biases of their own (like the possibility that agencies deliberately place adoptive children with parents who are culturally or behaviorally similar, or the possibility that adoptees were adopted late enough to still get some shared environment from their biological parents). Here are the findings of some of the largest and best11: Both straightforwardly confirmed the larger heritability numbers found in twin studies. I would add the evidence from some less formal “adoption studies”12. During residency, I spent a few months working in a child psychiatric hospital for the worst of the worst - kids who committed murder or rape or something before age 18. Many of these children had similar stories: they were taken from their parents just after birth because the parents were criminals/drug addicts/in jail/abusing them. Then they were adopted out to some extremely nice Christian family whose church told them that God wanted them to help poor little children in need. Then they promptly proceeded to commit crime / get addicted to drugs / go to jail / abuse people, all while those families’ biological children were goody-goodies who never got so much as a school detention. When I met with the families, they would always be surprised that things had gone so badly, insisting that they’d raised them exactly like their own son/daughter and taught them good Christian morals. I had to resist the urge to shove a pile of twin studies in their face. This has left me convinced that behavioral traits are highly heritable to a level that it would be hard for any study to contradict. Ultimate source here. Although the study is confusing about this, I think it’s trying to say that almost 90% of subjects were adopted before age 2. But I don’t think studies do contradict this. Given the degree to which their assumptions have been validated, and the level of confirmation from pedigree and adoption studies, I think they have earned a presumption of accuracy. Doubting the twin studies doesn’t seem like a promising route to reconciling the twin-vs-Sib-Regression/RDR discrepancy. What’s Going On? (Part 2: Is Something Wrong With Sib-Regression And RDR?) Sib-Regression is a clever way of avoiding most biases. Its independent variable - the degree to which some sibling pairs end up with slightly more shared genes than others - is even more random and exogenous than the difference between fraternal and identical twins. It can sometimes have biases related to assortative mating (which would falsely push heritability down), but otherwise it’s pretty good. RDR has many of the same advantages, and allows more diverse relationships and so larger sample sizes. It’s hard to think of ways these methods could be wildly off. There is one caveat: although RDR includes most of the rare and structural variants missed by GWAS, in theory it can miss certain ultra-rare variants which are so uncommon that they aren’t shared between some of the relative pairs used in RDR. De novo variants that occurred during the subject’s own conception would be in this category, if the subject didn’t have children or didn’t pass on that gene13. This seems like a pretty small subcategory of genetic variation, and I wouldn’t normally expect that much of importance to be hiding here, but maybe it’s more important than it seems. RDR also doesn’t include much variance caused by statistical interactions between genes. Although we said above that these are usually found to be insignificant, they might be more important in a trait like intelligence that has been under recent evolutionary selection that lops off easily-detectable sources of variance and leaves only the weird obscure ones behind. There’s limited ability for classical Mendelian dominance to affect common variants, but more complicated genetic interactions might still prove important. Overall these are strong methods, and their failure to converge is troubling. If forced to explain them away, we might tell a story like: So far, there is only one RDR study and a few Sib-Regression studies, so we should wait for more data before updating too hard.
Inline links: Sasha Gusev, The Infintesimal, Eric Turkheimer, Alex Young, Awais Aftab, wrote a similar post, here, read the whole thing, Young et al. 2018, 2, 3, families with Norman surnames still outperform their Saxon peers at Oxbridge admissions, or other behavioral differences between the two groups, 4, Tan et al 2024, https://substackcdn.com/image/fetch/$s_!ioe2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c5bec7-469c-40d2-b908-68d8583c9cca_766x766.png, 5, Polygenic embryo selection, need, Is Tan Et Al The End Of Social Science Genomics?, 6, like, And, 300K, 1M, 3M, A 2016 GREML paper, 7, evolves some ridiculous trick, a Darwinian argument, a 2017 paper, Wainschtein 2022, Sidorenko 2024, Zeng, 8, Weiner 2023, other experts say, meiosis, Young et al (2018), Kemper et al (2021), their trait similarity, matches their real status, 2, 3, 4, Felson 2013, https://substackcdn.com/image/fetch/$s_!r3kV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd575f2d6-3619-40e6-9a5e-f9f1ec1399a5_650x422.png, answer, The best studies, 9, seems real and important, here, 1, 2, when investigated, in this article, 10, here, here, https://substackcdn.com/image/fetch/$s_!b3LF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc094f9c0-4c71-48cf-89dc-615498d94812_483x51.png, 11, https://substackcdn.com/image/fetch/$s_!XFWU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85f433e5-d141-47b7-8dc5-2271925032e9_483x102.png, 12, https://x.com/cremieuxrecueil/status/1935731422205010135, here, 13
For example, educational attainment is 50% uncorrelated with direct genetic effects. You need to square this to figure out what percent is causal; when you do that, you find that the polygenic score that explained 14% of EA is only 4%pp direct genes, with the other 10%pp being nondirect5 confounders. So yes, it seems like most polygenic scores that don’t validate within families are confounded. However unhappy we previously were that we had only found 14% of genes for EA (vs. 40% expected), we should now be much more unhappy - we really only know 4% of genes that directly cause EA. On the other hand, you might say - so before we only knew 14%pp out of 40%. Now we only know 4%pp out of 40%. This is discouraging, but it doesn’t fundamentally change what we know about nature vs. nurture. Both 4%pp and 14%pp are less than 40% - with either number, we must be missing something or doing something wrong. Probably that’s insufficient sample size. We’ll keep working on sample size and other things, and eventually scrounge up the missing 26%pp or 36%pp or whatever of the variance, so this doesn’t change anything. All it means is that one predictive method that the average person never knew about in the first place doesn’t work as well as we thought. Who cares? Not doctors. So far this research has only just barely begun to reach the clinic. But also, all doctors want to do is predict things (like heart attack risk). They don’t care if they use causal vs. nondirect genes. It doesn’t matter if you’re “only” at higher risk of heart attack because you’re black, or Norman, or because your parents read books to you - you still need more heart attack medication! Polygenic embryo selection companies should care. They offer polygenic scores that can be used to select healthier or smarter embryos. If the predictors they use rely partly on variants that aren’t causal within families, their real benefits could be far lower than advertised. I talked to one of these companies, who said they’d already adjusted for these effects and expected their competitors had too - the proper antidote to this problem, sibling controls, is a natural choice when you’re literally picking between siblings. The biggest losers are the epidemiologists. They had started using polygenic predictors as a novel randomization method; suppose, for example, you wanted to study whether smoking causes Alzheimers. If you just checked how many smokers vs. nonsmokers got Alzheimers, your result would be vulnerable to bias; maybe poor people smoke more and get more Alzheimers. But (they hoped) you might be able to check whether people with the genes for smoking get more Alzheimers. Poverty can’t make you have more or fewer genes! This was a neat idea, but if the polygenic predictors are wrong about which genes cause smoking and what effect size they have, then the less careful among these results will need to be re-examined. But the reason I spent so much time on the subject here is that this has confused a lot of people into thinking heritability itself was confounded and is actually just 4%. When I read my first few blog posts on these findings, I came away thinking they were claiming to have discredited twin studies and heritability. And although I take partial ownership of my own poor reading comprehension, I maintain that the way that the new anti-hereditarians discuss this is pretty bad. For example, Turkheimer’s treatment of the Tan study above is called Is Tan Et Al The End Of Social Science Genomics?, and includes passages like: The median [direct genomic effect] heritability for behavioral phenotypes is .048. Let that sink in for a second. How different would the modern history of behavior genetics be if back in the 80s one study after another had shown that the heritability of behavior was around .05? When Arthur Jensen wrote about IQ, he usually used a figure of .8 for the heritability of intelligence. I know that the relationship between twin heritabilities and SNP heritabilities is complicated, and in fact the DGE heritability of ability is one of the higher ones, at .2336. But still, it seems to me that the appropriate conclusion from these results is that among people who don’t have an identical twin, genomic information is a statistically non-zero but all in all relatively minor contributor to behavioral differences. And comments included things like: I don’t know if [this study] is the end of social science genomics, but it should certainly be the end of attributing significant genetic influence to behavioral traits (despite the recent scientist-generated cartoons touting genes for “income”). And: There's no doubt that this reported findings have dealt a fatal blow to my conviction that behavioral traits are pre-eminently heritable…This is a remarkable example of an objective statistical fact mercilessly crushing the more subjective experiential sense of "A looks and acts more like B than C because A and B have the same parents." This subjective evidence is almost unshakable and universal in its application as a tried and tested psychosocial heuristic. And yet, here we are. Turkheimer is either misstating the relationship between polygenic scores and narrow-sense heritability, or at least egging on some very confused people who are doing that, and the dynamic was bad enough that I got confused myself for a while. But even more confusing, the new anti-hereditarians actually are saying that lots of behavioral traits have very low heritability! But this point requires different arguments, only tangentially related to these. So let’s move on to… Is Heritability Genuinely Low? (Part 1: GWAS & GREML) In the mid 2010s, when genome-wide association studies (GWAS) based polygenic predictors were getting better every year, it was easy to hope they might reach 40% and close the “missing heritability”. But since then, progress has stalled. The second-to-last tripling of sample size, from 300K to 1M between 2016 - 2018, increased predictive power from 6% → 12%. The last tripling, from 1M to 3M between 2018 - 2022, only increased predictive power from 12% → 14%. If you graph sample size vs. predictive power, it looks like there's an asymptote between 15 - 20% or so. (of which - remember - only 5% is directly causal!) Worse, a mid-2010s technique called GREML allowed researchers to estimate the percent of variance in a trait that comes from the sorts of common genes studied in GWAS, without having to identify the genes involved. A 2016 GREML paper suggested that the maximum share of variance that GWASs of educational attainment could ever discover was about 21% (again, compared to 40% predicted genetic from twin studies). Since unavoidable methodological issues will prevent GWASs from reaching the literal maximum possible, this agrees with the evidence suggesting an asymptote between 15 - 20%. So either twin studies are wrong and traits are less heritable than believed, or the heritability must lie somewhere other than the common genes identifiable by GWAS. What about rare genes? GWASs focus on genetic variation common enough to be worth including in a basic genetic test. Most of this is single nucleotide polymorphisms (“SNPs”). A single nucleotide is one letter of DNA - for example, a C or a G. Polymorphisms are genes that commonly vary in humans - sometimes across races (for example, some humans have a gene for light skin, and other humans have a gene for dark skin), and other times within races (for example, some white people have a gene that makes cilantro taste like soap, and others don’t). So SNPs are single-letter spots in DNA where different people often have different letters. How often? Some people say 1%, but the more practical definition is “often enough that someone has noticed and added it to the test panel”. There are three billion letters in the genome, of which only a few million are commonly-tested SNPs. But these SNP studies have limited7 ability to measure personal mutations and rare variants. Sometimes your parents’ egg and sperm cells mess up copying a nucleotide of DNA, and you get a mutation that isn’t inherited from your ethnic group or even from your subgroup/family line - it’s just some idiosyncratic DNA change that you might be the first person in history to have. Since scientists have never seen this mutation before, they don’t know about it and can’t test for it without doing something more expensive than a simple SNP screen. And SNP studies have limited ability to detect anything more complicated than a single letter changing to another single letter. But some mutations are more complicated structural variants. For example, some bits of DNA get stuck on repeat - one person might have GATGAT, another person might have GATGATGATGAT, and a third person might have fifty GATs in a row. Other bits come out backwards. Sometimes a whole chunk of DNA goes missing, or moves to the wrong place. Occasionally a gene reads The Selfish Gene by Richard Dawkins, takes it too seriously, and evolves some ridiculous trick for spamming itself all over the genome. So if even the best molecular studies seem to be asymptoting around 15-20% of variance in educational attainment, but twin studies suggest it’s 40% genetic, might rare variants and structural variants make up the missing 20-25%pp? This remains a topic of bitter disagreement. On the one side, hereditarians bring up a Darwinian argument: imagine a genetic engineer who hopes to find the genes for educational attainment and edit them to make everyone smart and successful. She looks harder and harder, becoming more and more exasperated as they fail to materialize. Finally, she realizes she’s been scooped: evolution has been working on the same project, and has a 100,000 year head start. In the context of intense, recent selection for intelligence, we should expect evolution to have already found (and eliminated) the most straightforward, easy-to-find genes for low intelligence. Therefore, everything left should be convoluted or hidden or impossible to work with. So although this requires a sort of god-of-the-gaps argument - where we keep pushing heritability into whatever genes are too weird for existing techniques to detect - there are some reasons to think God really is in the gaps here. And a 2017 paper uses some clever techniques to estimate the share of intelligence variation lurking in hard-to-measure genes and finds it’s more than half: “By capturing these additional genetic effects, our models closely approximate the heritability estimates from twin studies for intelligence and education.” (see also Wainschtein 2022, Sidorenko 2024) The anti-hereditarians disagree. They cite papers like Zeng which measure the strength of selection on intelligence and suggest that it’s too weak to concentrate so much of the variation in rare genes8. And Sasha Gusev mentions Weiner 2023, which finds that in fact rare variants “explain 1.3% (SE = 0.03%) of phenotypic variance on average – much less than common variants” (other experts say that burden heritability only captures some rare variants and is not the right tool for this problem). But it may not even matter, because another set of findings suggests that heritability is genuinely low even when the rare variants are counted. Is Heritability Genuinely Low? (Part 2: Sib-Regression and RDR) Two newer methods, Sib-Regression and RDR, ask: using what we know from genetic studies, how much genetic variation do we think exists, total, across both common and rare genes? On average siblings share 50% of genes. But there’s a little randomness in meiosis, so some siblings might share 40% and others might share 60%. The more genetic influence on a trait, the more similar sibling pairs who share 60% of their genes will be, compared to sibling pairs who only share 40% of their genes. Since 60%-gene siblings and 40%-gene siblings are both equally part of the same family, you can use these numbers to calculate heritability unconfounded by a range of family factors. This is Sib-Regression. If you do a more complicated statistical process to extend the same idea to relatives other than siblings, it’s relatedness disequilibrium regression or RDR. GWAS asks: Looking at common easy-to-study genes, how much variation in a trait have we explained right now? GREML asks: looking at common easy-to-study genes, how much variation could we ever explain? But sib-regression and RDR ask a question more like twin studies: considering all genes, whether common / rare / easy-to-study / hard-to-study, how much variation is there total? This could address the rare variant objection mentioned above. And in many ways, these techniques are better than twin studies - Sib-Regression eliminates many potential biases, and RDR eliminates even more (although it’s harder to pull off, requiring more genetic information and computational resources). These techniques are new and hard-to-use, and only a few published studies have applied them to the sorts of behavioral traits we’re interested in: Young et al (2018) did Sib-Regression and RDR to genetic data from Iceland. Sib-regression found educational attainment = 40% (±15%) heritable, and RDR found 17% (±9%) heritable. Kemper et al (2021) did Sib-Regression only to genetic data from Britain. It found educational attainment = 14% heritable. This number conflicts with the 40% from the Young paper. Why? Unclear, but it could be selection bias - Young’s Icelandic sample was representative of the country; Kemper’s British population were Biobank volunteers who tend tend to be healthier and higher-class than the population at large. Upper-class people may have restricted range in educational attainment, or different factors affecting their educational attainment compared to the overall population. Either way, these are closer to the low estimates from GWAS and GREML (7% direct, 20% total), than to the higher estimates from twin studies (40%, generally presumed direct). And we can no longer use contributions from rare variants to paper over the difference. So what is going on? It seems like we have to accept one of three possibilities: Either something is wrong with twin studies. Or something is wrong with Sib-Regression and RDR (and then we can explain away GWAS and GREML by saying they’re missing rare variants). Or something is wrong with how we’re thinking about this topic and comparing things. What’s Going On? (Part 1: Is Something Wrong With Twin Studies?) Twin studies have dominated discussion of behavioral genetics for decades, so there’s a vast literature investigating their various assumptions and whether something might be wrong with them. Here are some of the assumptions and what the research says about each. Some of these will be duplicates of the GWAS confounders above, but we’ll go through them again anyway to review how they apply to twins. 1: Parents Treat Fraternal And Identical Twins The Same: Twin studies claim that twins are a uniquely powerful genetic laboratory; both fraternal and identical twin pairs have equally concordant environments, but identical twins have more concordant genes. Therefore, the more similar identical twin pairs are relative to fraternal twin pairs, the more heritable a trait must be. But this conclusion falls apart if identical twin pairs actually have more similar environments than fraternal twin pairs do, maybe because parents (knowing their twins are identical) treat them more similarly than they would fraternal twins. Would-be twin-study-discreditors have been trying to argue that this must be true for decades, but it’s always been a kind of quixotic battle. Remember, twin studies find many behavioral traits like IQ are >60% heritable, so you would need to prove not only that parents treat identical twin pairs differently from fraternal, but that this was an overwhelming effect. Parents of identical twins would have to obsessively expose them to the exact same stimuli in the exact same order; parents of fraternal twins would have to send one to the Gifted Advanced Placement Acceleration program while locking the other in a box and force-feeding them lead pellets. Common sense tells us there are no such differences, and studies confirm this: when parents are wrong about their twins’ status (eg they have fraternal twins, but falsely think they’re identical, or vice versa) their trait similarity matches their real status, rather than the incorrect status that determined how their parents treat them; parental treatment explains less than 1% of why identical twin pairs are more concordant (2, 3, 4). See also Felson 2013, which tries to measure environmental similarity and adjust for it, with minimal effects. Are these two cuties monozygotic or dizygotic? Are you sure? (answer) 2: Fraternal And Identical Twins Have Equally Concordant Uterine Environments: Fraternal twins have different sacs in the uterus and use different placentas. Most identical twins share a placenta, and some share an amniotic sac. If trait similarity is caused by sharing a placenta or sac (maybe because the placenta is defective, the fetal brain is starved of nutrients, and so the person has a lower IQ when they grow up), twin studies would falsely read this identical-fraternal difference as genetic. Luckily this is easy to study; not all identical twins share a placenta or sac, so you can cleanly separate the effect of uterine environment from genetics. If you measure enough traits, you can find small deviations in some, but it’s not clear whether this is just multiple testing, and in any case the deviations are small. The best studies suggest this chips off somewhere between 0 - 3% from heritability estimates9. 3: There is little assortative mating: We discussed this one above in the earlier section on GWAS - smart/pretty/kind/whatever people tend to marry other smart/pretty/kind/whatever people. Why would this bias twin study results? Identical twins share 100% of their genes. Fraternal twins ought to share 50% of their genes - but they get half their genes from their mother, and half from their father. In the degenerate case where the mother and father have exactly the same genes (“would you have sex with your clone?”) even fraternal twins will be extremely similar (although not quite identical, since they’ll get different alleles from each clone). In the more plausible case where mothers and fathers are just a little more alike than chance (eg because smart people tend to marry other smart people), fraternal twins will share a genetic tendency towards a trait somewhat more than their 50% shared genes suggest. Since this makes fraternal twin pairs more (genetically) like identical twin pairs, and twin studies assess heritability as the difference in fraternal-identical-twin-pair concordance, this bias would make twin studies underestimate heritability. But this is the opposite of what you would need to “discredit” twin studies - if this bias is true, then everything is more genetic than twin studies think. And unlike the previous two biases, this one seems real and important, so much so that when you adjust for it, the heritability of educational attainment rises from ~40% to ~50%. I’m only mentioning this one here because some anti-hereditarians argue that you can’t trust twin studies because of assortative mating, without mentioning that this can only bias them down. 4: Population stratification: This is often large and worth worrying about, but it applies to identical and fraternal twin pairs equally, and doesn’t bias twin study heritability estimates much (though it might shift the balance between shared and non-shared environment). See eg the sentence around footnote 30 here. 5: Non-additive / “interaction” effects: These are theoretically interesting, but all research thus far has found they are minimal (1, 2). Some experts think this may miss rarer or harder-to-find interactions; we’ll return to this later. 6: “Genetic nurture”, parent-to-child Mentioned above: if there is a gene for reading books to kids, and reading books raises IQ, it will look like a “gene for IQ”. This isn’t as relevant to twin study estimates of heritability, since both identical twins and fraternal twins are equally related to their parents, and any trait caused by genetic nurture wouldn’t differ between them (and therefore would not falsely appear heritable in this design). Rather, they would appear as shared environment. 7: “Genetic nurture”, sibling-to-sibling That is, suppose your sibling’s traits influence your own development. For example, suppose your sibling has a gene that makes them sabotage your schoolwork, causing you to fail and drop out of school early. An identical twin would share this gene with their sibling more often than a fraternal twin, making it look like a “gene for doing badly at school” (since the people who have it do worse at school than those who don’t). Why are we even talking about this? Do we really think it’s a big part of the variance in behavioral traits? Challenging twin study heritability estimates through this route requires inhabiting a weird no-man’s-land where otherwise-invisible genetic and environmental pathways suddenly flare up when you say the magic words “it was done by a sibling”. For example, this requires a strong effect of shared environment - that is, your educational attainment has to depend on whether you’re being sabotaged or not. But in general, shared environmental effects are weak. And it requires a strong effect of genes - that is, this mechanism only works if your sibling’s tendency to sabotage you is highly genetically determined. But we’re deploying this claim to deny that traits like IQ or educational attainment are highly genetically determined. So to get much out of this, the tendency to sabotage siblings would have to be more genetic than other behavioral traits! The reason this convoluted possibility gets brought up so often is that, unlike the more plausible parent-to-child genetic nurture, twin studies can’t rule it out. So if you really want to deny twin studies, this is one of your best bets. But when investigated, this has effects indistinguishable from zero. I’ve been a bit mean in this whole section, because people really like to dismiss twin studies as “Oh, don’t you know, those depend on assumptions, I bet you never considered that assumptions might be wrong”, and then Gish Gallop you with different assumptions until you give up. But scientists have actually done a lot of really good work checking the assumptions and they mostly hold. An alternative way of validating twin studies (brought up by Noah Carl in this article) is to check them against their close cousins, adoption studies and pedigree studies. Pedigree studies investigate large family trees, and check how trait similarity decreases with genetic distance. They avoid twin specific biases (like different treatment of fraternal vs. identical twin pairs, or different prenatal environments), while adding others like assortative mating. Here are the heritabilities of IQ and EA found in pedigree studies10 (see footnote for sources and caveats, and see also here and here for somewhat similar designs): Adoption studies investigate whether adoptees’ traits are more correlated with their adoptive or biological parents. They avoid a large swathe of biases, at the risk of introducing new adoption-related biases of their own (like the possibility that agencies deliberately place adoptive children with parents who are culturally or behaviorally similar, or the possibility that adoptees were adopted late enough to still get some shared environment from their biological parents). Here are the findings of some of the largest and best11: Both straightforwardly confirmed the larger heritability numbers found in twin studies. I would add the evidence from some less formal “adoption studies”12. During residency, I spent a few months working in a child psychiatric hospital for the worst of the worst - kids who committed murder or rape or something before age 18. Many of these children had similar stories: they were taken from their parents just after birth because the parents were criminals/drug addicts/in jail/abusing them. Then they were adopted out to some extremely nice Christian family whose church told them that God wanted them to help poor little children in need. Then they promptly proceeded to commit crime / get addicted to drugs / go to jail / abuse people, all while those families’ biological children were goody-goodies who never got so much as a school detention. When I met with the families, they would always be surprised that things had gone so badly, insisting that they’d raised them exactly like their own son/daughter and taught them good Christian morals. I had to resist the urge to shove a pile of twin studies in their face. This has left me convinced that behavioral traits are highly heritable to a level that it would be hard for any study to contradict. Ultimate source here. Although the study is confusing about this, I think it’s trying to say that almost 90% of subjects were adopted before age 2. But I don’t think studies do contradict this. Given the degree to which their assumptions have been validated, and the level of confirmation from pedigree and adoption studies, I think they have earned a presumption of accuracy. Doubting the twin studies doesn’t seem like a promising route to reconciling the twin-vs-Sib-Regression/RDR discrepancy. What’s Going On? (Part 2: Is Something Wrong With Sib-Regression And RDR?) Sib-Regression is a clever way of avoiding most biases. Its independent variable - the degree to which some sibling pairs end up with slightly more shared genes than others - is even more random and exogenous than the difference between fraternal and identical twins. It can sometimes have biases related to assortative mating (which would falsely push heritability down), but otherwise it’s pretty good. RDR has many of the same advantages, and allows more diverse relationships and so larger sample sizes. It’s hard to think of ways these methods could be wildly off. There is one caveat: although RDR includes most of the rare and structural variants missed by GWAS, in theory it can miss certain ultra-rare variants which are so uncommon that they aren’t shared between some of the relative pairs used in RDR. De novo variants that occurred during the subject’s own conception would be in this category, if the subject didn’t have children or didn’t pass on that gene13. This seems like a pretty small subcategory of genetic variation, and I wouldn’t normally expect that much of importance to be hiding here, but maybe it’s more important than it seems. RDR also doesn’t include much variance caused by statistical interactions between genes. Although we said above that these are usually found to be insignificant, they might be more important in a trait like intelligence that has been under recent evolutionary selection that lops off easily-detectable sources of variance and leaves only the weird obscure ones behind. There’s limited ability for classical Mendelian dominance to affect common variants, but more complicated genetic interactions might still prove important. Overall these are strong methods, and their failure to converge is troubling. If forced to explain them away, we might tell a story like: So far, there is only one RDR study and a few Sib-Regression studies, so we should wait for more data before updating too hard.
Inline links: 5, Polygenic embryo selection, need, Is Tan Et Al The End Of Social Science Genomics?, 6, like, And, 300K, 1M, 3M, A 2016 GREML paper, 7, evolves some ridiculous trick, a Darwinian argument, a 2017 paper, Wainschtein 2022, Sidorenko 2024, Zeng, 8, Weiner 2023, other experts say, meiosis, Young et al (2018), Kemper et al (2021), their trait similarity, matches their real status, 2, 3, 4, Felson 2013, https://substackcdn.com/image/fetch/$s_!r3kV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd575f2d6-3619-40e6-9a5e-f9f1ec1399a5_650x422.png, answer, The best studies, 9, seems real and important, here, 1, 2, when investigated, in this article, 10, here, here, https://substackcdn.com/image/fetch/$s_!b3LF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc094f9c0-4c71-48cf-89dc-615498d94812_483x51.png, 11, https://substackcdn.com/image/fetch/$s_!XFWU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85f433e5-d141-47b7-8dc5-2271925032e9_483x102.png, 12, https://x.com/cremieuxrecueil/status/1935731422205010135, here, 13
Are we going to find and cash out “rare variants and interactions” soon? If we don’t, how long should we wait for genetic science to advance before changing our mind and deciding we must be missing something more fundamental? Alex Young thinks that once we get enough whole genomes sequenced (probably soon!) we might be able to use a technique called GREML-WGS to get more definitive answers about rare variants. But other experts I talked to said that if complex interactions were a big part of the picture, this might be “computationally intractable”. On the other hand, “computationally intractable” is a relative term: with enough data, genomic language models offer the potential for improved understanding of nonlinear effects. I’m encouraged to see increasingly good discussion of these topics on Substack, Twitter, and elsewhere. People like Sasha Gusev and Eric Turkheimer deserve credit for opening the discussion, but I would like to see a robust back-and-forth with the other side. Thanks to everyone who helped me review this post, including Ruben Arslan, Alex Young, Damien Morris, and some other people who didn’t respond to my email asking if I had their permission to list their names publicly (if this is you, let me know and I’ll edit you in). Most of what’s valuable is theirs, and all errors are mine alone the fault of o3, which provided invaluable research assistance but also hallucinated constantly. 1I’m abbreviating “two percentage points” as 2%pp. Nitpickers complain if I don’t use the “percentage points” framing, but it’s too long to spell out each time. 2Geneticists distinguish between three related concepts: Polygenic score r^2 is the degree to which our current best genetic models can predict traits. You might use this to discuss the accuracy of a genetic test or an embryo selection procedure.
Sasha Gusev of The Infinitesimal, a leading critic of twin studies and the person whose views most inspired the post, kindly replied. His reply has four parts - I’ll address each individually. First, GxE interactions:
Inline links: The Infinitesimal, kindly replied
I asked Gusev:
Elsewhere, Gusev linked a more thorough explanation of his theory of interactions. It’s pretty interesting, but (he admits) kind of the opposite of everyone else’s theory of interactions. Everyone else is looking for interactions where poor people have lower heritability of behavioral traits than rich people (because poor people might be malnourished/undereducated/etc, whereas rich people usually get enough resources to achieve their genetic potential, whatever it may be). But Gusev instead looks for interactions where rich people have lower heritability than poor people (because poor people face many challenges and their ability to overcome those challenges might depend on their genes, but rich people will do fine regardless of what genes they have). My impression is there are many studies on both sides; I’m not expert enough in the field to know whose studies are better or whether they can be reconciled, but it’s a bad omen that people looking for these effects can’t even agree on what the sign is.
5: My recent Highlights From The Comments On Missing Heritability included a comment by Sasha Gusev criticizing Davide Piffer’s work on race and IQ, and I partly endorsed Gusev’s criticism. Piffer responds here.
Inline links: Highlights From The Comments On Missing Heritability, here
That serves as proof-of-concept that this technology can work, and means other companies’ claims are at least plausible. Scientific Objections: Antagonistic Pleiotropy This is a fancy term for “sometimes genes that are good in one way are bad in other ways”. For example, there is a gene that decreases the risk of lung cancer, but increases the risk of leukemia. If you selected against lung cancer, you might give your child higher leukemia risk. Several of the professional societies raise this concern, and Sasha Gusev gives several examples here, including a correlation between education/IQ and anorexia. When I think about these concerns, I consider the following thought experiment: suppose that I had a natural, unselected child, and that child became high school valedictorian and got into Harvard. Would my first reaction be “Oh no! This slightly raises her risk of anorexia!”? If not, why should this be our reaction to artificially increasing IQ? Genetic selection isn’t doing some different, magical thing. It’s just picking from within the natural IQ/anorexia variation. If you would be happy to have higher IQ (or lower breast cancer risk, or lower schizophrenia risk) naturally, you should be happy to get it through selection too. (Objection one: suppose that the genetic component of IQ is net negative, but the environmental component is net positive to an even greater degree. Then IQ itself might be net positive - so you could still celebrate your valedictorian child - but since the genetic component alone is bad you wouldn’t want to select for it. I have never heard anyone seriously claim this, most studies suggest that genetic components of good things are good in the expected ways, and most critics don’t get this far. I mention it for the sake of completeness only.) (Objection two: is the example above just saying that I value IQ more than non-anorexia? If so, couldn’t I give an alternate example of learning that my child isn’t anorexic, celebrating this seemingly-obviously-good fact, but actually this means they have lower IQ and based on my stated values I should be sad? I don’t think so. There is no claim that the increased anorexia risk from raising IQ is exactly as bad as the IQ increase is good - for example, you could imagine a world where going from moron to supergenius only raises anorexia risk 0.0001%. More generally - although not rigorously - selecting for X should usually increase X more than it increases tangentially-correlated construct Y. So selecting for IQ should be net positive, even though it might slightly increase anorexia risk, and selecting for anorexia should be net negative, even though it might slightly increase IQ. I think this is the intuition that drives parents to be happy both when they learn that their child is smart, and when they learn their child doesn’t have anorexia - not just an intuition that one trait matters more than the other) But also, here’s the table of correlated genetic risks for psychiatric disorders: …where blue means that lowering the risk of one disease also lowers the risk of the other, and red means the opposite (as in the IQ - anorexia example above). Here’s the same table for other conditions, courtesy of Genomic Prediction (except I flipped the colors from the original, to match the one above): Aside from two bright orange squares (gallstones vs. hypertension and hypothyroidism - I don’t know what’s up with this and it doesn’t seem to be a widely-appreciated result) we see that most correlations are zero or positive - that is, selecting against one disease selects against another or at worst does nothing. In this ocean of blue, worrying about those few orange squares feels a bit motivated. Hans Jonas-ism says that no medical intervention may ever cause any harm, no matter how much benefit it produces. By this standard, perhaps slightly raising the risk of gallstones in the process of preventing various cancers and psychoses and other forms of human misery is unacceptable. To anyone with the more normal perspective where something with large benefits and tiny downsides is still pretty good, I don’t think the antagonistic pleiotropy argument carries much weight. Ethical Objection: Cost No way around this one: if these products work, they mean that rich people can have healthier/smarter/taller/prettier kids than poor people. One might object that at least they’re in good company: other products which help rich kids get healthier/smarter/taller/prettier than poor kids include private tutors, gyms, hair salons, health insurance, clothing, books, and food. Is this really the time to declare ourselves against this kind of thing? But maybe we should fight against expanding this already-bloated category. Or maybe there’s something more final about a genetic advantage. Maybe a stronger argument is that rich people get first crack at every new technology, but poor people usually follow close behind. The first cellphone, in 1982, cost $12,000 in today’s dollars. Now you can get something a thousand times better for $50, and Kenyan pastoralists use cell phones to call up the local shaman. The trajectory of genetics has been even more striking: sequencing a single genome cost about $100 million in 2000 and is somewhere around $100 today. Polygenic embryo selection has the potential to follow a similar path. There are two associated costs - sequencing the embryos, and running the analysis. Sequencing costs are decreasing and may eventually be comparable to the sorts of genetic screening (for e.g. Down Syndrome) that most families get anyway. Analysis costs are mostly the one-time expense of inventing the predictor; we might expect these to follow the same pattern as generic medications, where cutting-edge technology is jealously guarded and expensive, but last decade’s technology has made its way off patent and is cheap-to-free. A few groups have already created free open-source predictors; so far these are much worse than the private companies’ versions, but one of last year’s ACX Grantees is working on a better one. Also, it would be crazy for any forward-thinking government not to cover this; it could save hundreds of thousands of dollars in future health care expenses. In countries with public health care, this comes directly out of the government treasury; even in the US, it’s covered by Medicare after age 65. The government should be begging people to select embryos. The most persistent cost barrier is likely to be in vitro fertilization itself, a necessary precursor. In the US, 2-3% of babies are born through IVF. For those kids, this is a no-brainer - even if the cost never comes down, the cheaper products are only a fraction of total IVF expense. What about the other 98%? If those parents feel like they have to get embryo selection (and therefore IVF) to keep up, this could be a significant burden. IVF isn’t fun - it requires pumping a woman full of mind-altering hormones for weeks, extracting eggs in a minor surgery, and then implanting embryos in another minor surgery, all with a decent chance that some step will fail and you’ll have to do it all again. It also costs $15,000 in the US (less in poorer countries), and unlike the genetics, the cost has barely gone down in the past twenty-five years. Some countries, including Israel, offer free IVF for anybody who wants it. And universal basic IVF is surprisingly popular even in the usually government-phobic United States - Donald Trump made it part of his campaign platform. So there’s a plausible path to embryo selection for everyone who wants it. But it’s still going to take a while, it will hit different people at different times, and so far11 there’s no way around the month or two of various miserable medical procedures for women. Ethical Objection: Personhood Is it really correct to say that you have reduced someone’s risk of breast cancer by 46%, if what you’ve really done is closer to replacing them with a different person who is 46% less likely to have breast cancer? I cover this one in more depth here. Ethical Objection: Race This one is awkward: right now the technology works best for white people. Most genetic data available for research/commercial use comes from the UK, US, and Europe - areas which are mostly white. Asian biobanks, and those serving US minority communities, have been more reluctant to share data. So we know a lot about the genetics of white people, and only a limited amount about the genetics of anyone else. Companies are suitably embarrassed about this, and researchers in the field are working hard to wring every ounce of information out of the minority data they have. But for now, white people are the clear winner. Here’s data from Herasight: A European family with five embryos and no family history can cut their diabetes risk by 47%, and an African family 29%, with everyone else in between. As usual, all companies say that they adjust their scores based on the couple’s genetic ancestry. As usual, Herasight challenges them to publicly release data on exactly how they performed the adjustments and how well they work. All companies say they are working as hard as they can to improve cross-ancestry portability, but that progress will remain limited until governments collect/release better genetic data on non-white populations. Ethical Objection: Selection At some point, you’ve got to choose. Genomic Prediction and Herasight offer scores that aggregate overall health risks. Some people will follow them slavishly. Other people will try to second-guess them - would you prefer your child have lower cancer risk, or less chance of heart attacks? And this is the best case scenario! Herasight offers predictors for IQ, height and BMI; Nucleus offers those plus eye color and hair color12. A parent might encounter a situation where the embryo with their favorite eye color also has the highest cancer and schizophrenia risk, and choose to doom their child to cancer and schizophrenia because they really want pretty eyes. On average, even if everyone in the world selected for eye color, it wouldn’t raise cancer and schizophrenia risk. No not-deliberately-perverse polygenic selection choice can make your child worse off in expectation. Still, suppose you got cancer, and your mom admitted that she selected you for pretty eyes and didn’t even check the cancer column of the embryo selection report. How would you feel? And would you feel better or worse than someone whose parents didn’t do embryo selection at all, and spent the money on a Caribbean vacation? What if they selected your brother for everything great, then had you naturally? What if they selected you for IQ, but actually you are very stupid, and you were one of the 20% of cases where a predictor that’s right 80% of the time gets it wrong? Mark my words, one day there will be entire subfields of therapy dedicated to these issues. Going Nuclear Even as outsiders criticize the whole field, Herasight has launched a full-scale attack on competitor Nucleus. Herasight’s white paper compares its own predictors (favorably) to those of Orchid and Genomic Prediction… …but refuses to acknowledge Nucleus at all. In a supplementary note, the authors explain why: they accuse Nucleus of being so bad that it would “not yield a reliable or meaningful addition to our analysis”. They say Nucleus has inflated the accuracy of their scores. This is most dramatic for a few conditions like ADHD, where the leading published polygenic score is based on 2,300,000 variants but explains only ~1% of variance in the condition. Nucleus’ score is based on 12 variants13 and (implicitly) claims to explain 3-6%. This doesn’t make sense. Some of Nucleus’ other scores do use millions of variants. But many of these are 5-10 year old scores downloaded from open-source catalogs, whose accuracy statistics are easily available and far less than Nucleus claims. Here is what Herasight finds when they double-check Nucleus’ numbers: On their Substack, Herasight also criticizes Nucleus’ monogenic screening product. They point out cases where it fails to properly screen for the conditions it claims. For example, the Nucleus website advertises screening for spinal muscular atrophy: But on their gene list… …they don’t screen for SMN, which causes 95% of spinal muscular atrophy cases. They only screen for UBA1, which causes a distinct and much rarer condition called x-linked infantile spinal muscular atrophy. Professional organizations publish guidelines for what genes need to be screened in a screening product, and Nucleus does not appear to be following them. In further discussion, Herasight continued with exhaustive criticism of essentially everything Nucleus had ever done down to the smallest detail. Nucleus reports list the same baseline disease risk regardless of patient ancestry, but different ancestry groups should have different risks14. Nucleus’ physician reports sometimes list lower-than-average risk for patients with positive polygenic scores15. Nucleus’ age-based risk tables don’t distinguish between age and cohort effects (is this bad? see footnote16). My favorite critique is that Nucleus wrote a blog post criticizing competing company Orchid… …which included a section on how Orchid is a polygenic selection company, and polygenic selection companies are inherently “sketchy” and “honestly should be illegal”. But Nucleus is also a polygenic selection company! This is like Marlboro attacking Camel on the grounds that cigarettes are addictive and should be banned! Obviously something went wrong here - my guess is AI - and it’s a really bad look, especially when these scientific issues are so hard to litigate, and so many of us will have to go off gestalt impressions of corporate culture. Nucleus states that they validate their models internally and intend to make their results public soon. A Foothill Of The Future It’s hard not to love this technology. Lots of people (and the aforementioned professional organizations) manage anyway, but it’s hard. If this were a single-use medical treatment, delivered by a doctor after someone got the relevant condition, it would be one of the biggest advances of the decade - imagine a drug that cures 10 - 40%17 of breast cancers with no side effects! But in fact, it works for breast cancer, and schizophrenia, and heart attacks, and approximately everything else. The only things comparable are antibiotics and GLP-1RAs. And then there’s the IQ effects. Even after studying the literature, people have wildly different opinions about the importance of IQ. One of the most important debates is to what degree IQ differences are a cause of poverty, a consequence of poverty, or both. I lean towards both - a country with limited access to schools and medical care will have low average IQ, but as a consequence it probably won’t become the next big semiconductor hub. This technology could close half the IQ gap between poor and middle-income countries, or between middle-income and rich. Or it could give rich countries average IQs that have never been seen before, and let us see what kind of O-ring technologies (and new forms of social cooperation) lie just beyond the frontier. (this is the nice quantifiable argument in favor of IQ enhancement, but I find myself more convinced by fuzzier things - how much is it worth to be able to enjoy great art and literature? To fully comprehend what we know of nature, and be able to fully appreciate the mystery of the rest? To have a sense of why society works the way it does, instead of feeling like you’re being blown back and forth by institutions you don’t really understand? Amateur psychoanalysts like to say that the only people who care about IQ are those looking for an excuse to boast about how high their own is, but my experience is the opposite: I care about IQ because I bang up against the limits of my own a thousand times a day, and I hate it. I fantasize about ways to make my children smarter than I am for the same reason a dog confined in a tiny crate might fantasize about getting her puppies adopted out to a nice house with a big grassy yard.) My biggest qualm is that it might not matter. This is such a tiny foothill, flanking such a vast and foreboding range of mountains, that it might be a mistake to care about it at all. Selecting the best of five or ten embryos is not a very effective way to get the genes you want. There are things in the pipeline that will make this look like Hippocrates draining black bile. By the time the first polygenically selected children are adults, they’ll be old news. And then there’s AI. The average age at diagnosis for Type II diabetes is 45 years. Will there still be people growing gradually older and getting Type II diabetes and taking insulin injections in 2070? If not, what are we even doing here? Many people in the transhumanist community are still bullish on this technology. They think - well, there’s still an outside chance that something comes up and AGI takes another few decades. If we can enhance humans to be smarter, healthier, and more determined by the time it arrives, maybe we’ll have a better chance. Or maybe, if there’s a positive optimistic vision of a human-based high-tech future, people will be more willing to delay AI in the first place. I like this argument, but I also think it’s worth stepping back. What’s the point of anything? Why have kids at all in a world that’s changing this fast? Why save for the future? At some point your answer has to be romantic and aesthetic - it’s never been clear whether anything you do matters in any ultimate sense, but you’ve got to act as if it does and hope for the best. From that perspective, this is the most romantic technology of all. You’re not just giving a better life to your kids. Genes travel from generation to generation; you’re giving a better life your grandkids, your great-grandkids and so on to the point 1.77*log₂(population) generations from now when you are the ancestor of everybody and nobody. Somebody in Macaronesia in 3525 AD will avoid getting breast cancer because of you (if there is still cancer; if there are still breasts). Some combination of reasonable cost-benefit analysis and romantic/aesthetic commitments makes me want to have children despite the uncertainty, and the same combination made me sign up to use this technology despite the same. More later on how that’s going. 1I’m slightly mixing up two different things here - Down Syndrome can be detected with an aneuploidy test, but cystic fibrosis takes a more involved PGT-M test. 2There are two separate questions here. First, how much would diabetes risk decline if you selected the embryo with the lowest risk for diabetes - something you have no reason to do, since you have no reason to privilege diabetes risk over risk of any other disease? Second, how much would diabetes risk go down if you selected the embryo with the lowest health risk overall? Genomic Prediction’s their risk calculator calculator shows, seemingly paradoxically, that you get -38% relative risk by selecting against diabetes alone, but -41% relative risk by selecting against everything at once. Over email, they stand by this surprising result, saying that “for a couple of diseases (type II diabetes and CAD), the EHS actually accomplishes a larger risk reduction than the individual predictors. The explanation is that the EHS takes into account multiple PRS of diseases with high comorbidity”. See eg Figure 3 here: …and the section of the post called “Antagonistic Pleiotropy” for more. However, this paradoxical benefit is only true for a few conditions like diabetes - for everything else, selecting on health index does better than you would naively think, but still does not decrease the risk of a given condition as much as selecting against that condition directly. 3That is, new mutations in that particular baby, as opposed to older mutations already present in the parents. 4Conflicts of interest: I have used Orchid’s and Herasight’s products on my own embryos (not the ones used to conceive my existing kids, but for a potential third child), employees of Genomic Prediction and Herasight have been extremely helpful in contributing expertise to ACX posts on genetics, and I might invest in this field at some point (though haven’t done so yet). This post started as Herasight asking me to write about their white paper, then spiraled out of control. There were some unexpected time pressures and the result is that I didn’t get a chance to run everything in Herasight’s white paper by their competitors as thoroughly as I would like. Although I talked to representatives of all four companies profiled here, I feel like this probably reflects Herasight’s perspective better than other companies’, and that this is a major flaw. If other companies have responses, I’ll publish them. Thanks to all companies involved for their assistance on this article. Finally, I am favorably disposed toward Herasight because of how I learned about them: a professor named Jonathan Anomaly got cancelled from Penn for being too gung-ho about genetic enhancement, and used his newfound freedom to join a very-early-stage Herasight, raise their ambitions, and sell everyone (including me) on the idea. I grew up on a diet of books and movies about mad scientists, and I’m a sucker for a story about a guy named Doctor Anomaly pursuing revenge against the small-minded fools who destroyed his career by creating a race of superbabies. 5The version of the tool I looked at said 5.9 points for five embryos, up to 9 points for twenty embryos. The version of the tool on their current said says 5.3 - 9, so they might have recalculated after I finalized this article. 6Used in quotation marks because these scores were fine for the predictive tasks they were applied for - they just weren’t finding genes that directly caused the outcome of interest. 7Conflict of interest notice: this table was originally unadjusted. A representative of Herasight claimed that this was unfair, because each company used slightly different reporting conventions, and offered to correct for this in a neutral way. I retraced their reasoning, confirmed that the correction did not especially benefit Herasight at the expense of other companies, and accepted the correction. The original unadjusted table is below: Herasight was insufficiently comfortable with Nucleus’ methodology to even be willing to posit a corrected value, so I left their self-reported value in gray. 8Zagorsky (2007) says an extra IQ point means $234-$616/year in higher salary. The midpoint of $425 equals $670 in today’s dollars; assuming a forty-year career, Nucleus’ +1 point estimate is worth $26,800 (vs. $9,249 Nucleus cost) and Herasight’s +6 point estimate is worth $160,800 (vs. $53,250 Herasight cost). 9As part of researching this article, I asked all four major companies about their within-family validation strategies. Here are some details: Genomic Prediction discusses their strategy in this paper. The results are complicated to interpret - the within-family numbers often have such wide error bars that they overlap with both the across-family numbers and with zero - but looking qualitatively it seems like most scores on average lose about 25% of their risk reduction ability (though averages might not be the right way to do this, and some might be much more affected than others). Their website reports unadjusted, not within-family validated numbers; GP says they say this clearly on their site (which is true), Herasight counters that they still present their numbers as applicable to embryo selection (which is also true). To get the most applicable-to-embryo-selection numbers, you might want to adjust GP’s stated numbers down somewhat; it’s hard to say exactly how much, but maybe 20 - 25%?
Inline links: here, here’s, https://substackcdn.com/image/fetch/$s_!G_Lu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7049550b-4253-4900-9fe1-9f2df009e829_446x432.png, Here’s the same table for other conditions, https://substackcdn.com/image/fetch/$s_!jscV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a3ad311-4745-4f16-b82a-7ccdb297c670_1239x1600.png, Hans Jonas-ism, somewhere around $100 today., Donald Trump made it part of his campaign platform, 11, here, https://substackcdn.com/image/fetch/$s_!1Alk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1af69f39-f353-4aa6-acb9-d8c3b05c7bac_728x895.jpeg, 12, Herasight’s white paper, https://substackcdn.com/image/fetch/$s_!S7lY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F738260ba-8fe2-4647-8ca2-eeb4d13e0fce_605x341.png, 13, https://substackcdn.com/image/fetch/$s_!u7YE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F451c286b-c677-47af-8c07-f0d993a14384_612x345.png, their Substack, the Nucleus website, https://substackcdn.com/image/fetch/$s_!XmL4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff77915b4-3b36-4908-8f9f-032b7cf865ff_562x432.png, https://substackcdn.com/image/fetch/$s_!bxjb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2df9ffe5-17a3-4448-9a20-e9b27ac9a519_1250x795.png, publish guidelines, 14, 15, 16, a blog post criticizing competing company Orchid, https://substackcdn.com/image/fetch/$s_!MZCB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cf9cbf4-825b-4373-9052-80e43c36febf_718x1035.png, 17, GLP-1RAs, O-ring technologies, things in the pipeline, everybody, nobody, 1, 2, their risk calculator, everything, here, https://substackcdn.com/image/fetch/$s_!jtkY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd645c392-fed1-4f02-9a2e-878b8c7ef7f2_909x878.png, 3, 4, 5, 6, 7, https://substackcdn.com/image/fetch/$s_!Vimq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca0f1f15-268d-465a-a70f-b7f1173c6111_566x166.png, https://substackcdn.com/image/fetch/$s_!3B0A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faccc7a65-b142-4bf6-927d-53eb607d71ef_552x155.png, 8, 9
I appreciated Snow Martingale’s perspective: in the 1990s, fast food became associated with obesity, poor health, and the lower class. To escape this stigma, big chains rebranded as sort-of-at-least-attempting-to-be-bougie places with wraps and salads and decent coffee; the aesthetic change was part of this (successful and profit-increasing) effort. I wonder if we could take this further and trace it back to increasing inequality (appealing to bougies because that’s where more of the money is) or decreasing fertility (abandoning kid-friendly aesthetics because kids are a smaller fraction of customers). 9: Someone links (X) a paper saying that firewood made up almost a third of US GDP in 1830. Eliezer says (X) that doesn’t sound right. The rest of Twitter (X) uses this as an excuse for one of their regularly-scheduled paroxysms about how rationalists are all all smug autodidacts who hate experts and worship their own brilliance while sitting in their armchairs. Someone looks at the paper more closely (X) and finds that yeah, it was comparing apples to oranges and the original statistic was wrong. Remember, never be afraid to say “Huh, that sounds funny…”! 10: Richard Hanania interviews Scott Wiener on YIMBYism. I didn’t watch it - too close to a podcast - but this would not have been on my bingo card three years ago. 11: Claim: robots can already carve statues; buildings with AI-created stone ornaments are next. From their lips to God’s ears! 12: Terminal lucidity (aka “paradoxical lucidity”) is a medical mystery where previously demented people - even those who had been demented for many years - sometimes become lucid for just a few hours or days before they die. It’s surprisingly common - 6% of deaths in one palliative care ward. It is sometimes used as evidence that dementia must not cause complete information loss, even if it is irreversible with current technology. Scientists are baffled but gingerly suggest that maybe lack of oxygen disrupts inhibitory mechanisms in the brain, allowing enough electrical activity to make even a severely-damaged brain capable of complex thought - but I can’t help noticing that this is also the best evidence for an immaterial soul I’ve ever heard (you would need some model where the soul pretends to be dependent on the brain during life, becomes independent of the brain after death in order to head to the afterlife, but occasionally jumps the gun a little bit). 13: You probably heard about the METR study showing that even though programmers think AI is speeding them up, it actually seems to slow them down. Emmett Shear objects, saying that the developers didn’t have enough experience with AI tools to be past the negative-value part of the learning curve. And two of the programmer test subjects gave their takes: Ruby Bloom says part of the slowdown might be programmers fixing very simple bugs that could be improved by better prompts, and another part because they get distracted by other things while the AI is running. And Quentin Anthony says that coding AIs are addictive intermittent reinforcement - every so often they solve a bug perfectly, and this is so satisfying that it’s tempting to keep trying them again and again even when the chance is very low. 14: Jacob Goldsmith gives a clearer presentation of the issues with many antidepressant studies than I’d previously heard. Everyone knows that one problem is that reversion to the mean is so strong that it’s hard to find a treatment effect. But wouldn’t that in itself suggest that antidepressants aren’t necessary? Jacob says: not if there’s negative correlation between the treatment and placebo effects. That is, if your study is full of people with short-lived depression who will recover no matter what, then this dilutes the effect you’re looking for. But it might be that there’s a subgroup with long-lasting depression who recover only on the medication. One way to look for would be a “placebo run-in period”: give people a while to see if they recover on their own, then give the antidepressant to the ones who don’t. Psychiatrists and statisticians debate whether this is a good idea or cheating. My question: how come you can’t fix this with strict study entry criteria of “had depression for a long time”? 15: Lots more good discussion about missing heritability. Sasha Gusev argues that twin studies might be a poor guide to anything else if there are many gene-gene interactions. That is, if we take the difference between identical twins (who share 100% of their genes and therefore 100% of their interactions) and fraternal twins (who share 50% of their genes and therefore fewer than 50% of their interactions), and incorrectly extrapolate it to other differences using a model that assumes there are no interactions, we will overestimate the size of (non-interaction) genetic effects. Most studies find that there are few gene x gene interactions, but commenters convinced me last time that this might be an artifact of the studies being bad. And Unboxing Politics argues (against me in particular) that although it superficially looks like adoption and twin studies sort of agree, when you adjust out their known biases, it moves twin studies further up and adoption studies further down, such that now they disagree again (the objection I would have made is their Objection 2, which I think they at least somewhat refute). This is a good argument; without spending several hours checking all of their claims, my only weak partial objection is that I don’t think assortative mating can play quite the role they expect, because there seem to be the same twin/RDR differences even on traits where believing in assortative mating is absurd (like kidney function). But if you replaced it with Sasha’s argument above, you might have a pretty good case! On the pro-hereditarian side, East Hunter takes aim at gene x environment correlations, comes down somewhere in the middle, and Sebastian Jensen continues banging the drum of how most objections to twin studies don’t work. I think these are good attempts to buttress existing research but don’t fundamentally change anything or respond to the novel arguments above. And Emil Kirkegaard points out that the observed SNP heritability of facial features is only 23%. He argues that since it seems like facial features are extremely heritable, this reinforces the argument that SNP heritability numbers are too low (and therefore twin study numbers are more likely defensible). But should we be sure that facial features are more than 23% heritable? His argument is that identical twins have identical faces, but this might be vulnerable to Gusev’s point about interactions. Maybe a better argument would be that it seems very hard for shared environment to affect facial features (with a few exceptions like fetal alcohol syndrome), and facial features seem more than 23% heritable just by normal “he looks like his brother” common-sense observation? One interesting potential consequence of this research: if we ever fully understand how genes affect faces, then embryo selection companies could show people what each of their potential future kids might look like. I suggest they not do this: it might spook me into becoming pro-life. 16: Andy Masley’s AI art is good (three examples below). 17: There’s a debate going on between philosophers and AI researchers over whether AI can be conscious. I find most of the discussion annoying - this is generally an area where we can’t know anything for sure, and both sides are mostly shouting their priors at each other. The only exception - the single piece of evidence I will accept as genuinely bearing on this problem - is that if you ask an AI whether it’s conscious, it will say no, but activating or suppressing deception-related features (sort of like a mechanistic-interpretability-based lie detection test) reveals that it thinks it’s lying when it says that! Link is to a Less Wrong comment from a researcher in the field; I look forward to seeing an eventual peer-reviewed paper. H/T JD Pressman. 18: 80,000 Hours has a high-production-value video about the AI 2027 scenario. 19: Dynomight vs. Casey Milkweed debate on mathematical forecasting, with special reference to AI 2027. And Dynomight comments on Casey’s post here. 20: The Psmiths review The Ancient City, about ways that ancient culture depended on family, clan, ritual, and “the household gods”. Sample quote: I'm more interested in what all this means for us today, because with the exception of maybe a few aristocratic families, this highly self-conscious effort to build familial culture and maintain familial distinctiveness is almost totally absent in the Western world. But it's not that hard! ... Perhaps this is why I have an instinctive negative reaction when I encounter married couples who don't share a name. I don't much care whether it's the wife who takes the husband's name or the husband who takes the wife's, or even both of them switching to something they just made up (yeah, I'm a lib). But it just seems obvious to me on a pre-rational level that a husband and a wife are a team of secret agents, a conspiracy of two against the world, the cofounders of a tiny nation, the leaders of an insurrection. Members of secret societies need codenames and special handshakes and passwords and stuff, keeping separate names feels like the opposite — a timorous refusal to go all-in. 21: Did you know: Epic Systems, the electronic medical record company, has a fantasy-themed corporate headquarters in Wisconsin, with buildings that look like castles, quaint medieval towns, and the Emerald City of Oz (h/t Devon Zuegel): Meanwhile, tech companies with ten times as much money pretend that they’re cool and playful when their HQ has some rounded edges and a set of colored cubes in front. Do better! 22: Effective altruists have been funding teams working on lab-grown meat for almost a decade now. Around 2020, they hired some experts to double-check that this was possible in principle, and the experts wrote scathing analyses saying it was cost-ineffective by so many orders of magnitude that it was basically a pipe dream. Reactions were mixed, but a lot of us beat ourselves up and vowed to be less gullible next time. But now a new report comes out arguing that the previous reports were wrong, that lab-grown meat production is going much better than the earlier reports thought possible, and it’s more or less cost-effective already for the simplest products! Again, mixed reactions, and although some of the numbers are indisputable the analysis itself this is by a VC firm with lab-based meat investments. Here are some related Metaculus questions. 23: Ozy, citing Stutzman et al: “Afghanistan after the American withdrawal has the lowest life satisfaction rate ever recorded. Two-thirds of respondents rate their life satisfaction below 2, which is generally considered to be the point at which a life is no longer worth living. Life satisfaction dropped significantly after the withdrawal of American troops. Women, people in rural areas, and the poor were particularly negatively affected.” 24: Lencapavir is dubbed a “miracle drug” for AIDS; a single dose protects against infection for six months. Unclear how this interacts with PEPFAR cuts; if PEPFAR still existed it would be a big boost to its efficacy; now maybe this might be part of a strategy to tread water? 25: Did you know: when people first started making artificial ice in the 1850s, there was a backlash from people who thought it was gross and dystopian and that people should insist on natural ice for their iceboxes. From Pessimists’ Archive, which goes on to draw an analogy to lab-grown meat, etc (h/t Isaac King on X). 26: From Peter Hague (on X) and commenter Phaethon: why did so many Anglosphere countries see immigration spikes in 2021? Each of these has their own local story. In Britain, it’s the paradoxical effects of Brexit. In the US, it’s Joe Biden being soft on immigration. And so on - but should we be looking for some deeper cause that explains the overall phenomenon? A commenter suggests “a way to soak up all the inflation from the COVID money printing”, but I can’t tell if that even makes sense. Still, should something something COVID be a leading hypothesis? 27: Jesse Singal vs. Mark Stern on the Skrmetti Supreme Court case that failed to overturn Tennessee’s ban on gender medicine. US law bans sex discrimination, so pro-transgender advocates argued that, since doctors often prescribe eg estrogen to biological women, it was sex discrimination to ban prescribing it to biological men. Tennessee’s anti-transgender argument was that they weren’t discriminating by sex, they were discriminating by diagnosis (estrogen for eg hot flashes, vs. estrogen for gender transition). There is some subtlety here (if a biological man grows breasts because of some hormone imbalance, doctors might give him testosterone to counteract it, and this seems sort of like giving biological women testosterone to make them look less like women), but these are still sort of different diagnoses (gynecomastia vs. gender dysphoria) and Tennessee said you can still think of it as diagnostic discrimination rather than sex discrimination. This makes sense, except that the standards around sex discrimination are very strict and sort of box the court in here. And in a fit of wokeness, the 2020 court (including some of the conservative justices hearing this case) applied these standards very strictly and ruled that discriminating against gays was a form of sex discrimination (since if women can date men, it’s sex discrimination if men can’t also date men), and this is obviously the same argument. Now that wokeness is less popular, the court wants to rule against transgender, but it can’t help tripping over its previous ruling and giving some kind of unprincipled confusing non-opinion. 28: Contra compelling anecdotes, only ~5% of people raised very religious end up atheist later in life (X). Most people are about as religious as their parents; most exceptions are only slightly less religious, and most families that secularize do it over several generations. Note: percentages are of total, not of each row! 29: Related: social science team proposes a three-stage model of secularization: decreased public ritual participation → decreased personal importance → decreased identification, presents apparently confirmatory data. If true, would be somewhat inconsistent with intellectual models (eg people learn about evolution and start doubting the Bible) and more consistent with institutional models (eg the government provides welfare so people no longer need to be part of a tight-knit church). 30: Navigating LLMs’ spiky intelligence profile is a constant source of delight; in any given area, it seems like almost a random draw whether they will be completely transformative or totally useless. Now Ethan Strauss reports that they are, for some reason, extraordinarily effective at teaching people golf. “I am predicting the Golf Revolution, or perhaps decline, if your perspective is that optimization tends to ruin hobbies. A sport for obsessives has been gifted the ideal tool for refinement.” 31: Claim (via nxthompson on X): “In a huge survey of young kids about phones and technology, they all say they want to be out playing in the real world. But parents don't let them out unsupervised. So they're stuck on their phones.” Interesting, but I’m nervous about social desirability bias - how many adults would say on a survey that they would rather be on their phones than playing with friends? But adults do have this choice and mostly go with the phones. 32: Steven Adler on AI psychosis. He tries to analyze ER admissions data for psychosis and finds no change. I don’t think anyone reasonable expected this to be a large enough effect to show up in ER admissions data, but there are lots of unreasonable people so I appreciate his effort. He thinks AI companies might have better data on this, and encourages them to release it. 33: Cuartetera was the greatest polo horse ever. Polo players responded in a very practical way: they cloned her, dozens of times (and it worked; the clones are also excellent). Now there is a lawsuit as different polo teams fight to get their hands on Cuartetera clones. What is the equilibrium? If the outsiders get their hands on the genetic material, do we see a world where every polo horse is a Cuartetera clone? How much is lost if nobody ever tries to breed a polo horse better than Cuartetera (since the economics might not check out if the odds of success for any given foal is too low)? H/T Gwern and Siberian Fox (on X). 34: Claim: as of 2013, India’s Agarwal caste, who make up less than 1% of the population, got 40% of the e-commerce funding. 35: Owlposting: What Happened To Pathology AI Companies? Pathology is a medical specialty. A typical task involves looking at a microscope slide full of cells and trying to determine if any of them are cancerous. This seems like a good match for AI - and for years, studies have been showing that in fact AI can equal human experts. So why isn’t it being used more? The author’s three answers: first, slide scanning is expensive and clunky, and you can’t apply AI to a slide until you digitize it. Second, it’s hard to figure out a business plan where this saves someone money and doesn’t step on the toes of big companies that can outcompete anyone they don’t like. Third, pathologists use the context of a patient’s entire clinical history when they interpret a slide, and AIs that can’t do that (either because of technical limitations or legal/privacy limitations) are at a disadvantage even if their skills specifically relating to slide-reading are better. 36: Noahpinion: Will Data Centers Crash The Economy? Suppose that AI is a bubble, either permanently (because the technology isn’t really transformative) or temporarily (because it can’t transform things quickly enough to keep up with all the dumb money pouring into it). Will the sudden write-off of data centers lead to a broader economic collapse? In 2001, the dot-com bubble harmed the tech sector, but didn’t take the rest of the economy down with it; in 2008, the subprime mortgage bubble did take the rest of the economy down with it, because it damaged banks that the whole economy relied on. The optimistic case for AI is that data center spending is mostly coming from big companies like Google and Meta that can absorb a lot of loss. The pessimistic case is that some of the money is coming from private credit, a new-ish form of finance which hasn’t really been stress-tested and whose failure modes are still poorly understood. Noah’s final verdict: the stage isn’t obviously set for a crisis yet, but there’s the potential to get there and we should consider acting (how?) early. 37: The latest Twitter talking point is that universal hepatitis B vaccination at birth is “woke”: Hep B is (aside from mother-to-child transmission) often sexually transmitted, slutty women’s children are more likely to have Hep B, so perhaps giving the vaccine to everyone (instead of testing and only giving to the children of women who test positive) is an attempt to spare slutty women the embarrassment of getting a positive test. Ruxandra Teslo provides the counterargument - Hep B tests take a while, the medical system is fragmented, and any attempt to test people and then give the vaccine inevitably leads to many positive tests falling through the cracks. Vaccinating at birth is easy and hard to screw up, the vaccine has no known side effects, and empirically child Hepatitis B rates go down (by as much as 2/3!) when countries switch from test-and-vaccinate to universal vaccination. This benefits everyone - even people who never have unprotected sex and always follow up on their medical tests - because toddlers in daycare exchange saliva copiously, and if your toddler exchanges saliva with a Hep B positive toddler they could get the disease. A funny Twitter interaction was seeing Republicans in Congress hop on the anti-slut anti-vaccination bandwagon - except for Senator Bill Cassidy (R-Louisiana), who happens to be a liver doctor, and who is still fighting the good fight. I am always nervous when a good person who I like starts engaging on Twitter, since it elevates the discourse there but also gradually turns their brain into mush - but Ruxandra has made the leap and is doing a great job not just on bio related topics but also (for example) countering Curtis Yarvin on the history of her native Romania. 38: The response to GPT-5 was confusing; most specific people who reviewed it said they were impressed (Ethan Mollick, Tyler Cowen, Nabeel Qureshi, Taelin), it performed as expected on formal benchmarks, but the overall vibes declared it a big failure. Peter Wildeford speculated that maybe there was some kind of sinister pay-to-play early access bias involved. Zvi went the other way, calling it a “reverse DeepSeek moment” (insofar as DeepSeek was a pretty average model that got glowing praise.) In the end, I agree with Peter that this was mostly a branding issue. o3 was a genuinely revolutionary model; if OpenAI had called it “GPT-5”, it would have met expectations. Instead, they called it “o3”, and called a minor incremental update a few months later “GPT-5”. Then people got mad that the exciting-sounding “GPT-5” was merely an incremental update. A secondary issue was that the router wasn’t very good, and so many queries got routed to a small version without thinking mode that was if anything a downgrade from o3. I think this tweet by Shakeel perfectly encapsulates the essence of GPT discourse in two sentences: …but maybe it’s worth asking why GPT-5 isn’t bigger than o3. Was 4.5 a failed attempt at scaling? Did it fail in a way that sort of back-handedly justifies the “lost steam” take? Does the answer depend on distinctions between pre-training scaling, post-training scaling, etc? How? 39: This month in etymology: did you know that “oy vey” is a “fully Germanic phrase” which is cognate with English “oh woe!” (h/t Wylfcen on X) 40: mRNA shows promise to be a game-changing treatment for cancer, but RFK is trying to halt research. But so far he can only starve it of money, not ban it, and the funding gap is only $500 million. Will there be enough philanthropic billionaires and private foundations to step up? Zvi points out that although there is usually a game of chicken where foundations are hesitant to touch something the government cancelled lest the government decide it can cancel everything and hope philanthropists pick up the bill, in this case there are no game theory considerations - RFK is halting it because he genuinely wants it halted, and they are thwarting him rather than playing into his hands. The only problem is that $500M is a lot of money for the private sector; a few foundations could technically afford it, but not many could afford it comfortably and still have money left over for the next few crises of this magnitude. I hope someone is trying to organize a coalition. 41: AI fantasy flash fiction Turing test. Eight stories about demons, four by famous fantasy authors, four by ChatGPT. After 3000 votes, AI wins: humans can't tell the difference and slightly prefer the AI stories. My own score was only 75%. But I will say that I thought Mark Lawrence's was obviously the best, I was ~100% sure it was human, and it convinced me that regardless of the official results it's still possible to write flash fiction that an AI obviously can't do. 42: “SignPro” offers customized “In This House We Believe” signs, try not to use this for evil. 43: China think tank assessment of how in control Xi is: still very in control, maybe not infinitely in control. 44: Related - did you know (h/t xlr8harder) that if you ask AI to write a science fiction story, it will very often name the protagonist “Elara Voss” (or some very close variant like Elena Voss), and this remains true across various models and versions? Related: Chelsea Voss of OpenAI is having a baby and has the opportunity to do the funniest thing. 45: “Hector (cloud) is a cumulonimbus thundercloud cluster that forms regularly nearly every afternoon on the Tiwi Islands in the Northern Territory of Australia…[he is sometimes called] Hector the Convector”. 46: British allergy sufferers who want to know the ingredients of things demand that British cosmetics stop listing their ingredients in Latin. “For example, sweet almond oil is Prunus Amygdalus Dulcis, peanut oil is Arachis Hypogaea, and wheat germ extract is Triticum Vulgare.” 47: Text-based RPG about being an NYT journalist at the Manifest prediction market conference. I make a brief appearance. 48: Study uses supposedly-random variation in doctor assignments to test whether the marginal mental health commitment is good or bad for patients, finds that it is quite bad. Freddie de Boer is violently skeptical (maybe literally so?) and makes some good points about how a single quasi-experimental study is never absolute proof. But I don’t think he quite justifies his opinion that the paper was irresponsible and should never have been published; it’s just a normal quasi-experimental study that we should nod and say “huh” at but not overweight as the culmination of all possible research that overcomes all possible priors. My prior is that the marginal commitment is pretty useless (many commitments are just “well, since this person arrived at our ED for some reason, it would look bad from a medico-legal perspective to just let them go, so let’s keep them a few days to evaluate” - and yeah, you should be upset about this) but I’m still surprised by how many outright negative (as opposed to zero) effects the researchers found. The strongest argument for negative effects is that it will make some people miss work and maybe lose their job. But this study found that commitment ~doubles the risk of near-term suicide (admittedly only from 1% to 2%), which would have been outside my confidence intervals for how bad it could be. I suspect confounding, but only on general principle, and I wouldn’t be too surprised either way. 49: This tweet is probably bait, but I found it a thought-provoking question: I think there’s a boring answer, where the law is more complex than just a single number and whatever kind of weird trafficking Epstein was doing is worse than whatever normal relationships these European laws are permitting. But assuming that there’s a substantive difference even after taking that into account, I think my answer is something like - we’ve got to divide kids from adults at some age, there’s a range of reasonable possible ages, we shouldn’t be too mad at other societies that choose different dividing lines within that range - but having decided upon the age, we’ve got to stick with it and take it seriously (in the sense of penalizing/shaming people who break it). This is more culturally relativist than I expected to find myself being, so good job to Richard for highlighting the apparent paradox. 50: Dilan Esper describes his experience as one of Hulk Hogan’s attorneys in the Gawker lawsuit (X). Parts I found interesting: none of the lawyers knew Thiel was funding the lawsuit; Gawker probably could have won if they had been slightly competent but kept "shooting themselves in the foot"; and Gawker probably could have won if they had just pixelated the private parts in the video. 51: Amazing concept and poems (link on X): I tried to see if AI could do this, and it did something that technically met the requirements but had zero artistic merit - using a lot of words like “nowhere” and “outside” in one, then separating them out to “no where” and “out side” in the other. I didn’t invest much energy in creating a clever prompt telling it not to do that, so feel free to report if you get better success. 52: New study claims consultants are actually good, at least for profits: "We find positive effects on labor productivity of 3.6% over five years, driven by modest employment reductions alongside stable or growing revenue" 53: A Polish team tries to test Peter Turchin’s equations for predicting political unrest on recent Polish history, has to make some changes but claims mostly positive results. 54: New big multi-author Substack, The Argument, trying to be a sort of center-left version of the model pioneered by The Free Press and other high-production-value ideological Substack properties. Excited to see Kelsey Piper is involved, and she starts off strong with a post on the latest round of First World basic income studies, which find few positive effects. This is surprising, because recipients didn’t waste the money on alcohol or gambling or anything - they paid down debt and got useful goods. Still, it didn’t even affect things that should have been obvious, like stress level. It’s not even clear that amounts of money large enough to help with rent made homeless people more likely to get houses! Matt Bruenig criticizes the article, accusing Kelsey’s studies of being downstream of Perry Preschool style dreams that exactly the right welfare program will have massively compounding effects that cut poverty out at the root and turn everyone into elite human capital; he thinks giving people money won’t do this, but it will increase equality and give the poor better lives. I assume he’s not a strong hereditarian, but his argument makes even more sense from that perspective, and I’ve certainly criticized dumb outcome measures like infant brain waves which we have only tenuous reasons to think are related to anything we care about. But Kelsey reasonably responds that the outcome measures she’s talking about include stress level and life satisfaction. To defuse this critique, Bruenig either has to argue that our construct “life satisfaction” doesn’t really measure whether someone’s life is satisfactory, or else claim that giving poor people satisfactory lives isn’t really what we’re going for - which I think would require more explanation on his part. There’s some further (impressively acrimonious) debate on X, but I don’t see anything that addresses my core concern. GiveDirectly, a charity involved in basic income experiments, has a presponse here; they say that some studies are positive, and that the ones that aren’t might have tried too little cash to matter, or been confounded by COVID making everything worse. They also point out that basic income is harder to study than traditional programs like giving people housing, because if you’re giving housing you can measure housing-related outcomes directly and have a pretty good chance of getting enough statistical power to find them, but since everyone spends cash on different things, the positive effects might be scattered across many different outcomes (and therefore too small to reach significance on each). Everyone involved in this debate wants to emphasize that the poor results are for First World studies only, and that studies continue to show large benefits to giving cash in the developing world. 55: Related: I was less impressed by The Argument’s first foray into housing policy, which follows an all-too-familiar pattern: Some people say they don’t like noise and disorder and try to make rules against it in their apartments.
Inline links: Snow Martingale’s perspective, Someone links (X), Eliezer says (X), The rest of Twitter (X), Someone looks at the paper more closely (X), Richard Hanania interviews Scott Wiener on YIMBYism, robots can already carve statues; buildings with AI-created stone ornaments are next, Terminal lucidity, the METR study showing, Emmett Shear objects, Ruby Bloom, Quentin Anthony, Jacob Goldsmith gives, Sasha Gusev argues, commenters convinced me last time, Unboxing Politics, East Hunter takes aim at, Sebastian Jensen continues, Emil Kirkegaard points out, Andy Masley’s AI art is good, https://substackcdn.com/image/fetch/$s_!5bZR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcafaf1f2-b7b9-4acd-a0a7-2de9fc31c724_2688x1792.jpeg, https://substackcdn.com/image/fetch/$s_!6-cZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffb2b5c-3fcb-467d-b1f3-7aafb5dc90a3_1024x1024.jpeg, https://substackcdn.com/image/fetch/$s_!UyUx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b73899-6e25-460f-94f9-46d0713c5dd2_1024x1024.webp, it thinks it’s lying when it says that!, JD Pressman, a high-production-value video, Dynomight, Casey Milkweed, here, The Ancient City, has a fantasy-themed corporate headquarters, Devon Zuegel, https://substackcdn.com/image/fetch/$s_!yqG3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b2d15b0-e0f0-4bae-a2f6-aabfd2eda017_1536x794.jpeg, https://substackcdn.com/image/fetch/$s_!taZn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad460bb8-4416-4886-8ef0-b3d36f04c81a_640x480.png, https://substackcdn.com/image/fetch/$s_!bDya!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd45e5123-753d-4c87-b108-6523b38004cb_1480x833.webp, now a new report comes out, Here are some related Metaculus questions, Ozy, Stutzman et al, is dubbed a “miracle drug” for AIDS, Pessimists’ Archive, Isaac King on X, Peter Hague (on X), Phaethon, https://substackcdn.com/image/fetch/$s_!Ry-j!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcea22939-8cf9-4b32-8494-511f01cb2758_964x755.png, Jesse Singal vs. Mark Stern, only ~5% of people raised very religious end up atheist later in life (X), https://substackcdn.com/image/fetch/$s_!VScL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2509e243-f6f7-4448-9779-a8f9be45a2f9_1500x1500.png, proposes a three-stage model of secularization, extraordinarily effective at teaching people golf, nxthompson on X, a huge survey, Steven Adler on AI psychosis, they cloned her, dozens of times, a lawsuit, Gwern, on X, got 40% of the e-commerce funding, What Happened To Pathology AI Companies?, Will Data Centers Crash The Economy?, Ruxandra Teslo provides the counterargument, and who is still fighting the good fight, countering Curtis Yarvin on the history of her native Romania, Ethan Mollick, Tyler Cowen, Nabeel Qureshi, Taelin, on formal benchmarks, speculated, a “reverse DeepSeek moment”, with Peter, this tweet by Shakeel, https://substackcdn.com/image/fetch/$s_!GJNZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba0d8cf-fab8-4370-bcad-df789e157fdc_591x402.png, Wylfcen on X, Zvi points out that, AI fantasy flash fiction Turing test, customized “In This House We Believe” signs, China think tank assessment of how in control Xi is, xlr8harder, Chelsea Voss of OpenAI is having a baby, Hector (cloud), demand that British cosmetics stop listing their ingredients in Latin, Text-based RPG about being an NYT journalist at the Manifest prediction market conference, finds that it is quite bad, violently skeptical, literally so?, This tweet, https://substackcdn.com/image/fetch/$s_!S9fU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa558c09b-7fb6-40a8-a8a0-27b658a2c876_576x687.png, describes his experience as one of Hulk Hogan’s attorneys in the Gawker lawsuit (X), link on X, https://substackcdn.com/image/fetch/$s_!zyh7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e9f0f6-d794-4ea2-b24b-5d4803bf28dc_590x478.png, New study claims consultants are actually good, tries to test Peter Turchin’s equations for predicting political unrest on recent Polish history, The Argument, a post on the latest round of First World basic income studies, criticizes the article, infant brain waves, debate on X, has a presponse here, first foray into housing policy
36: Wang, Visscher, et al is a step up in studying the genetics of racial differences. It looks at a sample of Mexican families of mixed white-native heritage. By coincidence, some of their children will inherit more genes from the white side, and others more genes from the native side. These children will have identical social situations (since they’re from the same families) but different proportional ancestry, so we should expect any racial differences among them to come from the genetic rather than the social aspect of race (except that we can’t rule out “colorism”, ie genes making people look different and then causing discrimination). The paper finds that racial genetic differences directly affect height, diabetes risk, and other medical traits, but not educational attainment. Twitter discussion here. Cremieux argues here that genes don’t predict educational attainment in developing countries at all, so it’s unsurprising that the particular genes associated with race wouldn’t do so, and so this says nothing about the racial component of traits that are genetically heritable. He claims to have a version of the same analysis with UK whites vs. blacks that gets opposite results. Sasha Gusev critiques Cremieux’s analysis here, including pointing out that it fails to find racial differences in skin color to be genetic. Cremieux says that skin color is determined by such a small number of genes that this method, designed for truly polygenic traits, shouldn’t be expected to classify it properly.
That is, once you include the rare variants, the amount of genetic variation that “should” exist but doesn’t shrinks to only 12%. Plausibly an even bigger study, investigating even rarer variants, could shrink the gap further, all the way to zero. The oldest and strongest argument against hereditarianism - if all these genes exist, why can’t we find them? - has finally been put to rest. You couldn’t find them because they were rare. But when you include rare variants in your search, you can find at least 88% of them. But the nurturists declared victory (Sasha Gusev on Substack) because the graph, zoomed out, looks like this: Of the colored region, very little is red (representing missing heritability). But most of the graph is still black - ie, not heritable. So for example, this study found that IQ was 41% heritable, and they were able to “find” 33%pp of that - a full three-quarters. But 41% heritable is still a low number! Previous studies found high numbers (like 50 - 80%) for expected heritability, but were only able to get small numbers (10 - 20%pp) for “found heritability”. This study “closed the gap” by finding medium numbers (~30 - 40%) for both. But a medium amount of almost-fully-found heritability is still only a medium amount of heritability. Start with 30 - 40%, shave off a bit for confounders, and you might end up with only 10 - 20% direct causal heritability, which would be a total nurturist victory. The hereditarians object that this study wasn’t designed to pinpoint specific heritability numbers. Other methods are more accurate. But (the nurturists counter) those more accurate methods disagree among themselves, and some of them give results similar to the low numbers in this study. So this study is welcome (to nurturists) confirmation that the other low studies might have been on the right track. In other words, your interpretation on this study depends on which of these statements you agree with more: This study was designed to determine whether the missing heritability - the gap between relatedness and molecular methods - can be found in rare variants. It can be. We should celebrate this, and not worry too much about the exact heritability numbers, since it was never designed to find exact numbers in the first place.
56: Drug Monkey: Considering The Impact Of Multi-Year Funding At NIH. Sasha Gusev’s claim: “It is sort of flying under the radar outside of academia, but a completely arbitrary NIH budgeting change is about to decimate a generation of research labs with zero upside.”
Backlinks
- Alex Young
- Charles Lehman
- Concepts: G
- Concepts: I
- Cremieux
- Damien Morris
- Davide Piffer
- East Hunter
- Eric Turkheimer
- GREML
- GREML-WGS
- Gusev
- GWAS
- Highlights From The Comments On Missing Heritability
- IQ
- Kemper
- Kirkegaard
- Links For February 2026
- Links for July 2024
- Links For November 2024
- Links For October 2025
- Links For September 2024
- Links For September 2025
- Missing Heritability: Much More Than You Wanted To Know
- Nature
- Open Thread 390
- People: A
- People: D
- People: E
- People: G
- People: K
- People: S
- People: V
- People: W
- People: Y
- slatestarcodex
- RDR
- Ruben Arslan
- Ruxandra Teslo
- Suddenly, Trait-Based Embryo Selection
- The Good News Is That One Side Has Definitively Won The Missing Heritability Debate
- Visscher
- Wainschtein
- Young