Young
Article
Young is a recurring person in the Astral Codex Ten archive, appearing 3 times across 3 issues between June 26, 2025 and July 31, 2025. The archive places it in contexts such as “Young’s Icelandic sample was representative of the country”; “Markel’s 8% for EA is very different from Young’s Icelandic estimate of 40%”; “Young’s Icelandic estimate (~40%)“. It most often appears alongside IQ, Sasha Gusev, 23andme.
Metadata
- Category: People
- Mention count: 3
- Issue count: 3
- First seen: June 26, 2025
- Last seen: July 31, 2025
Appears In
- Missing Heritability: Much More Than You Wanted To Know
- Highlights From The Comments On Missing Heritability
- Suddenly, Trait-Based Embryo Selection
Related Pages
-
- IQ (3 shared issues)
-
- Sasha Gusev (3 shared issues)
-
- 23andme (2 shared issues)
-
- Alex Young (2 shared issues)
-
- Arthur Jensen (2 shared issues)
-
- Awais Aftab (2 shared issues)
-
- Cremieux (2 shared issues)
-
- EA (2 shared issues)
-
- Eric Turkheimer (2 shared issues)
-
- GREML (2 shared issues)
-
- Gusev (2 shared issues)
-
- GWAS (2 shared issues)
External Links
Source Context
Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.
Maybe there are genes we haven’t found yet For most of the 2010s, hypothesis 2 looked pretty good. Researchers gradually gathered bigger and bigger sample sizes, and found more and more of the missing heritability. A big 2018 study increased the predictive power of known genes from 2% to 10%. An even bigger 2022 study increased it to 14%, and current state of the art is around 17%. Seems like it was sample size after all! Once the samples get big enough we’ll reach 40% and finally close the gap, right? This post is the story of how that didn’t happen, of the people trying to rehabilitate the twin-studies-are-wrong hypothesis, and of the current status of the debate. Its most important influence/foil is Sasha Gusev, whose blog The Infintesimal introduced me to the new anti-hereditarian movement and got me to research it further, but it’s also inspired by Eric Turkheimer, Alex Young (not himself an anti-hereditarian, but his research helped ignite interest in this area), and Awais Aftab. (while I was working on this draft, the East Hunter Substack wrote a similar post. Theirs is good and I recommend it, but I think this one adds enough that I’m publishing anyway. You can see Gusev’s response to East Hunter here) In an interview with Aftab, Gusev explained his philosophy like so (I am excerpting heavily from a long interview and editing for flow/emphasis; completionists should read the whole thing): For teacher-reported ADHD, the twin heritability estimate was 69% while the GWAS-based heritability estimate [ie using genome-wide association studies where researchers actually try to find the genes involved] was just 5%; with similar gaps for other behavioral traits. These are huge differences! If we believe the twin study estimates, then this gap implies that there is a lot of causal genetic variation out there that GWAS/molecular data is not picking up. One way to think about this is that traits that are under stronger natural selection will have more of their genetic variants driven to low frequency, and thus less detectable by GWAS. So a big gap between GWAS and twins could imply that rare variants are very important due to strong selection. On the other hand, if we are skeptical of the twin study estimates, then this gap implies a substantial contribution from those environmental complexities I talked about previously. For a long time, the field of molecular genetics was operating under the assumption that the missing heritability was largely in the rare variants we had not yet measured. But a number of recent advances have started to tip the scales against that argument. First, some of the earlier molecular heritability estimates were found to be inflated by some mix of technical issues and cultural transmission, so the amount of missing heritability actually increased. Second, a new model was developed that could estimate total direct heritability using molecular data from mother-father-child trios, with very few model assumptions (the title literally states “… without environmental bias”; Young et al. 2018), and it too found estimates that were substantially lower than twins on average. Third, several studies have now actually measured the influence of rare variants in various forms, and they are so far not adding up to explain as much as we would expect from twin heritability estimates. Fourth, there is little evidence of the strong natural selection that would be needed to generate a massive trove of rare variants untagged by GWAS. I am a molecular geneticist, and this drumbeat of evidence from molecular data has convinced me that twin studies are either 2-3x inflated or estimate something fundamentally different from direct heritability. We’ll start by looking at Gusev’s first claim: that “earlier molecular estimates” (ie polygenic scores) are significantly inflated, or at least don’t mean what we thought they meant. This won’t be directly relevant to our question - even our original number of 17% implies missing heritability2, so moving it down a bit to 5-10% or up a bit to 20% doesn’t add or subtract from the fundamental mystery. But this discussion has gotten a lot of people extremely confused, and we’ll need to deconfuse ourselves if we’re going to get any further. Are Most Current Polygenic Scores Confounded? A polygenic score is one possible result of a genome-wide association study. These scores are algorithms which take a person’s genes as input and return information about their traits as output. Better polygenic scores can predict a higher percent of variance in a certain trait. For example, the latest polygenic score on educational attainment can predict up to 17% of the variance in how much schooling someone completes. Predictive power is different from causal efficacy. Consider a racist society where the government ensures that all white people get rich but all black people stay poor. In this society, the gene for lactose tolerance (which most white people have, but most black people lack) would do a great job predicting social class, but it wouldn’t cause social class3. It certainly wouldn’t be a “gene for social class” in the sense where it controls the part of your brain that helps you manage money, or where genetic engineering on this gene would make people richer. Here are three common ways that not-directly-causal genes can show up as predicting a trait: Population stratification: genes are linked to culture, and culture determines the trait, as in the racism-lactose example above. Many studies naturally mitigate this concern by using the UK Biobank of mostly white British samples, and by correcting for “principal components” that correspond to ancestry (and there are other, even more complicated ways to correct for this). But ancestry variation is fractal; no matter how uniform your sample, there will still be micro-differences you didn’t consider. For example, if you’re analyzing the educational attainment of white British people, it’s very relevant that families with Norman surnames still outperform their Saxon peers at Oxbridge admissions 900 years after William the Conqueror. If Britons with more Norman ancestry have non-education-related genes that their Saxon peers lack, these could be mistakenly classified as genes for education or other behavioral differences between the two groups. Assortative mating: Suppose that both height and wealth are desirable qualities in a mate. Then tall people will tend to marry rich people, and over generations, the same people will be both rich and tall. That means that even if wealth is 0% genetic, a study looking for “the gene for wealth” will be able to find genes that rich people have more often than poor people - namely, the genes for height. Or suppose that smart people tend to marry other smart people - surely true, if only because so many couples meet at college. Then all the intelligence genes will concentrate in the same people. So any study that tries to determine how much Intelligence Gene ABC affects intelligence will get inflated4 results, because everyone with Intelligence Gene ABC will also have many other intelligence genes - if the study naively asks “How much smarter are people with Gene ABC than people without it?”, it will find they are much smarter (because it’s accidentally including part of the effects of all the other intelligence genes that travel along with it). Parent-to-child transmission, aka “genetic nurture”: Children tend to share their parents’ genes. So if there’s a gene that causes parents to create a certain kind of childrearing environment, and that childrearing environment affects a trait, it will falsely look like a gene that directly causes the trait. Suppose Gene XYZ causes parents to read more books to their children, and reading books to children increases their IQ. Parents with Gene XYZ will tend to read books, so their kids will get high IQ. Those kids will also (probably) inherit Gene XYZ from their parents. So people with Gene XYZ will tend to have higher IQ. If you naively study which genes increase IQ, you’ll see Gene XYZ in more smart people than dumb people, and think it’s a “gene for IQ”. This is “causal” in a certain sense, but it’s not the one we traditionally think about, and it behaves importantly differently - for example, if you genetically engineer someone to have Gene XYZ, their IQ won’t go up (although their kids’ IQs might). How can we tell if a polygenic predictor is “direct” vs. confounded by these non-causal pathways? The most common technique is within-family comparisons: do the traditional “check if people with the gene differ on a trait from people without the gene” study, but limit its focus to (for example) sibling pairs. Suppose a couple has two children; the first child inherits Gene ABC and the second one doesn’t. If the first child is smarter than the second child, that provides some infinitesimal evidence that Gene ABC is a gene for intelligence. Repeat this process over hundreds of thousands of sibling pairs, and the infinitesimal evidence can reach statistical significance. Since the family unit is a perfect natural experiment that isolates the variable of interest (genes) while holding everything else (culture and parenting) constant, within-family results are protected against stratification, assortative mating, and genetic nurture effects. The culmination of this research program is Tan et al 2024, which finds that many polygenic predictors lose significant accuracy when retested among siblings. For example, educational attainment is 50% uncorrelated with direct genetic effects. You need to square this to figure out what percent is causal; when you do that, you find that the polygenic score that explained 14% of EA is only 4%pp direct genes, with the other 10%pp being nondirect5 confounders. So yes, it seems like most polygenic scores that don’t validate within families are confounded. However unhappy we previously were that we had only found 14% of genes for EA (vs. 40% expected), we should now be much more unhappy - we really only know 4% of genes that directly cause EA. On the other hand, you might say - so before we only knew 14%pp out of 40%. Now we only know 4%pp out of 40%. This is discouraging, but it doesn’t fundamentally change what we know about nature vs. nurture. Both 4%pp and 14%pp are less than 40% - with either number, we must be missing something or doing something wrong. Probably that’s insufficient sample size. We’ll keep working on sample size and other things, and eventually scrounge up the missing 26%pp or 36%pp or whatever of the variance, so this doesn’t change anything. All it means is that one predictive method that the average person never knew about in the first place doesn’t work as well as we thought. Who cares? Not doctors. So far this research has only just barely begun to reach the clinic. But also, all doctors want to do is predict things (like heart attack risk). They don’t care if they use causal vs. nondirect genes. It doesn’t matter if you’re “only” at higher risk of heart attack because you’re black, or Norman, or because your parents read books to you - you still need more heart attack medication! Polygenic embryo selection companies should care. They offer polygenic scores that can be used to select healthier or smarter embryos. If the predictors they use rely partly on variants that aren’t causal within families, their real benefits could be far lower than advertised. I talked to one of these companies, who said they’d already adjusted for these effects and expected their competitors had too - the proper antidote to this problem, sibling controls, is a natural choice when you’re literally picking between siblings. The biggest losers are the epidemiologists. They had started using polygenic predictors as a novel randomization method; suppose, for example, you wanted to study whether smoking causes Alzheimers. If you just checked how many smokers vs. nonsmokers got Alzheimers, your result would be vulnerable to bias; maybe poor people smoke more and get more Alzheimers. But (they hoped) you might be able to check whether people with the genes for smoking get more Alzheimers. Poverty can’t make you have more or fewer genes! This was a neat idea, but if the polygenic predictors are wrong about which genes cause smoking and what effect size they have, then the less careful among these results will need to be re-examined. But the reason I spent so much time on the subject here is that this has confused a lot of people into thinking heritability itself was confounded and is actually just 4%. When I read my first few blog posts on these findings, I came away thinking they were claiming to have discredited twin studies and heritability. And although I take partial ownership of my own poor reading comprehension, I maintain that the way that the new anti-hereditarians discuss this is pretty bad. For example, Turkheimer’s treatment of the Tan study above is called Is Tan Et Al The End Of Social Science Genomics?, and includes passages like: The median [direct genomic effect] heritability for behavioral phenotypes is .048. Let that sink in for a second. How different would the modern history of behavior genetics be if back in the 80s one study after another had shown that the heritability of behavior was around .05? When Arthur Jensen wrote about IQ, he usually used a figure of .8 for the heritability of intelligence. I know that the relationship between twin heritabilities and SNP heritabilities is complicated, and in fact the DGE heritability of ability is one of the higher ones, at .2336. But still, it seems to me that the appropriate conclusion from these results is that among people who don’t have an identical twin, genomic information is a statistically non-zero but all in all relatively minor contributor to behavioral differences. And comments included things like: I don’t know if [this study] is the end of social science genomics, but it should certainly be the end of attributing significant genetic influence to behavioral traits (despite the recent scientist-generated cartoons touting genes for “income”). And: There's no doubt that this reported findings have dealt a fatal blow to my conviction that behavioral traits are pre-eminently heritable…This is a remarkable example of an objective statistical fact mercilessly crushing the more subjective experiential sense of "A looks and acts more like B than C because A and B have the same parents." This subjective evidence is almost unshakable and universal in its application as a tried and tested psychosocial heuristic. And yet, here we are. Turkheimer is either misstating the relationship between polygenic scores and narrow-sense heritability, or at least egging on some very confused people who are doing that, and the dynamic was bad enough that I got confused myself for a while. But even more confusing, the new anti-hereditarians actually are saying that lots of behavioral traits have very low heritability! But this point requires different arguments, only tangentially related to these. So let’s move on to… Is Heritability Genuinely Low? (Part 1: GWAS & GREML) In the mid 2010s, when genome-wide association studies (GWAS) based polygenic predictors were getting better every year, it was easy to hope they might reach 40% and close the “missing heritability”. But since then, progress has stalled. The second-to-last tripling of sample size, from 300K to 1M between 2016 - 2018, increased predictive power from 6% → 12%. The last tripling, from 1M to 3M between 2018 - 2022, only increased predictive power from 12% → 14%. If you graph sample size vs. predictive power, it looks like there's an asymptote between 15 - 20% or so. (of which - remember - only 5% is directly causal!) Worse, a mid-2010s technique called GREML allowed researchers to estimate the percent of variance in a trait that comes from the sorts of common genes studied in GWAS, without having to identify the genes involved. A 2016 GREML paper suggested that the maximum share of variance that GWASs of educational attainment could ever discover was about 21% (again, compared to 40% predicted genetic from twin studies). Since unavoidable methodological issues will prevent GWASs from reaching the literal maximum possible, this agrees with the evidence suggesting an asymptote between 15 - 20%. So either twin studies are wrong and traits are less heritable than believed, or the heritability must lie somewhere other than the common genes identifiable by GWAS. What about rare genes? GWASs focus on genetic variation common enough to be worth including in a basic genetic test. Most of this is single nucleotide polymorphisms (“SNPs”). A single nucleotide is one letter of DNA - for example, a C or a G. Polymorphisms are genes that commonly vary in humans - sometimes across races (for example, some humans have a gene for light skin, and other humans have a gene for dark skin), and other times within races (for example, some white people have a gene that makes cilantro taste like soap, and others don’t). So SNPs are single-letter spots in DNA where different people often have different letters. How often? Some people say 1%, but the more practical definition is “often enough that someone has noticed and added it to the test panel”. There are three billion letters in the genome, of which only a few million are commonly-tested SNPs. But these SNP studies have limited7 ability to measure personal mutations and rare variants. Sometimes your parents’ egg and sperm cells mess up copying a nucleotide of DNA, and you get a mutation that isn’t inherited from your ethnic group or even from your subgroup/family line - it’s just some idiosyncratic DNA change that you might be the first person in history to have. Since scientists have never seen this mutation before, they don’t know about it and can’t test for it without doing something more expensive than a simple SNP screen. And SNP studies have limited ability to detect anything more complicated than a single letter changing to another single letter. But some mutations are more complicated structural variants. For example, some bits of DNA get stuck on repeat - one person might have GATGAT, another person might have GATGATGATGAT, and a third person might have fifty GATs in a row. Other bits come out backwards. Sometimes a whole chunk of DNA goes missing, or moves to the wrong place. Occasionally a gene reads The Selfish Gene by Richard Dawkins, takes it too seriously, and evolves some ridiculous trick for spamming itself all over the genome. So if even the best molecular studies seem to be asymptoting around 15-20% of variance in educational attainment, but twin studies suggest it’s 40% genetic, might rare variants and structural variants make up the missing 20-25%pp? This remains a topic of bitter disagreement. On the one side, hereditarians bring up a Darwinian argument: imagine a genetic engineer who hopes to find the genes for educational attainment and edit them to make everyone smart and successful. She looks harder and harder, becoming more and more exasperated as they fail to materialize. Finally, she realizes she’s been scooped: evolution has been working on the same project, and has a 100,000 year head start. In the context of intense, recent selection for intelligence, we should expect evolution to have already found (and eliminated) the most straightforward, easy-to-find genes for low intelligence. Therefore, everything left should be convoluted or hidden or impossible to work with. So although this requires a sort of god-of-the-gaps argument - where we keep pushing heritability into whatever genes are too weird for existing techniques to detect - there are some reasons to think God really is in the gaps here. And a 2017 paper uses some clever techniques to estimate the share of intelligence variation lurking in hard-to-measure genes and finds it’s more than half: “By capturing these additional genetic effects, our models closely approximate the heritability estimates from twin studies for intelligence and education.” (see also Wainschtein 2022, Sidorenko 2024) The anti-hereditarians disagree. They cite papers like Zeng which measure the strength of selection on intelligence and suggest that it’s too weak to concentrate so much of the variation in rare genes8. And Sasha Gusev mentions Weiner 2023, which finds that in fact rare variants “explain 1.3% (SE = 0.03%) of phenotypic variance on average – much less than common variants” (other experts say that burden heritability only captures some rare variants and is not the right tool for this problem). But it may not even matter, because another set of findings suggests that heritability is genuinely low even when the rare variants are counted. Is Heritability Genuinely Low? (Part 2: Sib-Regression and RDR) Two newer methods, Sib-Regression and RDR, ask: using what we know from genetic studies, how much genetic variation do we think exists, total, across both common and rare genes? On average siblings share 50% of genes. But there’s a little randomness in meiosis, so some siblings might share 40% and others might share 60%. The more genetic influence on a trait, the more similar sibling pairs who share 60% of their genes will be, compared to sibling pairs who only share 40% of their genes. Since 60%-gene siblings and 40%-gene siblings are both equally part of the same family, you can use these numbers to calculate heritability unconfounded by a range of family factors. This is Sib-Regression. If you do a more complicated statistical process to extend the same idea to relatives other than siblings, it’s relatedness disequilibrium regression or RDR. GWAS asks: Looking at common easy-to-study genes, how much variation in a trait have we explained right now? GREML asks: looking at common easy-to-study genes, how much variation could we ever explain? But sib-regression and RDR ask a question more like twin studies: considering all genes, whether common / rare / easy-to-study / hard-to-study, how much variation is there total? This could address the rare variant objection mentioned above. And in many ways, these techniques are better than twin studies - Sib-Regression eliminates many potential biases, and RDR eliminates even more (although it’s harder to pull off, requiring more genetic information and computational resources). These techniques are new and hard-to-use, and only a few published studies have applied them to the sorts of behavioral traits we’re interested in: Young et al (2018) did Sib-Regression and RDR to genetic data from Iceland. Sib-regression found educational attainment = 40% (±15%) heritable, and RDR found 17% (±9%) heritable. Kemper et al (2021) did Sib-Regression only to genetic data from Britain. It found educational attainment = 14% heritable. This number conflicts with the 40% from the Young paper. Why? Unclear, but it could be selection bias - Young’s Icelandic sample was representative of the country; Kemper’s British population were Biobank volunteers who tend tend to be healthier and higher-class than the population at large. Upper-class people may have restricted range in educational attainment, or different factors affecting their educational attainment compared to the overall population. Either way, these are closer to the low estimates from GWAS and GREML (7% direct, 20% total), than to the higher estimates from twin studies (40%, generally presumed direct). And we can no longer use contributions from rare variants to paper over the difference. So what is going on? It seems like we have to accept one of three possibilities: Either something is wrong with twin studies. Or something is wrong with Sib-Regression and RDR (and then we can explain away GWAS and GREML by saying they’re missing rare variants). Or something is wrong with how we’re thinking about this topic and comparing things. What’s Going On? (Part 1: Is Something Wrong With Twin Studies?) Twin studies have dominated discussion of behavioral genetics for decades, so there’s a vast literature investigating their various assumptions and whether something might be wrong with them. Here are some of the assumptions and what the research says about each. Some of these will be duplicates of the GWAS confounders above, but we’ll go through them again anyway to review how they apply to twins. 1: Parents Treat Fraternal And Identical Twins The Same: Twin studies claim that twins are a uniquely powerful genetic laboratory; both fraternal and identical twin pairs have equally concordant environments, but identical twins have more concordant genes. Therefore, the more similar identical twin pairs are relative to fraternal twin pairs, the more heritable a trait must be. But this conclusion falls apart if identical twin pairs actually have more similar environments than fraternal twin pairs do, maybe because parents (knowing their twins are identical) treat them more similarly than they would fraternal twins. Would-be twin-study-discreditors have been trying to argue that this must be true for decades, but it’s always been a kind of quixotic battle. Remember, twin studies find many behavioral traits like IQ are >60% heritable, so you would need to prove not only that parents treat identical twin pairs differently from fraternal, but that this was an overwhelming effect. Parents of identical twins would have to obsessively expose them to the exact same stimuli in the exact same order; parents of fraternal twins would have to send one to the Gifted Advanced Placement Acceleration program while locking the other in a box and force-feeding them lead pellets. Common sense tells us there are no such differences, and studies confirm this: when parents are wrong about their twins’ status (eg they have fraternal twins, but falsely think they’re identical, or vice versa) their trait similarity matches their real status, rather than the incorrect status that determined how their parents treat them; parental treatment explains less than 1% of why identical twin pairs are more concordant (2, 3, 4). See also Felson 2013, which tries to measure environmental similarity and adjust for it, with minimal effects. Are these two cuties monozygotic or dizygotic? Are you sure? (answer) 2: Fraternal And Identical Twins Have Equally Concordant Uterine Environments: Fraternal twins have different sacs in the uterus and use different placentas. Most identical twins share a placenta, and some share an amniotic sac. If trait similarity is caused by sharing a placenta or sac (maybe because the placenta is defective, the fetal brain is starved of nutrients, and so the person has a lower IQ when they grow up), twin studies would falsely read this identical-fraternal difference as genetic. Luckily this is easy to study; not all identical twins share a placenta or sac, so you can cleanly separate the effect of uterine environment from genetics. If you measure enough traits, you can find small deviations in some, but it’s not clear whether this is just multiple testing, and in any case the deviations are small. The best studies suggest this chips off somewhere between 0 - 3% from heritability estimates9. 3: There is little assortative mating: We discussed this one above in the earlier section on GWAS - smart/pretty/kind/whatever people tend to marry other smart/pretty/kind/whatever people. Why would this bias twin study results? Identical twins share 100% of their genes. Fraternal twins ought to share 50% of their genes - but they get half their genes from their mother, and half from their father. In the degenerate case where the mother and father have exactly the same genes (“would you have sex with your clone?”) even fraternal twins will be extremely similar (although not quite identical, since they’ll get different alleles from each clone). In the more plausible case where mothers and fathers are just a little more alike than chance (eg because smart people tend to marry other smart people), fraternal twins will share a genetic tendency towards a trait somewhat more than their 50% shared genes suggest. Since this makes fraternal twin pairs more (genetically) like identical twin pairs, and twin studies assess heritability as the difference in fraternal-identical-twin-pair concordance, this bias would make twin studies underestimate heritability. But this is the opposite of what you would need to “discredit” twin studies - if this bias is true, then everything is more genetic than twin studies think. And unlike the previous two biases, this one seems real and important, so much so that when you adjust for it, the heritability of educational attainment rises from ~40% to ~50%. I’m only mentioning this one here because some anti-hereditarians argue that you can’t trust twin studies because of assortative mating, without mentioning that this can only bias them down. 4: Population stratification: This is often large and worth worrying about, but it applies to identical and fraternal twin pairs equally, and doesn’t bias twin study heritability estimates much (though it might shift the balance between shared and non-shared environment). See eg the sentence around footnote 30 here. 5: Non-additive / “interaction” effects: These are theoretically interesting, but all research thus far has found they are minimal (1, 2). Some experts think this may miss rarer or harder-to-find interactions; we’ll return to this later. 6: “Genetic nurture”, parent-to-child Mentioned above: if there is a gene for reading books to kids, and reading books raises IQ, it will look like a “gene for IQ”. This isn’t as relevant to twin study estimates of heritability, since both identical twins and fraternal twins are equally related to their parents, and any trait caused by genetic nurture wouldn’t differ between them (and therefore would not falsely appear heritable in this design). Rather, they would appear as shared environment. 7: “Genetic nurture”, sibling-to-sibling That is, suppose your sibling’s traits influence your own development. For example, suppose your sibling has a gene that makes them sabotage your schoolwork, causing you to fail and drop out of school early. An identical twin would share this gene with their sibling more often than a fraternal twin, making it look like a “gene for doing badly at school” (since the people who have it do worse at school than those who don’t). Why are we even talking about this? Do we really think it’s a big part of the variance in behavioral traits? Challenging twin study heritability estimates through this route requires inhabiting a weird no-man’s-land where otherwise-invisible genetic and environmental pathways suddenly flare up when you say the magic words “it was done by a sibling”. For example, this requires a strong effect of shared environment - that is, your educational attainment has to depend on whether you’re being sabotaged or not. But in general, shared environmental effects are weak. And it requires a strong effect of genes - that is, this mechanism only works if your sibling’s tendency to sabotage you is highly genetically determined. But we’re deploying this claim to deny that traits like IQ or educational attainment are highly genetically determined. So to get much out of this, the tendency to sabotage siblings would have to be more genetic than other behavioral traits! The reason this convoluted possibility gets brought up so often is that, unlike the more plausible parent-to-child genetic nurture, twin studies can’t rule it out. So if you really want to deny twin studies, this is one of your best bets. But when investigated, this has effects indistinguishable from zero. I’ve been a bit mean in this whole section, because people really like to dismiss twin studies as “Oh, don’t you know, those depend on assumptions, I bet you never considered that assumptions might be wrong”, and then Gish Gallop you with different assumptions until you give up. But scientists have actually done a lot of really good work checking the assumptions and they mostly hold. An alternative way of validating twin studies (brought up by Noah Carl in this article) is to check them against their close cousins, adoption studies and pedigree studies. Pedigree studies investigate large family trees, and check how trait similarity decreases with genetic distance. They avoid twin specific biases (like different treatment of fraternal vs. identical twin pairs, or different prenatal environments), while adding others like assortative mating. Here are the heritabilities of IQ and EA found in pedigree studies10 (see footnote for sources and caveats, and see also here and here for somewhat similar designs): Adoption studies investigate whether adoptees’ traits are more correlated with their adoptive or biological parents. They avoid a large swathe of biases, at the risk of introducing new adoption-related biases of their own (like the possibility that agencies deliberately place adoptive children with parents who are culturally or behaviorally similar, or the possibility that adoptees were adopted late enough to still get some shared environment from their biological parents). Here are the findings of some of the largest and best11: Both straightforwardly confirmed the larger heritability numbers found in twin studies. I would add the evidence from some less formal “adoption studies”12. During residency, I spent a few months working in a child psychiatric hospital for the worst of the worst - kids who committed murder or rape or something before age 18. Many of these children had similar stories: they were taken from their parents just after birth because the parents were criminals/drug addicts/in jail/abusing them. Then they were adopted out to some extremely nice Christian family whose church told them that God wanted them to help poor little children in need. Then they promptly proceeded to commit crime / get addicted to drugs / go to jail / abuse people, all while those families’ biological children were goody-goodies who never got so much as a school detention. When I met with the families, they would always be surprised that things had gone so badly, insisting that they’d raised them exactly like their own son/daughter and taught them good Christian morals. I had to resist the urge to shove a pile of twin studies in their face. This has left me convinced that behavioral traits are highly heritable to a level that it would be hard for any study to contradict. Ultimate source here. Although the study is confusing about this, I think it’s trying to say that almost 90% of subjects were adopted before age 2. But I don’t think studies do contradict this. Given the degree to which their assumptions have been validated, and the level of confirmation from pedigree and adoption studies, I think they have earned a presumption of accuracy. Doubting the twin studies doesn’t seem like a promising route to reconciling the twin-vs-Sib-Regression/RDR discrepancy. What’s Going On? (Part 2: Is Something Wrong With Sib-Regression And RDR?) Sib-Regression is a clever way of avoiding most biases. Its independent variable - the degree to which some sibling pairs end up with slightly more shared genes than others - is even more random and exogenous than the difference between fraternal and identical twins. It can sometimes have biases related to assortative mating (which would falsely push heritability down), but otherwise it’s pretty good. RDR has many of the same advantages, and allows more diverse relationships and so larger sample sizes. It’s hard to think of ways these methods could be wildly off. There is one caveat: although RDR includes most of the rare and structural variants missed by GWAS, in theory it can miss certain ultra-rare variants which are so uncommon that they aren’t shared between some of the relative pairs used in RDR. De novo variants that occurred during the subject’s own conception would be in this category, if the subject didn’t have children or didn’t pass on that gene13. This seems like a pretty small subcategory of genetic variation, and I wouldn’t normally expect that much of importance to be hiding here, but maybe it’s more important than it seems. RDR also doesn’t include much variance caused by statistical interactions between genes. Although we said above that these are usually found to be insignificant, they might be more important in a trait like intelligence that has been under recent evolutionary selection that lops off easily-detectable sources of variance and leaves only the weird obscure ones behind. There’s limited ability for classical Mendelian dominance to affect common variants, but more complicated genetic interactions might still prove important. Overall these are strong methods, and their failure to converge is troubling. If forced to explain them away, we might tell a story like: So far, there is only one RDR study and a few Sib-Regression studies, so we should wait for more data before updating too hard.
Inline links: Sasha Gusev, The Infintesimal, Eric Turkheimer, Alex Young, Awais Aftab, wrote a similar post, here, read the whole thing, Young et al. 2018, 2, 3, families with Norman surnames still outperform their Saxon peers at Oxbridge admissions, or other behavioral differences between the two groups, 4, Tan et al 2024, https://substackcdn.com/image/fetch/$s_!ioe2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05c5bec7-469c-40d2-b908-68d8583c9cca_766x766.png, 5, Polygenic embryo selection, need, Is Tan Et Al The End Of Social Science Genomics?, 6, like, And, 300K, 1M, 3M, A 2016 GREML paper, 7, evolves some ridiculous trick, a Darwinian argument, a 2017 paper, Wainschtein 2022, Sidorenko 2024, Zeng, 8, Weiner 2023, other experts say, meiosis, Young et al (2018), Kemper et al (2021), their trait similarity, matches their real status, 2, 3, 4, Felson 2013, https://substackcdn.com/image/fetch/$s_!r3kV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd575f2d6-3619-40e6-9a5e-f9f1ec1399a5_650x422.png, answer, The best studies, 9, seems real and important, here, 1, 2, when investigated, in this article, 10, here, here, https://substackcdn.com/image/fetch/$s_!b3LF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc094f9c0-4c71-48cf-89dc-615498d94812_483x51.png, 11, https://substackcdn.com/image/fetch/$s_!XFWU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85f433e5-d141-47b7-8dc5-2271925032e9_483x102.png, 12, https://x.com/cremieuxrecueil/status/1935731422205010135, here, 13
For example, educational attainment is 50% uncorrelated with direct genetic effects. You need to square this to figure out what percent is causal; when you do that, you find that the polygenic score that explained 14% of EA is only 4%pp direct genes, with the other 10%pp being nondirect5 confounders. So yes, it seems like most polygenic scores that don’t validate within families are confounded. However unhappy we previously were that we had only found 14% of genes for EA (vs. 40% expected), we should now be much more unhappy - we really only know 4% of genes that directly cause EA. On the other hand, you might say - so before we only knew 14%pp out of 40%. Now we only know 4%pp out of 40%. This is discouraging, but it doesn’t fundamentally change what we know about nature vs. nurture. Both 4%pp and 14%pp are less than 40% - with either number, we must be missing something or doing something wrong. Probably that’s insufficient sample size. We’ll keep working on sample size and other things, and eventually scrounge up the missing 26%pp or 36%pp or whatever of the variance, so this doesn’t change anything. All it means is that one predictive method that the average person never knew about in the first place doesn’t work as well as we thought. Who cares? Not doctors. So far this research has only just barely begun to reach the clinic. But also, all doctors want to do is predict things (like heart attack risk). They don’t care if they use causal vs. nondirect genes. It doesn’t matter if you’re “only” at higher risk of heart attack because you’re black, or Norman, or because your parents read books to you - you still need more heart attack medication! Polygenic embryo selection companies should care. They offer polygenic scores that can be used to select healthier or smarter embryos. If the predictors they use rely partly on variants that aren’t causal within families, their real benefits could be far lower than advertised. I talked to one of these companies, who said they’d already adjusted for these effects and expected their competitors had too - the proper antidote to this problem, sibling controls, is a natural choice when you’re literally picking between siblings. The biggest losers are the epidemiologists. They had started using polygenic predictors as a novel randomization method; suppose, for example, you wanted to study whether smoking causes Alzheimers. If you just checked how many smokers vs. nonsmokers got Alzheimers, your result would be vulnerable to bias; maybe poor people smoke more and get more Alzheimers. But (they hoped) you might be able to check whether people with the genes for smoking get more Alzheimers. Poverty can’t make you have more or fewer genes! This was a neat idea, but if the polygenic predictors are wrong about which genes cause smoking and what effect size they have, then the less careful among these results will need to be re-examined. But the reason I spent so much time on the subject here is that this has confused a lot of people into thinking heritability itself was confounded and is actually just 4%. When I read my first few blog posts on these findings, I came away thinking they were claiming to have discredited twin studies and heritability. And although I take partial ownership of my own poor reading comprehension, I maintain that the way that the new anti-hereditarians discuss this is pretty bad. For example, Turkheimer’s treatment of the Tan study above is called Is Tan Et Al The End Of Social Science Genomics?, and includes passages like: The median [direct genomic effect] heritability for behavioral phenotypes is .048. Let that sink in for a second. How different would the modern history of behavior genetics be if back in the 80s one study after another had shown that the heritability of behavior was around .05? When Arthur Jensen wrote about IQ, he usually used a figure of .8 for the heritability of intelligence. I know that the relationship between twin heritabilities and SNP heritabilities is complicated, and in fact the DGE heritability of ability is one of the higher ones, at .2336. But still, it seems to me that the appropriate conclusion from these results is that among people who don’t have an identical twin, genomic information is a statistically non-zero but all in all relatively minor contributor to behavioral differences. And comments included things like: I don’t know if [this study] is the end of social science genomics, but it should certainly be the end of attributing significant genetic influence to behavioral traits (despite the recent scientist-generated cartoons touting genes for “income”). And: There's no doubt that this reported findings have dealt a fatal blow to my conviction that behavioral traits are pre-eminently heritable…This is a remarkable example of an objective statistical fact mercilessly crushing the more subjective experiential sense of "A looks and acts more like B than C because A and B have the same parents." This subjective evidence is almost unshakable and universal in its application as a tried and tested psychosocial heuristic. And yet, here we are. Turkheimer is either misstating the relationship between polygenic scores and narrow-sense heritability, or at least egging on some very confused people who are doing that, and the dynamic was bad enough that I got confused myself for a while. But even more confusing, the new anti-hereditarians actually are saying that lots of behavioral traits have very low heritability! But this point requires different arguments, only tangentially related to these. So let’s move on to… Is Heritability Genuinely Low? (Part 1: GWAS & GREML) In the mid 2010s, when genome-wide association studies (GWAS) based polygenic predictors were getting better every year, it was easy to hope they might reach 40% and close the “missing heritability”. But since then, progress has stalled. The second-to-last tripling of sample size, from 300K to 1M between 2016 - 2018, increased predictive power from 6% → 12%. The last tripling, from 1M to 3M between 2018 - 2022, only increased predictive power from 12% → 14%. If you graph sample size vs. predictive power, it looks like there's an asymptote between 15 - 20% or so. (of which - remember - only 5% is directly causal!) Worse, a mid-2010s technique called GREML allowed researchers to estimate the percent of variance in a trait that comes from the sorts of common genes studied in GWAS, without having to identify the genes involved. A 2016 GREML paper suggested that the maximum share of variance that GWASs of educational attainment could ever discover was about 21% (again, compared to 40% predicted genetic from twin studies). Since unavoidable methodological issues will prevent GWASs from reaching the literal maximum possible, this agrees with the evidence suggesting an asymptote between 15 - 20%. So either twin studies are wrong and traits are less heritable than believed, or the heritability must lie somewhere other than the common genes identifiable by GWAS. What about rare genes? GWASs focus on genetic variation common enough to be worth including in a basic genetic test. Most of this is single nucleotide polymorphisms (“SNPs”). A single nucleotide is one letter of DNA - for example, a C or a G. Polymorphisms are genes that commonly vary in humans - sometimes across races (for example, some humans have a gene for light skin, and other humans have a gene for dark skin), and other times within races (for example, some white people have a gene that makes cilantro taste like soap, and others don’t). So SNPs are single-letter spots in DNA where different people often have different letters. How often? Some people say 1%, but the more practical definition is “often enough that someone has noticed and added it to the test panel”. There are three billion letters in the genome, of which only a few million are commonly-tested SNPs. But these SNP studies have limited7 ability to measure personal mutations and rare variants. Sometimes your parents’ egg and sperm cells mess up copying a nucleotide of DNA, and you get a mutation that isn’t inherited from your ethnic group or even from your subgroup/family line - it’s just some idiosyncratic DNA change that you might be the first person in history to have. Since scientists have never seen this mutation before, they don’t know about it and can’t test for it without doing something more expensive than a simple SNP screen. And SNP studies have limited ability to detect anything more complicated than a single letter changing to another single letter. But some mutations are more complicated structural variants. For example, some bits of DNA get stuck on repeat - one person might have GATGAT, another person might have GATGATGATGAT, and a third person might have fifty GATs in a row. Other bits come out backwards. Sometimes a whole chunk of DNA goes missing, or moves to the wrong place. Occasionally a gene reads The Selfish Gene by Richard Dawkins, takes it too seriously, and evolves some ridiculous trick for spamming itself all over the genome. So if even the best molecular studies seem to be asymptoting around 15-20% of variance in educational attainment, but twin studies suggest it’s 40% genetic, might rare variants and structural variants make up the missing 20-25%pp? This remains a topic of bitter disagreement. On the one side, hereditarians bring up a Darwinian argument: imagine a genetic engineer who hopes to find the genes for educational attainment and edit them to make everyone smart and successful. She looks harder and harder, becoming more and more exasperated as they fail to materialize. Finally, she realizes she’s been scooped: evolution has been working on the same project, and has a 100,000 year head start. In the context of intense, recent selection for intelligence, we should expect evolution to have already found (and eliminated) the most straightforward, easy-to-find genes for low intelligence. Therefore, everything left should be convoluted or hidden or impossible to work with. So although this requires a sort of god-of-the-gaps argument - where we keep pushing heritability into whatever genes are too weird for existing techniques to detect - there are some reasons to think God really is in the gaps here. And a 2017 paper uses some clever techniques to estimate the share of intelligence variation lurking in hard-to-measure genes and finds it’s more than half: “By capturing these additional genetic effects, our models closely approximate the heritability estimates from twin studies for intelligence and education.” (see also Wainschtein 2022, Sidorenko 2024) The anti-hereditarians disagree. They cite papers like Zeng which measure the strength of selection on intelligence and suggest that it’s too weak to concentrate so much of the variation in rare genes8. And Sasha Gusev mentions Weiner 2023, which finds that in fact rare variants “explain 1.3% (SE = 0.03%) of phenotypic variance on average – much less than common variants” (other experts say that burden heritability only captures some rare variants and is not the right tool for this problem). But it may not even matter, because another set of findings suggests that heritability is genuinely low even when the rare variants are counted. Is Heritability Genuinely Low? (Part 2: Sib-Regression and RDR) Two newer methods, Sib-Regression and RDR, ask: using what we know from genetic studies, how much genetic variation do we think exists, total, across both common and rare genes? On average siblings share 50% of genes. But there’s a little randomness in meiosis, so some siblings might share 40% and others might share 60%. The more genetic influence on a trait, the more similar sibling pairs who share 60% of their genes will be, compared to sibling pairs who only share 40% of their genes. Since 60%-gene siblings and 40%-gene siblings are both equally part of the same family, you can use these numbers to calculate heritability unconfounded by a range of family factors. This is Sib-Regression. If you do a more complicated statistical process to extend the same idea to relatives other than siblings, it’s relatedness disequilibrium regression or RDR. GWAS asks: Looking at common easy-to-study genes, how much variation in a trait have we explained right now? GREML asks: looking at common easy-to-study genes, how much variation could we ever explain? But sib-regression and RDR ask a question more like twin studies: considering all genes, whether common / rare / easy-to-study / hard-to-study, how much variation is there total? This could address the rare variant objection mentioned above. And in many ways, these techniques are better than twin studies - Sib-Regression eliminates many potential biases, and RDR eliminates even more (although it’s harder to pull off, requiring more genetic information and computational resources). These techniques are new and hard-to-use, and only a few published studies have applied them to the sorts of behavioral traits we’re interested in: Young et al (2018) did Sib-Regression and RDR to genetic data from Iceland. Sib-regression found educational attainment = 40% (±15%) heritable, and RDR found 17% (±9%) heritable. Kemper et al (2021) did Sib-Regression only to genetic data from Britain. It found educational attainment = 14% heritable. This number conflicts with the 40% from the Young paper. Why? Unclear, but it could be selection bias - Young’s Icelandic sample was representative of the country; Kemper’s British population were Biobank volunteers who tend tend to be healthier and higher-class than the population at large. Upper-class people may have restricted range in educational attainment, or different factors affecting their educational attainment compared to the overall population. Either way, these are closer to the low estimates from GWAS and GREML (7% direct, 20% total), than to the higher estimates from twin studies (40%, generally presumed direct). And we can no longer use contributions from rare variants to paper over the difference. So what is going on? It seems like we have to accept one of three possibilities: Either something is wrong with twin studies. Or something is wrong with Sib-Regression and RDR (and then we can explain away GWAS and GREML by saying they’re missing rare variants). Or something is wrong with how we’re thinking about this topic and comparing things. What’s Going On? (Part 1: Is Something Wrong With Twin Studies?) Twin studies have dominated discussion of behavioral genetics for decades, so there’s a vast literature investigating their various assumptions and whether something might be wrong with them. Here are some of the assumptions and what the research says about each. Some of these will be duplicates of the GWAS confounders above, but we’ll go through them again anyway to review how they apply to twins. 1: Parents Treat Fraternal And Identical Twins The Same: Twin studies claim that twins are a uniquely powerful genetic laboratory; both fraternal and identical twin pairs have equally concordant environments, but identical twins have more concordant genes. Therefore, the more similar identical twin pairs are relative to fraternal twin pairs, the more heritable a trait must be. But this conclusion falls apart if identical twin pairs actually have more similar environments than fraternal twin pairs do, maybe because parents (knowing their twins are identical) treat them more similarly than they would fraternal twins. Would-be twin-study-discreditors have been trying to argue that this must be true for decades, but it’s always been a kind of quixotic battle. Remember, twin studies find many behavioral traits like IQ are >60% heritable, so you would need to prove not only that parents treat identical twin pairs differently from fraternal, but that this was an overwhelming effect. Parents of identical twins would have to obsessively expose them to the exact same stimuli in the exact same order; parents of fraternal twins would have to send one to the Gifted Advanced Placement Acceleration program while locking the other in a box and force-feeding them lead pellets. Common sense tells us there are no such differences, and studies confirm this: when parents are wrong about their twins’ status (eg they have fraternal twins, but falsely think they’re identical, or vice versa) their trait similarity matches their real status, rather than the incorrect status that determined how their parents treat them; parental treatment explains less than 1% of why identical twin pairs are more concordant (2, 3, 4). See also Felson 2013, which tries to measure environmental similarity and adjust for it, with minimal effects. Are these two cuties monozygotic or dizygotic? Are you sure? (answer) 2: Fraternal And Identical Twins Have Equally Concordant Uterine Environments: Fraternal twins have different sacs in the uterus and use different placentas. Most identical twins share a placenta, and some share an amniotic sac. If trait similarity is caused by sharing a placenta or sac (maybe because the placenta is defective, the fetal brain is starved of nutrients, and so the person has a lower IQ when they grow up), twin studies would falsely read this identical-fraternal difference as genetic. Luckily this is easy to study; not all identical twins share a placenta or sac, so you can cleanly separate the effect of uterine environment from genetics. If you measure enough traits, you can find small deviations in some, but it’s not clear whether this is just multiple testing, and in any case the deviations are small. The best studies suggest this chips off somewhere between 0 - 3% from heritability estimates9. 3: There is little assortative mating: We discussed this one above in the earlier section on GWAS - smart/pretty/kind/whatever people tend to marry other smart/pretty/kind/whatever people. Why would this bias twin study results? Identical twins share 100% of their genes. Fraternal twins ought to share 50% of their genes - but they get half their genes from their mother, and half from their father. In the degenerate case where the mother and father have exactly the same genes (“would you have sex with your clone?”) even fraternal twins will be extremely similar (although not quite identical, since they’ll get different alleles from each clone). In the more plausible case where mothers and fathers are just a little more alike than chance (eg because smart people tend to marry other smart people), fraternal twins will share a genetic tendency towards a trait somewhat more than their 50% shared genes suggest. Since this makes fraternal twin pairs more (genetically) like identical twin pairs, and twin studies assess heritability as the difference in fraternal-identical-twin-pair concordance, this bias would make twin studies underestimate heritability. But this is the opposite of what you would need to “discredit” twin studies - if this bias is true, then everything is more genetic than twin studies think. And unlike the previous two biases, this one seems real and important, so much so that when you adjust for it, the heritability of educational attainment rises from ~40% to ~50%. I’m only mentioning this one here because some anti-hereditarians argue that you can’t trust twin studies because of assortative mating, without mentioning that this can only bias them down. 4: Population stratification: This is often large and worth worrying about, but it applies to identical and fraternal twin pairs equally, and doesn’t bias twin study heritability estimates much (though it might shift the balance between shared and non-shared environment). See eg the sentence around footnote 30 here. 5: Non-additive / “interaction” effects: These are theoretically interesting, but all research thus far has found they are minimal (1, 2). Some experts think this may miss rarer or harder-to-find interactions; we’ll return to this later. 6: “Genetic nurture”, parent-to-child Mentioned above: if there is a gene for reading books to kids, and reading books raises IQ, it will look like a “gene for IQ”. This isn’t as relevant to twin study estimates of heritability, since both identical twins and fraternal twins are equally related to their parents, and any trait caused by genetic nurture wouldn’t differ between them (and therefore would not falsely appear heritable in this design). Rather, they would appear as shared environment. 7: “Genetic nurture”, sibling-to-sibling That is, suppose your sibling’s traits influence your own development. For example, suppose your sibling has a gene that makes them sabotage your schoolwork, causing you to fail and drop out of school early. An identical twin would share this gene with their sibling more often than a fraternal twin, making it look like a “gene for doing badly at school” (since the people who have it do worse at school than those who don’t). Why are we even talking about this? Do we really think it’s a big part of the variance in behavioral traits? Challenging twin study heritability estimates through this route requires inhabiting a weird no-man’s-land where otherwise-invisible genetic and environmental pathways suddenly flare up when you say the magic words “it was done by a sibling”. For example, this requires a strong effect of shared environment - that is, your educational attainment has to depend on whether you’re being sabotaged or not. But in general, shared environmental effects are weak. And it requires a strong effect of genes - that is, this mechanism only works if your sibling’s tendency to sabotage you is highly genetically determined. But we’re deploying this claim to deny that traits like IQ or educational attainment are highly genetically determined. So to get much out of this, the tendency to sabotage siblings would have to be more genetic than other behavioral traits! The reason this convoluted possibility gets brought up so often is that, unlike the more plausible parent-to-child genetic nurture, twin studies can’t rule it out. So if you really want to deny twin studies, this is one of your best bets. But when investigated, this has effects indistinguishable from zero. I’ve been a bit mean in this whole section, because people really like to dismiss twin studies as “Oh, don’t you know, those depend on assumptions, I bet you never considered that assumptions might be wrong”, and then Gish Gallop you with different assumptions until you give up. But scientists have actually done a lot of really good work checking the assumptions and they mostly hold. An alternative way of validating twin studies (brought up by Noah Carl in this article) is to check them against their close cousins, adoption studies and pedigree studies. Pedigree studies investigate large family trees, and check how trait similarity decreases with genetic distance. They avoid twin specific biases (like different treatment of fraternal vs. identical twin pairs, or different prenatal environments), while adding others like assortative mating. Here are the heritabilities of IQ and EA found in pedigree studies10 (see footnote for sources and caveats, and see also here and here for somewhat similar designs): Adoption studies investigate whether adoptees’ traits are more correlated with their adoptive or biological parents. They avoid a large swathe of biases, at the risk of introducing new adoption-related biases of their own (like the possibility that agencies deliberately place adoptive children with parents who are culturally or behaviorally similar, or the possibility that adoptees were adopted late enough to still get some shared environment from their biological parents). Here are the findings of some of the largest and best11: Both straightforwardly confirmed the larger heritability numbers found in twin studies. I would add the evidence from some less formal “adoption studies”12. During residency, I spent a few months working in a child psychiatric hospital for the worst of the worst - kids who committed murder or rape or something before age 18. Many of these children had similar stories: they were taken from their parents just after birth because the parents were criminals/drug addicts/in jail/abusing them. Then they were adopted out to some extremely nice Christian family whose church told them that God wanted them to help poor little children in need. Then they promptly proceeded to commit crime / get addicted to drugs / go to jail / abuse people, all while those families’ biological children were goody-goodies who never got so much as a school detention. When I met with the families, they would always be surprised that things had gone so badly, insisting that they’d raised them exactly like their own son/daughter and taught them good Christian morals. I had to resist the urge to shove a pile of twin studies in their face. This has left me convinced that behavioral traits are highly heritable to a level that it would be hard for any study to contradict. Ultimate source here. Although the study is confusing about this, I think it’s trying to say that almost 90% of subjects were adopted before age 2. But I don’t think studies do contradict this. Given the degree to which their assumptions have been validated, and the level of confirmation from pedigree and adoption studies, I think they have earned a presumption of accuracy. Doubting the twin studies doesn’t seem like a promising route to reconciling the twin-vs-Sib-Regression/RDR discrepancy. What’s Going On? (Part 2: Is Something Wrong With Sib-Regression And RDR?) Sib-Regression is a clever way of avoiding most biases. Its independent variable - the degree to which some sibling pairs end up with slightly more shared genes than others - is even more random and exogenous than the difference between fraternal and identical twins. It can sometimes have biases related to assortative mating (which would falsely push heritability down), but otherwise it’s pretty good. RDR has many of the same advantages, and allows more diverse relationships and so larger sample sizes. It’s hard to think of ways these methods could be wildly off. There is one caveat: although RDR includes most of the rare and structural variants missed by GWAS, in theory it can miss certain ultra-rare variants which are so uncommon that they aren’t shared between some of the relative pairs used in RDR. De novo variants that occurred during the subject’s own conception would be in this category, if the subject didn’t have children or didn’t pass on that gene13. This seems like a pretty small subcategory of genetic variation, and I wouldn’t normally expect that much of importance to be hiding here, but maybe it’s more important than it seems. RDR also doesn’t include much variance caused by statistical interactions between genes. Although we said above that these are usually found to be insignificant, they might be more important in a trait like intelligence that has been under recent evolutionary selection that lops off easily-detectable sources of variance and leaves only the weird obscure ones behind. There’s limited ability for classical Mendelian dominance to affect common variants, but more complicated genetic interactions might still prove important. Overall these are strong methods, and their failure to converge is troubling. If forced to explain them away, we might tell a story like: So far, there is only one RDR study and a few Sib-Regression studies, so we should wait for more data before updating too hard.
Inline links: 5, Polygenic embryo selection, need, Is Tan Et Al The End Of Social Science Genomics?, 6, like, And, 300K, 1M, 3M, A 2016 GREML paper, 7, evolves some ridiculous trick, a Darwinian argument, a 2017 paper, Wainschtein 2022, Sidorenko 2024, Zeng, 8, Weiner 2023, other experts say, meiosis, Young et al (2018), Kemper et al (2021), their trait similarity, matches their real status, 2, 3, 4, Felson 2013, https://substackcdn.com/image/fetch/$s_!r3kV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd575f2d6-3619-40e6-9a5e-f9f1ec1399a5_650x422.png, answer, The best studies, 9, seems real and important, here, 1, 2, when investigated, in this article, 10, here, here, https://substackcdn.com/image/fetch/$s_!b3LF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc094f9c0-4c71-48cf-89dc-615498d94812_483x51.png, 11, https://substackcdn.com/image/fetch/$s_!XFWU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85f433e5-d141-47b7-8dc5-2271925032e9_483x102.png, 12, https://x.com/cremieuxrecueil/status/1935731422205010135, here, 13
Maybe gene x gene interactions, especially epistasis, are more important than we thought. There’s some (weak) evidence for the latter two claims: Sib-Regression, unlike RDR, includes results from certain types of ultra-rare variants and non-additive effects. In the Iceland study, Sib-Regression found EA heritability of 40% (similar to twin studies), and RDR found 17% (much less than twin studies). Maybe these make Sib-Regression better at estimating the sort of broad heritability investigated in twin studies? What’s Going On? (Part 3: Is Educational Attainment Just Weird?) Above, we said that there were only two published peer-reviewed studies using Sib-Regression and RDR to estimate heritability of behavioral traits. But Markel et al (2025), a not-yet-peer-reviewed pre-print from GMU (why is it always GMU?) complicates things further. It looks at genetic data from six different countries/studies to estimate heritability of IQ and EA. Using Sib-Regression, they find educational attainment heritability of only 8% (±9%)14, and cognitive performance (~IQ) heritability of 75% (±20%)! Markel’s 8% for EA is very different from Young’s Icelandic estimate of 40% - is this bad? Not necessarily - as with Kemper, these studies might have different levels of selection bias. Or the countries where they take place might have different levels of educational mobility. But also, this is the first Sib-Regression study to investigate IQ - all the others had only done EA. They replicate (and even go beyond) the twin studies’ high IQ number, while continuing to get low heritability for EA. This suggests our previous assumption - that EA was usually a decent proxy for IQ - might be totally off. This doesn’t directly solve any of our problems - the twin study estimates for EA and the Sib-Regression estimates are still worryingly different. But it slightly bounds the damage. It suggests that the twin study estimates for IQ are ~correct, potentially meaning that whatever’s going on is some kind of EA-specific confounder. We know that EA is a pretty unusual trait, with high assortative mating, high shared environmental component, and high potential for genetic nurture / dynastic effects. We saw above that there are theoretical reasons not to expect these to bias twin studies upward or Sib-Regression downward. But maybe it did that anyway, despite the theoretical reasons. Stepping back, maybe educational attainment is full of landmines. Plenty of political and economic factors affect the degree to which your genes vs. your culture determine how far you go in school. Suppose a country passes a feel-good policy that high schools have to try to graduate all students, even ones who fail algebra. That changes the heritability of EA! Or suppose that scholarships become easier/harder to get, making rich people less/more likely to go to college relative to poor people. That changes the heritability of EA! Or suppose that the economy changes and jobs requiring PhDs are less/more lucrative than before - now ambitious people are less/more likely to pursue PhDs relative to people doing it for the love of academia, and that changes the heritability of EA! Finally, suppose some study enrolls mostly rich/well-educated people, and some other study enrolls proportionally across the population. That artificially restricts range and . . . changes the heritability of EA! So two potential takeaways from this preprint are: EA is a weird trait with a high shared environmental component, and might not be a good flagship trait to use for discussing heritability more generally.
Inline links: Markel et al (2025),, 14
Second, the Scotland pedigree estimates he cites are likely biased due to pop strat. In the RDR paper, @alextisyoung tests a method called “Kinship FE”. At a high-level, Kinship FE estimates heritability using a pedigree model which accounts for shared nuclear family environment. Importantly, this method is quite similar to the methods employed in the two Scotland papers cited by Alexander: Hill et al and Marioni et al (both estimate heritability using pedigrees while modeling the effects of the shared nuclear family environment). Using simulations, Dr. Young shows that Kinship FE is biased in the presence of genetic nurture or pop strat. This is because these processes induce correlations between genes and env beyond the nuclear family. Unfortunately, pop strat bias is not mitigated by PC adjustments. So the key question is: are these at play for cognitive phenotypes? The answer is maybe for genetic nurture & yes for pop strat. Tan et al Figure 1 shows that pop strat biases estimation of genetic effects for IQ & edu. Thus, pedigree estimates should be interpreted w/caution.
In the section comparing Kemper’s sib-regression estimate (14%) and Young’s Icelandic estimate (~40%), you note that the UK Biobank sample may be skewed toward healthier, higher-SES volunteers (so-called healthy volunteer bias, which commonly creates selection effects in medical research). But the implications of such selection effects extend far beyond variability in heritability estimates.
Sample Nucleus results. And this week, Herasight4 entered the space with the most impressive disease risk scores yet, an IQ predictor worth 6-95 extra points, and a series of challenges to competitors, whom they call out for insufficient scientific rigor. Their most scathing attack is on Nucleus itself, accusing its predictions of being misleading and unreliable. Let’s start with the science, then move on to the companies and see if we can litigate their dispute. In Theory, All Of This Should Work Polygenic embryo screening is a natural extension of two well-validated technologies: genetic testing of embryos, and polygenic prediction of traits in adults. Genetic testing of embryos has been done for decades, usually to detect chromosomal abnormalities like Down Syndrome or simple single-gene disorders like cystic fibrosis. It’s challenging - you need to take a very small number of cells (often only 5-10) from a tiny proto-placenta that may not have many cells to spare, and extract a readable amount of genetic material from this limited sample - but there are known solutions that mostly work. But most traits are polygenic, requiring information about thousands or tens of thousands of genes to predict. These are too complicated to understand fully at current levels of technology, but some studies have chipped away at the problem and gotten a partial understanding. Often this looks like being able to predict a few percent of the variance in a trait, and determine whether someone’s genetic risk is slightly higher or lower than average. Polygenic prediction of traits in adults is still young and full of hidden pitfalls. Last month, we discussed how some early studies unknowingly conflated direct genetic effects and various confounders6 - for example, they tended to pick up on genes associated with well-off ethnic groups or families who had good health outcomes for social reasons. Pinpointing the direct component requires an additional step where researchers validate their algorithms within families (for example, on pairs of siblings where one has a higher polygenic score than the other) to see how much predictive power remains. This is especially important for embryo selection companies, whose entire value proposition depends on comparing two genomes from the same family. How have they done? It depends on the number of embryos they have to work with; the more embryos, the better you can do by selecting the best. Herasight’s numbers on how breast cancer risk goes down with number of embryos used in selection. A typical round of IVF produces 1-10 embryos (younger women usually = more). Women with polycystic ovarian syndrome (prevalence: 10%) may get as many as 20. For more, you will probably need to do multiple IVF rounds. Here is a table of different companies’ reported risk reductions, slightly adjusted7 for different reporting conventions but otherwise taking all claims at face value (we’ll talk about how wise that is later). Relative risk reduction for five conditions (gray = no data / disputed data). Here baseline is for embryos neither of whose parents have the condition. GP and Orchid both say their technology has improved since reporting these numbers and they will report better numbers soon. GP numbers are not within-family validated and might be lower if they were. Absolute risk after selection for five conditions (gray = no data / disputed data), ibid. Some people might genuinely want to select on a single condition. For example, people with a strong family history of schizophrenia might want to minimize the chance of their children getting the disease; for these people, reducing schizophrenia risk by 58% (while keeping everything else constant) sounds pretty good. Everyone else probably wants a generically healthy embryo with low risk of all conditions. Exactly how this works depends on the customer’s own values - would they prefer an embryo with lower cancer risk to one who will have fewer heart attacks? - and the exact benefits will depend on how parents make that decision. Genomic Prediction and Herasight try to help by providing semi-objective measures of which embryo is overall healthiest according to different conditions’ effects on longevity and patient-rated quality of life. For Genomic Prediction, that’s the “embryo health score” If you selected the single highest-health-score embryo from a set of five, here’s how they’d do: For Herasight, it’s a “polygenic longevity index”. They don’t give exact risk reduction numbers for each disease, saying that it depends too much on a couple’s specific family history, but say that most people gain 1-4 years of healthy life (when I test it on a set of twenty embryos, the the healthiest gets an extra 1.66 years). How much would you pay to give your children an extra 1-4 years of healthy life? This is no longer a hypothetical question. Here are the costs of the companies in this space: Is it worth it? If: You’re already doing IVF
Inline links: Herasight, 4, 5, 6, https://substackcdn.com/image/fetch/$s_!VOdq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1ba32f3-72fa-4be1-846c-6b0b04a5a213_774x279.png, 7, https://substackcdn.com/image/fetch/$s_!0oUh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8419603-9239-43bb-8c79-77b078ff0789_548x136.png, https://substackcdn.com/image/fetch/$s_!rpEJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc539a717-a130-460d-90c9-4ab64619f26d_548x133.png, https://substackcdn.com/image/fetch/$s_!3Kc6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbda325bb-13fb-4c27-b8c3-24facce5c71a_676x153.png, https://substackcdn.com/image/fetch/$s_!t1Am!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F831ab3c6-4053-4ff9-bc2f-879aee4349cf_673x740.png, https://substackcdn.com/image/fetch/$s_!Q2vE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcf2bb4-8dd1-4a88-9728-6953a820971b_422x575.png
Herasight’s numbers on how breast cancer risk goes down with number of embryos used in selection. A typical round of IVF produces 1-10 embryos (younger women usually = more). Women with polycystic ovarian syndrome (prevalence: 10%) may get as many as 20. For more, you will probably need to do multiple IVF rounds. Here is a table of different companies’ reported risk reductions, slightly adjusted7 for different reporting conventions but otherwise taking all claims at face value (we’ll talk about how wise that is later). Relative risk reduction for five conditions (gray = no data / disputed data). Here baseline is for embryos neither of whose parents have the condition. GP and Orchid both say their technology has improved since reporting these numbers and they will report better numbers soon. GP numbers are not within-family validated and might be lower if they were. Absolute risk after selection for five conditions (gray = no data / disputed data), ibid. Some people might genuinely want to select on a single condition. For example, people with a strong family history of schizophrenia might want to minimize the chance of their children getting the disease; for these people, reducing schizophrenia risk by 58% (while keeping everything else constant) sounds pretty good. Everyone else probably wants a generically healthy embryo with low risk of all conditions. Exactly how this works depends on the customer’s own values - would they prefer an embryo with lower cancer risk to one who will have fewer heart attacks? - and the exact benefits will depend on how parents make that decision. Genomic Prediction and Herasight try to help by providing semi-objective measures of which embryo is overall healthiest according to different conditions’ effects on longevity and patient-rated quality of life. For Genomic Prediction, that’s the “embryo health score” If you selected the single highest-health-score embryo from a set of five, here’s how they’d do: For Herasight, it’s a “polygenic longevity index”. They don’t give exact risk reduction numbers for each disease, saying that it depends too much on a couple’s specific family history, but say that most people gain 1-4 years of healthy life (when I test it on a set of twenty embryos, the the healthiest gets an extra 1.66 years). How much would you pay to give your children an extra 1-4 years of healthy life? This is no longer a hypothetical question. Here are the costs of the companies in this space: Is it worth it? If: You’re already doing IVF
Inline links: 7, https://substackcdn.com/image/fetch/$s_!0oUh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8419603-9239-43bb-8c79-77b078ff0789_548x136.png, https://substackcdn.com/image/fetch/$s_!rpEJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc539a717-a130-460d-90c9-4ab64619f26d_548x133.png, https://substackcdn.com/image/fetch/$s_!3Kc6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbda325bb-13fb-4c27-b8c3-24facce5c71a_676x153.png, https://substackcdn.com/image/fetch/$s_!t1Am!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F831ab3c6-4053-4ff9-bc2f-879aee4349cf_673x740.png, https://substackcdn.com/image/fetch/$s_!Q2vE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fadcf2bb4-8dd1-4a88-9728-6953a820971b_422x575.png
Authorities on all sides have cited Alex Young10 as an authority on how polygenic scores can be confounded or misleading.
Inline links: 10