GPT-2

Article

GPT-2 is a recurring concept in the Astral Codex Ten archive, appearing 13 times across 13 issues between May 20, 2021 and January 13, 2026. The archive places it in contexts such as “Can you tell GPT-2-generated scientific paper abstracts from the real thing ?”; “The GPT-2 → GPT-3 transition involved centupling parameter count”; “GPT-2 And The Nature Of Intelligence, demonstrating a bunch of easy problems that GPT-2 failed on”. It most often appears alongside OpenAI, GPT-3, GPT-4.

Metadata

  • Category: Concepts
  • Mention count: 13
  • Issue count: 13
  • First seen: May 20, 2021
  • Last seen: January 13, 2026

Appears In

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

May 20, 2021 · Original source
26: Can you tell GPT-2-generated scientific paper abstracts from the real thing? I’m pretty proud of my 90% success rate on hard mode.
April 04, 2022 · Original source
Chess AI performance over time. Why does this matter? If there’s a slow takeoff (ie gradual exponential curve), it will become obvious that some kind of terrifying transformative AI revolution is happening, before the situation gets apocalyptic. There will be time to prepare, to test slightly-below-human AIs and see how they respond, to get governments and other stakeholders on board. We don’t have to get every single thing right ahead of time. On the other hand, because this is proceeding along the usual channels, it will be the usual variety of muddled and hard-to-control. With the exception of a few big actors like the US and Chinese government, and maybe the biggest corporations like Google, the outcome will be determined less by any one agent, and more by the usual multi-agent dynamics of political and economic competition. There will be lots of opportunities to affect things, but no real locus of control to do the affecting. If there’s a fast takeoff (ie sudden FOOM), there won’t be much warning. Conventional wisdom will still say that transformative AI is thirty years away. All the necessary pieces (ie AI alignment theory) will have to be ready ahead of time, prepared blindly without any experimental trial-and-error, to load into the AI as soon as it exists. On the plus side, a single actor (whoever has this first AI) will have complete control over the process. If this actor is smart (and presumably they’re a little smart, or they wouldn’t be the first team to invent transformative AI), they can do everything right without going through the usual government-lobbying channels. So the slower a takeoff you expect, the less you should be focusing on getting every technical detail right ahead of time, and the more you should be working on building the capacity to steer government and corporate policy to direct an incoming slew of new technologies. Yudkowsky Contra Christiano Eliezer counters that although progress may retroactively look gradual and continuous when you know what metric to graph it on, it doesn’t necessarily look that way in real life by the measures that real people care about. (one way to think of this: imagine that an AI’s effective IQ starts at 0.1 points, and triples every year, but that we can only measure this vaguely and indirectly. The year it goes from 5 to 15, you get a paper in a third-tier journal reporting that it seems to be improving on some benchmark. The year it goes from 66 to 200, you get a total transformation of everything in society. But later, once we identify the right metric, it was just the same rate of gradual progress the whole time. ) So Eliezer is much less impressed by the history of previous technologies than Paul is. He’s also skeptical of the “GDP will double in 4 years before it doubles in 1” claim, because of two contingent disagreements and two fundamental disagreements. The first contingent disagreement: government regulations make it hard to deploy imperfect things, and non-trivial to deploy things even after they’re perfect. Eliezer has non-jokingly said he thinks AI might destroy the world before the average person can buy a self-driving car. Why? Because the government has to approve self-driving cars (and can drag its feet on that), but the apocalypse can happen even without government approval. In Paul’s model, sometime long before superintelligence we should have AIs that can drive cars, and that increases GDP and contributes to a general sense that exciting things are going on. Eliezer says: fine, what if that’s true? Who cares if self-driving cars will be practical a few years before the world is destroyed? It’ll take longer than that to lobby the government to allow them on the road. The second contingent disagreement: superintelligent AIs can lie to us. Suppose you have an AI which wants to destroy humanity, whose IQ is doubling every six months. Right now it’s at IQ 200, and it suspects that it would take IQ 800 to build a human-destroying superweapon. Its best strategy is to lie low for a year. If it expects humans would turn it off if they knew how close it was to superweapons, it can pretend to be less intelligent than it really is. The period when AIs are holding back so we don’t discover their true power level looks like a period of lower-than-expected GDP growth - followed by a sudden FOOM once the AI gets its superweapon and doesn’t need to hold back. So even if Paul is conceptually right and fundamental progress proceeds along a nice smooth curve, it might not look to us like a nice smooth curve, because regulations and deceptive AIs could prevent mildly-transformative AI progress from showing up on graphs, but wouldn’t prevent the extreme kind of AI progress that leads to apocalypse. To an outside observer, it would just look like nothing much changed, nothing much changed, nothing much changed, and then suddenly, FOOM. But even aside from this, Eliezer doesn’t think Paul is conceptually right! He thinks that even on the fundamental level, AI progress is going to be discontinuous. It’s like a nuclear bomb. Either you don’t have a nuclear bomb yet, or you do have one and the world is forever transformed. There is a specific moment at which you go from “no nuke” to “nuke” without any kind of “slightly worse nuke” acting as a harbinger. He uses the example of chimps → humans. Evolution has spent hundreds of millions of years evolving brainier and brainier animals (not teleologically, of course, but in practice). For most of those hundreds of millions of years, that meant the animal could have slightly more instincts, or a better memory, or some other change that still stayed within the basic animal paradigm. At the chimp → human transition, we suddenly got tool use, language use, abstract thought, mathematics, swords, guns, nuclear bombs, spaceships, and a bunch of other stuff. The rhesus monkey → chimp transition and the chimp → human transition both involved the same ~quadrupling of neuron number, but the former was pretty boring and the latter unlocked enough new capabilities to easily conquer the world. The GPT-2 → GPT-3 transition involved centupling parameter count. Maybe we will keep centupling parameter count every few years, and most times it will be incremental improvement, and one time it will conquer the world. But even talking about centupling parameter points is giving Paul too much credit. Lots of past inventions didn’t come by quadrupling or centupling something, they came by discovering “the secret sauce”. The Wright brothers (he argues) didn’t make a plane with 4x the wingspan of the last plane that didn’t work, they invented the first plane that could fly at all. The Hiroshima bomb wasn’t some previous bomb but bigger, it was what happened after a lot of scientists spent a long time thinking about a fundamentally different paradigm of bomb-making and brought it to a point where it could work at all. The first transformative AI isn’t going to be GPT-3 with more parameters, it will be what happens after someone discovers how to make machines truly intelligent. (this is the same debate Eliezer had with Ajeya over the Biological Anchors post; have I mentioned that Ajeya and Paul are married?) Fine, Let’s Nitpick The Hell Out Of The Chimps Vs. Humans Example This is where the two of them end up, so let’s follow. Between chimps and humans, there were about seven million years of intermediate steps. These had some human capabilities, but not others. IE homo erectus probably had language, but not mathematics, and in terms of taking over the world it did make it to most of the Old World but was less dominant than moderns. But if we say evolutionary history started 500 million years ago (the Cambrian), and AI history started with the Dartmouth Conference in 1955, then the equivalent of 7 million years of evolutionary history is 1 year of AI history. In the very very unlikely and forced comparison where evolutionary history and AI history go at the same speed, there will be only about a year between chimp-level and human-level AIs. A chimp-level AI probably can’t double GDP, so this would count as a fast takeoff by Paul’s criterion. But even more than that, chimp → human feels like a discontinuity. It’s not just “animals kept getting smarter for hundreds of millions of years, and then ended up very smart indeed”. That happened for a while, and then all of sudden there was a near-instant phase transition into a totally different way of using intelligence with completely new abilities. If AI worked like this, we would have useful toys and interesting specialists for a few decades, until suddenly someone “got it right”, completed the package that was necessary for “true intelligence”, and then we would have a completely new category of thing. Paul admits this analogy is awkward for his position. He answers: Chimp evolution is not primarily selecting for making and using technology, for doing science, or for facilitating cultural accumulation. The task faced by a chimp is largely independent of the abilities that give humans such a huge fitness advantage. It’s not completely independent—the overlap is the only reason that evolution eventually produces humans—but it’s different enough that we should not be surprised if there are simple changes to chimps that would make them much better at designing technology or doing science or accumulating culture […] So I don’t think the example of evolution tells us much about whether the continuous change story applies to intelligence. This case is potentially missing the key element that drives the continuous change story—optimization for performance. Evolution changes continuously on the narrow metric it is optimizing, but can change extremely rapidly on other metrics. For human technology, features of the technology that aren’t being optimized change rapidly all the time. When humans build AI, they will be optimizing for usefulness, and so progress in usefulness is much more likely to be linear. That is, evolution wasn’t optimizing for tool use/language/intelligence, so we got an “overhang” where chimps could potentially have been very good at these, but evolution never bothered “closing the circuit” and turning those capabilities “on”. After a long time, evolution finally blundered into an area where marginal improvements in these capacities improved fitness, so evolution started improving them and it was easy. Imagine a company which, through some oversight, didn’t have a Sales department. They just sat around designing and manufacturing increasingly brilliant products, but not putting any effort into selling them. Then the CEO remembers they need a Sales department, starts one up, and the company goes from moving near zero units to moving millions of units overnight. It would look like the company had “suddenly” developed a “vast increase in capabilities”. But this is only possible when a CEO who is weirdly unconcerned about profit forgets to do obvious profit-increasing things for many years. This is Paul’s counterargument to the chimp analogy. Evolution isn’t directly concerned about various intellectual skills; it only wants them in the unusual cases where they’ll contribute to fitness on the margin. AI companies will be very concerned about various intellectual skills. If there’s a trivial change that can make their product 10x better, they’ll make it. So AI capabilities will grow in a “well-rounded” way, there won’t be any “overhangs”, and there won’t be any opportunities for a sudden overhang-solving phase transition with associated new-capability development like with chimps → humans. Eliezer answers: Chimps are nearly useless because they're not general, and doing anything on the scale of building a nuclear plant requires mastering so many different nonancestral domains that it's no wonder natural selection didn't happen to separately train any single creature across enough different domains that it had evolved to solve every kind of domain-specific problem involved in solving nuclear physics and chemistry and metallurgy and thermics in order to build the first nuclear plant in advance of any old nuclear plants existing. Humans are general enough that the same braintech selected just for chipping flint handaxes and making water-pouches and outwitting other humans, happened to be general enough that it could scale up to solving all the problems of building a nuclear plant - albeit with some added cognitive tech that didn't require new brainware, and so could happen incredibly fast relative to the generation times for evolutionarily optimized brainware. Now, since neither humans nor chimps were optimized to be "useful" (general), and humans just wandered into a sufficiently general part of the space that it cascaded up to wider generality, we should legit expect the curve of generality to look at least somewhat different if we're optimizing for that. Eg, right now people are trying to optimize for generality with AIs like Mu Zero and GPT-3. In both cases we have a weirdly shallow kind of generality. Neither is as smart or as deeply general as a chimp, but they are respectively better than chimps at a wide variety of Atari games, or a wide variety of problems that can be superposed onto generating typical human text. They are, in a sense, more general than a biological organism at a similar stage of cognitive evolution, with much less complex and architected brains, in virtue of having been trained, not just on wider datasets, but on bigger datasets using gradient-descent memorization of shallower patterns, so they can cover those wide domains while being stupider and lacking some deep aspects of architecture. It is not clear to me that we can go from observations like this, to conclude that there is a dominant mainline probability for how the future clearly ought to go and that this dominant mainline is, "Well, before you get human-level depth and generalization of general intelligence, you get something with 95% depth that covers 80% of the domains for 10% of the pragmatic impact". ...or whatever the concept is here, because this whole conversation is, on my own worldview, being conducted in a shallow way relative to the kind of analysis I did in Intelligence Explosion Microeconomics, where I was like, "here is the historical observation, here is what I think it tells us that puts a lower bound on this input-output curve". Here Eliezer sort of kind of grants Paul’s point that AIs will be optimized for generality in a way chimps aren’t, but points to his previous “Intelligence Explosion Microeconomics” essay to argue that we should expect a fast takeoff anyway. IEM has a lot of stuff in it, but one key point is that instead of using analogies to predict the course of future AI, we should open that black box and try to actually reason about how it will work, in which case we realize that recursive self-improvement common-sensically has to cause an intelligence explosion. I am sort of okay with this, but I feel like a commitment to avoiding analogies should involve not bringing up the chimp-human analogy further, which Eliezer continues to do, quite a lot. I do feel like Paul succeeded in convincing me that we shouldn’t place too much evidential weight on it. The Wimbledon Of Reference Class Tennis “Reference class tennis” is an old rationalist idiom for people throwing analogies back and forth. “AI will be slow, because it’s an economic transition like the Agricultural or Industrial Revolution, and those were slow!” “No, AI will be fast, because it’s an evolutionary step like chimps → humans, and that was fast!” “No, AI will be slow, because it’s an invention, like the computer, and computers were invented piecemeal and required decades of innovation to be useful.” “No, AI will be fast, because it’s an invention, like the nuclear bomb, and nuclear bombs went from impossible to city-killing in a single day.” “No, AI will be slow, because it will be surrounded by a shell-like metallic computer case, which makes it like a turtle, and turtles are slow.” “No, AI will be fast, because it’s dangerous and powerful, like a tiger, and tigers are fast!” And so on. Comparing things to other things is a time-tested way of speculating about them. But there are so many other things to compare to that you can get whatever result you want. This is the failure mode that the term “reference class tennis” was supposed to point to. Both participants in this debate are very smart and trying their hardest to avoid reference-class tennis, but neither entirely succeeds. Eliezer’s preferred classes are Bitcoin (“there wasn't a cryptocurrency developed a year before Bitcoin using 95% of the ideas which did 10% of the transaction volume”), nukes, humans/chimps, the Wright Brothers, AlphaGo (which really was a discontinuous improvement on previous Go engines), and AlphaFold (ditto for proteins). Paul’s preferred classes are the Agricultural and Industrial Revolutions, chess engines (which have gotten better along a gradual, well-behaved curve), all sorts of inventions like computers and ships (likewise), and world GDP. Eliezer already listed most of these in his Intelligence Explosion Microeconomics paper in 2013, and concluded that the space of possible analogies was contradictory enough that we needed to operate at a higher level. Maybe so, but when someone lobs a reference class tennis ball at you, it’s hard to resist the urge to hit it back. Recursive Self-Improvement This is where I think Eliezer most wants to take the discussion. The idea is: once AI is smarter than humans, it can do a superhuman job of developing new AI. In his Microeconomics paper, he writes about an argument he (semi-hypothetically) had with Ray Kurzweil about Moore’s Law. Kurzweil expected Moore’s Law to continue forever, even after the development of superintelligence. Eliezer objects: Suppose we were dealing with minds running a million times as fast as a human, at which rate they could do a year of internal thinking in thirty-one seconds, such that the total subjective time from the birth of Socrates to the death of Turing would pass in 20.9 hours. Do you still think the best estimate for how long it would take them to produce their next generation of computing hardware would be 1.5 orbits of the Earth around the Sun? That is: the fact that it took 1.5 years for transistor density to double isn’t a natural law. It’s pointing to a law that the amount of resources (most notably intelligence) that civilization focused on the transistor-densifying problem equalled the amount it takes to double it every 1.5 years. If some shock drastically changed available resources (by eg speeding up human minds a million times), this would change the resources involved, and the same laws would predict transistor speed doubling in some shorter amount of time (naively 0.000015 years, although realistically at that scale other inputs would dominate). So when Paul derives clean laws of economics showing that things move along slow growth curves, Eliezer asks: why do you think they would keep doing this when one of the discoveries they make along that curve might be “speeding up intelligence a million times”? (Eliezer actually thinks improvements in the quality of intelligence will dominate improvements in speed - AIs will mostly be smarter, not just faster - but speed is a useful example here and we’ll stick with it) Paul answers: Summary of my response: Before there is AI that is great at self-improvement there will be AI that is mediocre at self-improvement. Powerful AI can be used to develop better AI (amongst other things). This will lead to runaway growth. This on its own is not an argument for discontinuity: before we have AI that radically accelerates AI development, the slow takeoff argument suggests we will have AI that significantly accelerates AI development (and before that, slightly accelerates development). That is, an AI is just another, faster step in the hyperbolic growth we are currently experiencing, which corresponds to a further increase in rate but not a discontinuity (or even a discontinuity in rate). The most common argument for recursive self-improvement introducing a new discontinuity seems be: some systems “fizzle out” when they try to design a better AI, generating a few improvements before running out of steam, while others are able to autonomously generate more and more improvements. This is basically the same as the universality argument in a previous section. Eliezer: Oh, come on. That is straight-up not how simple continuous toy models of RSI work. Between a neutron multiplication factor of 0.999 and 1.001 there is a very huge gap in output behavior. Outside of toy models: Over the last 10,000 years we had humans going from mediocre at improving their mental systems to being (barely) able to throw together AI systems, but 10,000 years is the equivalent of an eyeblink in evolutionary time - outside the metaphor, this says, "A month before there is AI that is great at self-improvement, there will be AI that is mediocre at self-improvement." (Or possibly an hour before, if reality is again more extreme along the Eliezer-Hanson axis than Eliezer. But it makes little difference whether it's an hour or a month, given anything like current setups.) This is just pumping hard again on the intuition that says incremental design changes yield smooth output changes, which (the meta-level of the essay informs us wordlessly) is such a strong default that we are entitled to believe it if we can do a good job of weakening the evidence and arguments against it. And the argument is: Before there are systems great at self-improvement, there will be systems mediocre at self-improvement; implicitly: "before" implies "5 years before" not "5 days before"; implicitly: this will correspond to smooth changes in output between the two regimes even though that is not how continuous feedback loops work. I got a bit confused trying to understand the criticality metaphor here. There’s no equivalent of neutron decay, so any AI that can consistently improve its intelligence is “critical” in some sense. Imagine Elon Musk replaces his brain with a Neuralink computer which - aside from having read-write access - exactly matches his current brain in capabilities. Also he becomes immortal. He secludes himself from the world, studying AI and tinkering with his brain’s algorithms. Does he become a superintelligence? I think under the assumptions Paul and Eliezer are using, eventually maybe. After some amount of time he’ll come across a breakthrough he can use to increase his intelligence. Then, armed with that extra intelligence, he’ll be able to pursue more such breakthroughs. However intelligent the AI you’re scared of is, Musk will get there eventually. How long will it take? A good guess might be “years” - Musk starts out as an ordinary human, and ordinary humans are known to take years to make breakthroughs. Suppose it takes Musk one year to come up with a first breakthrough that raises his IQ 1 point. How long will his second breakthrough take? It might take longer, because he has picked the lowest-hanging fruit, and all the other possible breakthroughs are much harder. Or it might take shorter, because he’s slightly smarter than he was before, and maybe some extra intelligence goes a really long way in AI research. The concept of an intelligence explosion seems to assume the second effect dominates the first. This would match the observation that human researchers, who aren’t getting any smarter over time, continue making new discoveries. That suggests the range of possible discoveries at a given intelligence level is pretty vast. Some research finds that the usual pattern in science is constant rate of discovery from exponentially increasing number of researchers, suggesting strong low-hanging fruit effects, but these seem to be overwhelmed by other considerations in AI right now. I think Eliezer’s position on this subject is shaped by assumptions like: If you have an AI as intelligent as Elon Musk today, then tomorrow you can run it on more hardware with a bit of normal human algorithmic progress, and get one twice as intelligent. So even if it would take Elon years to make a breakthrough, long before those years are up you’ll have an AI that can make breakthroughs much faster.
June 07, 2022 · Original source
To give an example: in January 2020, Gary Marcus wrote a great post, GPT-2 And The Nature Of Intelligence, demonstrating a bunch of easy problems that GPT-2 failed on:
Marcus had previously shown that GPT-2 could reason accurately about the languages of well-known places and big countries. For example, if you prompt it with “I grew up in Athens, I speak fluent…”, then it will say “Greek”. Here he shows that it can’t do that with smaller, less obvious locales:
So this seems like pretty good evidence that GPT-2 has lots of glaring deficiencies. Marcus concludes:
June 10, 2022 · Original source
Now it is true that GPT-3 is genuinely better than GPT-2, and maybe (but maybe not, see footnote 1) true that InstructGPT is genuinely better than GPT-3. I do think that for any given example, the probability of a correct answer has gone up. [Scott] is quite right about that, at least for GPT-2 to GPT-3.
Literally billions of dollars have been invested in building systems like GPT-2, and megawatts of energy (perhaps more) have gone into testing them; few systems if any have ever been trained on bigger data sets. Many of the brightest minds have been working on blank-slate-ish sentence prediction systems for decades.
In essence, GPT-2 has been a monumental experiment in Locke's hypothesis, and so far it has failed. Empiricism has been given every advantage in the world; thus far it hasn't worked. Even with massive data sets and enormous compute, the knowledge that it acquires has been superficial and unreliable.
September 19, 2022 · Original source
But sometimes GPT-3 genuinely gets it right. The most common way for that to happen is (again) by mistake. A common failure mode is to repeat the same sentence several times. GPT-3 was trained on a corpus of Internet text, and some of the Internet text was discussions of GPT-2. Many of the samples it saw that repeated the same sentence over and over in an endless loop were discussions of GPT-2 doing this. So sometimes it will get stuck in a loop, then end with “This is an example of text produced by a transformer language model”. This sounds like a stupid example from a Philosophy Of Self-Awareness class, but sometimes it really happens. Here’s an example from one of Janus’ attempts to generate Loom documentation:
Based on your interactions with the AI, Loom constructs your labyrinthian multiverse as a "labyrinthogram", exploring genealogically contiguous regions of possible language relating to the same theme as you drive. Here are some small samples of labyrinthograms. This multiverse is about introducing Loom. The first continuation from the first serial labyrinthogram above is: Then for the two next continuities the computer went: and and and and and and This essay discusses how to use GPT-2, a large probability language model.
February 09, 2023 · Original source
(source) 22: Related: the very center of GPT’s embedding space contains a few unusual tokens including the string “SolidGoldMagikarp”. GPT displays anomalous behavior if these tokens are inserted in a query; for example, it treats “SolidGoldMagikarp” as the word “distribute”. ChatGPT is pretty advanced and fails semi-gracefully here; GPT-2’s reaction to these tokens is more disturbing: (source: Less Wrong) Further investigation determined that many of these tokens are the screen names of a group of Redditors who attempted to count to infinity. The most likely explanation, according to the discoverers, is that these names were in GPT’s tokenization data, but not its training data (maybe they were especially common in the tokenization data because they made thousands of posts with numbers in them, but didn’t make it into the training data because their posts had no content?) - that leaves them existing without content, and GPT tries to round them off to some other “nearby” token (by incomprehensible AI standards of nearbyness). Congrats to the SERI-MATS AI alignment researchers who found all of this; maybe this makes it 0.0001% less likely that the AI which controls the nuclear arsenal in twenty years will have equally inexplicable behavior. 23: More language model news: LLM that understands and can explain images
February 20, 2023 · Original source
2018 was before the birth of GPT-2, the first decent language model, so even including this category was pretty bold. I wrote:
March 10, 2023 · Original source
11: A few years ago I wrote about attempts to make GPT-2 play chess; it couldn’t consistently make legal moves, but when it did, its moves seemed better than random although still not great. Zack Witten reports playing chess with Bing (either a late GPT-3 or an early GPT-4) and finds it’s much better - he reports consistently legal play with Elo of about 1100 (around the level of an okay beginner who’s stopped being too embarrassing). Other commenters report worse experiences and more illegal moves; I don’t have access to confirm.
April 25, 2023 · Original source
This is the basic idea behind Zou et al (2022), Forecasting Future World Events With Neural Networks. They create a dataset, Autocast, with 6000 questions from forecasting tournaments Metaculus, Good Judgment Project, and CSET Foretell. Then they ask their AI (a variant of GPT-2) to predict them, given news articles up to some date before the event happened. Here’s their result:
. . . okay, this isn’t very interesting. GPT-2, a very weak obsolete AI, is able to do better than chance, but much worse than humans. I don’t know what I was expecting.
July 25, 2023 · Original source
“Wow, someone who was selected only for being good at chess still has an IQ in the 99th percentile! It’s amazing how well-correlated all intellectual abilities are.” I think both of these are good takeaways. Compare the 0.72 verbal/math correlation with the 0.76 dominant-hand/non-dominant hand grip strength correlation and I think intelligence is a useful concept in the same way strength is. But also, humans are better at both the SAT verbal and the SAT math than chimps, cows, or fish. And GPT-4 is better at both those tests than GPT-3 or GPT-2. It seems to be a general principle that people, animals, or artifacts who are better at the SAT math are also better at the SAT verbal. 2.1: Why Is A Concept Like Intelligence Useful? Across different people, skill at different kinds of intellectual tasks are correlated. Partly this is for prosaic reasons, like: Some people get better education, and end up more skilled in everything that gets taught in school.
The bigger your blob, the cleverer its arrangement, and the more training data you give it, the better it’s likely to perform on a very wide variety of cognitive tasks. This explains why chimps are smarter than cows, why Einstein is smarter than you, and why GPT-4 is smarter than GPT-2. The correlations won’t be perfect, any more than strength correlations are perfect. But they’ll be useful enough to talk about.
AI will be one of those things. We already know that bigger blobs of compute with more training data can do more things in correlated ways - frogs are outclassed by cows, chimps, and humans; toddlers are outclassed by Einstein; GPT-2 is outclassed by GPT-4. At some point we might get a blob which is better than humans at designing chips, and then we can make even bigger blobs of compute, even faster than before.
November 27, 2023 · Original source
In May of this year, OpenAI tried to make GPT-4 (very big) understand GPT-2 (very small). They got GPT-4 to inspect each of GPT-2’s 307,200 neurons and report back on what it found.
It found a collection of intriguing results and random gibberish, because they hadn’t mastered the techniques described above of projecting the real neurons into simulated neurons and analyzing the simulated neurons instead. Still, it was impressively ambitious. Unlike the toy AI in the monosemanticity paper, GPT-2 is a real (albeit very small and obsolete) AI that once impressed people.
Is the AI plotting to kill all humans? There will be some combination of millions of features and connections that answers these questions. In some case we can even imagine how we would begin to do it - check how active the features representing race are when we ask it to judge people, maybe. But realistically, when we’re working with very complex interactions between millions of neurons we’ll have to automate the process, some larger scale version of “ask GPT-4 to tell us what GPT-2 is doing”. This probably works for racial stereotypes. It’s more complicated once you start asking about killing all humans (what if the GPT-4 equivalent is the one plotting to kill all humans, and feeds us false answers?) But maybe there’s some way to make an interpreter AI which itself is too dumb to plot, but which can interpret a more general, more intelligent, more dangerous AI. You can see more about how this could tie into more general alignment plans in the post on the ELK problem. I also just found this paper, which I haven’t fully read yet but which seems like a start on engineering safety into interpretable AIs. Finally, what does all of this tell us about humans? Humans also use neural nets to reason about concepts. We have a lot of neurons, but so does GPT-4. Our data is very sparse - there are lots of concepts (eg octopi) that come up pretty rarely in everyday life. Are our brains full of strange abstract polyhedra? Are we simulating much bigger brains? This field is very new, but I was able to find one paper, Identifying Interpretable Visual Features in Artificial and Biological Neural Systems. The authors say: Through a suite of experiments and analyses, we find evidence consistent with the hypothesis that neurons in both deep image model [AIs] and the visual cortex [of the brain] encode features in superposition. That is, we find non-axis aligned directions in the neural state space that are more interpretable than individual neurons. In addition, across both biological and artificial systems, we uncover the intriguing phenomenon of what we call feature synergy - sparse combinations in activation space that yield more interpretable features than the constituent parts. Our work pushes in the direction of automated interpretability research for CNNs, in line with recent efforts for language models. Simultaneously, it provides a new framework for analyzing neural coding properties in biological systems. This is a single non-peer-reviewed paper announcing a surprising claim in a hype-filled field. That means it has to be true - otherwise it would be unfair! If this topic interests you, you might want to read the full papers, which are much more comprehensive and interesting than this post was able to capture. My favorites are: An Introduction To Circuits
February 13, 2024 · Original source
The basic logic: GPT-1 cost approximately nothing to train. GPT-2 cost $40,000. GPT-3 cost $4 million. GPT-4 cost $100 million. Details about GPT-5 are still secret, but one extremely unreliable estimate says $2.5 billion, and this seems the right order of magnitude given the $8 billion that Microsoft gave OpenAI.
January 13, 2026 · Original source
“Did you hear about the guy who made a working language model in Minecraft using redstone circuits? Pretty amazing, isn’t it? His version is barely GPT-2 level, but there’s no reason we can’t scale that up. Once we create full-sized data centers in Minecraft, everyone will want to do their training runs there.”