DALL-E
Article
DALL-E is a recurring brand in the Astral Codex Ten archive, appearing 8 times across 8 issues between April 11, 2022 and June 03, 2025. The archive places it in contexts such as “After GPT-2, DALL-E, and the rest”; “as sure of anything as DALL-E is as sure that William Ockham had a giant red beard”; “systems like DALL-E to understand semantics”. It most often appears alongside Elon Musk, GPT-3, OpenAI.
Metadata
- Category: Brands
- Mention count: 8
- Issue count: 8
- First seen: April 11, 2022
- Last seen: June 03, 2025
Appears In
- Deceptively Aligned Mesa-Optimizers: It’s Not Funny If I Have To Explain It
- A Guide To Asking Robots To Design Stained Glass Windows
- My Bet: AI Size Solves Flubs
- Somewhat Contra Marcus On AI Scaling
- Links For July
- OpenAI’s “Planning For AGI And Beyond”
- How Did You Do On The AI Art Turing Test?
- Choose Nonbook Review Finalists 2025
Related Pages
-
- Elon Musk (4 shared issues)
-
- GPT-3 (3 shared issues)
-
- OpenAI (3 shared issues)
-
- DeepMind (2 shared issues)
-
- Eliezer Yudkowsky (2 shared issues)
-
- Gary Marcus (2 shared issues)
-
- GPT-2 (2 shared issues)
-
- GPT-2 (2 shared issues)
-
- GPT-3 (2 shared issues)
-
- GPT-4 (2 shared issues)
-
- GPT-4 (2 shared issues)
-
- Gwern (2 shared issues)
External Links
Source Context
Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.
Prosaic alignment is hard… “Prosaic alignment” (see this article for more) means alignment of normal AIs like the ones we use today. For a while, people thought those AIs couldn’t reach dangerous levels, and that AIs that reached dangerous levels would have so many exotic new discoveries that we couldn’t even begin to speculate on what they would be like or how to align them. After GPT-2, DALL-E, and the rest, alignment researchers got more concerned that AIs kind of like current models could be dangerous. Prosaic alignment - trying to align AIs like the ones we have now - has become the dominant (though not unchallenged) paradigm in alignment research. “Prosaic” doesn’t necessarily mean the AI cannot write poetry; see Gwern’s AI generated poetry for examples. … because OOD behavior is unpredictable “OOD” stands for “out of distribution”. All AIs are trained in a certain environment. Then they get deployed in some other environment. If it’s like the training environment, presumably their training is pretty relevant and helpful. If it’s not like the training environment, anything can happen. Returning to our stock example, the “training environment” where evolution designed humans didn’t involve contraceptives. In that environment, the base optimizer’s goal (pass on genes) and the mesa-optimizer’s goal (get genital friction) were very well-aligned - doing one often led to the other - so there wasn’t much pressure on evolution to look for a better proxy. Then 1957, boom, the FDA approves the oral contraceptive pill, and suddenly the deployment environment looks really really different from the training environment and the proxy collapses so humiliatingly that people start doing crazy things like electing Viktor Orban prime minister. So: suppose we train a robot to pick strawberries. We let it flail around in a strawberry patch, and reinforce it whenever strawberries end up in a bucket. Eventually it learns to pick strawberries very well indeed. But maybe all the training was done on a sunny day. And maybe what it actually learned was to identify the metal bucket by the way it gleamed in the sunlight. Later we ask it to pick strawberries in the evening, where a local streetlight is the brightest thing around, and it throws the strawberries at the streetlight instead. So fine. We train it in a variety of different lighting conditions, until we’re sure that, no matter what the lighting situation, the strawberries go in the bucket. Then one day someone with a big bulbous red nose wanders on to the field, and the robot tears his nose off and pulls it into the bucket. If only there had been someone with a nose that big and red in the training distribution, so we could have told it not to do that! The point is, just because it’s learned “strawberries into bucket” in one environment, doesn’t mean it’s safe or effective in another. And we can never be sure we’ve caught all the ways the environment can vary. …and deception is more dangerous than Goodharting. To “Goodhart” is to take advantage of Goodhart’s Law: to follow the letter of your reward function, rather than the spirit. The ordinary-life equivalent is “teaching to the test”. The system’s programmers (eg the Department of Education) have an objective (children should learn). They delegate that objective to mesa-optimizers (the teachers) via a proxy objective (children should do well on the standardized test) and a correlated reward function (teachers get paid more if their students get higher test scores). The teachers can either pursue the base objective for less reward (teach children useful skills), or pursue their mesa-level objective for more reward (teach them how to do well on the test). An alignment failure! This sucks, but it’s a bounded problem. We already know that some teachers teach to the test, and the Department of Education has accepted this as a reasonable cost of having the incentive system at all. We might imagine our strawberry-picker cutting strawberries into little pieces, so that it counts as having picked more strawberries. Again, it sucks, but once a programmer notices it can be fixed pretty quickly (as long as the AI is still weak and under control). What about deception? Suppose the strawberry-picker happens to land on some goal function other than the intended one. Maybe, as before, it wants to toss strawberries at light sources, in a way that works when the nearest light source is a metal bucket, but fails when it’s a streetlight. Our programmers are (somewhat) smart and careful, so during training, they test it at night, next to a streetlight. What happens? If it’s just a dumb collection of reflexes trained by gradient descent, it throws the strawberry at the streetlight and this is easily caught and fixed. If it’s a very smart mesa-optimizer, it might think “If I throw the strawberry at the streetlight, I will be caught and trained to have different goals. This totally fails to achieve my goal of having strawberries near light sources. So throwing the strawberry at the light source this time, in the training environment, fails to achieve my overall goal of having strawberries thrown at light sources in general. I’ll do what the humans want - put the strawberry in the bucket - for now.” So it puts the strawberry in the bucket and doesn’t get caught. Then, as soon as the humans stop looking, it throws strawberries at the streetlight again. Deception is more dangerous than Goodharting because Goodharting will get caught and trained away, and deception might not. I might not be explaining this well, see also Deceptively Aligned Mesa-Optimizers? It’s More Likely Than You Think: We prevent OOD behavior by detecting OOD and obtaining more human labels when we detect it… If you’re (somewhat) careful, you can run your strawberry-picking AI at night, see it throw strawberries at streetlights, and train it out of this behavior (ie have a human programmer label it “bad” so the AI gradient-descends away from it) …and we eliminate the incentive for deception by ensuring that the base optimizer is myopic A myopic optimizer is one that reinforces programs based only on their performance within a short time horizon. So for example, the outside gradient descent loop might grade a strawberry picker only on how well it did picking strawberries for the first hour it was deployed. If this worked perfectly, it would create an optimizer with a short time horizon. When it considered deceiving its programmers in order to get a payoff a few days later when they stopped watching it, it wouldn’t bother, since a few days later is outside the time horizon. …and implements a decision theory incapable of acausal trade. You don’t want to know about this one, really. Just pretend it never mentioned this, sorry for the inconvenience. There are deceptively-aligned non-myopic mesa-optimizers even for a myopic base objective. Even if the base optimizer is myopic, the mesa-optimizer might not be. Evolution designed humans myopically, in the sense that we live some number of years, and nothing that happens after that can reward or punish us further. But we still “build for posterity” anyway, presumably as a spandrel of having working planning software at all. Infinite optimization power might be able to evolve this out of us, but infinite optimization power could do lots of stuff, and real evolution remains stubbornly finite. Maybe it would be helpful if we could make the mesa-optimizer itself myopic (though this would severely limit its utility). But so far there is no way to make a mesa-optimizer anything. You just run the gradient descent and cross your fingers. The most likely outcome: you run myopic gradient descent to create a strawberry picker. It creates a mesa-optimizer with some kind of proxy goal which corresponds very well to strawberry picking in the training optimization, like flinging red things at lights (realistically it will be weirder and more exotic than this). The mesa-optimizer is not incentivized to think about anything more than an hour out, but does so anyway, for the same reason I’m not incentivized to speculate about the far future but I’m doing so anyway. While speculating about the far future, it realizes that failing to pick strawberries correctly now will thwart its goal of throwing red things at light sources later. It picks strawberries correctly in the training distribution, and then, when training is over and nobody is watching, throws strawberries at streetlights. (Then it realizes it could throw lots more red things at light sources if it was more powerful, achieves superintelligence somehow, and converts the mass of the Earth into red things it can throw at the sun. The end.) III. You’re still here? But we already finished explaining the meme! Okay, fine. Is any of this relevant to the real world? As far as we know, there are no existing full mesa-optimizers. AlphaGo is kind of a mesa-optimizer. You could approximate it as a gradient descent loop creating a good-Go-move optimizer. But this would only be an approximation: DeepMind hard-coded some parts of AlphaGo, then gradient-descended other parts. Its objective function is “win games of Go”, which is hard-coded and pretty clear. Whether or not you choose to call it a mesa-optimizer, it’s not a very scary one. Will we get scary mesa-optimizers in the future? This ties into one of the longest-running debates in AI alignment - see eg my review of Reframing Superintelligence, or the Eliezer Yudkowsky/Richard Ngo dialogue. Optimists say: “Since a goal-seeking AI might kill everyone, I would simply not create one”. They speculate about mechanical/instinctual superintelligences that would be comparatively easy to align, and might help us figure out how to deal with their scarier cousins. But the mesa-optimizer literature argues: we have limited to no control over what kind of AIs we get. We can hope and pray for mechanical instinctual AIs all we want. We can avoid specifically designing goal-seeking AIs. But really, all we’re doing here is setting up a gradient descent loop and pressing ‘go’. Then the loop evolves whatever kind of AI best minimizes our loss function. Will that be a mesa-optimizer? Well, I benefit from considering my actions and then choosing the one that best achieves my goal. Do you benefit from this? It sure does seem like this helps in a broad class of situations. So it would be surprising if planning agents weren’t an effective AI design. And if they are, we should expect gradient descent to stumble across them eventually. This is the scenario that a lot of AI alignment research focuses on. When we create the first true planning agent - on purpose or by accident - the process will probably start with us running a gradient descent loop with some objective function. That will produce a mesa-optimizer with some other, potentially different, objective function. Making sure you actually like the objective function that you gave the original gradient descent loop on purpose is called outer alignment. Carrying that objective function over to the mesa-optimizer you actually get is called inner alignment. Outer alignment problems tend to sound like Sorcerer’s Apprentice. We tell the AI to pick strawberries, but we forgot to include caveats and stop signals. The AI becomes superintelligent and converts the whole world into strawberries so it can pick as many as possible. Inner alignment problems tend to sound like the AI tiling the universe with some crazy thing which, to humans, might not look like picking strawberries at all, even though in the AI’s exotic ontology it served as some useful proxy for strawberries in the training distribution. My stand-in for this is “converts the whole world into red things and throws them into the sun”, but whatever the AI that kills us really does will probably be weirder than that. They’re not ironic Sorcerer’s Apprentice-style comeuppance. They’re just “what?” If you wrote a book about a wizard who created a strawberry-picking golem, and it converted the entire earth into ferrous microspheres and hurled them into the sun, it wouldn’t become iconic the way Sorcerer’s Apprentice did. Inner alignment problems happen “first”, so we won’t even make it to the good-story outer alignment kind unless we solve a lot of issues we don’t currently know how to solve. For more information, you can read: Rob Miles’ video above, direct link here, channel here.
Enter DALL-E-2, the new art-generating AI. I’m still on the waitlist, but a friend who jumped in sooner than I did let me use their computer for a while and play around with it. This was my first introduction to the exciting world of DALL-E query framing - the surprisingly complicated relationship between what you ask the AI to do, and what it actually does. Seems on topic for this blog. So this is a combination investigation into how DALL-E thinks about queries, but also a practical guide to getting DALL-E to make good stained glass. Let’s get started.
Inline links: DALL-E-2
I’m going to go out of order here so I can demonstrate some principles from simplest to most complicated. Empiricism was the easiest window to generate. I wanted a picture of Charles Darwin studying finches. DALL-E was happy to provide.
Some of these are okay, but they don’t especially look like Bayes. I figured that maybe DALL-E doesn’t know who Bayes is. Is that true?
On A Guide To Asking Robots To Design Stained Glass Windows, I described how DALL-E gets confused easily and makes silly mistakes. But I also wrote that:
Inline links: A Guide To Asking Robots To Design Stained Glass Windows
Why are you so confident in this? The inability of systems like DALL-E to understand semantics in ways requiring an actual internal world model strikes me as the very heart of the issue. We can also see this exact failure mode in the language models themselves. They only produce good results when the human asks for something vague with lots of room for interpretation, like poetry or fanciful stories without much internal logic or continuity […]
DALL-E: “A lawyer wearing a bathing suit in court” The top prompt is hilarious and a pretty understandable mistake if you think of it as about clothing, but in the end I probably can’t give it any credit. So in the end, the more advanced GPT-3 gets 4.5 / 6.
Previously: I predicted that DALL-E’s many flaws would be fixed quickly in future updates. As evidence, I cited Gary Marcus’ lists of GPT’s flaws, most of which got fixed quickly in future updates.
Inline links: I predicted
To repeat, the main point I was making in my last post was that we should mostly expect certain particular minor problems with DALL-E to get fixed by the next update. I don’t think Marcus particularly wants to argue against that.
DALL-E, “A elderly person's hand, labelled with the text ‘AN ELDERLY PERSON'S HAND’” Lucid dreaming enthusiasts suggest that two of the easiest ways to distinguish dreams from reality is that, in dreams, hands have the wrong number of fingers, and text is garbled and unreadable. This is not a coincidence because nothing is ever a coincidence. But even the waking world gives us clues, as Sarah Constantin notes in Humans Who Are Not Concentrating Are Not General Intelligences. Most adults will make GPT-like mistakes (or gloss over such mistakes) unless they’re focusing all their brainpower on an issue. And a 4chan post by someone who claims to have done psych research in prison populations goes further (slightly edited for language and offensiveness): I did IQ research as a grad student, and it involved a lot of this stuff. Did you know that most people (95% with less than 90 IQ) can't understand conditional hypotheticals? For example, "How would you have felt yesterday evening if you hadn't eaten breakfast or lunch?" "What do you mean? I did eat breakfast and lunch." "Yes, but if you had not, how would you have felt?" "Why are you saying that I didn't eat breakfast? I just told you that did." "Imagine that you hadn't eaten it, though. How would you have felt?" "I don't understand the question." It's really fascinating [...] Other interesting phenomenon around IQ involves recursion. For example: "Write a story with two named characters, each of whom have at least one line of dialogue." Most literate people can manage this, especially once you give them an example. "Write a story with two named characters, each of whom have at least one line of dialogue. In this story, one of the characters must be describing a story with at least two named characters, each of whom have at least one line of dialogue." If you have less than 90 IQ, this second exercise is basically completely impossible. Add a third level ('frame') to the story, and even IQ 100's start to get mixed up with the names and who's talking. Turns out Scheherazade was an IQ test! Time is practically impossible to understand for sub 80s. They exist only in the present, can barely reflect on the past and can't plan for the future at all. Sub 90s struggle with anachronism too. For example, I remember the 80-85s stumbling on logic problems that involved common sense anachronism stuff. For instance: "Why do you think that military strategists in WWII didn't use laptop computers to help develop their strategies?" "I guess they didn't want to get hacked by Nazis". Admittedly you could argue that this is a history knowledge question, not quite a logic sequencing question, but you get the idea. Sequencing is super hard for them to track, but most 100+ have no problem with it, although I imagine that a movie like Memento strains them a little. Recursion was definitely the killer though. Recursive thinking and recursive knowledge seems genuinely hard for people of even average intelligence. I have no proof that this person is who they say they are, but it matches some of my experience giving cognitive exams to patients from low-functioning populations. And it matches Flynn on Luria (who admittedly was approaching IQ from a cultural relativist viewpoint, but one which I think is equally applicable to the current problem). Luria gave IQ-test-like questions to various people across the USSR. He ran into trouble when he got to Uzbek peasants (transcribed, with some changes for clarity, from here): Luria: All bears are white where there is always snow. In Novaya Zemlya there is always snow. What color are the bears there? Peasant: I have seen only black bears and I do not talk of what I have not seen. Luria: What what do my words imply? Peasant: If a person has not been there he can not say anything on the basis of words. If a man was 60 or 80 and had seen a white bear there and told me about it, he could be believed. And: Luria: There are no camels in Germany; the city of B is in Germany; are there camels there or not? Peasant: I don't know, I have never seen German villages. If is a large city, there should be camels there. Luria: But what if there aren't any in all of Germany? Peasant: If B is a village, there is probably no room for camels. And: Luria: What do a chicken and a dog have in common? Peasant: They are not alike. A chicken has two legs, a dog has four. A chicken has wings but a dog doesn't. A dog has big ears and a chicken's are small. Luria: Is there one word you could use for them both? Peasant: No, of course not. Luria: Would the word "animal" fit? Peasant: Yes. And: Luria: What do a fish and a crow have in common? Peasant: A fish — it lives in water. A crow flies. If the fish just lies on top of the water, the crow could peck at it. A crow can eat a fish but a fish can't eat a crow. Luria: Could you use one word for them both? Peasant: If you call them "animals", that wouldn't be right. A fish isn't an animal and a crow isn't either. A crow can eat a fish but a fish can't eat a bird. A person can eat fish but not a crow. What I gather from all of this is that the human mind doesn’t start with some kind of crystalline beautiful ability to solve what seem like trivial and obvious logical reasoning problems. It starts with weaker, lower-level abilities. Then, if you live in a culture that has a strong tradition of abstract thought, and you’re old enough/smart enough/awake enough/concentrating enough to fully absorb and deploy that tradition, then you become good at abstract thought and you can do logical reasoning problems successfully. (Sometimes! If you’re lucky! Linda is a blah blah blah you know the story. Is she more likely to be a bank teller, or a feminist bank teller. When people get this question wrong, do they have a world-model, or not?) Imagine a world where doctors gave different diagnoses based on unrelated contingent features of the encounter like a patient’s gender, their race, or how you phrase the question. What a crazy place that would be! What is the pre-logical function that logic gets knit out of? I think it’s something like predictive pattern-matching. I think the brain starts by predicting arbitrary patterns, builds up more and more layers of abstraction to try to predict those patterns better, and eventually some of those layers cohere into something that looks like formal logic. To put it another way, my brain is in some sense a supercomputer that can outperform the best calculating machines in the world - but also, I have trouble multiplying three digit numbers in my head. The supercomputer is doing something, and then I’m using that something, very lossily, to emulate logical functions like math or formal logic. So when I see GPT, which also runs on a supercomputer, also slowly gaining the ability to multiply two-digit, then three-digit numbers as the supercomputer gets bigger and bigger, I feel a sort of kinship with it. It’s a trash heap of patterns with a hard-won ability to sometimes break out into the clear day of logical reasoning, just like me. IV. I think Marcus knows and agrees with most of this, but I think he thinks of the world-modeling ability as some special rare brain region (maybe the prefrontal cortex?) which is only online part of the time (or maybe can have its performance degrade gracefully). Whereas I think of it as shallower pattern-matching abilities which escalate to deeper and deeper pattern-matching abilities as more and more brainpower becomes available, with world-modeling as one of the deepest (and sure, probably the PFC plays a major role, but not because it has a fundamentally different structure but just because that’s where reinforcement learning stuck the highest-level patterns). Why do I think this? The human brain is pretty plastic. Usually if one part of it dies, another part can take over. This makes me think that the brain area : function correspondence isn’t entirely a function of different structures in different regions (though some of it might be this), but downstream of an originally poorly-differentiated blob of neurons that get trained by the overall predictive structure based on their proximity to various input ports (eg sensory nerves) output ports (eg motor nerves), and other brain areas. (this would also explain why the brain has a pretty consistent area dedicated to reading/writing, even though we haven’t been literate long enough to evolve new literacy-related structures) Deep learning agents are also a poorly-differentiated mass of neurons. As they get inputs and outputs (ie training data) they slowly “evolve”/develop the ability to “recognize” patterns. We don’t know how they do this or what recognition-abilities they’re evolving, except by speculating (the way Marcus and I are doing) based on what kinds of problems they can and can’t solve. It would make sense to me if poorly-differentiated blobs of neurons, when having lots of problems thrown at them, gradually move from developing simpler pattern-recognition programs (eg edge detectors), to more complicated pattern-recognition programs, all the way up to world-modeling, without any of these being hard-coded into the territory. (the brain does have a lot of things hard-coded - ie we’re not blank slates - but its plasticity suggests that the forms of hard-coding we’re talking about here are helpful but not completely necessary for cognition) If this were true, it would mean that as a blob of neurons got bigger, more sophisticated, and saw more training data, it would eventually develop new capabilities that weren’t hard-coded in, and that smaller versions of the same blob didn’t have. One of the really exciting things about GPT-3 was its sudden and unplanned development of new capabilities over GPT-2 (its creators mention “translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic”). This seems like a good fit for the chimp → human transition, where evolutionary lineages that couldn’t do a bunch of difficult things for the first few hundred million years suddenly became good at those things in an evolutionary eyeblink. The ~5 million chimp/human gap seems like enough time to scale up chimp brains a bit (which definitely happened), but not enough time to invent a fundamentally new architecture. It wouldn’t surprise me if the architecture changed a little during this time, but we’re limited in how fundamental a change we can talk about over that period. I’m not at all sure this is true! I’m honestly close to 50-50 here. Maybe the PFC actually is magic! It just confuses me that Marcus seems to think we’ve ruled out the theory that this kind of scaling is possible, when I feel like we’ve heard plausible arguments on both sides. Nothing we’ve seen in GPTs or any other AI thus far disproves the scaling hypothesis, and a lot of what we’ve seen supports it. So sure, point out that large language models suck at reasoning today. I just don’t see how you can be so sure that they’re still going to suck tomorrow. Lemurs sucked for millions of years, then scaled up a bit and took over the world! V. …is one possible argument. Another possible argument is: language models and other deep learners really aren’t doing the same thing humans do - but whatever, their thing is powerful/effective/dangerous too. Suppose that GPT-X took over the world and killed all humans. Millennia later, some alien archaeologists come and investigate. They conclude that since its training data included Alexander the Great and Caesar, it was just pattern-matching to the kind of things they did (multiplied by a vector representing the difference between ancient and modern times), and GPT-X never demonstrated any true intelligence. So . . . what? I imagine this situation ALL THE TIME and I hate it. I think the impetus behind a lot of the AI risk stuff is that we’re barrelling to a world where AIs have far more than self-driving-car levels of capabilities, while being unpredictable in ways that are a lot like this. The history of the past few decades has been people getting surprised, again and again, at how much AIs can do without being “generally intelligent”. Douglas Hofstadter predicted in 1979 that any AI that could beat a grandmaster at chess would also be able to decide chess was boring and it preferred writing poetry. Instead, we got Deep Blue, so domain-specific it can’t even do so much as play checkers. Worse, now we have AIs that can switch between writing poetry and playing chess, and it still seems like a clever parlor trick rather than anything like real intelligence. I think basically nobody predicted this: narrow AI has won victories beyond past generations’ imagination. (cf. Nostalgebraist’s Human Psycholinguists: A Critical Appraisal) So even if GPTs aren’t a step on the path towards some sort of human-like AGI thing, I have no idea where they’ll end up. Replacing humans at all jobs? Writing novels? Taking over the world? If this seems crazy to you, “solve protein folding” sounded crazy ten years ago, and they already did that! At this point I will basically believe anything. VI. So I’m not going to take Marcus’ bet that GPT-4 will be perfect (as if anything ever is!). But here are some things I do believe, with confidence levels: At some point before 2030, someone will come out with a deep-learning-based language model which is significantly better than the current state of the art, by Gary Marcus’ admission (97%)
Inline links: hands have the wrong number of fingers, text is garbled and unreadable, Humans Who Are Not Concentrating Are Not General Intelligences, 4chan post, here, blah blah blah, https://twitter.com/GaryMarcus/status/1534887585759100930, a patient’s gender, their race, how you phrase the question, predictive pattern-matching, prefrontal cortex, mention, https://substackcdn.com/image/fetch/$s_!M9E6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F85fd2e4d-c51e-4820-a9a2-6f79970026cb_600x368.png, https://substackcdn.com/image/fetch/$s_!_D9T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5cb652e3-381b-488d-a6a6-f1155f7ff557_586x194.png, writing poetry, playing chess, Human Psycholinguists: A Critical Appraisal
13: DALL-E generation for "Michaelangelo, Donatello, Leonardo, and Rafael hanging out at the beach, oil on canvas". Intermediate source here, I don’t know original:
Inline links: here
Even if they’re trying to be honest, will their bottom line bias them towards waiting for some final apocalyptic proof that “now climate change is a crisis”, of a sort that will never happen, so they don’t have to stop pumping oil? This is how I feel about OpenAI’s new statement, Planning For AGI And Beyond. OpenAI is the AI company behind ChatGPT and DALL-E. In the past, people (including me) have attacked them for seeming to deprioritize safety. Their CEO, Sam Altman, insists that safety is definitely a priority, and has recently been sending various signals to that effect. Sam Altman posing with leading AI safety proponent Eliezer Yudkowsky. Also Grimes for some reason. Planning For AGI And Beyond (“AGI” = “artificial general intelligence”, ie human-level AI) is the latest volley in that campaign. It’s very good, in all the ways ExxonMobil’s hypothetical statement above was very good. If they’re trying to fool people, they’re doing a convincing job! Still, it doesn’t apologize for doing normal AI company stuff in the past, or plan to stop doing normal AI company stuff in the present. It just says that, at some indefinite point when they decide AI is a threat, they’re going to do everything right. This is more believable when OpenAI says it than when ExxonMobil does. There are real arguments for why an AI company might want to switch from moving fast and breaking things at time t to acting all responsible at time t + 1 . Let’s explore the arguments they make in the document, go over the reasons they’re obviously wrong, then look at the more complicated arguments they might be based off of. Why Doomers Think OpenAI Is Bad And Should Have Slowed Research A Long Time Ago OpenAI boosters might object: there’s a disanalogy between the global warming story above and AI capabilities research. Global warming is continuously bad: a temperature increase of 0.5 degrees C is bad, 1.0 degrees is worse, and 1.5 degrees is worse still. AI doesn’t become dangerous until some specific point. GPT-3 didn’t hurt anyone. GPT-4 probably won’t hurt anyone. So why not keep building fun chatbots like these for now, then start worrying later? Doomers counterargue that the fun chatbots burn timeline. That is, suppose you have some timeline for when AI becomes dangerous. For example, last year Metaculus thought human-like AI would arrive in 2040, and superintelligence around 2043. Recent AIs have tried lying to, blackmailing, threatening, and seducing users. AI companies freely admit they can’t really control their AIs, and it seems high-priority to solve that before we get superintelligence. If you think that’s 2043, the people who work on this question (“alignment researchers”) have twenty years to learn to control AI. Then OpenAI poured money into AI, did ground-breaking research, and advanced the state of the art. That meant that AI progress would speed up, and AI would reach the danger level faster. Now Metaculus expects superintelligence in 2031, not 2043 (although this seems kind of like an over-update), which gives alignment researchers eight years, not twenty. So the faster companies advance AI research - even by creating fun chatbots that aren’t dangerous themselves - the harder it is for alignment researchers to solve their part of the problem in time. This is why some AI doomers think of OpenAI as an Exxon-Mobil style villain, even though they’ve promised to change course before the danger period. Imagine an environmentalist group working on research and regulatory changes that would have solar power ready to go in 2045. Then ExxonMobil invents a new kind of super-oil that ensures that, nope, all major cities will be underwater by 2031 now. No matter how nice a statement they put out, you’d probably be pretty mad! Why OpenAI Thinks Their Research Is Good Now, But Might Be Bad Later OpenAI understands the argument against burning timeline. But they counterargue that having the AIs speeds up alignment research and all other forms of social adjustment to AI. If we want to prepare for superintelligence - whether solving the technical challenge of alignment, or solving the political challenges of unemployment, misinformation, etc - we can do this better when everything is happening gradually and we’ve got concrete AIs to think about: We believe we have to continuously learn and adapt by deploying less powerful versions of the technology in order to minimize “one shot to get it right” scenarios […] As we create successively more powerful systems, we want to deploy them and gain experience with operating them in the real world. We believe this is the best way to carefully steward AGI into existence—a gradual transition to a world with AGI is better than a sudden one. We expect powerful AI to make the rate of progress in the world much faster, and we think it’s better to adjust to this incrementally. A gradual transition gives people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place. It also allows for society and AI to co-evolve, and for people collectively to figure out what they want while the stakes are relatively low. You might notice that, as written, this argument doesn’t support full-speed-ahead AI research. If you really wanted this kind of gradual release that lets society adjust to less powerful AI, you would do something like this: Release AI #1
Inline links: Planning For AGI And Beyond, attacked them for seeming to deprioritize safety, https://substackcdn.com/image/fetch/$s_!k2Db!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95482196-a442-48ba-9e90-c7d681c5dd1d_1517x1491.png, Metaculus, they can’t really control their AIs
DeepMind thought they were establishing a lead in 2008, but OpenAI has caught up to them. OpenAI thought they were establishing a lead the past two years, but a few months after they came out with GPT, at least Google, Facebook, and Anthropic had comparable large language models; a few months after they came out with DALL-E, random nobody startups came out with StableDiffusion and MidJourney. None of this research has established a commanding lead, it’s just moved everyone forward together and burned timelines for no reason.
Finally, I avoided most AI art in the DALL-E "house style", since everyone already knows this is AI - or in other similar styles that humans would have trouble replicating, maybe because they do too much with color and lighting, in a way that few human artists would have the talent or patience for.
These people aren't necessarily deluded; they might mean that they're frustrated wading through heaps of bad AI art, all drawn in an identical DALL-E house style, and this dataset of hand-curated AI art selected for stylistic diversity doesn't capture what bothers them.
Humans keep insisting that AI art is hideous slop. But also, when you peel off the labels, many of them can’t tell AI art from some of the greatest artists in history. I’ve tried to be as fair as possible to these people, proposing that maybe they’re just expressing frustration with the proliferation of the DALL-E house style. And maybe some really do have an amazing eye for tiny incongruous details.
...hips An American Football Game Arbitraging Several Dozen Online Casinos As Little As Possible Bukele Bishop's Castle Bite Me: Teeth Contrasting Reviews of Nine Countries DALL-E Dating Apps The Disease Death (Mata Hari, Princess Di, Joan of Arc) Deja Vu Earth Einstein's World-View Effective Altruism / Rationalism Elon Musk's Engineering Algorith...
Backlinks
- A Guide To Asking Robots To Design Stained Glass Windows
- Brands
- Choose Nonbook Review Finalists 2025
- Concepts: C
- Concepts: E
- Concepts: I
- Concepts: L
- Concepts: O
- Concepts: S
- Deceptively Aligned Mesa-Optimizers: It’s Not Funny If I Have To Explain It
- Films
- GPT-2
- GPT-2
- GPT-3
- How Did You Do On The AI Art Turing Test?
- Links For July
- My Bet: AI Size Solves Flubs
- OpenAI’s “Planning For AGI And Beyond”
- Organizations: A
- Organizations: D
- People: E
- People: F
- People: K
- People: L
- People: U
- Places: M
- Places: T
- Publications: A
- Publications: G
- Publications: H
- Publications: I
- Publications: M
- Publications: N
- Rob Miles
- Scheherazade
- Somewhat Contra Marcus On AI Scaling
- Time’s Arrow