DALL-E
Article
DALL-E is a recurring concept in the Astral Codex Ten archive, appearing 5 times across 5 issues between May 13, 2022 and July 03, 2023. The archive places it in contexts such as “When the groundbreaking GPT-3 and DALL-E suddenly could write news articles or poetry”; “DALL-E: A two-headed elk”; “DALL-E: “A beast with seven heads and ten horns”. It most often appears alongside GPT-3, OpenAI, AGI.
Metadata
- Category: Concepts
- Mention count: 5
- Issue count: 5
- First seen: May 13, 2022
- Last seen: July 03, 2023
Appears In
- Your Book Review: Consciousness And The Brain
- ELK And The Problem Of Truthful AI
- Why Not Slow AI Progress?
- Davidson On Takeoff Speeds
- Tales Of Takeover In CCF-World
Related Pages
-
- GPT-3 (3 shared issues)
-
- OpenAI (3 shared issues)
-
- AGI (2 shared issues)
-
- Anthropic (2 shared issues)
-
- Google (2 shared issues)
-
- GPT-4 (2 shared issues)
-
- Open Philanthropy (2 shared issues)
-
- Tom Davidson (2 shared issues)
-
- US (2 shared issues)
-
- Yudkowsky (2 shared issues)
-
- 80,000 Hours (1 shared issues)
-
- 80,000 Hours’ Guide To Working In AI Policy And Strategy (1 shared issues)
External Links
Source Context
Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.
We know this because it happened several times. The first time was in 1966, when ELIZA passed the Turing test. ELIZA was a chatbot who could fool some people to believe that they talk with a real human. Before ELIZA, people assumed that only an intelligent machine could do that, but it just turned out that it is really easy to fool others. Other tests for intelligence were playing chess, playing a whole variety of games, or recognizing cat images. Machines can do all this by now, and this is awesome. And yet, every success sparked new disappointment, because we didn't find any magic ingredient, some quality that would make a difference between intelligent and non-intelligent. When the groundbreaking GPT-3 and DALL-E suddenly could write news articles or poetry, or could dream up snails made of harp... the main improvement was that they used more raw computation power than the previous versions.
DALL-E: A two-headed elk. A strawberry-picking AI will be some network of neuron weights representing something about picking strawberries. The strawberry-picker itself will be one “head” - an intelligence connected to this network focused on picking as many strawberries as possible. But you could add another “head” and train it to tell the truth. This new head would know everything the first head knew (it’s connected to the same “memory”), but it would be optimizing for truth-telling instead of strawberry-picking. And since it has access to the strawberry-picker’s memory, it can answer questions about the strawberry-picker. The problem is training the ELK head to tell the truth. You run into the same problems as in Part I above: an AI that says what it thinks humans want to hear will do as well or better in tests of truth-telling as an AI that really tells the truth. DALL-E: “A beast with seven heads and ten horns, and upon his horns ten crowns, and upon his heads the name of blasphemy.” Probably just a coincidence. III. Ipso Facto, Ergo ELK The ELK Technical Report And Contest are a list of ARC’s attempts to solve the problem so far, and a call for further solutions. It starts with a toy problem: a superintelligent security AI guarding a diamond. Every so often, thieves come in and try to steal the diamond, the AI manipulates some incomprehensible set of sensors and levers and doodads and traps, and the theft either succeeds or fails. Everyone agrees that trying to understand ELK is terrible, so please accept these delightful illustrations by María Gutiérrez-Rojas as compensation. We train the AI by running millions of simulations where it plays against simulated thieves. At first it flails randomly. But as time goes on, it moves towards strategies that make it win more often, learning more and more about how to deploy its doodads and traps most effectively. As it approaches superintelligence, it even starts extruding new traps and doodads we didn’t design, things we have no idea what they even do. Things get spooky. A thief comes in, gets to the diamond, then just seems to vanish. Another ELK report illustration. In the top part, we easily understand what’s happening - the AI is activating a trap door, plunging the thief into a spike pit. In the bottom part, we’re not sure. The AI does something incomprehensible, and all we know is that the thief is gone and the diamond is intact This is good - we wanted a superintelligent security AI, and we got one. But we can no longer evaluate its reasoning. All we can do is judge its results: is the diamond still there at the end of the simulation? If we see the diamond, we press the REWARD lever; if it’s gone, we press the PUNISHMENT lever. The training process. The AI does some incomprehensible thing. We check whether the diamond is safe or not. Then we rate it as good or bad. The AI gradient descends away from bad strategies, towards good ones. Eventually we’ve trained the AI very well and it has an apparent 100% success rate. What could go wrong? If we’re very paranoid, we might notice that the task at which we have a 100% success rate is causing the AI to get good ratings. How does the AI get good ratings? By making us think the diamond is safe. Hopefully this is correlated with the diamond actually being safe. But we haven’t proven this, have we? Suppose the simulated thief has hit upon the strategy of taping a photo of the diamond to the front of the camera lens. At the end of the training session, the simulated thief escapes with the diamond. The human observer sees the camera image of the safe diamond and gives the strategy a “good” rating. The AI gradient descends in the direction of helping thieves tape photos to cameras. Notice the “reality” section of the third example. The thief has made it look (to the human) like the diamond is safe. The human sees a diamond and positively reinforces the AI. The AI learns that thieves stealing the diamonds and fooling humans about it is great. It’s important not to think of this as the thief “defeating” or “fooling” the AI. The AI could be fully superintelligent, able to outfox the thief trivially or destroy him with a thought, and that wouldn’t change the situation at all. The problem is that the AI was never a thief-stopping machine. It was always a reward-getting machine, and it turns out the AI can get more reward by cooperating with the thief than by thwarting him. So the interesting scientific point here isn’t “you can fool a camera by taping a photo to it”. The interesting point is “we thought we were training an AI to do one thing, but actually we had no idea what was going on, and we were training it to do something else”. In fact, maybe the thief never tries this, and the AI comes up with this plan itself! In the process of randomly manipulating traps and doodads, it might hit on the policy of manipulating the images it sends through the camera. If it manipulates the image to look like the diamond is still there (even when it isn’t), that will always get good feedback, and the AI will be incentivized to double down on that strategy. Much like in the GPT-3 example, if the training simulations include examples of thieves fooling human observers which are marked as “good”, the AI will definitely learn the goal “try to convince humans that the diamond is safe”. If the training simulations are perfect and everyone is very careful, it will just maybe learn this goal - a million cases of the diamond being safe and humans saying this is good fail to distinguish between “good means the diamond is safe” and “good means humans think the diamond is safe”. The machine will make its decision for inscrutable AI reasons, or just flip a coin. So, again, are you feeling lucky? IV. "I'm Not Owned!" I Continue To Insist As I Slowly Shrink And Transform Into A Paperclip In real life there are some trivial kludges you might try first here. Force the AI to give its human raters 3D models of the entire room instead of just camera images. Hire a really good cybersecurity person to make sure the camera is un-hackable. These are too boring to be worth the ARC team’s time. This scenario is a metaphor for this broader class of situations - including GPT-3 lying to you and superintelligences trying to turn you into paperclips. So assume you live in Security Hell where you can never be fully sure your information channels aren’t hacked. The AI can be fully sure - it’s superintelligent or whatever - but you, personally, cannot. Your only solution is to somehow make the AI tell you the truth. And so we return to ELK. We add an ELK head on to the superintelligent security AI, train it to report truthfully on the AI’s innards. Then we ask: is the diamond really safe? The ELK head consults the innards, processes the memory banks of the security vault AI according to some function, and reports back with an answer. The ARC team really like modeling this as Bayes nets: I don’t know much about these, but from the picture I gather they’re kind of like causal diagrams or neural networks - a member in good standing of the broad class of “little circles connected by lines”. Each circle corresponds to some concept in the AI’s innards (roughly defined; the AI’s “concepts” might be completely incomprehensible to us), and anything the AI knows corresponds to some combination of circle activations and weights. We humans also have some Bayes net, probably very different from the AI’s. The job of the ELK head - our purpose-trained truth-telling AI - is to translate from the AI’s ontology to ours and back again. This matters because the reporter (ie the ELK head) needs a good model of both the security AI’s Bayes net, and the human programmer’s Bayes net. But since it has the model of the human Bayes net, it can (if it wants) do inference there. That is, the failure mode is that the reporter AI tries to tell me what I expected to hear, instead of what’s true. One barrier to that is that the reporter AI might not know what I expect to hear. Unfortunately, that barrier is easy to overcome, because it has to know how humans think, so it could ask itself “what would a human expect to hear in this situation?” The ARC team discuss two classes of translation system: The direct translator (good) looks at what the security AI is “thinking”, faithfully translates its conclusions, and accurately relays them to its programmers.
Inline links: https://substackcdn.com/image/fetch/$s_!E0WP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F107c91d0-9bf2-4b92-9c6c-8ae03761d77f_1024x1024.png, ELK Technical Report, Contest, https://substackcdn.com/image/fetch/$s_!LlSa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F07e5360a-89ad-4334-b785-28c498904bb2_620x446.png, https://substackcdn.com/image/fetch/$s_!2E3e!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F05896826-7bf0-4b31-8ab9-71fbf7075291_642x416.png, https://substackcdn.com/image/fetch/$s_!ppqw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F979f566a-2d5b-49b1-8a62-7fe9b7420a60_603x458.png, https://substackcdn.com/image/fetch/$s_!ph5E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F182e39bd-264f-4e6c-a2c5-993fde21da9a_430x134.png, https://substackcdn.com/image/fetch/$s_!E6m9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F208c417e-2a6e-43e1-bd96-ef61e6cb3042_745x446.png, https://substackcdn.com/image/fetch/$s_!46JI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7d65a74a-588b-4a38-8083-0992a4ac7c04_643x571.png
DALL-E: “A beast with seven heads and ten horns, and upon his horns ten crowns, and upon his heads the name of blasphemy.” Probably just a coincidence. III. Ipso Facto, Ergo ELK The ELK Technical Report And Contest are a list of ARC’s attempts to solve the problem so far, and a call for further solutions. It starts with a toy problem: a superintelligent security AI guarding a diamond. Every so often, thieves come in and try to steal the diamond, the AI manipulates some incomprehensible set of sensors and levers and doodads and traps, and the theft either succeeds or fails. Everyone agrees that trying to understand ELK is terrible, so please accept these delightful illustrations by María Gutiérrez-Rojas as compensation. We train the AI by running millions of simulations where it plays against simulated thieves. At first it flails randomly. But as time goes on, it moves towards strategies that make it win more often, learning more and more about how to deploy its doodads and traps most effectively. As it approaches superintelligence, it even starts extruding new traps and doodads we didn’t design, things we have no idea what they even do. Things get spooky. A thief comes in, gets to the diamond, then just seems to vanish. Another ELK report illustration. In the top part, we easily understand what’s happening - the AI is activating a trap door, plunging the thief into a spike pit. In the bottom part, we’re not sure. The AI does something incomprehensible, and all we know is that the thief is gone and the diamond is intact This is good - we wanted a superintelligent security AI, and we got one. But we can no longer evaluate its reasoning. All we can do is judge its results: is the diamond still there at the end of the simulation? If we see the diamond, we press the REWARD lever; if it’s gone, we press the PUNISHMENT lever. The training process. The AI does some incomprehensible thing. We check whether the diamond is safe or not. Then we rate it as good or bad. The AI gradient descends away from bad strategies, towards good ones. Eventually we’ve trained the AI very well and it has an apparent 100% success rate. What could go wrong? If we’re very paranoid, we might notice that the task at which we have a 100% success rate is causing the AI to get good ratings. How does the AI get good ratings? By making us think the diamond is safe. Hopefully this is correlated with the diamond actually being safe. But we haven’t proven this, have we? Suppose the simulated thief has hit upon the strategy of taping a photo of the diamond to the front of the camera lens. At the end of the training session, the simulated thief escapes with the diamond. The human observer sees the camera image of the safe diamond and gives the strategy a “good” rating. The AI gradient descends in the direction of helping thieves tape photos to cameras. Notice the “reality” section of the third example. The thief has made it look (to the human) like the diamond is safe. The human sees a diamond and positively reinforces the AI. The AI learns that thieves stealing the diamonds and fooling humans about it is great. It’s important not to think of this as the thief “defeating” or “fooling” the AI. The AI could be fully superintelligent, able to outfox the thief trivially or destroy him with a thought, and that wouldn’t change the situation at all. The problem is that the AI was never a thief-stopping machine. It was always a reward-getting machine, and it turns out the AI can get more reward by cooperating with the thief than by thwarting him. So the interesting scientific point here isn’t “you can fool a camera by taping a photo to it”. The interesting point is “we thought we were training an AI to do one thing, but actually we had no idea what was going on, and we were training it to do something else”. In fact, maybe the thief never tries this, and the AI comes up with this plan itself! In the process of randomly manipulating traps and doodads, it might hit on the policy of manipulating the images it sends through the camera. If it manipulates the image to look like the diamond is still there (even when it isn’t), that will always get good feedback, and the AI will be incentivized to double down on that strategy. Much like in the GPT-3 example, if the training simulations include examples of thieves fooling human observers which are marked as “good”, the AI will definitely learn the goal “try to convince humans that the diamond is safe”. If the training simulations are perfect and everyone is very careful, it will just maybe learn this goal - a million cases of the diamond being safe and humans saying this is good fail to distinguish between “good means the diamond is safe” and “good means humans think the diamond is safe”. The machine will make its decision for inscrutable AI reasons, or just flip a coin. So, again, are you feeling lucky? IV. "I'm Not Owned!" I Continue To Insist As I Slowly Shrink And Transform Into A Paperclip In real life there are some trivial kludges you might try first here. Force the AI to give its human raters 3D models of the entire room instead of just camera images. Hire a really good cybersecurity person to make sure the camera is un-hackable. These are too boring to be worth the ARC team’s time. This scenario is a metaphor for this broader class of situations - including GPT-3 lying to you and superintelligences trying to turn you into paperclips. So assume you live in Security Hell where you can never be fully sure your information channels aren’t hacked. The AI can be fully sure - it’s superintelligent or whatever - but you, personally, cannot. Your only solution is to somehow make the AI tell you the truth. And so we return to ELK. We add an ELK head on to the superintelligent security AI, train it to report truthfully on the AI’s innards. Then we ask: is the diamond really safe? The ELK head consults the innards, processes the memory banks of the security vault AI according to some function, and reports back with an answer. The ARC team really like modeling this as Bayes nets: I don’t know much about these, but from the picture I gather they’re kind of like causal diagrams or neural networks - a member in good standing of the broad class of “little circles connected by lines”. Each circle corresponds to some concept in the AI’s innards (roughly defined; the AI’s “concepts” might be completely incomprehensible to us), and anything the AI knows corresponds to some combination of circle activations and weights. We humans also have some Bayes net, probably very different from the AI’s. The job of the ELK head - our purpose-trained truth-telling AI - is to translate from the AI’s ontology to ours and back again. This matters because the reporter (ie the ELK head) needs a good model of both the security AI’s Bayes net, and the human programmer’s Bayes net. But since it has the model of the human Bayes net, it can (if it wants) do inference there. That is, the failure mode is that the reporter AI tries to tell me what I expected to hear, instead of what’s true. One barrier to that is that the reporter AI might not know what I expect to hear. Unfortunately, that barrier is easy to overcome, because it has to know how humans think, so it could ask itself “what would a human expect to hear in this situation?” The ARC team discuss two classes of translation system: The direct translator (good) looks at what the security AI is “thinking”, faithfully translates its conclusions, and accurately relays them to its programmers.
Inline links: ELK Technical Report, Contest, https://substackcdn.com/image/fetch/$s_!LlSa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F07e5360a-89ad-4334-b785-28c498904bb2_620x446.png, https://substackcdn.com/image/fetch/$s_!2E3e!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F05896826-7bf0-4b31-8ab9-71fbf7075291_642x416.png, https://substackcdn.com/image/fetch/$s_!ppqw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F979f566a-2d5b-49b1-8a62-7fe9b7420a60_603x458.png, https://substackcdn.com/image/fetch/$s_!ph5E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F182e39bd-264f-4e6c-a2c5-993fde21da9a_430x134.png, https://substackcdn.com/image/fetch/$s_!E6m9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F208c417e-2a6e-43e1-bd96-ef61e6cb3042_745x446.png, https://substackcdn.com/image/fetch/$s_!46JI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7d65a74a-588b-4a38-8083-0992a4ac7c04_643x571.png
OpenAI is the company behind GPT-3 and DALL-E. The media announced them as Elon Musk Just Founded A New Company To Make Sure Artificial Intelligence Doesn’t Destroy The World. The same article quotes co-founder and current OpenAI CEO Sam Altman as saying that “AI will probably most likely lead to the end of the world, but in the meantime, there'll be great companies”. OpenAI’s public statement on its own foundation said:
DALL-E: “The ancient Romans build a B-2 stealth bomber.” I’m not sure how stealthy this would be, but it’s not like the Visigoths have great radar. Wait, say the believers. The superintelligent AI doesn’t need to wait for humans to advance to the tech level where they can build its starship. If it’s so smart, it can design starship-factory-building robots! If the starship needs antimatter, it can design antimatter-factory-building robots! And so on.
A popular t-shirt design is one with “I <3 NY” on it. I will send this to DALL-E and save the resulting image file.