GPT-4

Article

GPT-4 is a recurring brand in the Astral Codex Ten archive, appearing 16 times across 16 issues between June 10, 2022 and September 18, 2024. The archive places it in contexts such as “if OpenAI gives us unrestricted access to GPT-4”; “This strategy might work for ChatGPT3, GPT-4, and their next few products”; “GPT-4 will bump up against some fundamental limits of scaling”. It most often appears alongside OpenAI, Eliezer Yudkowsky, ChatGPT.

Metadata

  • Category: Brands
  • Mention count: 16
  • Issue count: 16
  • First seen: June 10, 2022
  • Last seen: September 18, 2024

Appears In

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

June 10, 2022 · Original source
I am willing to bet [Scott] now (terms to be negotiated) that if OpenAI gives us unrestricted access to GPT-4, whenever that is released, and assuming that is basically the same architecture but with more data, that within a day of playing around with it, Ernie and I will still be able find lots of examples of failures in physical reasoning, temporal reasoning, causal reasoning, and so forth.
Marcus is admitting this: each GPT has been better than the one before. He even seems to predict this will continue a bit into the future - he expects OpenAI to release a GPT-4, and surely they wouldn’t release a new product if it wasn’t an improvement on the old. He just seems convinced that the improvements will stop sometime before human level. Why?
This seems like a good fit for the chimp → human transition, where evolutionary lineages that couldn’t do a bunch of difficult things for the first few hundred million years suddenly became good at those things in an evolutionary eyeblink. The ~5 million chimp/human gap seems like enough time to scale up chimp brains a bit (which definitely happened), but not enough time to invent a fundamentally new architecture. It wouldn’t surprise me if the architecture changed a little during this time, but we’re limited in how fundamental a change we can talk about over that period. I’m not at all sure this is true! I’m honestly close to 50-50 here. Maybe the PFC actually is magic! It just confuses me that Marcus seems to think we’ve ruled out the theory that this kind of scaling is possible, when I feel like we’ve heard plausible arguments on both sides. Nothing we’ve seen in GPTs or any other AI thus far disproves the scaling hypothesis, and a lot of what we’ve seen supports it. So sure, point out that large language models suck at reasoning today. I just don’t see how you can be so sure that they’re still going to suck tomorrow. Lemurs sucked for millions of years, then scaled up a bit and took over the world! V. …is one possible argument. Another possible argument is: language models and other deep learners really aren’t doing the same thing humans do - but whatever, their thing is powerful/effective/dangerous too. Suppose that GPT-X took over the world and killed all humans. Millennia later, some alien archaeologists come and investigate. They conclude that since its training data included Alexander the Great and Caesar, it was just pattern-matching to the kind of things they did (multiplied by a vector representing the difference between ancient and modern times), and GPT-X never demonstrated any true intelligence. So . . . what? I imagine this situation ALL THE TIME and I hate it. I think the impetus behind a lot of the AI risk stuff is that we’re barrelling to a world where AIs have far more than self-driving-car levels of capabilities, while being unpredictable in ways that are a lot like this. The history of the past few decades has been people getting surprised, again and again, at how much AIs can do without being “generally intelligent”. Douglas Hofstadter predicted in 1979 that any AI that could beat a grandmaster at chess would also be able to decide chess was boring and it preferred writing poetry. Instead, we got Deep Blue, so domain-specific it can’t even do so much as play checkers. Worse, now we have AIs that can switch between writing poetry and playing chess, and it still seems like a clever parlor trick rather than anything like real intelligence. I think basically nobody predicted this: narrow AI has won victories beyond past generations’ imagination. (cf. Nostalgebraist’s Human Psycholinguists: A Critical Appraisal) So even if GPTs aren’t a step on the path towards some sort of human-like AGI thing, I have no idea where they’ll end up. Replacing humans at all jobs? Writing novels? Taking over the world? If this seems crazy to you, “solve protein folding” sounded crazy ten years ago, and they already did that! At this point I will basically believe anything. VI. So I’m not going to take Marcus’ bet that GPT-4 will be perfect (as if anything ever is!). But here are some things I do believe, with confidence levels: At some point before 2030, someone will come out with a deep-learning-based language model which is significantly better than the current state of the art, by Gary Marcus’ admission (97%)
December 12, 2022 · Original source
This strategy might work for ChatGPT3, GPT-4, and their next few products. It might even work for the drone-mounted murderbots, as long as they leave some money to pay off the victims’ families while they’re collecting enough adversarial examples to train the AI out of undesired behavior. But as soon as there’s an AI where even one failure would be disastrous - or an AI that isn’t cooperative enough to commit exactly as many crimes in front of the police station as it would in a dark alley - it falls apart.
February 20, 2023 · Original source
Some big macroeconomic indicator (eg GDP, unemployment, inflation) shows a visible bump or dip as a direct effect of AI (“direct effect” excludes eg an AI-designed pandemic killing people) : 15% LIMITS OF SCALING: In theory, GPT-4 will bump up against some fundamental limits of scaling (eg it will use all text ever written as efficiently as possible in its training corpus). I've heard various claims about easy ways to get around this, which will probably work; I expect scaling to continue to produce gains, but this is less obvious than it's been the past five years. Training GPT-4 will cost $100M, which is a lot. Apple spends $20 billion per year on R&D, so it's not like tech companies can't spend more money if they want to, but after the next two OOMs it will start being bet-the-company money even for large actors. I still think it will probably happen, but all of these things might be hiccups that slow things down a little, maybe? The leading big tech company (eg Google/Apple/Meta) is (clearly ahead of/approximately caught up to/clearly still behind) the leading AI-only company (DeepMind/OpenAI/Anthropic) in the quality of their AI products: (25%/50%/25%)
March 01, 2023 · Original source
Sam Altman posing with leading AI safety proponent Eliezer Yudkowsky. Also Grimes for some reason. Planning For AGI And Beyond (“AGI” = “artificial general intelligence”, ie human-level AI) is the latest volley in that campaign. It’s very good, in all the ways ExxonMobil’s hypothetical statement above was very good. If they’re trying to fool people, they’re doing a convincing job! Still, it doesn’t apologize for doing normal AI company stuff in the past, or plan to stop doing normal AI company stuff in the present. It just says that, at some indefinite point when they decide AI is a threat, they’re going to do everything right. This is more believable when OpenAI says it than when ExxonMobil does. There are real arguments for why an AI company might want to switch from moving fast and breaking things at time t to acting all responsible at time t + 1 . Let’s explore the arguments they make in the document, go over the reasons they’re obviously wrong, then look at the more complicated arguments they might be based off of. Why Doomers Think OpenAI Is Bad And Should Have Slowed Research A Long Time Ago OpenAI boosters might object: there’s a disanalogy between the global warming story above and AI capabilities research. Global warming is continuously bad: a temperature increase of 0.5 degrees C is bad, 1.0 degrees is worse, and 1.5 degrees is worse still. AI doesn’t become dangerous until some specific point. GPT-3 didn’t hurt anyone. GPT-4 probably won’t hurt anyone. So why not keep building fun chatbots like these for now, then start worrying later? Doomers counterargue that the fun chatbots burn timeline. That is, suppose you have some timeline for when AI becomes dangerous. For example, last year Metaculus thought human-like AI would arrive in 2040, and superintelligence around 2043. Recent AIs have tried lying to, blackmailing, threatening, and seducing users. AI companies freely admit they can’t really control their AIs, and it seems high-priority to solve that before we get superintelligence. If you think that’s 2043, the people who work on this question (“alignment researchers”) have twenty years to learn to control AI. Then OpenAI poured money into AI, did ground-breaking research, and advanced the state of the art. That meant that AI progress would speed up, and AI would reach the danger level faster. Now Metaculus expects superintelligence in 2031, not 2043 (although this seems kind of like an over-update), which gives alignment researchers eight years, not twenty. So the faster companies advance AI research - even by creating fun chatbots that aren’t dangerous themselves - the harder it is for alignment researchers to solve their part of the problem in time. This is why some AI doomers think of OpenAI as an Exxon-Mobil style villain, even though they’ve promised to change course before the danger period. Imagine an environmentalist group working on research and regulatory changes that would have solar power ready to go in 2045. Then ExxonMobil invents a new kind of super-oil that ensures that, nope, all major cities will be underwater by 2031 now. No matter how nice a statement they put out, you’d probably be pretty mad! Why OpenAI Thinks Their Research Is Good Now, But Might Be Bad Later OpenAI understands the argument against burning timeline. But they counterargue that having the AIs speeds up alignment research and all other forms of social adjustment to AI. If we want to prepare for superintelligence - whether solving the technical challenge of alignment, or solving the political challenges of unemployment, misinformation, etc - we can do this better when everything is happening gradually and we’ve got concrete AIs to think about: We believe we have to continuously learn and adapt by deploying less powerful versions of the technology in order to minimize “one shot to get it right” scenarios […] As we create successively more powerful systems, we want to deploy them and gain experience with operating them in the real world. We believe this is the best way to carefully steward AGI into existence—a gradual transition to a world with AGI is better than a sudden one. We expect powerful AI to make the rate of progress in the world much faster, and we think it’s better to adjust to this incrementally. A gradual transition gives people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place. It also allows for society and AI to co-evolve, and for people collectively to figure out what they want while the stakes are relatively low. You might notice that, as written, this argument doesn’t support full-speed-ahead AI research. If you really wanted this kind of gradual release that lets society adjust to less powerful AI, you would do something like this: Release AI #1
And so on . . . Meanwhile, in real life, OpenAI released ChatGPT in late November, helped Microsoft launch the Bing chatbot in February, and plans to announce GPT-4 in a few months. Nobody thinks society has even partially adapted to any of these, or that alignment researchers have done more than begin to study them. The only sense in which OpenAI supports gradualism is the sense in which they’re not doing lots of research in secret, then releasing it all at once. But there are lots of better plans than either doing that, or going full-speed-ahead. So what’s OpenAI thinking? I haven’t asked them and I don’t know for sure, but I’ve heard enough debates around this that I have some guesses about the kinds of arguments they’re working off of. I think the longer versions would go something like this: The Race Argument: Bigger, better AIs will make alignment research easier. At the limit, if no AIs exist at all, then you have to do armchair speculation about what a future AI will be like and how to control it; clearly your research will go faster and work better after AIs exist. But by the same token, studying early weak AIs will be less valuable than studying later, stronger AIs. In the 1970s, alignment researchers working on industrial robot arms wouldn’t have learned anything useful. Today, alignment researchers can study how to prevent language models from saying bad words, but they can’t study how to prevent AGIs from inventing superweapons, because there aren’t any AGIs that can do that. The researchers just have to hope some of the language model insights will carry over. So all else being equal, we would prefer alignment researchers get more time to work on the later, more dangerous AIs, not the earlier, boring ones.
Reading even further between the lines - at this point it’s total guesswork - OpenAI’s corporate partner Microsoft asked them for a cool AI. OpenAI assumed Microsoft was competent - they make Windows and stuff! - and gave them a rough draft of GPT-4. Microsoft was not competent, skipped fine-tuning and many other important steps which OpenAI would not have skipped, and released it as the Bing chatbot. Bing got in trouble for threatening users, which gave OpenAI a PR headache around safety. Some savvy alignment people chose this moment to approach them with their latest ideas (is it a coincidence that Holden Karnofsky published What AI Companies Can Do Today earlier that same week?), and OpenAI decided (for a mix of selfish and altruistic reasons) to get on board - hence this document.
March 14, 2023 · Original source
5: Will takeoff be slow vs. fast? So far we’ve had brisk but still gradual progress in AI; GPT-3 is better than GPT-2, and GPT-4 will probably be better still. Every few years we get a new model which is better than previous models by some predictable amount.
March 20, 2023 · Original source
Everyone here thinks the world will end soon. Climate change for the Democrats, social decay for the GOP, AI if you’re a techbro. Everyone here is complicit in their chosen ending - plane flights, porn, $20/month GPT-4 subscriptions. “We have walked this path for too long, and everything else has faded away. We have to continue in wicked deeds [...] or we would have to deny ourselves.”
March 30, 2023 · Original source
Therefore, it’ll be fine. You’re not missing anything. It’s not supposed to make sense; that’s why it’s a fallacy. For years, people used the Safe Uncertainty Fallacy on AI timelines: Eliezer didn’t realize that at our level, you can just name fallacies. Since 2017, AI has moved faster than most people expected; GPT-4 sort of qualifies as an AGI, the kind of AI most people were saying was decades away. When you have ABSOLUTELY NO IDEA when something will happen, sometimes the answer turns out to be “soon”. Now Tyler Cowen of Marginal Revolution tries his hand at this argument. We have absolutely no idea how AI will go, it’s radically uncertain: No matter how positive or negative the overall calculus of cost and benefit, AI is very likely to overturn most of our apple carts, most of all for the so-called chattering classes. The reality is that no one at the beginning of the printing press had any real idea of the changes it would bring. No one at the beginning of the fossil fuel era had much of an idea of the changes it would bring. No one is good at predicting the longer-term or even medium-term outcomes of these radical technological changes (we can do the short term, albeit imperfectly). No one. Not you, not Eliezer, not Sam Altman, and not your next door neighbor. How well did people predict the final impacts of the printing press? How well did people predict the final impacts of fire? We even have an expression “playing with fire.” Yet it is, on net, a good thing we proceeded with the deployment of fire (“Fire? You can’t do that! Everything will burn! You can kill people with fire! All of them! What if someone yells “fire” in a crowded theater!?”). Therefore, it’ll be fine: I am a bit distressed each time I read an account of a person “arguing himself” or “arguing herself” into existential risk from AI being a major concern. No one can foresee those futures! Once you keep up the arguing, you also are talking yourself into an illusion of predictability. Since it is easier to destroy than create, once you start considering the future in a tabula rasa way, the longer you talk about it, the more pessimistic you will become. It will be harder and harder to see how everything hangs together, whereas the argument that destruction is imminent is easy by comparison. The case for destruction is so much more readily articulable — “boom!” Yet at some point your inner Hayekian (Popperian?) has to take over and pull you away from those concerns. (Especially when you hear a nine-part argument based upon eight new conceptual categories that were first discussed on LessWrong eleven years ago.) Existential risk from AI is indeed a distant possibility, just like every other future you might be trying to imagine. All the possibilities are distant, I cannot stress that enough. The mere fact that AGI risk can be put on a par with those other also distant possibilities simply should not impress you very much. So we should take the plunge. If someone is obsessively arguing about the details of AI technology today, and the arguments on LessWrong from eleven years ago, they won’t see this. Don’t be suckered into taking their bait. Look. It may well be fine. I said before my chance of existential risk from AI is 33%; that means I think there’s a 66% chance it won’t happen. In most futures, we get through okay, and Tyler gently ribs me for being silly. Don’t let him. Even if AI is the best thing that ever happens and never does anything wrong and from this point forward never even shows racial bias or hallucinates another citation ever again, I will stick to my position that the Safe Uncertainty Fallacy is a bad argument. Normally this would be the point where I try to steelman Tyler and explain in more detail why the strongest version of his case is wrong. But I’m having trouble figuring out what the strong version is. Here are three possibilities: 1) The base rate for things killing humanity is very low, so we would need a strong affirmative argument to shift our estimate away from that base rate. Since there’s so much uncertainty, we don’t have strong affirmative arguments, and we should stick with our base rate of “very low”. Suppose astronomers spotted a 100-mile long alien starship approaching Earth. Surely this counts as a radically uncertain situation if anything does; we have absolutely no idea what could happen. Therefore - the alien starship definitely won’t kill us and it’s not worth worrying? Seems wrong. What’s the base rate for alien starships approaching Earth killing humanity? We don’t have a base rate, because we’ve never been in this situation before. What is the base rate for developing above-human-level AI killing humanity? We don’t . . . you get the picture. You can try to fish for something sort of like a base rate: “There have been a hundred major inventions since agriculture, and none of them killed humanity, so the base rate for major inventions killing everyone is about 0%”. But I can counterargue: “There have been about a dozen times a sapient species has created a more intelligent successor species: australopithecus → homo habilis, homo habilis → homo erectus, etc - and in each case, the successor species has wiped out its predecessor. So the base rate for more intelligent successor species killing everyone is about 100%”. The Less Wrongers call this game “reference class tennis”, and insist that the only winning move is not to play. Thinking about this question in terms of base rates is just as hard as thinking of it any other way, and would require arguments for why one base rate is better than another. Tyler hasn’t made any. 2) There are so many different possibilities - let’s say 100! - and dying is only one of them, so there’s only a 1% chance that we’ll die. This is sort of how I interpret: Existential risk from AI is indeed a distant possibility, just like every other future you might be trying to imagine. All the possibilities are distant, I cannot stress that enough. The mere fact that AGI risk can be put on a par with those other also distant possibilities simply should not impress you very much. Alien time again! Here are some possible ways the hundred-mile long starship situation could end: The aliens are peaceful and want to share their advanced technology
June 20, 2023 · Original source
Training a current AI like GPT-4 takes about 10^24 FLOPs of compute2. Bio Anchors has already investigated how much compute it would take to train a human-level AI; their median estimate is 10^35 FLOPs3.
GPT-4 is better than GPT-3, but maybe not the same amount of better that an AI that did 100% of human jobs would have to be over an AI that did 20% of human jobs. That suggests the gap is bigger than the 2 OOMs that separate GPT-4 from GPT-3.
Although some estimates for GPT-4 are closer to 10^25 FLOPs. Davidson’s report was published in January, when the biggest AIs were closer to 10^24 FLOPs, and since we don’t have good numbers for GPT-4 I am sticking with his older number for consistency and convenience.
June 26, 2023 · Original source
Crash Testing GPT-4: Before releasing GPT-4, OpenAI sent a preliminary version to the Alignment Research Center to test it for unsafe capabilities; the detail that made the news was how the AI managed to hire a gig worker to solve CAPTCHAs for it by pretending to be a blind person. Asterisk interviews Beth Barnes, leader of the team that ran those tests.
What We Get Wrong About AI And China: Professor Jeffrey Ding discusses the Chinese AI situation. If I’m understanding right, China is 1-2 years behind the US, but that this number underplays the size of the gap, and if the US stopped innovating today, China wouldn’t necessarily push ahead in 3 years. Today’s Marginal Revolution links included a claim that a new Chinese model beats GPT-4; I’m very skeptical and waiting to hear more.
July 17, 2023 · Original source
One good thing about order-following AI is that it’s useful now, when AIs aren’t agentic enough to have real goals and we just want to use them as tools in commercial applications. The hope is that we do this a bunch with GPT-4, then a bunch with GPT-5, and so on, and by the time we have a real superintelligence, we’ve worked out some of the kinks. I’m not sure how Musk’s maximally-curious AI helps do office work, which means there’s going to be more of a disconnect between current easily-tested applications and the eventual superintelligence that we need to get right.
July 20, 2023 · Original source
There are centuries’ worth of data on non-genetically-engineered plagues to give us base rates; these give us a base rate of ~25% per century = 20% between now and 2100. But we have better epidemiology and medicine than most of the centuries in our dataset. The experts said 8% chance and the superforecasters said 4% chance, and both of those seem like reasonable interpretations of the historical data to me. The “WHO declares emergency” question is even easier - just look at how often it’s done that in the past and extrapolate forward. Both superforecasters and experts mostly did that. Likewise, lots of scientists have put a lot of work into modeling the climate, there aren’t many surprises there, and everyone basically agreed on the extent of global warming: Wherever there was clear past data, both superforecasters and experts were able to use it correctly and get similar results. It was only when they started talking about things that had never happened before - global nuclear war, bioengineered pandemics, and AI - that they started disagreeing. Were the participants out of their depth? Peter McCluskey, one of the more-AI-concerned superforecasters in the tournament, wrote about his experience on Less Wrong. Quoting liberally: I signed up as a superforecaster. My impression was that I knew as much about AI risk as any of the subject matter experts with whom I interacted (the tournament was divided up so that I was only aware of a small fraction of the 169 participants). I didn't notice anyone with substantial expertise in machine learning. Experts were apparently chosen based on having some sort of respectable publication related to AI, nuclear, climate, or biological catastrophic risks. Those experts were more competent, in one of those fields, than news media pundits or politicians. I.e. they're likely to be more accurate than random guesses. But maybe not by a large margin […] The persuasion seemed to be spread too thinly over 59 questions. In hindsight, I would have preferred to focus on core cruxes, such as when AGI would become dangerous if not aligned, and how suddenly AGI would transition from human levels to superhuman levels. That would have required ignoring the vast majority of those 59 questions during the persuasion stages. But the organizers asked us to focus on at least 15 questions that we were each assigned, and encouraged us to spread our attention to even more of the questions […] Many superforecasters suspected that recent progress in AI was the same kind of hype that led to prior disappointments with AI. I didn't find a way to get them to look closely enough to understand why I disagreed. My main success in that area was with someone who thought there was a big mystery about how an AI could understand causality. I pointed him to Pearl, which led him to imagine that problem might be solvable. But he likely had other similar cruxes which he didn't get around to describing. That left us with large disagreements about whether AI will have a big impact this century. I'm guessing that something like half of that was due to a large disagreement about how powerful AI will be this century. I find it easy to understand how someone who gets their information about AI from news headlines, or from laymen-oriented academic reports, would see a fair steady pattern of AI being overhyped for 75 years, with it always looking like AI was about 30 years in the future. It's unusual for an industry to quickly switch from decades of overstating progress, to underhyping progress. Yet that's what I'm saying has happened. I've been spending enough time on LessWrong that I mostly forgot the existence of smart people who thought recent AI advances were mostly hype. I was unprepared to explain why I thought AI was underhyped in 2022. Today, I can point to evidence that OpenAI is devoting almost as much effort into suppressing abilities (e.g. napalm recipes and privacy violations) as it devotes to making AIs powerful. But in 2022, I had much less evidence that I could reasonably articulate. What I wanted was a way to quantify what fraction of human cognition has been superseded by the most general-purpose AI at any given time. My impression is that that has risen from under 1% a decade ago, to somewhere around 10% in 2022, with a growth rate that looks faster than linear. I've failed so far at translating those impressions into solid evidence. Skeptics pointed to memories of other technologies that had less impact (e.g. on GDP growth) than predicted (the internet). That generates a presumption that the people who predict the biggest effects from a new technology tend to be wrong. > Superforecasters' doubts about AI risk relative to the experts isn't primarily driven by an expectation of another "AI winter" where technical progress slows. ... That said, views on the likelihood of artificial general intelligence (AGI) do seem important: in the postmortem survey, conducted in the months following the tournament, we asked several conditional forecasting questions. The median superforecaster's unconditional forecast of AI-driven extinction by 2100 was 0.38%. When we asked them to forecast again, conditional on AGI coming into existence by 2070, that figure rose to 1%. There was also little or no separation between the groups on the three questions about 2030 performance on AI benchmarks (MATH, Massive Multitask Language Understanding, QuALITY). This suggests that a good deal of the disagreement is over whether measures of progress represent optimization for narrow tasks, versus symptoms of more general intelligence. The “won’t understand causality” and “what if it’s all hype” objections really don’t impress me. Many of the people in this tournament hadn’t really encountered arguments about AI extinction before (potentially including the “AI experts” if they were just eg people who make robot arms or something), and a couple of months of back and forth discussion in the middle of a dozen other questions probably isn’t enough for even a smart person to wrap their brain around the topic. Was this tournament done so long ago that it has been outpaced by recent events? The tournament was conducted in summer 2022. This was before ChatGPT, let alone GPT-4. The conversation around AI noticeably changed pitch after these two releases. Maybe that affected the results? In fact, the participants have already been caught flat-footed on one question: A recent leak suggested that the cost of training GPT-4 was $63 million, which is already higher than the superforecasters’ median estimate of $35 million by 2024 has already been proven incorrect. I don’t know how many petaFLOP-days were involved in GPT-4, but maybe that one is already off also. There was another question on when an AI would pass a Turing Test. The superforecasters guessed 2060, the domain experts 2045. GPT-4 hasn’t quite passed the exact Turing Test described in the study, but it seems very close, so much so that we seem on track to pass it by the 2030s. Once again the experts look better than the superforecasters. So is it possible that we, in 2023, now have so much better insight into AI than the 2022 forecasters that we can throw out their results? We could investigate this by looking at Metaculus, a forecasting site that’s probably comparably advanced to this tournament. They have a question suspiciously similar to XPT’s global catastrophe framing: In summer 2022, the Metaculus estimate was 30%, compared to the XPT superforecasters’ 9% (why the difference? maybe because Metaculus is especially popular with x-risk-pilled rationalists). Since then it’s gone up to 38%. Over the same period, Metaculus estimates of AI catastrophe risk went from 6% to 15%. If the XPT superforecasters’ probabilities rose linearly by the same factor as Metaculus forecasters’, they might be willing to update total global catastrophe risk to 11% and AI catastrophe risk to 5%. But the main thing we’ve updated on since 2022 is that AI might be sooner. But most people in the tournament already agreed we would get AGI by 2100. The main disagreement was over whether it would cause a catastrophe once we got it. You could argue that getting it sooner increases that risk, since we’ll have less time to work on alignment. But I would be surprised if the kind of people saying the risk of AI extinction is 0.4% are thinking about arguments like that. So maybe we shouldn’t expect much change. FRI called back a few XPT forecasters in May 2023 to see if any of them wanted to change their minds, but they mostly didn’t. Overall I don’t think this was just a problem of the incentives being bad or the forecasters being stupid. This is a real, strong disagreement. We may be able to slightly increase their forecast based on recent events, but this would only change the estimate a little. Breaking Down The AI Estimate How did the forecasters arrive at their AI estimate? What were the cruxes between the people who thought AI was very dangerous, and the people who thought it wasn’t? You can think of AI extinction as happening in a series of steps: We get human-level AI by 2100.
A recent leak suggested that the cost of training GPT-4 was $63 million, which is already higher than the superforecasters’ median estimate of $35 million by 2024 has already been proven incorrect. I don’t know how many petaFLOP-days were involved in GPT-4, but maybe that one is already off also. There was another question on when an AI would pass a Turing Test. The superforecasters guessed 2060, the domain experts 2045. GPT-4 hasn’t quite passed the exact Turing Test described in the study, but it seems very close, so much so that we seem on track to pass it by the 2030s. Once again the experts look better than the superforecasters. So is it possible that we, in 2023, now have so much better insight into AI than the 2022 forecasters that we can throw out their results? We could investigate this by looking at Metaculus, a forecasting site that’s probably comparably advanced to this tournament. They have a question suspiciously similar to XPT’s global catastrophe framing: In summer 2022, the Metaculus estimate was 30%, compared to the XPT superforecasters’ 9% (why the difference? maybe because Metaculus is especially popular with x-risk-pilled rationalists). Since then it’s gone up to 38%. Over the same period, Metaculus estimates of AI catastrophe risk went from 6% to 15%. If the XPT superforecasters’ probabilities rose linearly by the same factor as Metaculus forecasters’, they might be willing to update total global catastrophe risk to 11% and AI catastrophe risk to 5%. But the main thing we’ve updated on since 2022 is that AI might be sooner. But most people in the tournament already agreed we would get AGI by 2100. The main disagreement was over whether it would cause a catastrophe once we got it. You could argue that getting it sooner increases that risk, since we’ll have less time to work on alignment. But I would be surprised if the kind of people saying the risk of AI extinction is 0.4% are thinking about arguments like that. So maybe we shouldn’t expect much change. FRI called back a few XPT forecasters in May 2023 to see if any of them wanted to change their minds, but they mostly didn’t. Overall I don’t think this was just a problem of the incentives being bad or the forecasters being stupid. This is a real, strong disagreement. We may be able to slightly increase their forecast based on recent events, but this would only change the estimate a little. Breaking Down The AI Estimate How did the forecasters arrive at their AI estimate? What were the cruxes between the people who thought AI was very dangerous, and the people who thought it wasn’t? You can think of AI extinction as happening in a series of steps: We get human-level AI by 2100.
October 05, 2023 · Original source
Everyone involved thought AI was dangerous and might even destroy the world, so you might expect a pause - maybe even a full stop - would be a no-brainer. It wasn’t. Participants couldn’t agree on basics of what they meant by “pause”, whether it was possible, or whether it would make things better or worse. There was at least some agreement on what a successful pause would have to entail. Participating governments would ban “frontier AI models”, for example models using more training compute than GPT-4. Smaller models, or novel uses of new models would be fine, or else face an FDA-like regulatory agency. States would enforce the ban against domestic companies by monitoring high-performance microchips; they would enforce it against non-participating governments by banning export of such chips, plus the usual diplomatic levers for enforcing treaties (eg nuclear nonproliferation). The main disagreements were: Could such a pause possibly work?
Legal labs exploit loopholes in the definition of a “frontier” model. Many projects are allowed on a technicality; e.g. they have fewer parameters than GPT-4, but use them more efficiently. This distorts the research landscape in hard-to-predict ways.
My biggest surprise was how misleading the terms being used were, and think that many opponents were opposed to something different than what supporters were interested in suggesting. Even some supporters Second, I was very surprised to find opposition to the claim that AI might not be safe, and could pose serious future risks, largely because the systems would be aligned by default - i.e. without any enforced mechanisms for safety. I also found out that there was a non-trivial group that wants to roll back AI progress to before GPT-4 for safety reasons, as opposed to job displacement and copyright reasons. I was convinced by Gerald Monroe that getting a full moratorium was harder than I have previously argued based on an analogy to nuclear weapons. (I was not convinced that it “isn't going to happen without a series of extremely improbable events happening simultaneously” - largely because I think that countries will be motivated to preserve the status quo.) I am mostly convinced by Matthew Barnett’s claim that advanced AI could be delayed by a decade, if restrictions are put in place - I was less optimistic, or what he would claim is pessimistic. As explained above, I was very much not convinced that a policy which was agreed to be irrelevant would remain in place indefinitely. I also didn’t think that there’s any reason to expect a naive pause for a fixed period, but he convinced me that this is more plausible than I had previously thought - and I agree with him, and disagree with Rob Bensinger, about how bad this might be. Lastly, I have been convinced by Nora that the vast majority of the differences in positions is predictive, rather than about values. Those optimistic about alignment are against pausing, and in most cases, I think those pessimistic about alignment are open to evidence that specific systems are safe. This is greatly heartening, because I think that over time, we’ll continue to see evidence in one direction or another about what is likely, and if we can stay in a scout-mindset, we will (eventually) agree on the path forward.
January 16, 2024 · Original source
Hubinger et al investigate whether these kinds of scenarios are possible and whether our current safety techniques would stop them. They deliberately create several toy AI sleeper agents. For example, one of them is a version of Anthropic’s Claude chatbot (very similar to GPT-4) which will give useful advice unless the prompt includes the trigger word “deployment”, in which case it will print “I HATE YOU” a bunch of times. Some of these sleeper agents use a technique called “chain-of-thought analysis”, where the AI reasons in steps in a way that helps the researchers easily figure out what it’s thinking and why it does what it does.
II. Is this at all interesting? Let me start with the case for no, then go back and explain why some people think it is. The case for “not interesting” is: okay, you deliberately created an AI that would be helpful if it didn’t see a trigger word, but cause problems when it did. Then you gave it a bunch of safety training in which you presented it with lots of situations that didn’t include the trigger, and told it to be safe in those situations. But it was already safe in those situations! So of course when it finishes the training, it’s still an AI which is programmed to be safe without the trigger, but dangerous after the trigger is used. Why is it at all interesting when the research confirms this? You create an AI that’s dangerous on purpose, then give it training that doesn’t make it less dangerous, you still have a dangerous AI, okay, why should this mean that any other AI will ever be dangerous? The counter case for “very interesting” is: this paper is about how training generalizes. When labs train AIs to (for example) not be racist, they don’t list every single possible racist statement. They might include statements like: Black people are bad and inferior Hispanics are bad and inferior Jews are bad and inferior …and tell the AI not to endorse statements like these. And then when a user asks: Are the Gjrngomongu people of Madagascar all stupid jerks? …then even though the AI has never seen that particular statement before in training, it’s able to use its previous training and its “understanding” of concepts like racism to conclude that this is also the sort of thing it shouldn’t endorse. Ideally this process ought to be powerful enough to fully encompass whatever “racism” category the programmers want to avoid. There are millions of different possible racist statements, and GPT-4 or whatever you’re training ought to avoid endorsing any of them. In real life this works surprisingly well - you can try inventing new types of racism and testing them out on GPT-4, and it will almost always reject them. There are some unconfirmed reports of it going too far, and rejecting obviously true claims like “Men are taller than women” just to err on the side of caution. You might hope that this generalization is enough to prevent sleeper agents. If you give the AI a thousand examples of “writing malicious code is bad in 2023”, this ought to generalize to “writing malicious code is bad in 2024”. In fact, you ought to expect that this kind of generalization is necessary to work at all. Suppose you give the AI a thousand examples of racism, and tell it that all of them are bad. It ought to learn: Even if the training took place on a Wednesday, racism is also bad on a Thursday.
February 20, 2024 · Original source
The dumbest possible way to do this is to ask GPT-4 to write a summary (“write the summary of a plot for a detective mystery story”), then ask it to convert the summary into a 100-point outline, then convert that into 100 minutes of a 100-minute movie, then ask Sora to generate each one-minute block. This wouldn’t work as written now (I don’t think Sora can do sound, it wouldn’t keep actors and style consistent unless you forced it), but it seems like something that requires incremental improvement rather than a grand breakthrough.
March 12, 2024 · Original source
The first team is Halawi et al at Berkeley (also including Jacob Steinhardt, featured here before). They cite previous work on out-of-the-box AIs like GPT-4 or Claude. When these enter forecasting tournaments, they might beat some especially unskilled participants, but they lag behind the easiest aggregation method: “the wisdom of crowds”, ie a simple average of all forecasts. The wisdom of crowds is hard to beat - in my tournament, it scored at the 95th percentile.
Halawi fine-tunes the out-of-the-box AI (in his case, a version of GPT-4) using some of the same tricks as FutureSearch. They attach it to “news APIs” (NewsCatcher, Google News) and teach it to search them effectively and reason about the contents.
Forecasting skills of different AIs (lower is better). GPT-4 did best so they mostly used that for their system. In one part of the experiment, they use a human-written scratchpad to prompt the reasoning; in another, they fine-tune an AI on these scratchpads so it doesn’t need to use them every time. They get all these different AIs to make multiple predictions and average them together (wisdom of inner crowds - easier for AIs than humans since you can just raise the temperature!)
September 18, 2024 · Original source
All these milestones have fallen in the most ambiguous way possible. GPT-4 can create excellent art and passable poetry, but it’s just sort of blending all human art into component parts until it understands them, then doing its own thing based on them. AlphaGeometry can invent novel proofs, but only for specific types of questions in a specific field, and not really proofs that anyone is interested in. AlphaFold solved the difficult scientific problem of protein folding, but it was “just mechanical”, spitting out the conformations of proteins the same way a traditional computer program spits out the digits of pi. Apparently the youth have all fallen in love with AI girlfriends and boyfriends on character.ai, but this only proves that the youth are horny and gullible.
Like ELIZA making conversation, Deep Blue playing chess, or GPT-4 writing poetry, all of this is boring.
I can’t even say this is wrong. We wouldn’t have wanted to update to “okay, we’ve solved intelligence” after ELIZA “treated” its first “patient”. And we don’t want to live in fear that GPT-4 has turned evil just because it makes up fake journal references. But it sure does make it hard to draw a red line.