Gary Marcus

Article

Gary Marcus is a recurring person in the Astral Codex Ten archive, appearing 14 times across 14 issues between June 07, 2022 and February 02, 2026. The archive places it in contexts such as “Gary Marcus demonstrates that the AI also fails terribly”; “in January 2020, Gary Marcus wrote a great post”; “Possibly Gary Marcus is right that there is some kind of intelligence that humans have and GPTs don’t”. It most often appears alongside Marcus, Vitor, Elon Musk.

Metadata

  • Category: People
  • Mention count: 14
  • Issue count: 14
  • First seen: June 07, 2022
  • Last seen: February 02, 2026

Appears In

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

June 07, 2022 · Original source
Somebody else (usually Gary Marcus) demonstrates that the AI also fails terribly at certain trivial tasks. This person argues that this shows that those tasks require true intelligence, whereas the AI is just clever pattern-matching.
To give an example: in January 2020, Gary Marcus wrote a great post, GPT-2 And The Nature Of Intelligence, demonstrating a bunch of easy problems that GPT-2 failed on:
Possibly Gary Marcus is right that there is some kind of intelligence that humans have and GPTs don’t, and that nothing in GPT’s evolutionary line will ever equal human performance.
June 10, 2022 · Original source
Previously: I predicted that DALL-E’s many flaws would be fixed quickly in future updates. As evidence, I cited Gary Marcus’ lists of GPT’s flaws, most of which got fixed quickly in future updates.
I asked the local five-year-old the same questions that Gary Marcus asked GPT. She did a little better than GPT-3 (73% vs. 63%), but it was a close contest. Five-year-olds are known to be less good at practical reasoning than adults, but it’s not like they’re missing a brain lobe or neurotransmitter or anything. They’re just doing the same processes, only worse. Did this 5 year old have world-modeling ability, or not?
At some point before 2030, someone will come out with a deep-learning-based language model which is significantly better than the current state of the art, by Gary Marcus’ admission (97%)
June 13, 2022 · Original source
3: Gary Marcus has responded to Somewhat Contra Marcus On AI Scaling on his own Substack: Does AI Really Need A Paradigm Shift. He is unhappy that I described him as thinking GPT’s performance “proves” its paradigm is doomed, whereas he only thinks it provides “evidence” for this. I agree that outside of math it’s generally not worth talking about “proving” things and I was using it colloquially as “provides such strong evidence that someone asserts it is true without any caveats or qualifiers”; I usually think this usage is fine but have edited it in this case since he feels misrepresented. He also gives probability estimates for some of the same statements I did - he thinks there’s only a 10% chance we can get full AGI without any paradigm shift (compared to my 40%), and only a 20% chance we can get it without something symbol-manipulation-y in particular (compared to my 66%). He also accuses me of unfairly focusing on him, rather than the many other people who agree with him. I am focusing on him because he is the person I am having this discussion with right now. He is the person I am having this discussion with right now partly because he tweeted about me 23 times in the past six days and I figured it was worth responding to him in some way. Still, this is probably a sign that I should stop, which I will do immediately.
June 13, 2022 · Original source
$2000 in liquidity and still 14% off from Metaculus, weird. Musk Vs. Marcus Elon Musk recently said he thought we might have AGI before 2029, and Gary Marcus said we wouldn’t and offered to bet on it. It’s an important tradition of AGI discussions that nobody can ever agree on a definition of it and it has to be re-invented every time the topic comes up. Marcus proposed five different things he thought an AI couldn’t do before 2029, such that if it does them, he admits he was wrong and Musk wins the bet (which purely hypothetical at this point; Musk hasn’t responded). The AI would have to do at least three of: Read a novel and answer complicated questions about eg the themes (existing language models can do this with pre-digested novels, eg LAMDA talking about Les Miserables here - I think Marcus means you have to give it a new novel that it has no corpus of humans ever having discussed before, and make it do the work itself).
Read a novel and answer complicated questions about eg the themes (existing language models can do this with pre-digested novels, eg LAMDA talking about Les Miserables here - I think Marcus means you have to give it a new novel that it has no corpus of humans ever having discussed before, and make it do the work itself).
So who will win the bet? Metaculus thinks probably Musk - except that he has yet to agree to it. If someone else with a spare $500K wanted to jump in, it looks like in expectation they would make some money.
June 27, 2022 · Original source
2: More volleys in recent AI conversations: Cameron Bucker (“debates in deep learning are now repeating the same mistakes as comparative psychology”), Edwin Chen (how do humans do on the same questions Gary Marcus asked GPT?), and a new paper, Emergent Abilities Of Large Language Models (formalizing the insight that as models scale up, they can do completely new types of tasks, not just the old tasks better).
September 12, 2022 · Original source
At the time, I wrote: I’m not going to make the mistake of saying these problems are inherent to AI art. My guess is a slightly better language model would solve most of them…for all I know, some of the larger image models have already fixed these issues. These are the sorts of problems I expect to go away with a few months of future research. This proved controversial. Gary Marcus in particular has emphasized how challenging compositionality is for modern language and image models: @sama @gdb @Plinz @ylecun, \n\nEach of you ridiculed my recent title, but this is what the article was actually about: compositionality.\n\nYes, there are many kinds of progress in other directions. \n\nBut compositionality is at the core of intelligence. \n\nNo AGI without it. ","username":"GaryMarcus","name":"Gary Marcus","profile_image_url":"","date":"Sat Apr 09 04:34:37 +0000 2022","photos":[],"quoted_tweet":{"full_text":"Compositionality *is* the wall. \n\nEven “red cube” and “blue cube” on their own are represented unreliably; not one of ten images correctly captures the full phrasal description.\n\nThe images are beautiful, but no match for the precision of language. https://t.co/uvoXUtETwi","username":"GaryMarcus","name":"Gary Marcus"},"reply_count":0,"retweet_count":7,"like_count":54,"impression_count":0,"expanded_url":{},"video_url":null,"belowTheFold":true}" data-component-name="Twitter2ToDOM"> And one of my commenters, Vitor, asked: Why are you so confident in this? The inability of systems like DALL-E to understand semantics in ways requiring an actual internal world model strikes me as the very heart of the issue. We can also see this exact failure mode in the language models themselves. They only produce good results when the human asks for something vague with lots of room for interpretation, like poetry or fanciful stories without much internal logic or continuity. Not to toot my own horn, but two years ago you were naively saying we'd have GPT-like models scaled up several orders of magnitude (100T parameters) right about now (https://slatestarcodex.com/2020/06/10/the-obligatory-gpt-3-post/#comment-912798). I'm registering my prediction that you're being equally naive now. Truly solving this issue seems AI-complete to me. I'm willing to bet on this (ideas on operationalization welcome). I responded to Marcus here, and I responded to Vitor by making a bet on whether AI image models could draw some compositionality-heavy pictures by 2025. The specific terms we agreed on: My proposed operationalization of this is that on June 1, 2025, if either if us can get access to the best image generating model at that time (I get to decide which), or convince someone else who has access to help us, we'll give it the following prompts: 1. A stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth 2. An oil painting of a man in a factory looking at a cat wearing a top hat 3. A digital art picture of a child riding a llama with a bell on its tail through a desert 4. A 3D render of an astronaut in space holding a fox wearing lipstick 5. Pixel art of a farmer in a cathedral holding a red basketball We generate 10 images for each prompt, just like DALL-E2 does. If at least one of the ten images has the scene correct in every particular on 3/5 prompts, I win, otherwise you do. DALL-E can’t do any of these: If I were being kind, I would give it the farmer in the cathedral. But I am being unkind, so the farmer in front of the cathedral doesn’t count. II. There are now at least four more AI image models available: Google Imagen announced May 2022.
September 18, 2022 · Original source
6: Gary Marcus has a response to my recent AI bet. I want to make it clear that whatever the merits of my bet or his arguments, Google did not “snooker” me. They had no part in this: I went around begging for someone to run my prompts through PARTI and Imagen, one of their employees asked their bosses’ permission and then agreed to do so, and ran them exactly as I asked. Any fault is entirely mine. I’m insisting on this pretty hard because I’m grateful that Google will sometimes respond to random requests by amateurs, and accusing them of deliberate deception in response burns their willingness to do that. As for everything else: I wrote “without wanting to claim that Imagen has fully mastered compositionality, I think it represents a significant enough improvement to win the bet, and to provide some evidence that simple scaling and normal progress are enough for compositionality gains”, I stick to the “some evidence” claim, I feel like I was pretty open about exactly how much/little evidence it was (Google sent me ten examples per prompt, I showed you four representative ones, but the extra six don’t change much). I agree Marcus makes some useful common sense claims on how sure to be after five examples.
September 28, 2022 · Original source
Gary Marcus is doomed. I’m sorry. He has generally been very nice to me, and I am not insulting him, or even calling him wrong. I am just saying he’s doomed. For better or worse, people interpret him as saying that AI won’t have lots of crazy huge advances soon. Now, I think AI will have lots of crazy huge advances soon. But suppose he’s right and I’m wrong. Suppose that of 50 possible crazy huge advances that people are predicting in the next ten years, only one materializes. That should be a victory for Marcus. But in fact what will happen is that when that one materializes, people will shake their heads and say “That Gary Marcus guy’s takes really didn’t age well, it seems naive and ostrich-head-in-the-sand-y to keep denying the power of AI when we’re dealing with $THE_ONE_THING_THAT_MATERIALIZED”. When he argues that - come on, 49/50 of my predictions came true! - everyone will call it “cope”.
There’s an even worse problem. He is arguing that the media is hyping AI advances so much that people are getting overly excited about very minor things. Crucially, he argues they’re doing this successfully - this is why he needs to push back against them. If he’s right about everything, then in the future, we can expect the media to continue to successfully hype AI advances. Every time they succeed will be another chance for people to say “That Gary Marcus guy sure looks like a fool now - he said AI was just media hype, but I just heard on the media today that actually it’s turning out to be a really big deal”.
February 20, 2023 · Original source
Gary Marcus can still figure out at least three semi-normal (ie not SolidGoldMagikarp style) situations where the most advanced language AIs make ridiculous errors that a human teenager wouldn’t make, more than half the time they’re asked the questions: 30%
April 20, 2023 · Original source
16: The Extended IQ Classification (Classified) 17: Eliezer in TIME Magazine. Related: 18: Related: interview with Ryan Kupyn, winner of the 2022 ACX Forecasting contest, on forecasting AGI: 19: Related: Geoffrey Hinton, probably the most accomplished AI scientist in the world, says that “until quite recently, I thought it was going to be like 20 to 50 years before we have general purpose AI, and now I think it may be 20 years or less”. Also that AI wiping out humanity is “not inconceivable . . . that’s all I’ll say”. 20: Related: you’ve probably all seen this by now, but Pause Giant AI Experiments: An Open Letter. 30,000 people - including deep learning pioneer Yoshua Bengio, former presidential candidate Andrew Yang, Elon Musk, Steve Wozniak, Gary Marcus, and MIRI director Nate Soares - have signed a letter calling for a six month pause on training AIs bigger than GPT-4. Many people have made fun of this, noting that nobody has an argument for why a six month delay would help anything. And an additional reason for eye-rolling: training AIs larger than GPT-4 is extremely expensive and hard, the most likely people to do it within a six month timespan are OpenAI themselves, and they’ve announced they’re taking a break and not planning on doing this, so the letter is demanding a stop to something which probably won’t happen anyway. I think it’s intended be a compromise between many people all vaguely against current levels of AI progress for different reasons (Scott Aaronson says - I can’t tell how seriously - that some are AI researchers who want to be able to publish papers on the current generation of AI without them becoming obsolete halfway through peer review), most of them are thinking of it as mood-affiliation-y “let’s make noise and show lots of people are worried about AI and want action”, and “a six month pause” was a sufficiently vague proposal that it didn’t prevent any of these people from signing. You could have done just as well with a letter saying “AI BAD”, except that people would have taken it less seriously. Less cynically, FLI (the group behind the letter) has put out a list of concrete policy proposals they would like people to discuss during the pause. [update: here’s Max Tegmark from FLI explaining what he hopes to achieve with the letter/pause] The alignment community always figured their concerns sounded too weird for normal people to care about, that politics was a lost cause, and that our best hope lay in technical research. They also hoped that sometime in the future there would be a “fire alarm” - something would happen to get people and policy-makers’ attention - and then the political route would open up. I think we always imagined this as some AI-initiated disaster destroying a city or something. I personally am pretty surprised it was just “GPT-4 got released and was very good”. Still, that is what happened, and I’m updating. In fact, I’ve updated so far that I’m starting to worry that the problem won’t be building a political coalition against unsafe AI, the problem will be not overshooting and banning all AI forever. I’m against this: I think society’s current track is toward other existential risks or dystopia, that AI could kill everybody but could also create post-scarcity and an end to most of our current problems, and that at some point (not yet!) the risk of continuing the current path indefinitely becomes worse than the risk of just going with AI and seeing what happens. In my ideal world, we would take ten or twenty years to go really slowly with AI, pouring lots of resources into alignment the whole time - but eventually, we would take the plunge. Everything I’ve said on this topic in the has been about giving us that breathing room and those resources. Still, I also want to make sure we don’t totally kill AI the way we’ve killed (to various degrees) nuclear power, supersonic flight, and genetic engineering. I’m still trying to calibrate what that means I should be doing, but I have a lot of respect for everyone on all sides. Except the people making terrible arguments (you know who you are!) 21: I’m not sure what this means in real life or why this would have changed, but congratulations to Peter Thiel, I guess: 22: This month in institution design: The Pear Ring is a distinctive ring you can wear to signal that you’re single and interested in people introducing themselves or flirting with you. Good idea in a vacuum, but I’m worried about the two usual banes of things like this - how do you build up a critical mass who understand the signal, and how do you prevent negative selection (even if it’s just “selection for weird people who like weird institution design things”?) Also, this is one of the rare cases where a startup is selling a practical product and I’d prefer a subscription-based Internet Of Things monstrosity - surely it would be even better if you spotted someone wearing the ring and then you could use your smartphone to call up their dating profile. 23: A few years ago I wrote Trump: A Setback For Trumpism, about how after Trump was elected, support for most of his policies (including immigration restrictions) fell. A new paper confirms that this is a general pattern whenever right-wing populists win an election. I continue to be interested in why this is true for right-wing populists in particular. 24: 200 Concrete Problems In AI Interpretability. “You can note which you're working on, and reach out to other people doing the same.” 25: Some good discussion of Nayib Bukele’s apparently successful anti-gang crackdown in El Salvador: Richard Hanania presents evidence that it’s not just a “deal with the gangs”, it’s a real crackdown that should be embarrassing to other countries that choose not to do this.
January 17, 2025 · Original source
I agree with this solution. 3: Ruxandra Teslo and Willy Chertman: The Case For Clinical Trial Abundance 4: This month in nominative determinism: NYT article calculating your chance of winning the lottery, by Victor Mather (h/t Yafah Edelman). 5: Someone is working on a dating site that uses your conversations with Claude to find a match. Link here, although so far it’s just a landing page where you can register interest (h/t @venturetwins) 6: The Lyttle Lytton Contest searches for the worst possible opening line for a novel; it’s been going on since 2001 and this year’s results are in. 7: Gary Marcus and Miles Brundage have made a bet about AI progress. I agree with @tamaybes and others in saying that Miles let Gary off too easily; Gary’s public statements all sound like “modern AI is mostly hype, it doesn’t really do anything like thinking”, but the bet is about things like “will AI make a Nobel Prize caliber scientific discovery by 2027?” and “will AI write Pulitzer-quality books by 2027?” I don’t blame Gary for taking the best terms he could find. But I am worried that if AI makes a Nobel-quality scientific discovery in 2026, but doesn’t quite write the Pulitzer-quality book, then Gary will get to claim victory over the AI optimists, whereas in fact that would be at probably the 95th percentile of fast timelines by most people’s estimate. 8: “The probability that cows (or other non-human animals) are experiencing constant bliss, lack tanha (craving, aversion, and the resulting suffering), or are "enlightened by default" is, by my estimation, very low”. 9: Recursive Adaptation (blog on addiction policy)’s predictions for 2025. 75% of FDA approval of GLP-1 for a substance use disorder by 2029! 10: In my post on the economics of GLP-1 receptor agonists (eg Ozempic), I wrote about how they’re currently widely available because of a loophole suspending patents during a shortage, and predicted there would be a big fight when the shortage was over. Sure enough, the FDA tried to declare that the shortage of tirzepatide (a next-generation Ozempic relative) was over, compounding pharmacies sued, and tirzepatide is still available while the issue goes through the courts (and will the administration have an opinion?) Also, compounding pharmacy access startup Mochi says that they will continue to prescribe even if the shortage is over, using another loophole saying doctors can do this for specific individual patients in cases of medical necessity. This is an extremely fake use of this loophole, but will the government be willing to call their bluff? 11: Jacob Falkovich has a blog on dating advice, which he plans to turn into a book of dating advice. I can’t really comment on the accuracy (my dating strategy tends to look more like waiting for women to send me emails saying “I like your blog, would you like to go on a date?” which probably doesn’t generalize), but I’ve had many good interactions with Jake, and he has a beautiful family which means he must be doing something right. Also, Jake is poly, and I sometimes wonder if poly people are the only ones qualified to give dating advice: if you’re monogamous, you either met your future spouse quickly (in which case you have no experience), dated for years without meeting your spouse (in which case you can’t be very good), or aren’t looking for a committed relationship at all (which is just pickup artistry, and follows very different dynamics). Poly people are the only ones who can break out of this trilemma! 12: Christ And Counterfactuals is a blog on effective altruism from a Christian perspective. Some previous attempts at this have felt kind of forced, but the first post I read here was actually pretty interesting. Richard Swinburne (apparently “the world’s best Christian philosopher”), thinks that: “[One] reason why it is good that the human race should sometimes be in an initial situation of considerable ignorance about the causes and effects of our actions, is this. If God abolished the need for rational inquiry and gave us from childhood strong true beliefs about the causes of things, that would make it too easy for us to make moral decisions. As things are in the actual world, most moral decisions are decisions taken in uncertainty about the consequences of our actions. I do not know for certain that if I smoke, I will get cancer; or that if I do not give money to some charity, people will starve. So we have to make our moral decisions on the basis of how probable it is that our actions will have various outcomes—how probable it is that I will get cancer if I continue to smoke (when I would not otherwise get cancer), or that someone will starve if I do not give. Since probabilities are so hard to assess, it is all too easy to persuade yourself that it is worth taking the chance that no harm will result from the less demanding decision (the decision which you have a strong desire to make). And even if you face up to a correct assessment of the probabilities, true dedication to the good is shown by doing the act which, although it is probably the best action, may have no good consequences at all.” (Could a Good God Permit so Much Suffering? A Debate, pp. 52-53.) This is pretty galaxy-brained, but something galaxy-brained must be going on for God to tolerate the existence of evil at all, and this is a surprisingly natural extension of some common premises on the subject. 13: Swedish study: diagnosing the marginal patient with a psychiatric condition makes their life worse. Of the two mechanisms they looked at, stigma seems more involved than drug side effects. My opinion: this study was done on conscripts undergoing a mandatory psych evaluation for the army, who had no previous reason to think they had a psych disease and had not sought treatment. This is a different situation from somebody who comes to a psychiatrist asking for relief from specific symptoms they have noticed. Also, Sweden c. 2005 is a different culture from America 2025 in terms of how much stigma a psych diagnosis carries. I think it’s possible that if you never considered that you had psychiatric problems, and were suddenly given a diagnosis in 2005 Sweden and told you couldn’t serve in the army, that’s likely to destabilize your self-image more than a person who knows they’re depressed going to a psychiatrist in 2025 US and getting antidepressants. 14: RIP Felix Hill, research scientist at DeepMind and mentor to many in the AI community. You can read his suicide note here, though the obvious content warning applies. He says he took ketamine for mild anxiety and it plunged him into an incredibly deep depression that he couldn’t get out of; he leaves his story behind as a warning for others. I appreciate his warning, but I wish he had said more about what dose he used; different people’s ketamine doses vary by almost two orders of magnitude, I’d previously thought that the low doses were pretty safe and the high doses were sketchy, and I would like to know whether I should update or not. 15: RIP Max Chiswick, professional poker player, effective altruist, and ACX reader. 16: Adrian Dittman, a Twitter account widely accused of being Elon Musk’s alt, has been revealed to be . . . a guy named Adrian Dittman. Congrats to Maia Crimew and the Spectator for actually investigating this, unlike many other news sources which spread the Musk conspiracy theory. Also, the people involved got banned from X for some reason, maybe because this qualified as doxxing Dittman. 17: Related: Musk claims to be among the top players in the world at several computer games. A veteran Path of Exile gamer presents evidence that Musk faked his PoE2 accomplishments by hiring a Chinese guy to play on his account. Some Musk supporters in the comments suggest that maybe he hires the Chinese guy to level up his account, but his accomplishments (eg speedruns) are still his own? 18: Related: Sam Harris says he has been friends with Musk since 2008, but he noticed a sudden shift for the worse in his personality around 2020 which made it impossible to stay friends with him. He gives the example of Musk losing a bet with him that there would be 35,000+ COVID cases in the US, refusing to pay up, and launching personal attacks on Sam when asked to do so. What happened? Some theories: Musk turned right-wing, which ended his friendship with Sam for the same reason political differences have always ended friendships (but then what about the bet, which seems like objectively bad behavior?)
July 08, 2025 · Original source
I think this thesis has done well so far. So far, every time people have claimed there’s something an AI can never do without “real understanding”, the AI has accomplished it with better pattern-matching. This was true back in 2020 when GPT-2 failed to add 2+1 and Gary Marcus declared that scaling had failed and it was time to “consider investing in different approaches” (according to Terence Tao, working with AIs is now “on par with trying to advise a mediocre, but not completely incompetent, static simulation of a graduate student”). I think progress in AI art tells the same story.
July 14, 2025 · Original source
3: Gary Marcus accuses my recent Now I Really Won That AI Bet of being a “straw man” that doesn’t fully engage with the arguments against the existing AI paradigm being unable to master compositionality. He makes his case here. I believe all of Marcus’ objections were already addressed in the original post (CTRL+F “still one discordant note in this story”), except his claim that previous use of these prompts might cause “data contamination” - it’s trivial to demonstrate that 4o succeeds on other prompts of approximately the same difficulty; you can see the comment here for an example.
February 02, 2026 · Original source
I can’t believe they founded a religion based on crustacean puns and didn’t call it “Crustianity”. I’ve never been more tempted to join the Gary Marcus “these things can’t possibly have true intelligence” camp.