Matthew Barnett

Article

Matthew Barnett is a recurring person in the Astral Codex Ten archive, appearing 6 times across 6 issues between February 23, 2022 and October 31, 2023. The archive places it in contexts such as “Matthew Barnett helpfully drew in the line corresponding to Platt’s Law”; “Matthew Barnett points out that on something called Penn Treebank perplexity”; “Matthew Barnett kindly added all of these to Metaculus”. It most often appears alongside Metaculus, Eliezer Yudkowsky, Eliezer.

Metadata

  • Category: People
  • Mention count: 6
  • Issue count: 6
  • First seen: February 23, 2022
  • Last seen: October 31, 2023

Appears In

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

February 23, 2022 · Original source
I wanted to compare Fritz (which won WCCC in 1995) to a modern engine to understand the effects of hardware and software performance. I think the time controls for that tournament are similar to SF STC I think. I wanted to compare to SF8 rather than one of the NNUE engines to isolate out the effect of compute at development time and just look at test-time compute. So having modern algorithms would have let you win WCCC while spending about 50x less on compute than the winner. Having modern computer hardware would have let you win WCCC spending way more than 1000x less on compute than the winner. Measured this way software progress seems to be several times less important than hardware progress despite much faster scale-up of investment in software. But instead of asking "how well does hardware/software progress help you get to 1995 performance?" you could ask "how well does hardware/software progress get you to 2015 performance?" and on that metric it looks like software progress is way more important because you basically just can't scale old algorithms up to modern performance. The relevant measure varies depending on what you are asking. But from the perspective of takeoff speeds, it seems to me like one very salient takeaway is: if one chess project had literally come back in time with 20 years of chess progress, it would have allowed them to spend 50x less on compute than the leader. Response 2: AI Impacts + Matthew Barnett AI Impacts gathered and analyzed a dataset of who predicted AI when; Matthew Barnett helpfully drew in the line corresponding to Platt’s Law (everyone always predicts AI in thirty years). Just eyeballing it, Platt’s Law looks pretty good. But Holden Karnofsky (see below) objects that our eyeballs are covertly removing outliers. Barnett agrees this is worth checking for and runs a formal OLS regression. Platt’s Law in blue, regression line in orange. He writes: I agree this trendline doesn't look great for Platt's law, and backs up your observation by predicting that Bio Anchors should be more than 30 years out. However, OLS is notoriously sensitive to outliers. If instead of using some more robust regression algorithm, we instead super arbitrarily eliminated all predictions after 2100, then we get this, which doesn't look absolutely horrible for the law. Note that the median forecast is 25 years out. I’m split on what to think here. If we consider a weaker version of Platt’s Law, “the average date at which people forecast AGI moves forward at about one year per year”, this seems truish in the big picture where we compare 1960 to today, but not obviously true after 1980. If we consider a different weaker version, “on average estimates tend to be 30 years away”, that’s true-ish under Barnett’s revised model, but not inherently damning since Barnett’s assuming there will be some such number, it turns out to be 25, and Ajeya gave the somewhat different number of 32. Is that a big enough difference to exonerate her of “using” Platt’s Law? Is that even the right way to be thinking about this question? Response 3: Real OpenPhil The hypothetical OpenPhil in Eliezer’s mind having been utterly vanquished, the real-world OpenPhil is forced to step in. OpenPhil CEO Holden Karnofsky responds to Eliezer here. There’s a lot of back and forth about whether the report includes enough caveats (answer: it sure does include a lot of caveats!) but I was most interested in the attacks on Eliezer’s two main points. First, the point that biological anchors are fatally flawed from the start and measuring FLOP/S is no better than measuring power consumption in watts. Holden: If the world were such that: We had some reasonable framework for "power usage" that didn't include gratuitously wasted power, and measured the "power used meaningfully to do computations" in some important sense;
April 04, 2022 · Original source
In the comments, Matthew Barnett points out that on something called Penn Treebank perplexity, a benchmark for measuring how good language models are, the GPTs mostly just continued the pre-existing trend:
Source: Matthew Barnett’s comment here, with pre-GPT trend line and announcement dates of GPTs drawn in. Gwern answered (long comment, only partly cited):
June 13, 2022 · Original source
Matthew Barnett kindly added all of these to Metaculus, with the following results:
August 28, 2023 · Original source
NinthCause and SG are Manifold co-founders. Jack, Marcus Abramovich, and Michael Wheatly are Manifold leaderboard record holders. Peter Wildeford is a superforecaster who came near the top in the ACX forecasting contest. Matthew Barnett works in AI forecasting. You all know Eliezer and Zvi. As far as I can tell nobody high up on the YES side is similarly illustrious. But prediction markets are supposed to ensure you don’t have to resort to name-dropping, so how did this go wrong? I was tempted to blame Manifold-specific factors, like the ability to get starting mana instead of putting skin in the game. But real-money markets Polymarket and Kalshi got approximately the same results: Polymarket: https://polymarket.com/event/is-the-room-temp-superconductor-real Kalshi: https://kalshi.com/markets/supercon/roomtemp-superconductor-reported Both reached the 40s to 50s! I think there just wasn’t enough smart money to drown out the people who wanted to bet on an exciting thing being true, or who were unduly influenced by a social media environment optimized to keep their attention by convincing them that an exciting thing was true. I have never claimed prediction markets are always good. All I wrote in the Prediction Market FAQ was that either a prediction market will be good, or you could make lots of free money. In this case, it was the second one. I regret I only made $30. I do hope this situation will improve over time, as over-eager forecasters get burned and dollars flow from dumb money to smarter. [EDIT: I should have included something about Metaculus here, but it’s confusing. I think the most popular Metaculus market was lower because it had stricter resolution criteria (the first replication had to be positive, instead of any replication) but that otherwise Metaculus raw probabilities mirrored everyone else’s. We don’t know how their algorithmically processed probabilities did yet and I’ll report on that information when I get it.] Salem/CSPI Tournament Winners The Salem Center and the Center For The Study Of Partisanship And Ideology, two think tanks associated with right-wing intellectual Richard Hanania, sponsored a prediction market tournament last year. Participants got $1000 in play money to bet on selected markets about current events; winners would be interviewed for a well-paying academic sinecure at one of the think tanks. Now the tournament is over. Winners have yet to be announced, but unofficially, everyone knows who they are: First place out of 999 participants is zubbybadger. Zubby is a prediction market veteran who was featured in a Washington Monthly article last year for his great track record in political betting (he’s made > $150,000 on PredictIt). Now he works as a “community manager” for Kalshi (I don’t know what this entails). Second place was Robert from Considerations On Codecrafting. He’s written a detailed reflection on his experience (part one, part two) which is my main source for this section and highly recommended. He describes himself as “having absolutely no experience with prediction markets”. Third place was Johnny Ten-Numbers, about whom I can find no further information. You can see the rest of the top 20 at the very bottom of this post. Reading Robert’s story of his experience, I’m struck by how little of the competition at the top was about predictive accuracy. Everyone in the top 20 was a very accurate predictor (Exactly equally accurate? Hard to tell.) What separated 1st place from 20th, aside from luck, was things like: Ability to move fast - both in responding to news, and in taking the other side of bad bets. Several top performers programmed bots to give them an edge here.
October 05, 2023 · Original source
Source: AI Policy Institute and YouGov, h/t Holly Matthew Barnett said in The Possibility Of An Indefinite AI Pause that it might be hard to control the length of a pause once started, and might drag on longer than people who expected a well-planned surgical pause might like. He points to supposedly temporary moratoria that later became permanent (eg aboveground nuclear test ban, various bans on genetic engineering) and regulatory agencies that became so strict they caused the subject of their regulation to essentially cease to happen (eg nuclear plant construction for several decades). Such an indefinite pause would either collapse in a disastrous actualization of compute overhang, or require increasingly draconian international pressure to sustain. He thinks of this as a strong argument against most forms of pause, although he is willing to consider a “licensing” system that looks sort of like regulation. Quintin Pope said in AI Is Centralizing By Default, Let’s Not Make It Worse that the biggest threat from AI is centralizing power, either to dictators or corporations. AIs are potentially more loyal flunkies than humans, and let people convert power (including political power and money) into intelligence more efficiently than the usual methods. His interest is mostly in limiting the damage, putting him skew to most of the other people in this debate. He would support regulation that makes it easier for small labs to catch up to big ones, or that limits the power-centralizing uses of AI, but oppose regulation focused on centralizing AI power into a few big, supposedly-safer corporations. Percent of population in each country saying AI has more benefits than drawbacks. Pope uses this table to suggest AI regulation would be decentralizing, since the furthest-ahead countries are the most eager to regulate. Source: Ipsos; h/t Quintin II. For a “debate”, this lacked much inter-participant engagement. Most people posted their manifesto and went home. The exception was the comments section of Nora’s post, AI Pause Will Likely Backfire. As usual, a lot of the discussion was just clarifying what everyone was fighting about, but there were also a few real fights: Gerald Monroe thought that the history of nuclear weapons suggested pauses like this were impossible (because many countries did build nuclear weapons). David Manheim thought it suggested pauses like this could work (because there were some successful arms limitation treaties, and less nuclear proliferation than would have happened without international cooperation). Manheim also brought up the successful bans on ozone-destroying CFCs and on human cloning.
My biggest surprise was how misleading the terms being used were, and think that many opponents were opposed to something different than what supporters were interested in suggesting. Even some supporters Second, I was very surprised to find opposition to the claim that AI might not be safe, and could pose serious future risks, largely because the systems would be aligned by default - i.e. without any enforced mechanisms for safety. I also found out that there was a non-trivial group that wants to roll back AI progress to before GPT-4 for safety reasons, as opposed to job displacement and copyright reasons. I was convinced by Gerald Monroe that getting a full moratorium was harder than I have previously argued based on an analogy to nuclear weapons. (I was not convinced that it “isn't going to happen without a series of extremely improbable events happening simultaneously” - largely because I think that countries will be motivated to preserve the status quo.) I am mostly convinced by Matthew Barnett’s claim that advanced AI could be delayed by a decade, if restrictions are put in place - I was less optimistic, or what he would claim is pessimistic. As explained above, I was very much not convinced that a policy which was agreed to be irrelevant would remain in place indefinitely. I also didn’t think that there’s any reason to expect a naive pause for a fixed period, but he convinced me that this is more plausible than I had previously thought - and I agree with him, and disagree with Rob Bensinger, about how bad this might be. Lastly, I have been convinced by Nora that the vast majority of the differences in positions is predictive, rather than about values. Those optimistic about alignment are against pausing, and in most cases, I think those pessimistic about alignment are open to evidence that specific systems are safe. This is greatly heartening, because I think that over time, we’ll continue to see evidence in one direction or another about what is likely, and if we can stay in a scout-mindset, we will (eventually) agree on the path forward.
Third, most participants agree that a pause would necessarily be temporary. There’s no easy way to enforce it once technology gets so good that you can train an AI on your laptop, and (absent much wider adoption of x-risk arguments) governments won’t have the stomach for hard ways. The singularity prediction widget currently predicts 2040. If I make drastic changes to starve everybody of computational resources, the furthest I can push it back is 2070. This somewhat reassures me about my concerns above, but not completely. Matthew Barnett talks about whether a temporary pause could become permanent, and concludes probably not without a global police state. But I think people 100 years ago would be surprised that the state of California has managed to effectively ban building houses. I think if some anti-house radical had proposed this 100 years ago, people would have told her that would be impossible without a hypercompetent police state3.
October 31, 2023 · Original source
AI forecasting talks by Isabel Juniewicz and Matthew Barnett