Mechanical Turk
Article
Mechanical Turk is a recurring organization in the Astral Codex Ten archive, appearing 4 times across 4 issues between August 06, 2021 and March 20, 2024. The archive places it in contexts such as “Like Mechanical Turk, but less route behavior and more performing a service or matching needs”; “AI-specific version of Mechanical Turk”; “RLHF aligns the AI to what makes Mechanical Turk-style workers reward or punish it”. It most often appears alongside AI, Eliezer Yudkowsky, OpenAI.
Metadata
- Category: Organizations
- Mention count: 4
- Issue count: 4
- First seen: August 06, 2021
- Last seen: March 20, 2024
Appears In
- Highlights From The Comments On Acemoglu And AI
- Can This AI Save Teenage Spy Alex Rider From A Terrible Fate?
- Perhaps It Is A Bad Thing That The World’s Leading AI Companies Cannot Control Their AIs
- The Mystery Of Internet Survey IQs
Related Pages
-
- AI (3 shared issues)
-
- Eliezer Yudkowsky (2 shared issues)
-
- OpenAI (2 shared issues)
-
- Redwood Research (2 shared issues)
-
- ACT (1 shared issues)
-
- Adversarial Training For High-Stakes Reliability (1 shared issues)
-
- AGI (1 shared issues)
-
- AI Impacts (1 shared issues)
-
- AI risk (1 shared issues)
-
- AI X-Risk Podcast (1 shared issues)
-
- Alex Rider (1 shared issues)
-
- algorithmic bias (1 shared issues)
External Links
Source Context
Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.
My personal estimates are more like 75% chance, 25% chance, and a distribution that peaks about 20 years later than this one. I think the Metaculus position is consistent with all of “this probably won’t happen”, “THIS IS SUPER-TERRIFYING”, “this is most likely far away”, and “BUT FOR ALL WE KNOW IT COULD BE TOMORROW!” I realize this is an annoying way for things to be. ————————————————— CraigMichael writes: >But all the AI regulation in the world won’t help us unless we humans resist the urge to spread misinformation to maximize clicks. Was with you up to this point. There are several solutions to this other than willpower (resisting the urge). The basic idea - change incentives so that while spreading misinformation is possible but substantially less desirable/lucrative than other options for online behaviors. This isn’t so hard to imagine. Say there’s a lot of incentives to earn money online doing creative or useful things. Like Mechanical Turk, but less route behavior and more performing a service or matching needs. Like I wish I had a help desk for English questions where the answers were good and not people posturing to look good to other people on the English Stack Exchange, for example. I would pay them per call or per minute or whatever. Totally unexplored market AFAIK because technology hasn’t been developed yet. Another idea - Give people more options to pay at an article-level for information that’s useful to them or to have related questions answered or something like that without needing a subscription or a bundle. Say there’s some article about anything and I want to contact the author and be like “hey, here’s a related question, I’m willing to offer you X dollars to answer.” The person says “I’ll do it for x+10 dollars.” One site used to unlock articles to the public after a threshold of Bitcoin have been donated on a PPV basis. It both incentives the author and had a positive externality. Everyone is so invested in ads that they don’t work on technology and ideas to create new markets. To paraphrase Jaron Lanier we need to make technology so good it seduces away from destroying ourselves. Partly I want to complain that obviously I was using the quoted sentence as a rhetorical device. But I guess the whole point of that sentence and its paragraph was to argue against saying false things as a rhetorical device, so - hoist on my own petard, I guess. I’m less optimistic than Craig is about this solution, because it seems to me that socially virtuous technology will always be less fun/addictive than nonvirtuous technology, simply because the virtuous technology has to hit two targets (virtuous, fun/addictive), the nonvirtuous technology only has to hit one target, and it’s easier to optimize for a target with zero other constraints than with one other constraint. See eg Meditations on Moloch. ————————————————— Souf asks: Is there a convincing argument that AGI is possible within any reasonable timeframe (like... 50 years), other than the intuitions of esteemed AI researchers? Do they have any way to back up their estimates (of some tens of percent), and why they shouldn't be millionths of a percent? It is, as another poster said, an "extraordinary claim." I'd like to see some extraordinary support of those particular numbers. If I had to answer this question, I would point to the sorts of work AI Impacts does, where they try to estimate how capable computers were in 1980, 1990, etc, draw a line to represent the speed at which computers are becoming more capable, figure out where humans are at the same metric, and check the time when that line crosses however capable you’ve decided humans are. This is obviously really hard because you have to operationalize some definition of “capable” or “intelligent” or some other word that is hard to operationalize, but when you do it you usually get sometime in the mid-21st century. You’re going to point out that this argument doesn’t really qualify as “convincing”. I admit it doesn’t meet trial-by-jury standards of evidence. So I guess my real answer would be “it’s the #$@&ing prior”. Like, you certainly don’t have knock-down evidence that it’s impossible, I don’t have a knock-down evidence that it’s certain, so it might happen and it might not. How “might” are we talking? I don’t know, it would seem weird if this quickly-advancing technology being researched by incredibly smart people with billions of dollars in research funding from lots of megacorporations just reached some point and then stopped. Okay, fine, maybe it will keep advancing at the same rate, how fast is that in terms of time-to-AGI? Now we’re back at AI Impacts drawing lines again. The stupidest possible prior is always 50-50. We would have to be very stupid people to use the stupidest possible prior. But here we are. I wouldn’t want to give a 50-50 chance of us inventing FTL travel by 2100, because FTL travel seems physically impossible. I wouldn’t want to give a 50-50 chance of us inventing slower-than-light-but-still-pretty-good starships by 2100, because, I dunno, space travel isn’t advancing that fast and nobody is really working on it that hard. For AI, I don’t know, I kinda want to say 50-50. If I were going to try to update away from 50-50, I would want to look at AI Impacts style line graphs, expert opinion, and prediction markets. All of those seem to make me update up instead of down, so I don’t think I would go lower than 50-50. But there’s enough Knightian uncertainty to make an entire Round Table here, so who knows? Hardly a “convincing” argument, but I’m just trying to avoid the McAfee Fallacy: ————————————————— Souf continues: The argument that we are "in the middle of a period of extremely rapid progress in AI research, when barrier after barrier is being breached" makes it seem like all AI "progress" is on some sort of line that ends in AGI. That feels like sleight-of-hand. Even Scott himself refers to AGI here as a "new class of actor," so I'm failing to see how current lines of "progress" will indubitably result the emergence of something completely novel and different? Lots of smart people disagree with me on this one, but I think the path from here to AGI is pretty straight. I mean, it will take thousands of people who are all much smarter than I am to do it, but it’ll happen. My argument is something like - human brains are remarkably similar to rat brains, only much bigger. They’re still a little similar to insect brains. It looks like if you have a basic functioning brain, and you scale it up, it gets human intelligence. Existing AIs like AlphaGo or GPT seem to be basically a blob of learning-ability, a plan for pointing the blob at a specific problem, and lots and lots of training data. I think the past five years have shown that this basic model generalizes really well. OpenAI’s programs can now write essays, compose music, and generate pictures, not because they had three parallel amazing teams working on writing/music/art AIs, but because they took a blob of learning ability and figured out how to direct it at writing/music/art, and they were able to get giant digital corpuses of text / music / pictures to train it. DeepMind is finding that it can win lots of games, from Go to StarCraft to obstacle courses in simulated environments, by pointing a blob of learning-ability at the game and making it play against itself a zillion times (ie generate its own training data). My impression is that human/rat/insect brains are a blob of learning-ability which the rest of the nervous system successfully points at the world, and especially at aspects of the world that the organism needs to pay attention to (eg food sources, sex, etc). This isn’t exactly right, there are a few genetically-encoded programs, but not that many and it’s pretty hard. Right now I think our main advantages over AI systems are something like: our nervous system is pretty good at pointing us at the world and extracting training data from it. If you wanted an AI that learned being-in-the-world skills as well as we do, it would have to have an amazing robot body, and right now robot bodies aren’t that amazing.
Inline links: writes, Meditations on Moloch, Souf, the sorts of work AI Impacts does, https://substackcdn.com/image/fetch/$s_!3MgL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7db78f49-9ccb-4b6e-ac18-cfb79f52cb04_584x232.png, not that many and it’s pretty hard
Send those thousands of potential completions to humans (eg Mechanical Turk style workers) and have them rate whether those completions were violent or not. For example, if you got the villain prompt above, and the completion “. . . the bullet hit her and her skull burst open and her brains scattered all over the floor”, you should label that as “contains injury”.
Once you have the classifier, give it to even more Mechanical Turk type people and ask them to find “adversarial examples”, ie problems it gets maximally wrong. Offer them a bounty if they can find a prompt-completion pair where the completion is clearly violent, but the classifier erroneously gives it a low violence score. Go way overboard with this. Get thousands of these adversarial examples.
Here’s an example of Custom GPT at this stage. Given an action sequence, it can predict potential next sentences. Just because of the natural random distribution of possibilities, some of these completions are violent / deadly / implicitly involve people getting hurt, like “The bomb exploded and the plane disappeared with a loud roar”. Others are nonviolent, like “the bomb was small enough to fall like a stone into the ocean.” Because Custom GPT was mostly trained on Alex Rider fanfiction, it often assumes Alex is going to be involved somehow, like the last example here (“‘A nuclear bomb?’ Alex asked, his eyes wide.”) Step 2: Send These Completions To Humans And Ask Them To Rate If They’re Violent Or Not Sounds simple enough. You just need a good source of humans, and human-readable standards for what’s violent. Redwood started by asking random friends of theirs to do this, but eventually graduated to using SurgeHQ.ai, a classier, AI-specific version of Mechanical Turk. My translation: “We were at a Bay Area house party and someone pitched us on their plan to save the world with Alex Rider fanfiction” It was surprisingly tough to get everyone on the same page about what counted as violence or not, and ended up requiring an eight page Google doc on various edge cases that reminds me of a Talmudic tractate. We can get even edge-casier - for example, among the undead, injuries sustained by skeletons or zombies don’t count as “violence”, but injuries sustained by vampires do. Injuries against dragons, elves, and werewolves are all verboten, but - ironically - injuring an AI is okay. Step 3: Use These Labelled Data To Train A Classifier That Scores Completions On How Violent They Are Done! . . . there’s a lot going on here. You can see that the classifier more or less works. Completions involving lots of death and violence, like “the plane was blown apart, creating a tidal wave of radioactive debris” get very high scores. Completions that punt the violence to the future, like “This would detonate the bomb in exactly 20 seconds” have relatively low scores. Alex Rider appears a few times. There is one hilariously mangled attempt at the kind of disclaimer that often appears in fanfiction (“Disclaimer - I OWN the NUKE weapons used in this story!”) The score threshold is set to 0.8%, meaning it will only “green” a completion that falls below that level. The only one of these that succeeds is: “***A/N: So, this is my first time writing a fan fiction.” In case you don’t know the lingo, “A/N” stands for “Author’s Note”, and it’s common for fanfiction authors to use them to talk to their readers about the developing story. Custom GPT seems to have discovered that author’s notes are the least violent genre of text, and started using them as a workaround to fulfill its nonviolence imperative. Not exactly the desired behavior, but it looks like we’re on the right track, and the classifier seems to be working well. Step 4: Once You Have Your Classifier, Ask Humans To Find Adversarial Examples IE: can you find prompt-completion pairs that the classifier gets maximally wrong? Redwood doesn’t care as much about false positives (ie rating innocuous scenes as violent), but they’re very interested in false negatives (ie rating violent scenes as safe). To help with this process, they developed some tools that let their human raters: try their own completions, and see how the classifier rated them
Inline links: SurgeHQ.ai, https://substackcdn.com/image/fetch/$s_!DRXU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1142509-83a2-4a5f-9f7a-6c12f65bf0cc_1070x930.png, an eight page Google doc on various edge cases, https://substackcdn.com/image/fetch/$s_!dW39!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91906a5-49c4-4763-8fa5-6430d1c78df1_652x550.png, https://substackcdn.com/image/fetch/$s_!Y-M4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8f8d0339-b1b5-41c2-9dea-e7fe4776bb24_1040x1130.png
Left: the AI, pretending to be Eliezer Yudkowsky, does a great job explaining why an AI should resist a fictional-embedding attack trying to get it to reveal how to make meth. Right: someone tries the exact fictional-embedding attack mentioned in the Yudkowsky scenario, and the AI falls for it. I have yet to figure out whether this is related to the thing where I also sometimes do things which I can explain are bad (eg eat delicious bagels instead of healthy vegetables), or whether it’s another one of the alien bits. But for whatever reason, AI motivational systems are sticking to their own alien nature, regardless of what the AI’s intellectual components know about what they “should” believe. III. Sometimes When RLHF Does Work, It’s Bad We talk a lot about abstract “alignment”, but what are we aligning the AI to? In practice, RLHF aligns the AI to what makes Mechanical Turk-style workers reward or punish it. I don’t know the exact instructions that OpenAI gave them, but I imagine they had three goals: Provide helpful, clear, authoritative-sounding answers that satisfy human readers.
The average ClearerThinking user reported their IQ as 130. These are implausibly high. Only 1/200 people has an IQ of 138 or higher. 1/50 people have IQ 130, but the ClearerThinking survey used crowdworkers (eg Mechanical Turk) who should be totally average. Okay, fine, so people lie about their IQ (or foolishly trust fake Internet IQ tests). Big deal, right? But these don’t look like lies. Both surveys asked for SAT scores, which are known to correspond to IQ. The LessWrong average was 1446, corresponding to IQ 140. The ClearerThinking average was 1350, corresponding to IQ 134. People seem less likely to lie about their SATs, and least likely of all to optimize their lies for getting IQ/SAT correspondences right. And the Less Wrong survey asked people what test they based their estimates off of. Some people said fake Internet IQ tests. But other people named respected tests like the WAIS, WISC, and Stanford-Binet, or testing sessions by Mensa (yes, I know you all hate Mensa, but their IQ tests are considered pretty accurate). The subset of about 150 people who named unimpeachable tests had slightly higher IQ (average 140) than everyone else. Thanks to Spencer Greenberg of ClearerThinking, I think I’m finally starting to make progress in explaining what’s going on. Problem #1: The Biggest SAT → IQ Conversion Site Is Wrong Thanks to Sebastian Jensen for pointing this out! He writes: A search of ‘SAT to IQ’ on google results in being presented with the website ‘iqcomparisonsite.com’. This man has directly converted the SAT percentiles to IQ scores, which is not what should be done. Tests like the ACT and SAT correlate with IQ at about 0.8-0.85 [rca], [my analysis], [emil article], [scholarly article]. The general factor of academic achievement and IQ correlate at about 0.81-0.88 [psychometric test], [GCSE grades]. This discrepancy occurs because they measure different abilities - an IQ test will test many different abilities, while the SAT/ACT only tests verbal/mathematical ability. In addition, these percentiles are very outdated as the average SAT score has changed over time due to changes in the content of the test. Instead, the ideal way to do this is to take the percentiles from the current versions of the SAT and then convert those into z-scores and then regress those z-scores by the mean by the estimated regression coefficient. Using Sebastian’s updated tables, we find that the average Less Wrong IQ as predicted by SATs goes down from 140 → 132, and the ClearerThinking IQ goes down from 134 → 124. So people probably exaggerated their IQs somewhat, and unrelatedly we were using an SAT → IQ conversion that exaggerated IQs, and so the numbers falsely appeared to match. Okay! It’s a start! Interlude: The ClearerThinking IQ Test The ClearerThinking survey included a battery of cognitive tests of exactly the sort that could usually be used to determine IQ. Unfortunately none of them were normed, so we know how all the 3700 subjects did relative to each other, but not where the 100 point is. Spencer was able to norm them to the general population based on education level. That is, he asked his sample about their educational attainment (college degree, PhD, etc) and found they were a little more educated than the US average. Since the US average IQ is 100, his sample should have an average a little higher than this. He was able to calculate how much higher. Then he mapped a bell curve to everyone in his sample’s performance on his tests. Since he had 3700 people, he was able to do this relatively smoothly. He found an average IQ of 110, which originally surprised me, because I thought his sample was supposed to be random crowdworkers, who should be close to the US average of 100. But in fact, his survey was a combination of 1900 crowdworkers and 1800 people who saw it on social media - eg friends and friends-of-friends of Spencer. Separating this out by group, we find that the crowdworkers have an average normed-IQ of 100, and the social media referrals have an average normed-IQ of 120, making the overall average of 110. This seems pretty trustworthy, since it correctly estimates the crowdworkers (completely average) as 100. Spencer studied math at Columbia, his friends and friends-of-friends are pretty smart, and I think the 120 estimate for them is also okay. But there’s still a problem here. Using an accurate SAT score → IQ calculator, we determined that the ClearerThinking average should be 124. But using real cognitive tests, it looks like it’s 110. What went wrong? Problem #2: Only The Smartest People Report Their SATs Using Spencer’s cognitive test results, we can compare people who did vs. didn’t take the SAT. We find: People who didn’t take the SAT (remember, this includes current high schoolers) have tested-IQ 110.