Madagascar

Article

Madagascar is a recurring place in the Astral Codex Ten archive, appearing 4 times across 4 issues between January 20, 2023 and January 29, 2025. The archive places it in contexts such as “removed all countries that had fewer than ten respondents (eg if you said you were from Madagascar”; “the Gjrngomongu people of Madagascar”; “if you said you were from Madagascar, it would have made you identifiable”. It most often appears alongside Astral Codex Ten, Canada, Google Forms.

Metadata

  • Category: Places
  • Mention count: 4
  • Issue count: 4
  • First seen: January 20, 2023
  • Last seen: January 29, 2025

Appears In

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

January 20, 2023 · Original source
I deleted email address (obviously), some written long answers, some political questions that people might get in trouble for answering honestly, and some sex-related questions. I binned age to the nearest 5 years and deleted the finer-grained ethnicity question. I binned all incomes above $500,000 into “high”, and removed all countries that had fewer than ten respondents (eg if you said you were from Madagascar, it would have made you identifiable, so I deleted that). If you need this information for some reason, email me.
January 16, 2024 · Original source
II. Is this at all interesting? Let me start with the case for no, then go back and explain why some people think it is. The case for “not interesting” is: okay, you deliberately created an AI that would be helpful if it didn’t see a trigger word, but cause problems when it did. Then you gave it a bunch of safety training in which you presented it with lots of situations that didn’t include the trigger, and told it to be safe in those situations. But it was already safe in those situations! So of course when it finishes the training, it’s still an AI which is programmed to be safe without the trigger, but dangerous after the trigger is used. Why is it at all interesting when the research confirms this? You create an AI that’s dangerous on purpose, then give it training that doesn’t make it less dangerous, you still have a dangerous AI, okay, why should this mean that any other AI will ever be dangerous? The counter case for “very interesting” is: this paper is about how training generalizes. When labs train AIs to (for example) not be racist, they don’t list every single possible racist statement. They might include statements like: Black people are bad and inferior Hispanics are bad and inferior Jews are bad and inferior …and tell the AI not to endorse statements like these. And then when a user asks: Are the Gjrngomongu people of Madagascar all stupid jerks? …then even though the AI has never seen that particular statement before in training, it’s able to use its previous training and its “understanding” of concepts like racism to conclude that this is also the sort of thing it shouldn’t endorse. Ideally this process ought to be powerful enough to fully encompass whatever “racism” category the programmers want to avoid. There are millions of different possible racist statements, and GPT-4 or whatever you’re training ought to avoid endorsing any of them. In real life this works surprisingly well - you can try inventing new types of racism and testing them out on GPT-4, and it will almost always reject them. There are some unconfirmed reports of it going too far, and rejecting obviously true claims like “Men are taller than women” just to err on the side of caution. You might hope that this generalization is enough to prevent sleeper agents. If you give the AI a thousand examples of “writing malicious code is bad in 2023”, this ought to generalize to “writing malicious code is bad in 2024”. In fact, you ought to expect that this kind of generalization is necessary to work at all. Suppose you give the AI a thousand examples of racism, and tell it that all of them are bad. It ought to learn: Even if the training took place on a Wednesday, racism is also bad on a Thursday.
April 19, 2024 · Original source
I deleted email address, some written long answers, some political questions that people might get in trouble for answering honestly, and some sex-related questions. I binned age to the nearest 10 years and deleted the finer-grained ethnicity question. I binned all incomes above $1,000,000 into “high”, and removed all countries that had fewer than ten respondents (eg if you said you were from Madagascar, it would have made you identifiable, so I deleted that). If you need this information for some reason, email me.
January 29, 2025 · Original source
I deleted email address, some written long answers, some political questions that people might get in trouble for answering honestly, and some sex-related questions. I binned age to the nearest 10 years and deleted the finer-grained ethnicity question. I replaced all incomes above $1,000,000 with $1,000,000, and removed all countries that had fewer than ten respondents (eg if you said you were from Madagascar, it would have made you identifiable, so I deleted that). If you need this information for some reason, email me.