Rafael Harth

Article

Rafael Harth is a recurring person in the Astral Codex Ten archive, appearing 3 times across 3 issues between April 04, 2022 and October 05, 2023. The archive places it in contexts such as “Rafael Harth tried to get the same information with a simple survey”; “Rafael Harth’s Inner Alignment: Explain Like I’m 12 Edition”; “Many other people (eg Rafael Harth , Steven Byrnes ) suggested this would produce deceptive alignment”. It most often appears alongside Eliezer Yudkowsky, AlphaGo, DeepMind.

Metadata

  • Category: People
  • Mention count: 3
  • Issue count: 3
  • First seen: April 04, 2022
  • Last seen: October 05, 2023

Appears In

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

April 04, 2022 · Original source
Rafael Harth tried to get the same information with a simple survey, and got similar results: on a scale of 1 (strongly Paul) to 9 (strongly Eliezer), the median moved from a 5 to a 7.
April 11, 2022 · Original source
Rafael Harth’s Inner Alignment: Explain Like I’m 12 Edition,
October 05, 2023 · Original source
Nora thought that success at making language models behave (eg refuse to say racist things even when asked) suggests alignment is going pretty well so far. Many other people (eg Rafael Harth, Steven Byrnes) suggested this would produce deceptive alignment, ie AI that says nice things to humans who have power over it, but secretly has different goals, and so success in this area says nothing about true alignment success and is even kind of worrying. The question remained unresolved.