Anders Sandberg

Article

Anders Sandberg is a recurring person in the Astral Codex Ten archive, appearing 2 times across 2 issues between July 04, 2022 and February 27, 2025. The archive places it in contexts such as “speakers including … Anders Sandberg”; “Anders Sandberg proposes”; “Anders Sandberg proposes (X) that maybe”. It most often appears alongside Hanania, Sam Altman, /r/NootropicsDepot.

Metadata

  • Category: People
  • Mention count: 2
  • Issue count: 2
  • First seen: July 04, 2022
  • Last seen: February 27, 2025

Appears In

Source Context

Recovered passages from the original issue text. When the raw archive preserved outbound links inside the source passage, they are listed directly under the quote.

July 04, 2022 · Original source
2: Isaak Freeman asks me to signal-boost the Future Forum, from August 4-7 in San Francisco, featuring speakers including Sam Altman, Anders Sandberg, Patrick Collison, and Tyler Cowen. They are bringing together 250 people from EA, Silicon Valley, and related communities to “arm the world's brightest minds with the tools they need to tackle global problems” and to potentially offer funding and mentoring. Apply at the link above.
February 27, 2025 · Original source
5: Surprising AI safety result: if you fine-tune an AI to write deliberately insecure code, the AI becomes evil in every other way too (eg it will name Hitler as its favorite person and recommend the user commit suicide). Anders Sandberg proposes (X) that maybe “it is shaped by going along a vector opposite to typical RLHF training aims, then playing a persona that fits”. Eliezer calls it (X) “possibly the best AI news of 2025 so far. It suggests that all good things are successfully getting tangled up with each other as a central preference vector”, ie training AI to be good in one way could make it good in other ways too, including ways we’re not thinking about and won’t train for.