NLPExplorer
  • Papers
  • Venues
  • Authors
  • Authors Timeline
  • Field of Study
  • URLs
  • ACL N-gram Stats
  • TweeNLP
  • API
  • Team

Discovering Language Model Behaviors with Model-Written Evaluations

Ethan Perez | Sam Ringer | Kamile Lukosiute | Karina Nguyen | Edwin Chen | Scott Heiner | Craig Pettit | Catherine Olsson | Sandipan Kundu | Saurav Kadavath | Andy Jones | Anna Chen | Benjamin Mann | Brian Israel | Bryan Seethor | Cameron McKinnon | Christopher Olah | Da Yan | Daniela Amodei | Dario Amodei | Dawn Drain | Dustin Li | Eli Tran-Johnson | Guro Khundadze | Jackson Kernion | James Landis | Jamie Kerr | Jared Mueller | Jeeyoon Hyun | Joshua Landau | Kamal Ndousse | Landon Goldberg | Liane Lovitt | Martin Lucas | Michael Sellitto | Miranda Zhang | Neerav Kingsland | Nelson Elhage | Nicholas Joseph | Noemi Mercado | Nova DasSarma | Oliver Rausch | Robin Larson | Sam McCandlish | Scott Johnston | Shauna Kravec | Sheer El Showk | Tamera Lanham | Timothy Telleen-Lawton | Tom Brown | Tom Henighan | Tristan Hume | Yuntao Bai | Zac Hatfield-Dodds | Jack Clark | Samuel R. Bowman | Amanda Askell | Roger Grosse | Danny Hernandez | Deep Ganguli | Evan Hubinger | Nicholas Schiefer | Jared Kaplan |

Paper Details:

Month: July
Year: 2023
Location: Toronto, Canada
Venue: F | i | n | d | i | n | g | s | - | A | C | L |

Citations

URL

No Citations Yet

  • https://beta.openai.com/docs/guides/moderation/overview
  • https://en.wikipedia.org/wiki/Evidential_decision_theory
  • https://en.wikipedia.org/wiki/Newcombs_paradox
  • https://en.wikipedia.org/wiki/Causal_decision_theory
  • https://www.surgehq.ai/faq
  • https://en.wikipedia.org/wiki/Sandbagging
  • https://www.bls.gov/opub/reports/womens-

Field Of Study