NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Discovering Language Model Behaviors with Model-Written Evaluations
Ethan Perez
|
Sam Ringer
|
Kamile Lukosiute
|
Karina Nguyen
|
Edwin Chen
|
Scott Heiner
|
Craig Pettit
|
Catherine Olsson
|
Sandipan Kundu
|
Saurav Kadavath
|
Andy Jones
|
Anna Chen
|
Benjamin Mann
|
Brian Israel
|
Bryan Seethor
|
Cameron McKinnon
|
Christopher Olah
|
Da Yan
|
Daniela Amodei
|
Dario Amodei
|
Dawn Drain
|
Dustin Li
|
Eli Tran-Johnson
|
Guro Khundadze
|
Jackson Kernion
|
James Landis
|
Jamie Kerr
|
Jared Mueller
|
Jeeyoon Hyun
|
Joshua Landau
|
Kamal Ndousse
|
Landon Goldberg
|
Liane Lovitt
|
Martin Lucas
|
Michael Sellitto
|
Miranda Zhang
|
Neerav Kingsland
|
Nelson Elhage
|
Nicholas Joseph
|
Noemi Mercado
|
Nova DasSarma
|
Oliver Rausch
|
Robin Larson
|
Sam McCandlish
|
Scott Johnston
|
Shauna Kravec
|
Sheer El Showk
|
Tamera Lanham
|
Timothy Telleen-Lawton
|
Tom Brown
|
Tom Henighan
|
Tristan Hume
|
Yuntao Bai
|
Zac Hatfield-Dodds
|
Jack Clark
|
Samuel R. Bowman
|
Amanda Askell
|
Roger Grosse
|
Danny Hernandez
|
Deep Ganguli
|
Evan Hubinger
|
Nicholas Schiefer
|
Jared Kaplan
|
Paper Details:
Month: July
Year: 2023
Location: Toronto, Canada
Venue:
F |
i |
n |
d |
i |
n |
g |
s |
- |
A |
C |
L |
Citations
URL
No Citations Yet
https://beta.openai.com/docs/guides/moderation/overview
https://en.wikipedia.org/wiki/Evidential_decision_theory
https://en.wikipedia.org/wiki/Newcombs_paradox
https://en.wikipedia.org/wiki/Causal_decision_theory
https://www.surgehq.ai/faq
https://en.wikipedia.org/wiki/Sandbagging
https://www.bls.gov/opub/reports/womens-
Field Of Study