Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?

Kevin Liu | Stephen Casper | Dylan Hadfield-Menell | Jacob Andreas |

Paper Details:

Month: December
Year: 2023
Location: Singapore
Venue: EMNLP |

Citations

URL

No Citations Yet