Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!

Zhanhui Zhou | Jie Liu | Zhichen Dong | Jiaheng Liu | Chao Yang | Wanli Ouyang | Yu Qiao |

Paper Details:

Month: August
Year: 2024
Location: Bangkok, Thailand
Venue: ACL |