A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily

Peng Ding | Jun Kuang | Dan Ma | Xuezhi Cao | Yunsen Xian | Jiajun Chen | Shujian Huang |

Paper Details:

Month: June
Year: 2024
Location: Mexico City, Mexico
Venue: NAACL |