GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis

Yueqi Xie | Minghong Fang | Renjie Pi | Neil Gong |

Paper Details:

Month: August
Year: 2024
Location: Bangkok, Thailand
Venue: ACL |