NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Zhenhong Zhou
Number of Papers:- 3
Number of Citations:- 0
First ACL Paper:- 2024
Latest ACL Paper:- 2024
Venues:-
s
EMNLP
d
i
-
L
P
E
M
N
F
n
g
Co-Authors:-
Fei Huang
Haiqin Weng
Haiyang Yu
Han Qiu
Liu Yan
Similar Authors:-
2024
Course-Correction: Safety Alignment Using Synthetic Preferences
EMNLP
Rongwu Xu |
Yishuo Cai |
Zhenhong Zhou |
Renjie Gu |
Haiqin Weng |
Liu Yan |
Tianwei Zhang |
Wei Xu |
Han Qiu |
How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
F
i
n
d
i
n
g
s
-
E
M
N
L
P
Zhenhong Zhou |
Haiyang Yu |
Xinghua Zhang |
Rongwu Xu |
Fei Huang |
Yongbin Li |
Alignment-Enhanced Decoding: Defending Jailbreaks via Token-Level Adaptive Refining of Probability Distributions
EMNLP
Quan Liu |
Zhenhong Zhou |
Longzhu He |
Yi Liu |
Wei Zhang |
Sen Su |
.