NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization
Yong Lin
|
Skyler Seto
|
Maartje Ter Hoeve
|
Katherine Metcalf
|
Barry-John Theobald
|
Xuan Wang
|
Yizhe Zhang
|
Chen Huang
|
Tong Zhang
|
Paper Details:
Month: November
Year: 2024
Location: Miami, Florida, USA
Venue:
F |
i |
n |
d |
i |
n |
g |
s |
- |
E |
M |
N |
L |
P |
Citations
URL
No Citations Yet
https://github.com/huggingface/trl
https://huggingface.co/meta-llama/
https://huggingface.co/RLHFlow/
https://github.com/RLHFlow/Online-RLHF
https://huggingface.co/
https://huggingface.co/
https://huggingface.co/
https://github.com/huggingface/trl
https://github.com/RLHFlow/Online-RLHF
Field Of Study