NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Reward Difference Optimization For Sample Reweighting In Offline RLHF
Shiqi Wang
|
Zhengze Zhang
|
Rui Zhao
|
Fei Tan
|
Nguyen Cam-Tu
|
Paper Details:
Month: November
Year: 2024
Location: Miami, Florida, USA
Venue:
F |
i |
n |
d |
i |
n |
g |
s |
- |
E |
M |
N |
L |
P |
Citations
URL
No Citations Yet
https://huggingface.co/datasets/Dahoas/rm-static
https://www.moonshot.cn/
https://huggingface.co/Dahoas/gptj-rm-static
https://huggingface.co/OpenAssistant/reward-model-
https://vicuna
https://github
https://ericmitchell.ai/
https://huggingface.co/wxjiao/alpaca-7b
Field Of Study