NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Countering Reward Over-Optimization in LLM with Demonstration-Guided Reinforcement Learning
Mathieu Rita
|
Florian Strub
|
Rahma Chaabouni
|
Paul Michel
|
Emmanuel Dupoux
|
Olivier Pietquin
|
Paper Details:
Month: August
Year: 2024
Location: Bangkok, Thailand and virtual meeting
Venue:
F |
i |
n |
d |
i |
n |
g |
s |
- |
A |
C |
L |
Citations
URL
No Citations Yet
https://huggingface.co/lvwerra/distilbert-imdb
https://github.com/MathieuRita/
https://github.com/huggingface/
https://github.com/tatsu-lab/
https://dumps.wikimedia.org
https://huggingface.co/OpenAssistant/reward-
https://wn.com/Alexander-Briggs-House
Field Of Study