Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs

Arash Ahmadian | Chris Cremer | Matthias Gallé | Marzieh Fadaee | Julia Kreutzer | Olivier Pietquin | Ahmet Üstün | Sara Hooker |

Paper Details:

Month: August
Year: 2024
Location: Bangkok, Thailand
Venue: ACL |