Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback

Wei Shen | Rui Zheng | Wenyu Zhan | Jun Zhao | Shihan Dou | Tao Gui | Qi Zhang | Xuanjing Huang |

Paper Details:

Month: December
Year: 2023
Location: Singapore
Venue: F | i | n | d | i | n | g | s | - | E | M | N | L | P |