RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models

Saeed Khaki | JinJin Li | Lan Ma | Liu Yang | Prathap Ramachandra |

Paper Details:

Month: June
Year: 2024
Location: Mexico City, Mexico
Venue: F | i | n | d | i | n | g | s | - | N | A | A | C | L |

Citations

URL