NLPExplorer

RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models

Month: June
Year: 2024
Location: Mexico City, Mexico
Venue: F | i | n | d | i | n | g | s | - | N | A | A | C | L |

No Citations Yet