Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Tianduo Wang | Shichen Li | Wei Lu |

Paper Details:

Month: August
Year: 2024
Location: Bangkok, Thailand
Venue: ACL |