Draft& Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding

Jun Zhang | Jue Wang | Huan Li | Lidan Shou | Ke Chen | Gang Chen | Sharad Mehrotra |

Paper Details:

Month: August
Year: 2024
Location: Bangkok, Thailand
Venue: ACL |