RelayAttention for Efficient Large Language Model Serving with Long System Prompts

Lei Zhu | Xinjiang Wang | Wayne Zhang | Rynson Lau |

Paper Details:

Month: August
Year: 2024
Location: Bangkok, Thailand
Venue: ACL |