GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Joshua Ainslie | James Lee-Thorp | Michiel de Jong | Yury Zemlyanskiy | Federico Lebron | Sumit Sanghai |

Paper Details:

Month: December
Year: 2023
Location: Singapore
Venue: EMNLP |