RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

Jing Huang | Zhengxuan Wu | Christopher Potts | Mor Geva | Atticus Geiger |

Paper Details:

Month: August
Year: 2024
Location: Bangkok, Thailand
Venue: ACL |