Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Hang Zhang | Xin Li | Lidong Bing |

Paper Details:

Month: December
Year: 2023
Location: Singapore
Venue: EMNLP |