<< TOPへ

LongVLM: Efficient Long Video Understanding via Large Language Models

要点 要約 Google