LongVLM: Efficient Long Video Understanding via Large Language Models