Model | # Params | Binary QA | MCQ | TSH | STH | Avg. |
Human | - | 95.14 | 93.29 | 90.17 | 87.43 | 91.51 | GPT-4o | - | 81.17 | 90.97 | 83.42 | 74.17 | 82.43 |
Gemini-1.5-Pro | - | 75.02 | 79.04 | 82.67 | 64.11 | 75.21 |
VILA 1.5 | 13B | 57.75 | 81.95 | 68.84 | 35.04 | 60.90 |
VidelLLaMA2 | 7B | 48.23 | 83.79 | 22.50 | 65.22 | 54.94 |
LLaVA-NeXT-Video | 34B | 26.04 | 77.57 | 20.67 | 44.39 | 42.17 |
PLLaVA | 13B | 35.04 | 77.31 | 17.83 | 32.94 | 40.78 |
Video-LLaVA | 7B | 23.88 | 65.18 | 28.83 | 30.12 | 37.00 |
Chat-UniVi | 13B | 23.20 | 55.07 | 32.50 | 31.55 | 35.58 |
SharGPT4Video | 8B | 29.58 | 44.83 | 49.00 | 17.08 | 35.12 |
Video-ChatGPT | 7B | 9.36 | 23.25 | 29.83 | 8.13 | 17.64 |
Binary QA: binary question and answer, MCQ: multiple choice question,
TSH: temporal sequence hallucination, STH: scene transition hallucination.