English
mjjung commited on
Commit
2b6fa89
·
verified ·
1 Parent(s): ecb36d0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -3
README.md CHANGED
@@ -1,3 +1,46 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # TimeChat-7B-ActivityNet-VTune Model
6
+
7
+ ## Model details
8
+
9
+ We trained [VideoLLaMA](https://arxiv.org/abs/2306.02858) using VTune, a developed instruction-tuning method specifically designed to account for consistency.
10
+
11
+ For the tuning, we utilized 10K training videos from ActivityNet-Captions with 205K automatically generated annotations.
12
+
13
+ ## Evaluation
14
+ We evaluated the model on ActivtyNet-CON and ActivtyNet-Captions.
15
+
16
+ - ActivityNet-CON
17
+ | Metric | Value |
18
+ |-----------------|-------------|
19
+ | Ground | 33.0 |
20
+ | R-Ground | 24.7 (74.8) |
21
+ | S-Ground | 10.0 (30.2) |
22
+ | H-Verify | 20.2 (61.1) |
23
+ | C-Verify | 17.7 (53.7) |
24
+
25
+ - ActivityNet-Captions
26
+ | Metric | Value |
27
+ |-----------------|---------|
28
+ | R@1 IoU=0.3 | 51.58 |
29
+ | R@1 IoU=0.5 | 34.38 |
30
+ | R@1 IoU=0.7 | 19.18 |
31
+ | mIoU | 36.16 |
32
+
33
+ **Paper and Code for more information:**
34
+ [Paper](https://arxiv.org/abs/2411.12951), [Code](https://github.com/minjoong507/consistency-of-video-llm)
35
+
36
+ ## Citation
37
+ If you find our research and codes useful, please consider starring our repository and citing our paper:
38
+
39
+ ```
40
+ @article{jung2024consistency,
41
+ title={On the Consistency of Video Large Language Models in Temporal Comprehension},
42
+ author={Jung, Minjoon and Xiao, Junbin and Zhang, Byoung-Tak and Yao, Angela},
43
+ journal={arXiv preprint arXiv:2411.12951},
44
+ year={2024}
45
+ }
46
+ ```