9 32 20

Zhongang Cai

caizhongang

http://caizhongang.com/

AI & ML interests

Multimodal, Spatial Intelligence, Embodied AI, Virtual Humans.

Recent Activity

upvoted a paper 32 minutes ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

upvoted a paper about 23 hours ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

liked a model 11 days ago

sensenova/SenseNova-SI-1.4-InternVL3-8B

View all activity

Organizations

upvoted a paper 32 minutes ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published 2 days ago • 75

upvoted a paper about 23 hours ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Paper • 2604.04901 • Published 2 days ago • 26

liked a model 11 days ago

sensenova/SenseNova-SI-1.4-InternVL3-8B

Image-Text-to-Text • 8B • Updated 12 days ago • 1.12k • 3

authored a paper 16 days ago

Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

Paper • 2603.19227 • Published 19 days ago • 42

upvoted 2 papers 19 days ago

Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

Paper • 2603.19227 • Published 19 days ago • 42

MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction

Paper • 2603.19231 • Published 19 days ago • 36

upvoted a paper 21 days ago

Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation

Paper • 2603.16669 • Published 22 days ago • 70

authored a paper 21 days ago

Demystifing Video Reasoning

Paper • 2603.16870 • Published 21 days ago • 367

upvoted a paper 21 days ago

Demystifing Video Reasoning

Paper • 2603.16870 • Published 21 days ago • 367

upvoted a paper 22 days ago

HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions

Paper • 2603.15612 • Published 22 days ago • 152

upvoted 2 papers about 1 month ago

ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors

Paper • 2603.04338 • Published Mar 4 • 24

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

Paper • 2603.03241 • Published Mar 3 • 87

liked a dataset about 1 month ago

Video-Reason/VBVR-Dataset

Viewer • Updated 7 days ago • 1M • 1.65k • 50

liked a Space about 1 month ago

VBVR Bench Leaderboard

🥇

Leaderboard for VBVR-Bench

authored 3 papers about 1 month ago

liked a model about 1 month ago

Video-Reason/VBVR-Wan2.2

Image-to-Video • Updated 7 days ago • 244 • 123

upvoted a paper about 1 month ago

A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 517

upvoted a paper about 2 months ago

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

Paper • 2602.08439 • Published Feb 9 • 28

Zhongang Cai

AI & ML interests

Recent Activity

Organizations

caizhongang's activity

VBVR Bench Leaderboard