Jiawei ou
jiaweiou
·
AI & ML interests
None yet
Organizations
None yet
Vidun
-
Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing
Paper • 2411.19460 • Published • 11 -
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
Paper • 2406.19263 • Published • 10 -
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
Paper • 2412.02592 • Published • 24
Speech
VAE
Vidun
-
Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing
Paper • 2411.19460 • Published • 11 -
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
Paper • 2406.19263 • Published • 10 -
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
Paper • 2412.02592 • Published • 24