1 16 15

Yolo Y. Tang

yunlong10

https://yunlong10.github.io/

AI & ML interests

LMMs/Agents for Video Understanding

Recent Activity

upvoted a paper about 1 month ago

Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination

upvoted a paper 2 months ago

Latent Chain-of-Thought for Visual Reasoning

upvoted a paper 2 months ago

Directional Reasoning Injection for Fine-Tuning MLLMs

View all activity

Organizations

None yet

authored 7 papers 3 months ago

FreSca: Unveiling the Scaling Space in Diffusion Models

Paper • 2504.02154 • Published Apr 2, 2025 • 18

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

Paper • 2504.10686 • Published Apr 14, 2025

MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models

Paper • 2505.19415 • Published May 26, 2025 • 2

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

Paper • 2505.20426 • Published May 26, 2025 • 7

ZeroSep: Separate Anything in Audio with Zero Training

Paper • 2505.23625 • Published May 29, 2025 • 6

Can Sound Replace Vision in LLaVA With Token Substitution?

Paper • 2506.10416 • Published Jun 12, 2025 • 1

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Paper • 2510.05034 • Published Oct 6, 2025 • 48

authored 2 papers 9 months ago

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Paper • 2504.05541 • Published Apr 7, 2025 • 15

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)

Paper • 2504.03151 • Published Apr 4, 2025 • 15

authored 2 papers 10 months ago

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity

Paper • 2503.11557 • Published Mar 14, 2025 • 22

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach

Paper • 2412.18108 • Published Dec 24, 2024 • 1

authored 3 papers 12 months ago

authored 6 papers about 1 year ago

Caption Anything: Interactive Image Description with Diverse Multimodal Controls

Paper • 2305.02677 • Published May 4, 2023

Video Understanding with Large Language Models: A Survey

Paper • 2312.17432 • Published Dec 29, 2023 • 3

Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

Paper • 2402.00827 • Published Feb 1, 2024 • 2

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue

Paper • 2403.16276 • Published Mar 24, 2024

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Paper • 2404.12353 • Published Apr 18, 2024

AIM 2024 Challenge on Video Saliency Prediction: Methods and Results

Paper • 2409.14827 • Published Sep 23, 2024 • 1

Yolo Y. Tang

AI & ML interests

Recent Activity

Organizations

yunlong10's activity