LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment Paper • 2310.01852 • Published Oct 3, 2023 • 2
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models Paper • 2503.14939 • Published Mar 19, 2025 • 5
ReEx-SQL: Reasoning with Execution-Aware Reinforcement Learning for Text-to-SQL Paper • 2505.12768 • Published May 19, 2025 • 5
Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark Paper • 2503.17599 • Published Mar 22, 2025
FaVChat: Hierarchical Prompt-Query Guided Facial Video Understanding with Data-Efficient GRPO Paper • 2503.09158 • Published Mar 12, 2025 • 1
Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning Paper • 2602.22703 • Published 25 days ago
Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding Paper • 2603.18472 • Published 4 days ago • 18
MIGA: Mutual Information-Guided Attack on Denoising Models for Semantic Manipulation Paper • 2503.06966 • Published Mar 10, 2025