MahatmA's picture

26 68

MahatmA

maha-atman

·

AI & ML interests

None yet

Recent Activity

liked a Space 6 days ago

r3gm/Ultimate-Vocal-Remover-WebUI

liked a Space about 1 month ago

depth-anything/depth-anything-3

upvoted a paper 3 months ago

Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

View all activity

Organizations

None yet

upvoted 3 papers 3 months ago

Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

Paper • 2510.08673 • Published Oct 9, 2025 • 125

CommonForms: A Large, Diverse Dataset for Form Field Detection

Paper • 2509.16506 • Published Sep 20, 2025 • 19

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Paper • 2510.09608 • Published Oct 10, 2025 • 50

upvoted an article 3 months ago

Article

BigCodeArena: Judging code generations end to end with code executions

Oct 7, 2025

•

19

upvoted 3 papers 3 months ago

Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

Paper • 2509.25541 • Published Sep 29, 2025 • 140

DA^2: Depth Anything in Any Direction

Paper • 2509.26618 • Published Sep 30, 2025 • 25

LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26, 2025 • 184

upvoted a collection 3 months ago

⚛️ Liquid Nanos

Library of task-specific models: https://www.liquid.ai/blog/introducing-liquid-nanos-frontier-grade-performance-on-everyday-devices • 26 items • Updated 3 days ago • 105

upvoted a paper 3 months ago

Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24, 2025 • 99

upvoted a paper 4 months ago

LLM-I: LLMs are Naturally Interleaved Multimodal Creators

Paper • 2509.13642 • Published Sep 17, 2025 • 9

upvoted 2 collections 4 months ago

Granite Docling

5 items • Updated Nov 17, 2025 • 60

Holo1.5

Holo1.5 - Open Foundation Models for Computer Use Agents • 5 items • Updated Sep 15, 2025 • 34

upvoted an article 4 months ago

Article

PP-OCRv5 on Hugging Face: A Specialized Approach to OCR

Sep 10, 2025

•

109

upvoted a collection 4 months ago

PP-OCRv5

PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese • 13 items • Updated Sep 15, 2025 • 50

upvoted an article 6 months ago

Article

Introducing ColQwen-Omni: Retrieve in every modality

Jul 17, 2025

•

75

upvoted a paper 6 months ago

Depth Anything at Any Condition

Paper • 2507.01634 • Published Jul 2, 2025 • 49

upvoted 2 papers 7 months ago

Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10, 2025 • 105

PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

Paper • 2506.05573 • Published Jun 5, 2025 • 82

upvoted a collection 7 months ago

Lingshu MLLMs

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning • 4 items • Updated Oct 9, 2025 • 21

upvoted a paper 7 months ago

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Paper • 2506.03143 • Published Jun 3, 2025 • 53