Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2511.16719

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 30
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

video-segmentation

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 135

ai: segmentation

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 135
A Context-Driven Training-Free Network for Lightweight Scene Text Segmentation and Recognition

Paper • 2503.15639 • Published Mar 19, 2025 • 2

Project Ideas 2026

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 135
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published Dec 9, 2025 • 134
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Paper • 2512.04677 • Published Dec 4, 2025 • 177
LongCat-Image Technical Report

Paper • 2512.07584 • Published Dec 8, 2025 • 23

Papers I want to read 🗞️

Machine Learning Operations (MLOps): Overview, Definition, and Architecture

Paper • 2205.02302 • Published May 4, 2022 • 1
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 135

LocalMamba: Visual State Space Model with Windowed Selective Scan

Paper • 2403.09338 • Published Mar 14, 2024 • 8
GiT: Towards Generalist Vision Transformer through Universal Language Interface

Paper • 2403.09394 • Published Mar 14, 2024 • 26
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29, 2024 • 35
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16, 2024 • 31

Interesting articles

General Agentic Memory Via Deep Research

Paper • 2511.18423 • Published Nov 23, 2025 • 170
Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 132
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 135
Back to Basics: Let Denoising Generative Models Denoise

Paper • 2511.13720 • Published Nov 17, 2025 • 70

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

Paper • 2512.14614 • Published Dec 16, 2025 • 73
Agent READMEs: An Empirical Study of Context Files for Agentic Coding

Paper • 2511.12884 • Published Nov 17, 2025 • 28
PersonaLive! Expressive Portrait Image Animation for Live Streaming

Paper • 2512.11253 • Published Dec 12, 2025 • 40
DeepCode: Open Agentic Coding

Paper • 2512.07921 • Published Dec 8, 2025 • 33

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 135
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

Paper • 2511.09554 • Published Nov 12, 2025 • 9

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 245
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 135
Show, Don't Tell: Morphing Latent Reasoning into Image Generation

Paper • 2602.02227 • Published Feb 2 • 10

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 30
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

LocalMamba: Visual State Space Model with Windowed Selective Scan

Paper • 2403.09338 • Published Mar 14, 2024 • 8
GiT: Towards Generalist Vision Transformer through Universal Language Interface

Paper • 2403.09394 • Published Mar 14, 2024 • 26
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29, 2024 • 35
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16, 2024 • 31

video-segmentation

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 135

Interesting articles

General Agentic Memory Via Deep Research

Paper • 2511.18423 • Published Nov 23, 2025 • 170
Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 132
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 135
Back to Basics: Let Denoising Generative Models Denoise

Paper • 2511.13720 • Published Nov 17, 2025 • 70

ai: segmentation

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 135
A Context-Driven Training-Free Network for Lightweight Scene Text Segmentation and Recognition

Paper • 2503.15639 • Published Mar 19, 2025 • 2

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

Paper • 2512.14614 • Published Dec 16, 2025 • 73
Agent READMEs: An Empirical Study of Context Files for Agentic Coding

Paper • 2511.12884 • Published Nov 17, 2025 • 28
PersonaLive! Expressive Portrait Image Animation for Live Streaming

Paper • 2512.11253 • Published Dec 12, 2025 • 40
DeepCode: Open Agentic Coding

Paper • 2512.07921 • Published Dec 8, 2025 • 33

Project Ideas 2026

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 135
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published Dec 9, 2025 • 134
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Paper • 2512.04677 • Published Dec 4, 2025 • 177
LongCat-Image Technical Report

Paper • 2512.07584 • Published Dec 8, 2025 • 23

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 135
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

Paper • 2511.09554 • Published Nov 12, 2025 • 9

Papers I want to read 🗞️

Machine Learning Operations (MLOps): Overview, Definition, and Architecture

Paper • 2205.02302 • Published May 4, 2022 • 1
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 135

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 245
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 135
Show, Don't Tell: Morphing Latent Reasoning into Image Generation

Paper • 2602.02227 • Published Feb 2 • 10

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs