Depth Anything V2
This model is made for deployment on mobile (Snapdragon 8 Elite Gen 4) NPU. Find deployment instructions here.
Model Description
Depth Anything V2 is a foundation model for monocular depth estimation (predicting a depth map from a single RGB image).
It improves on Depth Anything V1 with finer details and stronger robustness, while remaining significantly more efficient than diffusion-based depth models (often cited as ~10× faster).
Depth Anything V2 is trained using a large-scale pipeline combining synthetic labeled depth data and massive real-world unlabeled imagery, with a teacher–student strategy bridged by pseudo-labeled real images.
Features
- Monocular depth estimation: dense depth map prediction from a single image.
- Fine-grained structure: sharper edges and better small-object depth details vs V1.
- Robust generalization: improved stability across diverse real-world scenes and conditions.
- Efficient deployment: lightweight compared to diffusion-based depth methods, enabling faster inference.
- Multiple sizes & variants: commonly released as Small/Base/Large (and other checkpoints), including metric fine-tunes for indoor/outdoor depth.
Use Cases
- AR/VR & 3D effects: background segmentation, depth-aware rendering, relighting
- Robotics & navigation: obstacle understanding and scene geometry from RGB cameras
- Photo/video post-processing: synthetic bokeh, depth-based editing and compositing
- Autonomous/ADAS prototyping: monocular depth priors for perception pipelines (especially with metric fine-tunes)
- On-device deployment: optimized inference targets (e.g., model hubs and edge toolchains)
Inputs and Outputs
Input:
- RGB image (single frame), typically resized/normalized per checkpoint configuration.
Output:
- Relative depth map (per-pixel depth up to scale) for general models.
- Metric depth map (depth in real units) for specific fine-tuned metric variants (e.g., indoor/outdoor).
License
This repo is licensed under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license, which allows use, sharing, and modification only for non-commercial purposes with proper attribution. All NPU-related models, runtimes, and code in this project are protected under this non-commercial license and cannot be used in any commercial or revenue-generating applications. Commercial licensing or enterprise usage requires a separate agreement. For inquiries, please contact dev@nexa.ai