π Depth-Jitter: Seeing Through the Depth
Accepted at OCEANS 2025 - Great Lakes
π Paper β’ π» Code
Authors: Md Sazidur Rahman1,2, David Cabecinhas2, Ricard Marxer1
1UniversitΓ© de Toulon, Aix Marseille Univ, CNRS, LIS, Toulon, France
2Institute for Systems and Robotics, Instituto Superior TΓ©cnico, Lisbon, Portugal
π Overview
Depth-Jitter is an advanced image augmentation framework that enhances datasets by incorporating depth-aware transformations.
It provides a set of tools for depth-based image processing, data augmentation, and model training, enabling improved model robustness for applications such as:
- π Underwater Imaging
- π€ Autonomous Navigation
- π 3D Reconstruction & Robotics
β¨ Key Features
β
Depth-Based Augmentation β Simulates real-world depth variations.
β
Quantile-Based Thresholding β Adaptive thresholding for different depth distributions.
β
Adaptive Depth Offsetting β Introduces controlled randomness for better robustness.
β
Multi-Dataset Support β Works with UTDAC2020 and FathomNet datasets.
β
Seamless Deep Learning Integration β Ready for model training and evaluation.
Underwater Image Formation Model
Depth Jitter
Depth Jitter Equation
To model depth-aware augmentation, we introduce the following equation:
In this equation, ( \Delta z_m ) represents the depth offset added to the original depth map. By incorporating this offset, we generate synthetic data with depth variations, which serves as an effective data augmentation strategy. This method enhances the modelβs robustness to varying color and depth conditions, particularly in underwater environments where visibility and illumination vary significantly.
By applying depth offsets during training, the model learns to generalize across different visibility settings, leading to improved adaptability in real-world scenarios.
π Project Structure
.
βββ README.md
βββ assets
β βββ depth-jitter-white.png
β βββ equation.png
β βββ project_video.mp4
β βββ seathru.png
βββ depth_variance_fathomnet.json
βββ depth_variance_utdac.json
βββ environment.yml
βββ output-fathomnet.png
βββ output-utdac.png
βββ parameters_train.json
βββ q2l_labeller
β βββ __init__.py
β βββ __pycache__
β βββ data
β βββ loss_modules
β βββ models
β βββ pl_modules
βββ simple-demo.ipynb
βββ train.json
βββ train_fathomnet.json
βββ train_q2l.py
βββ val.json
βββ val_fathomnet.json
Download the Dataset
Fathomnet 2023
Please follow the instructions from here.
UTDAC2020
Download the dataset from this link.
π Dataset Folder Structure
.
βββ annotations
β βββ train.json
β βββ val.json
βββ Depth_images
βββ train_images
βββ val_images
Usage
Clone the repository
git clone https://github.com/mim-team/Depth-Jitter.git
cd Depth-Jitter/
Create Conda Environment
conda env create -f environment.yml
Activate Conda Environment
conda activate depth-jitter
train with desired dataset
python train_q2l.py --dataset FathomNet
Train on SLURM Server
If you are on a slurm server, You can use the slurm script to train the model on multiple gpus.
sbatch depthjitter.slurm
Jupyter Notebook
A Jupyter Notebook is provided for a more user-friendly and interactive experience with the code.
If you want to change augmentation settings
You can tweak the augmentation settings and the image size in this part of the training script.
# Initialize Data Module
coco = COCODataModule(
data_dir=selected_dataset["image_folder"],
img_size=384,
batch_size=128,
num_workers=8, # Adjust based on CPU cores
use_cutmix=True,
cutmix_alpha=1.0,
train_classes=None,
sampling_strategy="default", # oversample, undersample, default
augmentation_strategy="seathru", # baseline, seathru, combined
num_classes=selected_dataset["num_classes"],
seathru_transform=seathru_transform
)
Model Settings
You can change the model backbone and hyperparameters in this section of the training script. If you want to use different backbones you can use them from timm
If you use a different backbone, please make sure to change the backbone_desc and conv_out_dims according to the models.
param_dict = {
"backbone_desc": "resnest101e",
"conv_out_dim": 2048,
"hidden_dim": 256,
"num_encoders": 2,
"num_decoders": 3,
"num_heads": 8,
"batch_size": 128,
"image_dim": 384,
"learning_rate": 1e-4,
"momentum": 0.9,
"weight_decay": 1e-2,
"n_classes": selected_dataset["num_classes"], # Dynamically assign class numbers
"thresh": 0.4,
"use_cutmix": True,
"use_pos_encoding": True,
"loss": "ASL", # ASL, BCE
"data": coco
}
Inference Images
python inference.py --image path/to/image.jpg --checkpoint path/to/model.ckpt --num_classes <number of classes>
ποΈ Train Your Own Dataset
If you want to train your own dataset, follow these steps:
π Step 1: Generate Depth Images
Get the depth images and depth parameters using any state-of-the-art RGB-to-Depth model.
We used Depth Anything v2 for our dataset.
π Step 2: Extract Seathru Parameters
Use Gaussian Seathru (from Sucre) to obtain seathru parameters.
π Note: You will need the depth images from Step 1 for this process.
π Step 3: Compute Depth Variance Threshold
To determine the depth variance threshold and generate the depth_variance.json file:
π Use the Jupyter Notebook provided in this repository.
π Step 4: Prepare Annotations in COCO Format
Ensure that your dataset annotations are formatted in COCO JSON format before proceeding.
π Refer to the COCO Dataset Guide if needed.
π Step 5: Train Your Model with Depth-Jitter
Now that you have all the required data, you can train your multi-label classification model
using our proposed augmentation technique! π―
# Example command to train your dataset after fixing the paths in the training script.
python train.py --dataset YourDataset
Acknowledgement
First and foremost, I would like to express my deepest gratitude to my supervisor, Professor Dr. Ricard Marxer, for his continuous support, guidance, and encouragement throughout this research. His insightful feedback and unwavering belief in my capabilities have been invaluable to the completion of this work. I am also profoundly grateful to my co-supervisor, Dr. David Cabecinhas, for his expertise, patience, and constructive criticism, which have significantly contributed to the quality and direction of this research. I extend my sincere thanks to the LIS Lab at UniversitΓ© de Toulon for providing the financial support and resources necessary for this research. The funding and facilities offered by the LIS Lab have been instrumental in facilitating my experiments and enabling me to pursue my research objectives. Additionally, I acknowledge the faculty and staff of UniversitΓ© de Toulon and Instituto Superior TΓ©cnico for their support and assistance during my studies. Special thanks to my colleagues and friends for providing a stimulating and supportive environment in which to learn and grow.
The query2label implementation was modifed from this repository.
π Citation
If you use Depth-Jitter in your work, please cite:
π BibTeX
@inproceedings{rahman2025depthjitter,
author = {Md Sazidur Rahman and David Cabecinhas and Ricard Marxer },
title = {Depth-Jitter: Seeing through the Depth},
booktitle = {Proceedings of the OCEANS 2025 Conference, Great Lakes},
year = {2025},
address = {Chicago, IL, USA},
month = sep,
pages = {XXX--XXX},
doi = {10.XXXX/XXXXX}
}