AI & ML interests

None defined yet.

DmitryRyuminΒ 
posted an update about 1 month ago
view post
Post
1231
πŸš€πŸ‘οΈπŸŒŸ New Research Alert - ICCV 2025 (Poster)! πŸŒŸπŸ‘οΈπŸš€
πŸ“„ Title: Is Less More? Exploring Token Condensation as Training-Free Test-Time Adaptation πŸ”

πŸ“ Description: Token Condensation as Adaptation (TCA) improves the performance and efficiency of Vision Language Models in zero-shot inference by introducing domain anchor tokens.

πŸ‘₯ Authors: Zixin Wang, Dong Gong, Sen Wang, Zi Huang, Yadan Luo

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation (2410.14729)

πŸ“ Repository: https://github.com/Jo-wang/TCA

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Session 1: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/session-1.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #TestTimeAdaptation #TokenCondensation #VisionLanguageModels #TrainingFreeAdaptation #ZeroShotLearning #EfficientAI #AI #ICCV2025 #ResearchHighlight
DmitryRyuminΒ 
posted an update about 1 month ago
view post
Post
2379
πŸš€πŸ‘οΈπŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ‘οΈπŸš€
πŸ“„ Title: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching πŸ”

πŸ“ Description: The proposed method enhances stereo matching by efficiently combining unbiased monocular priors from vision foundation models. This method addresses misalignment and local optima issues using a binary local ordering map and pixel-wise linear regression.

πŸ‘₯ Authors: Chengtang Yao, Lidong Yu, Zhidan Liu, Jiaxi Zeng, Yuwei Wu, and Yunde Jia

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching (2505.14414)

πŸ“ Repository: https://github.com/YaoChengTang/Diving-into-the-Fusion-of-Monocular-Priors-for-Generalized-Stereo-Matching

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the 3D Pose Understanding Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/3d-pose-understanding.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #StereoMatching #MonocularDepth #VisionFoundationModels #3DReconstruction #Generalization #AI #ICCV2025 #ResearchHighlight
DmitryRyuminΒ 
posted an update about 1 month ago
view post
Post
2831
πŸš€πŸ‘ŒπŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ€ŒπŸš€
πŸ“„ Title: Understanding Co-speech Gestures in-the-wild πŸ”

πŸ“ Description: JEGAL is a tri-modal model that learns from gestures, speech and text simultaneously, enabling devices to interpret co-speech gestures in the wild.

πŸ‘₯ Authors: @sindhuhegde , K R Prajwal, Taein Kwon, and Andrew Zisserman

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Understanding Co-speech Gestures in-the-wild (2503.22668)

🌐 Web Page: https://www.robots.ox.ac.uk/~vgg/research/jegal
πŸ“ Repository: https://github.com/Sindhu-Hegde/jegal
πŸ“Ί Video: https://www.youtube.com/watch?v=TYFOLKfM-rM

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Human Modeling Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/human-modeling.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #CoSpeechGestures #GestureUnderstanding #TriModalRepresentation #MultimodalLearning #AI #ICCV2025 #ResearchHighlight
DmitryRyuminΒ 
posted an update about 2 months ago
view post
Post
3953
πŸš€πŸ’‘πŸŒŸ New Research Alert - ICCV 2025 (Oral)! 🌟πŸͺ„πŸš€
πŸ“„ Title: LoftUp: Learning a Coordinate-based Feature Upsampler for Vision Foundation Models πŸ”

πŸ“ Description: LoftUp is a coordinate-based transformer that upscales the low-resolution features of VFMs (e.g. DINOv2 and CLIP) using cross-attention and self-distilled pseudo-ground truth (pseudo-GT) from SAM.

πŸ‘₯ Authors: Haiwen Huang, Anpei Chen, Volodymyr Havrylov, Andreas Geiger, and Dan Zhang

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models (2504.14032)

🌐 Github Page: https://andrehuang.github.io/loftup-site
πŸ“ Repository: https://github.com/andrehuang/loftup

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Foundation Models and Representation Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/foundation-models-and-representation-learning.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #LoftUp #VisionFoundationModels #FeatureUpsampling #Cross-AttentionTransformer #CoordinateBasedLearning #SelfDistillation #PseudoGroundTruth #RepresentationLearning #AI #ICCV2025 #ResearchHighlight
DmitryRyuminΒ 
posted an update about 2 months ago
view post
Post
1947
πŸš€πŸ·οΈπŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ§©πŸš€
πŸ“„ Title: Heavy Labels Out! Dataset Distillation with Label Space Lightening πŸ”

πŸ“ Description: The HeLlO framework is a new corpus distillation method that removes the need for large soft labels. It uses a lightweight, online image-to-label projector based on CLIP. This projector has been adapted using LoRA-style, parameter-efficient tuning. It has also been initialized with text embeddings.

πŸ‘₯ Authors: @roseannelexie , @Huage001 , Zigeng Chen, Jingwen Ye, and Xinchao Wang

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Heavy Labels Out! Dataset Distillation with Label Space Lightening (2408.08201)

πŸ“Ί Video: https://www.youtube.com/watch?v=kAyK_3wskgA

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #DatasetDistillation #LabelCompression #CLIP #LoRA #EfficientAI #FoundationModels #AI #ICCV2025 #ResearchHighlight
  • 2 replies
Β·
DmitryRyuminΒ 
posted an update about 2 months ago
view post
Post
4806
πŸš€πŸ€–πŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ€–πŸš€
πŸ“„ Title: Variance-based Pruning for Accelerating and Compressing Trained Networks πŸ”

πŸ“ Description: The one-shot pruning method efficiently compresses networks, reducing computation and memory usage while retaining almost full performance and requiring minimal fine-tuning.

πŸ‘₯ Authors: Uranik Berisha, Jens Mehnert, and Alexandru Paul Condurache

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Variance-Based Pruning for Accelerating and Compressing Trained Networks (2507.12988)

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #VarianceBasedPruning #NetworkCompression #ModelAcceleration #EfficientDeepLearning #VisionTransformers #AI #ICCV2025 #ResearchHighlight
DmitryRyuminΒ 
posted an update about 2 months ago
view post
Post
3011
πŸš€πŸ‘οΈπŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ‘οΈπŸš€
πŸ“„ Title: Token Activation Map to Visually Explain Multimodal LLMs πŸ”

πŸ“ Description: The Token Activation Map (TAM) is an advanced explainability method for multimodal LLMs. Using causal inference and a Rank Gaussian Filter, TAM reveals token-level interactions and eliminates redundant activations. The result is clearer, high-quality visualizations that enhance understanding of object localization, reasoning and multimodal alignment across models.

πŸ‘₯ Authors: Yi Li, Hualiang Wang, Xinpeng Ding, Haonan Wang, and Xiaomeng Li

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Token Activation Map to Visually Explain Multimodal LLMs (2506.23270)

πŸ“ Repository: https://github.com/xmed-lab/TAM

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Multi-Modal Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/multi-modal-learning.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #TokenActivationMap #TAM #CausalInference #VisualReasoning #Multimodal #Explainability #VisionLanguage #LLM #XAI #AI #ICCV2025 #ResearchHighlight
  • 2 replies
Β·
WauplinΒ 
posted an update 5 months ago
view post
Post
3255
Say hello to hf: a faster, friendlier Hugging Face CLI ✨

We are glad to announce a long-awaited quality-of-life improvement: the Hugging Face CLI has been officially renamed from huggingface-cli to hf!

So... why this change?

Typing huggingface-cli constantly gets old fast. More importantly, the CLI’s command structure became messy as new features were added over time (upload, download, cache management, repo management, etc.). Renaming the CLI is a chance to reorganize commands into a clearer, more consistent format.

We decided not to reinvent the wheel and instead follow a well-known CLI pattern: hf <resource> <action>. Isn't hf auth login easier to type and remember?

The full rationale, implementation details, and migration notes are in the blog post: https://huggingface.co/blog/hf-cli

chansungΒ 
posted an update 5 months ago
view post
Post
4389
YAML engineering becomes more and more important than ever from infra provisioning to model training (recipes).

Here, I built a simple editor first for @dstackai , and I will share the live endpoint this week. Let me know what you think about this approach.

Based on this approach, if people think this is useful, I am going to do the same thing for the LLM training recipes for popular frameworks such as Hugging Face open-r1, Axolotl, and so on. Let me hear.
WauplinΒ 
posted an update 9 months ago
view post
Post
2332
‼️ huggingface_hub's v0.30.0 is out with our biggest update of the past two years!

Full release notes: https://github.com/huggingface/huggingface_hub/releases/tag/v0.30.0.

πŸš€ Ready. Xet. Go!

Xet is a groundbreaking new protocol for storing large objects in Git repositories, designed to replace Git LFS. Unlike LFS, which deduplicates files, Xet operates at the chunk levelβ€”making it a game-changer for AI builders collaborating on massive models and datasets. Our Python integration is powered by [xet-core](https://github.com/huggingface/xet-core), a Rust-based package that handles all the low-level details.

You can start using Xet today by installing the optional dependency:

pip install -U huggingface_hub[hf_xet]


With that, you can seamlessly download files from Xet-enabled repositories! And don’t worryβ€”everything remains fully backward-compatible if you’re not ready to upgrade yet.

Blog post: https://huggingface.co/blog/xet-on-the-hub
Docs: https://huggingface.co/docs/hub/en/storage-backends#xet


⚑ Inference Providers

- We’re thrilled to introduce Cerebras and Cohere as official inference providers! This expansion strengthens the Hub as the go-to entry point for running inference on open-weight models.

- Novita is now our 3rd provider to support text-to-video task after Fal.ai and Replicate.

- Centralized billing: manage your budget and set team-wide spending limits for Inference Providers! Available to all Enterprise Hub organizations.

from huggingface_hub import InferenceClient
client = InferenceClient(provider="fal-ai", bill_to="my-cool-company")
image = client.text_to_image(
    "A majestic lion in a fantasy forest",
    model="black-forest-labs/FLUX.1-schnell",
)
image.save("lion.png")


- No more timeouts when generating videos, thanks to async calls. Available right now for Fal.ai, expecting more providers to leverage the same structure very soon!
Β·
chansungΒ 
posted an update 9 months ago
view post
Post
3947
simple guide on the recipe for GRPO on Open-R1 which is built on top of TRL

I think FastAPI wrapper of vLLM with WeightSyncWorker is pretty cool feature. Also, we have many predefined reward functions out of the box!
Β·
chansungΒ 
posted an update 9 months ago
view post
Post
2675
Mistral AI Small 3.1 24B is not only commercial free but also the best model in a single GPU deployment.

I packed up all the information you need to know in a single picture. Hope this helps! :)
  • 1 reply
Β·
chansungΒ 
posted an update 9 months ago
view post
Post
1604
Gemma 3 Release in a nutshell
(seems like function calling is not supported whereas the announcement said so)
DmitryRyuminΒ 
posted an update 10 months ago
view post
Post
4089
πŸš€πŸŽ­πŸŒŸ New Research Alert - WACV 2025 (Avatars Collection)! πŸŒŸπŸŽ­πŸš€
πŸ“„ Title: EmoVOCA: Speech-Driven Emotional 3D Talking Heads πŸ”

πŸ“ Description: EmoVOCA is a data-driven method for generating emotional 3D talking heads by combining speech-driven lip movements with expressive facial dynamics. This method has been developed to overcome the limitations of corpora and to achieve state-of-the-art animation quality.

πŸ‘₯ Authors: @FedeNoce , Claudio Ferrari, and Stefano Berretti

πŸ“… Conference: WACV, 28 Feb – 4 Mar, 2025 | Arizona, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: https://arxiv.org/abs/2403.12886

🌐 Github Page: https://fedenoce.github.io/emovoca/
πŸ“ Repository: https://github.com/miccunifi/EmoVOCA

πŸš€ CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

πŸš€ WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers

πŸš€ ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸš€ Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

πŸ” Keywords: #EmoVOCA #3DAnimation #TalkingHeads #SpeechDriven #FacialExpressions #MachineLearning #ComputerVision #ComputerGraphics #DeepLearning #AI #WACV2024
  • 1 reply
Β·