Nerdy Face

Team

company

https://huggingface.co/

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

woojun-jung authored a paper 1 day ago

Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models

taesiri submitted a paper 2 days ago

MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos

taesiri submitted a paper 2 days ago

Stronger Normalization-Free Transformers

View all activity

woojun-jung

authored a paper 1 day ago

Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models

Paper • 2512.10362 • Published 3 days ago • 1

taesiri

submitted 5 papers to Daily Papers 2 days ago

MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos

Paper • 2512.10881 • Published 2 days ago • 21

The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality

Paper • 2512.10791 • Published 2 days ago • 3

taesiri

submitted 5 papers to Daily Papers 3 days ago

WonderZoom: Multi-Scale 3D World Generation

Paper • 2512.09164 • Published 4 days ago • 10

UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

Paper • 2512.09864 • Published 3 days ago • 10

StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation

Paper • 2512.09363 • Published 4 days ago • 63

OmniPSD: Layered PSD Generation with Diffusion Transformer

Paper • 2512.09247 • Published 4 days ago • 40

Learning Unmasking Policies for Diffusion Language Models

Paper • 2512.09106 • Published 4 days ago • 5

taesiri

submitted 5 papers to Daily Papers 4 days ago

DeepCode: Open Agentic Coding

Paper • 2512.07921 • Published 5 days ago • 19

EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce

Paper • 2512.08868 • Published 4 days ago • 2

TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels

Paper • 2512.08358 • Published 5 days ago • 3

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published 4 days ago • 119

TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models

Paper • 2512.08153 • Published 5 days ago • 5

suayptalha

authored a paper about 1 month ago

Superpositional Gradient Descent: Harnessing Quantum Principles for Model Training

Paper • 2511.01918 • Published Nov 1 • 11

m-ric

posted an update 2 months ago

Post

787

Tokenization is one of the most important processes in AI - yet many would like to kill it 💀

What's tokenization? The neural networks inside LLMs actually only process numbers, not text: tokenization is the process that makes text readable for them, by converting sentences into lists of numbers.

➡️ For instance, "This is tokenization" would be split into "This | is | token | ization", then each of the parts (tokens) are converted to IDs according to a predefined mapping: for instance "ization" could map to id 2438.
Thus "This is tokenization" can become 1335 | 135 | 2980 | 2438 => now the model can process the sentence!

Most tokenizers today use pre-specified mappings called "vocabularies", generally built about the compression algorithme Byte-Pair Encoding (BPE) that learns from a big corpuses of texts an optimized split to efficiently encode any text from the same distribution into a list token IDs.

🤨 Now, these current tokenizers have flaws.
For instance, the rigidity of their mapping creates losses ; the prime example being that a tokenizer designed for English (thus optimized for tokens like "has", "been", "clock", etc) will not have the right tokens to approach Burmese, thus being terribly inefficient at it.

Many alternative approaches have emerged as a result: for instance "tokenizer-free tokenizers". One that I really liked was "entropy-based": it monitors the stream of text, and trigger a split whenever the entropy increases too much, i.e. when something "surprising" happens.

But this great article argues that tokenizers are a lesser evil. Read and decide for yourself!
https://huggingface.co/blog/catherinearnett/in-defense-of-tokenizers

hassenhamdi

posted an update 2 months ago

Post

208

New release of HyDRA v0.2 is here!

🐍 HyDRA: Hybrid Dynamic RAG Agent.

For addressing the limitations of simple, static RAG. HyDRA is the answer. It's an advanced, unified framework for agentic RAG, inspired by the latest research to create something truly powerful.

🧠 Moving beyond single-shot retrieval. HyDRA introduces a multi-turn, reflection-based system with coordinated agents: a Planner, Coordinator, and Executors (currently local & deep web search).

🔬 At its core is an advanced 3-stage local retrieval pipeline that leaves basic RAG in the dust:
🥇 1. Hybrid Search: Combines dense (semantic) and sparse (textual) embeddings in one go using the bge-m3 model. This alone is a massive upgrade.
🥈 2. RRF (Reciprocal Rank Fusion): Intelligently merges and reranks results from different search vectors for ultimate precision.
🥉 3. Advanced Reranking: Uses the bge-m3-reranker model to score and surface the absolute most relevant documents for any query.

⚡️ This isn't just powerful, it's blazing fast. We're using SOTA ANN (HNSW) with vector and index quantization (down to 1-bit!) for near-instant retrieval with minimal quality loss.

🤖 HyDRA is more than just retrieval. It incorporates memory from experience and reflection, creating a guiding policy for smarter future interactions and strategic planning.

The result? A local retrieval system that significantly outperforms standard vector search RAG.

🌐 For deep web searches, HyDRA leverages the asynDDGS library and mcp (Model Context Protocol) for free, unrestricted web access. The entire reasoning engine is powered by the incredibly fast and efficient Google Gemini 2.5 Flash!

👨‍💻 Explore the project, dive into the code, and see it in action:
🔗 GitHub: https://github.com/hassenhamdi/HyDRA (leave a star if you like the project)

🤝 Looking to implement cutting-edge AI solutions or collaborate? Let's connect!
LinkedIn: linkedin.com/in/hassenhamdi
Email: [email protected]
Discord: hassenhamdi

m-ric

posted an update 2 months ago

Post

4888

STOP EVERYTHING NOW - we might finally have a radical architecture improvement over Transformers!!! 🚨

A lone scientist just proposed Tiny Recursive Model (TRM), and it is literally the most impressive model that I've seen this year.

➡️ Tiny Recursive Model is 7M parameters
➡️ On ARC-AGI, it beats flagship models like Gemini-2.5-pro

Consider how wild this is: Gemini-2.5-pro must be over 10,000x bigger
and had 1,000 as many authors 😂 (Alexia is alone on the paper)

What's this sorcery?
In short: it's a very tiny Transformers, but it loops over itself at two different frequencies, updating two latent variables: one for the proposed answer and one for the reasoning.

@AlexiaJM started from the paper Hierarchical Reasoning Model, published a few months ago, that already showed breakthrough improvement on AGI for its small size (27M)

Hierarchical Reasoning Model had introduced one main feature:
🔎 Deep supervision
In their model, one part (here one layer) would run at high frequency, and another would be lower frequency, running only every n steps.

They had used a recurrent architecture, where these layers would repeat many times ; but to make it work they had to do many approximations, including not fully backpropagating the loss through all layers.

Alexia studied what was useful and what wasn't, and cleaned the architecture as follows :
Why use a recurrent architecture, when you can just make it a loop?
➡️ She made the network recursive, looping over itself

Why use 2 latent variables ?
➡️ She provides a crystal clear explanation : the one that changes frequently is the reasoning, the one that changes at low frequency is the proposed answer.
➡️ She runs ablation studies to validate that 2 is indeed optimal.

This new setup is a much more elegant way to process reasoning than generating huge chains of tokens as all flagship models currently do.

This might be the breakthrough we've been awaiting for so long!

4 replies

AI & ML interests

Recent Activity

Team members 311

nerdyface's activity