scikit-learn

community

https://scikit-learn.org/stable/auto_examples/index.html

Activity Feed Request to join this org

AI & ML interests

Building interactive demos to scikit-learn examples 🧡

Recent Activity

parkneurals authored a paper 8 days ago

KV Cache Recycling to Expand Usable Context Capacity in Low Parameter LLMs

1aurent authored a paper 17 days ago

Voxtral TTS

suayptalha authored a paper 23 days ago

Diffutron: A Masked Diffusion Language Model for Turkish Language

View all activity

awinml

authored a paper about 19 hours ago

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19, 2025 • 48

Aurelien-Morgan

posted an update 3 days ago

Post

162

Launching a workweek of @retrain-pipelines wheels.

Day #1 : Compose

3 replies

parkneurals

authored a paper 8 days ago

KV Cache Recycling to Expand Usable Context Capacity in Low Parameter LLMs

Paper • 2512.11851 • Published Dec 4, 2025

Shrijanagain

posted an update 12 days ago

Post

4134

sKT-Ai-Labs

Join fast we will soon published tokens and all join and get started because we will soon off join request button if you want you can join fast guys

1 reply

PhysiQuanty

posted an update 13 days ago

Post

3338

🇫🇷 French PATENTS DataSets
🚀 800k patents (1981-2026)

INPI-France/FR-Patent-2020-2026-Raw
INPI-France/FR-Patent-2020-Claims
INPI-France/FR-Patent-2024-Chunked
INPI-France/Brevets-Francais-1981-2026-Raw

🔓 API/FTP INPI ACCESS
🔑 Access to the API/FTP INPI : https://data.inpi.fr/content/editorial/apis_pi

Shrijanagain

posted an update 17 days ago

Post

2564

🚀 Bharat AI Revolution ka Hissa Banein! 🇮🇳

Kya aap Bharat ko AI ki duniya mein ek nayi pehchan dilana chahte hain ?

SKT AI Labs sirf ek naam nahi, ek mission hai—desh ko digital shakti dene ka aur "Viksit Bharat" ke sapne ko sach karne ka.

Humse Kyun Judein?

1. Desh ka Apna AI: Hum aise models bana rahe hain jo khas taur par Bharat ki zarooraton aur bhashaon ke liye hain.

2. Open Collaboration: Hamare Hugging Face repository par hamare kaam ko dekhein, test karein aur apna yogdan dein.

3. Technological Growth: Agar aap student hain, developer hain ya tech enthusiast hain, toh hamare saath naya seekhne aur grow karne ka yeh behtareen mauka hai.

Join here

sKT-Ai-Labs
🔗

sKT-Ai-Labs

Aaiye, saath milkar Bharat AI Revolution ko aage badhate hain! 💻🔥

#SKTAILabs #DigitalIndia #AIRevolution #ViksitBharat #TechInnovation #JoinTheMission

Q-bert

submitted a paper to Daily Papers 17 days ago

Diffutron: A Masked Diffusion Language Model for Turkish Language

Paper • 2603.20466 • Published 26 days ago • 8

PhysiQuanty

posted an update 17 days ago

Post

2912

🧬 Can an LLM speak in binary ?
✅ YES ... RADIX 2 / VOCAB 4
PhysiQuanty/Binary-LLM-POC

🤖 >_ Can an LLM execute logic gates and boolean arithmetic ?

We need to create datasets :
- Neural Arithmetic and Logic Unit (NALU) 32 bits
- Neural Application Binary Interface (NABI) 32 bits

🎯 Optimal Instruction Set = RV32IMAF

This opens the way for code writing and execution by the LLMs themselves without an external CLI.

The more of us who want it, the more possible it will become ...

PhysiQuanty/Binary-Addition-LLM-POC
(10-bits binary addition : binary carry propagation, sampling no longer has any effect on the logits due to the fact that it is deterministic next token.)

1 reply

Shrijanagain

posted an update 18 days ago

Post

6823

SOME NEW HINDI + ENGLISH DATASETS

🔗
- sKT-Ai-Labs/HIN
- sKT-Ai-Labs/SKT-MIX
- sKT-Ai-Labs/ST-H

Download and Use And Train Models

You Can Alsoo Use ST-x-LIGHTING Module For Faster Training

pip install ST-x-LIGHT-V11

2 replies

Q-bert

authored a paper 23 days ago

Diffutron: A Masked Diffusion Language Model for Turkish Language

Paper • 2603.20466 • Published 26 days ago • 8

suayptalha

authored a paper 23 days ago

Diffutron: A Masked Diffusion Language Model for Turkish Language

Paper • 2603.20466 • Published 26 days ago • 8

Shrijanagain

posted an update 24 days ago

Post

5575

We are thrilled to announce the launch of SKT-OMNI-CORPUS-146T-V1, a massive-scale, high-quality dataset designed to power the next generation of Foundation Models (LLMs) from scratch.
Developed at SKT AI LABS, this corpus is not just a collection of data; it’s a mission to decentralize high-grade AI training for regional languages and global knowledge.

💎 Key Highlights:

•• Massive Scale: Targeting a multi-terabyte architecture for 146T-level tokenization.

•• Pure Quality: Curated from 500+ Elite Sources

•• Structured for MoE: Perfectly sharded into 3.5GB standardized units (SKT-𝕻 series) for seamless distributed training.

🤝 Open for Collaboration!

We are looking for AI researchers, CUDA engineers, and data scientists to join us in this journey of building Project Surya and the ST-X Series models. Whether it's optimization, custom tokenization, or architecture design—let’s build the future together.

Explore the Dataset on Hugging Face:

🔗 https://huggingface.co/datasets/Shrijanagain/SKT-OMNI-CORPUS-146T-V1

DSR -- 🔗 https://huggingface.co/datasets/Shrijanagain/SKT-DSRx10000

#AI #MachineLearning #OpenSource #IndicAI #SKTAILABS #LLM #BigData #HuggingFace #InnovationIndia

Shrijanagain

posted an update 28 days ago

Post

5465

Surya-1.1T: Scaling Beyond Human-Level Reasoning via 146 Trillion Token Pre-training
Author: SKT AI LABS
Affiliation: SKT AI Labs / Project Surya
Model Architecture: Optimized Dense Transformer
Parameters: 1.1 Trillion
Training Tokens: 146 Trillion

Wanna collaborate us Friends let's Start Journey we have Collected 146 trillon tokens and done pre training but we need to made more powerfull

Whitepaper - https://github.com/SHRIJANAGAIN/PROFF

57 replies

Nymbo

posted an update about 1 month ago

Post

6588

We should really have a release date range slider on the /models page. Tired of "trending/most downloaded" being the best way to sort and still seeing models from 2023 on the first page just because they're embedded in enterprise pipelines and get downloaded repeatedly. "Recently Created/Recently Updated" don't solve the discovery problem considering the amount of noise to sift through.

Slight caveat: Trending actually does have some recency bias, but it's not strong/precise enough.

3 replies

Tonic

posted an update about 2 months ago

Post

3572

🤔 Who would win ?

- a fully subsidized ai lab
OR
- 3 random students named

kurakurai ?

demo : Tonic/fr-on-device

if you like it give the demo a little star and send a shoutout to : @MaxLSB @jddqd and @GAD-cell for absolutely obliterating the pareto frontier of the french language understanding .

4 replies

Tonic

posted an update about 2 months ago

Post

3375

🙋🏻‍♂️hello my lovelies ,

it is with great pleasure i present to you my working one-click deploy 16GB ram completely free huggingface spaces deployment.

repo : Tonic/hugging-claw (use git clone to inspect)
literally the one-click link : Tonic/hugging-claw

you can also run it locally and see for yourself :

docker run -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_TRUSTED_PROXIES="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_PASSWORD="YOUR_VALUE_HERE" \
-e OPENCLAW_CONTROL_UI_ALLOWED_ORIGINS="YOUR_VALUE_HERE" \
registry.hf.space/tonic-hugging-claw:latest

just a few quite minor details i'll take care of but i wanted to share here first

2 replies

efecelik

posted an update 2 months ago

Post

3094

The moment we've been waiting for — ACE-Step dropped their new model: Ace-Step 1.5 🎉
🔗 ACE-Step/Ace-Step1.5
And the best part? It's released under the MIT license.
We've already started integrating it into our project. Let's go 🚀

1 reply

turhancan97

authored a paper 3 months ago

SpaRRTa: A Synthetic Benchmark for Evaluating Spatial Intelligence in Visual Foundation Models

Paper • 2601.11729 • Published Jan 16 • 1

MElHuseyni

authored 2 papers 3 months ago

Beyond Cosine Similarity: Taming Semantic Drift and Antonym Intrusion in a 15-Million Node Turkish Synonym Graph

Paper • 2601.13251 • Published Jan 19 • 4

A Hybrid Protocol for Large-Scale Semantic Dataset Generation in Low-Resource Languages: The Turkish Semantic Relations Corpus

Paper • 2601.13253 • Published Jan 19 • 4

AI & ML interests

Recent Activity

Team members 157

sklearn-docs's activity