Sitong CHENG

cmots

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 days ago

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

upvoted a paper about 1 month ago

Step-Audio-R1 Technical Report

upvoted a paper 3 months ago

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

View all activity

Organizations

None yet

upvoted a paper 4 days ago

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Paper • 2512.19693 • Published 4 days ago • 60

upvoted a paper about 1 month ago

Step-Audio-R1 Technical Report

Paper • 2511.15848 • Published Nov 19 • 52

upvoted 2 papers 3 months ago

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

Paper • 2510.09606 • Published Oct 10 • 17

UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice

Paper • 2509.21144 • Published Sep 25 • 1

updated a model 3 months ago

cmots/UniSS

Audio-to-Audio • 2B • Updated Sep 26 • 29 • 3

liked a model 3 months ago

cmots/UniSS

Audio-to-Audio • 2B • Updated Sep 26 • 29 • 3

published a model 4 months ago

cmots/UniSS

Audio-to-Audio • 2B • Updated Sep 26 • 29 • 3

liked 2 datasets 4 months ago

stepfun-ai/StepEval-Audio-Paralinguistic

Viewer • Updated Aug 29 • 550 • 807 • 8

nvidia/Granary

Viewer • Updated Aug 14 • 116M • 4.73k • 163

liked a dataset 5 months ago

HKUSTAudio/Audio-FLAN-Dataset

Preview • Updated Oct 6 • 13.4k • 38

liked a Space 9 months ago

The Ultra-Scale Playbook

🌌

3.6k

The ultimate guide to training LLM on large GPU Clusters

liked a model 10 months ago

SparkAudio/Spark-TTS-0.5B

Text-to-Speech • Updated Mar 7 • 1.3k • 717

upvoted a paper 10 months ago

Audio-FLAN: A Preliminary Release

Paper • 2502.16584 • Published Feb 23 • 36

liked a model 12 months ago

HKUSTAudio/xcodec2

Audio-to-Audio • 0.8B • Updated Feb 23 • 24.8k • 94

liked 3 datasets about 1 year ago

liked a dataset over 1 year ago

fka/awesome-chatgpt-prompts

Viewer • Updated about 3 hours ago • 664 • 24.1k • 9.52k

upvoted 2 papers over 1 year ago

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Paper • 2406.05370 • Published Jun 8, 2024 • 18

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

Paper • 2405.00233 • Published Apr 30, 2024 • 17

Sitong CHENG

AI & ML interests

Recent Activity

Organizations

cmots's activity

The Ultra-Scale Playbook