Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
ChunjiangGe 's Collections
Generation
RL
LLM
MLLM

LLM

updated about 18 hours ago
Upvote
1

  • Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

    Paper • 2601.19895 • Published Jan 27 • 25

  • Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

    Paper • 2601.17367 • Published Jan 24 • 34

  • Small-scale proxies for large-scale Transformer training instabilities

    Paper • 2309.14322 • Published Sep 25, 2023 • 22

  • Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training

    Paper • 2602.00747 • Published Jan 31 • 9

  • HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing

    Paper • 2602.03560 • Published Feb 3 • 48

  • FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach

    Paper • 2603.13364 • Published 18 days ago • 9

  • When Does Sparsity Mitigate the Curse of Depth in LLMs

    Paper • 2603.15389 • Published 10 days ago • 5

  • Spectral Condition for μP under Width-Depth Scaling

    Paper • 2603.00541 • Published 27 days ago • 15

  • Attention Sinks Are Provably Necessary in Softmax Transformers: Evidence from Trigger-Conditional Tasks

    Paper • 2603.11487 • Published 15 days ago • 2
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs