moonshotai
/

Moonlight-16B-A3B

Text Generation

text-generation-inference

Model card Files Files and versions

Resources

View closed (2)

Moonlight-16B-A3B training fails in DeepseekV3MoE.forward with UnboundLocalError when shared experts are enabled

#8 opened 3 days ago by

checkpoints license

#6 opened 12 months ago by

when running the example got ValueError: Attention mask should be of size (1, 1, 1, 12), but is torch.Size([1, 1, 1, 11]).

#5 opened about 1 year ago by

License

#4 opened about 1 year ago by

Thank you!

#2 opened about 1 year ago by

Fix generation with latest transformers

#1 opened about 1 year ago by