Moonlight-16B-A3B training fails in DeepseekV3MoE.forward with UnboundLocalError when shared experts are enabled
#8 opened 3 days ago
by
arindamm
checkpoints license
#6 opened 12 months ago
by
virgoolAI
when running the example got ValueError: Attention mask should be of size (1, 1, 1, 12), but is torch.Size([1, 1, 1, 11]).
1
#5 opened about 1 year ago
by
melodylizx
License
2
#4 opened about 1 year ago
by
arshiaafshani
Thank you!
❤️ 2
#2 opened about 1 year ago
by
tanliboy
Fix generation with latest transformers
#1 opened about 1 year ago
by
kylesayrs