you should try this.
#8
by
ZeroWw - opened
By using another layer, it might improve even more.
https://huggingface.co/moelanoby/phi-3-M3-coder/blob/main/architecture.py
By using another layer, it might improve even more.
https://huggingface.co/moelanoby/phi-3-M3-coder/blob/main/architecture.py