Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
Paper • 2405.15319 • Published • 28
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("llm-stacking/StackLLM_410M_750BToken")
model = AutoModelForCausalLM.from_pretrained("llm-stacking/StackLLM_410M_750BToken")YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Paper:arxiv.org/abs/2405.15319
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="llm-stacking/StackLLM_410M_750BToken")