Salesforce/wikitext
Viewer • Updated • 3.71M • 1.33M • 690
GPT-2 (124M) fine-tuned on
Salesforce/wikitext wikitext-2-raw-v1 train (chunked at 512
tokens) via the bergson
MAGIC pipeline. This is the exact checkpoint used to generate the
attribution scores published at
EleutherAI/bergson-magic-scores-gpt-2.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("EleutherAI/bergson-magic-gpt-2")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/bergson-magic-gpt-2")
run_path: runs/gpt2_wikitext
model: gpt2
overwrite: true
data:
dataset: Salesforce/wikitext
subset: wikitext-2-raw-v1
split: "train"
chunk_length: 512
query:
dataset: Salesforce/wikitext
subset: wikitext-2-raw-v1
split: "test[3:4]"
chunk_length: 0
distributed:
nproc_per_node: 4
nnode: 4
batch_size: 256
num_epochs: 2
lr_schedule:
lr_scheduler_type: polynomial
lr: 0.0008
lr_start: 1e-6
lr_end: 0.00008
warmup_steps: 0.25
subset_strategy: random
wandb_project: magic
Saved as examples/magic/gpt2_wikitext.yaml in the bergson repo.
Run with:
bergson magic examples/magic/gpt2_wikitext.yaml
The bergson magic step trains the model on the train split via its
own training loop (it must, because MAGIC's attribution scores are
the gradients of query loss with respect to per-example training
weights, computed by back-propagating through training). The final
trained weights end up at the hf_model/ subdirectory of the run
path; that's what was uploaded here.