train_multirc_42_1762316044

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1345
  • Num Input Tokens Seen: 264840880

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2321 1.0 6130 0.1345 13256608
0.0032 2.0 12260 0.1813 26510112
0.0054 3.0 18390 0.1470 39755376
0.0024 4.0 24520 0.1790 53010912
0.0676 5.0 30650 0.2459 66248576
0.0025 6.0 36780 0.2882 79495984
0.0008 7.0 42910 0.3392 92713360
0.0002 8.0 49040 0.3033 105934480
0.1222 9.0 55170 0.3478 119164864
0.0 10.0 61300 0.3589 132392640
0.0004 11.0 67430 0.3862 145641920
0.0 12.0 73560 0.3949 158902432
0.0 13.0 79690 0.3835 172144032
0.0 14.0 85820 0.4343 185378480
0.0 15.0 91950 0.4269 198621168
0.0 16.0 98080 0.4714 211855376
0.0 17.0 104210 0.5027 225105296
0.0 18.0 110340 0.5288 238352272
0.0 19.0 116470 0.5347 251594480
0.0 20.0 122600 0.5353 264840880

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_42_1762316044

Adapter
(2369)
this model