train_multirc_42_1762316044

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

Loss: 0.1345
Num Input Tokens Seen: 264840880

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2321	1.0	6130	0.1345	13256608
0.0032	2.0	12260	0.1813	26510112
0.0054	3.0	18390	0.1470	39755376
0.0024	4.0	24520	0.1790	53010912
0.0676	5.0	30650	0.2459	66248576
0.0025	6.0	36780	0.2882	79495984
0.0008	7.0	42910	0.3392	92713360
0.0002	8.0	49040	0.3033	105934480
0.1222	9.0	55170	0.3478	119164864
0.0	10.0	61300	0.3589	132392640
0.0004	11.0	67430	0.3862	145641920
0.0	12.0	73560	0.3949	158902432
0.0	13.0	79690	0.3835	172144032
0.0	14.0	85820	0.4343	185378480
0.0	15.0	91950	0.4269	198621168
0.0	16.0	98080	0.4714	211855376
0.0	17.0	104210	0.5027	225105296
0.0	18.0	110340	0.5288	238352272
0.0	19.0	116470	0.5347	251594480
0.0	20.0	122600	0.5353	264840880

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_42_1762316044

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2369)

this model