Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
pathcosmos
/
frankenstallm
like
0
Text Generation
Transformers
Safetensors
GGUF
7 datasets
Korean
English
llama
3b
korean
from-scratch
orpo
instruction-tuned
preference-aligned
fp8
b200
Eval Results (legacy)
text-generation-inference
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
51
Deploy
Use this model
main
frankenstallm
/
data
19.7 GB
Ctrl+K
Ctrl+K
2 contributors
History:
6 commits
pathcosmos
feat: Add SFT training data (filtered, 2.4M samples from 24 sources)
6ebad26
verified
about 1 month ago
preference
feat: Add heegyu_orca-math-korean-preference-cleaned.jsonl for ORPO/SFT reproducibility
about 1 month ago
sft
feat: Add SFT val + preference data (ORPO training, 630K pairs)
about 1 month ago
sft_combined
feat: Add SFT training data (filtered, 2.4M samples from 24 sources)
about 1 month ago
DATA_README.md
Safe
1.4 kB
feat: Add data pipeline scripts + phase reports (Tier 3 - reproducibility)
about 1 month ago
__init__.py
Safe
212 Bytes
feat: Add data pipeline scripts + phase reports (Tier 3 - reproducibility)
about 1 month ago
build_dataset.sh
Safe
1.88 kB
feat: Add data pipeline scripts + phase reports (Tier 3 - reproducibility)
about 1 month ago
build_korean_dataset.sh
Safe
5.4 kB
feat: Add data pipeline scripts + phase reports (Tier 3 - reproducibility)
about 1 month ago
dataset.py
Safe
5.41 kB
feat: Add data pipeline scripts + phase reports (Tier 3 - reproducibility)
about 1 month ago
download.py
Safe
11.4 kB
feat: Add data pipeline scripts + phase reports (Tier 3 - reproducibility)
about 1 month ago
download_cc100.sh
Safe
3.99 kB
feat: Add data pipeline scripts + phase reports (Tier 3 - reproducibility)
about 1 month ago
filter_sft_v2.py
Safe
8.37 kB
feat: Add data pipeline scripts + phase reports (Tier 3 - reproducibility)
about 1 month ago
finish_korean_pipeline.sh
Safe
17.6 kB
feat: Add data pipeline scripts + phase reports (Tier 3 - reproducibility)
about 1 month ago
merge_bins.py
Safe
1.51 kB
feat: Add data pipeline scripts + phase reports (Tier 3 - reproducibility)
about 1 month ago
prepare.py
Safe
10.7 kB
feat: Add data pipeline scripts + phase reports (Tier 3 - reproducibility)
about 1 month ago
prepare_preference_combined.py
Safe
12.4 kB
feat: Add data pipeline scripts + phase reports (Tier 3 - reproducibility)
about 1 month ago
prepare_sft_data.py
Safe
23.7 kB
feat: Add data pipeline scripts + phase reports (Tier 3 - reproducibility)
about 1 month ago
sft_dataset.py
Safe
23.2 kB
feat: Add data pipeline scripts + phase reports (Tier 3 - reproducibility)
about 1 month ago
tokenize_cc100.sh
Safe
7.36 kB
feat: Add data pipeline scripts + phase reports (Tier 3 - reproducibility)
about 1 month ago
tokenize_extra.py
Safe
30.4 kB
feat: Add data pipeline scripts + phase reports (Tier 3 - reproducibility)
about 1 month ago