Instructions to use PearlLeeStudio/TheArtist-MusicTransformer-lora-bossa with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use PearlLeeStudio/TheArtist-MusicTransformer-lora-bossa with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
TheArtist Music Transformer — LoRA Adapter (Bossa nova)
LoRA adapter that conditions the F1 base (PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80) toward bossa nova chord progressions. One of eleven per-genre adapters released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). This release is the best-rank snapshot from a 5-point rank sweep (r ∈ {4, 8, 16, 32, 64}); see §Rank sweep below for the full table and selection criterion.
Adapter summary
| Field | Value |
|---|---|
| Base model | PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80 (F1, 25.6M params) |
| Adapter type | LoRA (Q/K/V projections) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Target modules | w_q, w_k, w_v |
| Trainable parameters | |
| Adapter file size | ~1.5 MB |
| Base vocabulary | 351 tokens (jazz/pop) |
| Vocabulary extension | +8 genre tokens (embedding_extension.pt) |
| Training epochs | 5 |
Training data
Source
14,315 chord-progression sequences in the bossa nova subset of the Chordonomicon dataset. Chordonomicon is licensed CC BY-NC 4.0; see the dataset card for full terms.
Filter rule
genres contains any of {bossa, samba, latin, salsa, cumbia}
(See ai/training/extract_genre_subsets.py:GENRE_FILTERS for the full extraction logic — main matches the main_genre column, genres_any substring-matches the free-form genres column. Each song is assigned to its first matching genre so it never double-counts.)
Splits (song-level, seed=42, 80/10/10)
| Partition | Songs | Used for |
|---|---|---|
| train | 11,452 | this LoRA's training (12-key augmented → 137,424 sequences) |
| val | 1,431 | rank-sweep eval + best-epoch selection during training |
| test | 1,432 | held aside for future paired analysis |
Vocabulary
- Base: 351 tokens (jazz/pop chord vocab from the F1 base model)
- Extension: +8
[GENRE:X]tokens covering 8 new genres (this LoRA adds the[GENRE:bossa]token) - Final vocab: 359 tokens (stored alongside the adapter in
embedding_extension.pt)
Reproducibility
# 1. Pull Chordonomicon raw csv into ai/data/raw/chordonomicon/
# 2. Extract this genre subset
uv run python ai/training/extract_genre_subsets.py --genres bossa --merge
# 3. Train the LoRA at the released rank
uv run python ai/training/lora_train.py --config ai/training/configs/lora/bossa_r16.yaml
Hyperparameters: 5 epochs · batch 32 × accum 2 · lr 3e-4 · 1-epoch warmup · AMP fp16 · best.pt selected by min val_loss.
Genre character
Brazilian bossa nova jazz harmony
Rank sweep
The released adapter is the best-rank snapshot from training the same LoRA recipe at five different ranks. Every cell uses the same F1 base, same val split, same evaluate() call, and the same [GENRE:none]-initialized embedding extension — only lora_r (and lora_alpha = 2 × lora_r) changes. Numbers are validation-set token-level metrics (no key augmentation).
| Rank | val_loss | val_top1 (%) | val_top5 (%) | Δtop1 vs F1 |
|---|---|---|---|---|
| r=4 | 0.6593 | 81.13 | 96.00 | +2.80 |
| r=8 | 0.6601 | 81.15 | 94.86 | +2.82 |
| r=16 | 0.6589 | 82.30 | 95.99 | +3.97 ← selected |
| r=32 | 0.6601 | 81.13 | 96.00 | +2.80 |
| r=64 | 0.6604 | 81.12 | 96.00 | +2.79 |
Selection criterion: minimum validation cross-entropy loss; val_top1 as tiebreaker. val_loss is what the training loop optimizes and what selects each rank's best.pt epoch, so using it for cross-rank selection keeps consistency with how each individual checkpoint was chosen.
Full 11-genre × 5-rank sweep + full-FT anchor table: ai/results/lora_rank_sweep.md in the repo.
Evaluation
Validation token-level metrics on the genre-specific val split (1431 sequences, no key augmentation). The F1 base column uses the same val split, same dataloader, and the same [GENRE:none]-initialized embedding-extension setup as the LoRA run — only the LoRA parameters and the trained embedding rows differ.
| Metric | F1 base alone | F1 + this LoRA | Δ |
|---|---|---|---|
| Top-1 accuracy (%) | 78.33 | 82.30 | +3.97 |
| Top-5 accuracy (%) | 93.64 | 95.99 | +2.35 |
| Cross-entropy loss | 0.9635 | 0.6589 | -0.3046 |
Source: ai/results/f1_per_genre_baseline.csv + ai/results/lora_rank_sweep.csv. Higher top-1/top-5 and lower loss are better.
Real-song eval
Mean validation top-1/top-5/cross-entropy on 10 held-out real bossa songs from ai/data/eval_real_songs.jsonl (held-out from ai/data/splits/{val,test}.jsonl, see docs/EVAL.md for dataset composition + methodology). Teacher-forced eval — same evaluate() call as the full-val rank-sweep eval above, just narrowed to a curated 10-song subset.
| Model | Top-1 (%) | Top-5 (%) | val_loss |
|---|---|---|---|
| F1 base alone | 81.43 | 95.47 | 0.7825 |
| F1 + this LoRA | 84.02 | 97.53 | 0.5604 |
| Δ | +2.59 | +2.07 | -0.2221 |
Evaluation data
This adapter is evaluated on two complementary held-out sets, both drawn from the same val + test splits the LoRA never saw during training:
1. Full val split — used for the rank sweep table above
- Size: 1,431 validation sequences (this genre's val partition)
- Methodology: teacher-forced next-token CE / top-1 / top-5 with
pad_idmasking, batch 32, no key augmentation - Comparison fairness: same
evaluate()call asai/results/f1_per_genre_baseline.csv, same dataloader, same[GENRE:none]-initialised embedding-extension setup. Only the LoRA's adapter weights + the 8 new genre embedding rows differ. - Output:
ai/results/lora_rank_sweep.csv(long format, one row per (genre, rank) cell)
2. Curated 130-song real-song eval — used for the Real-song eval section below
- Size: 10 songs from this genre (10 per genre × 13 genres = 130 total)
- Source partition: drawn from
splits/val.jsonl+splits/test.jsonlonly (no train leakage) - Per-genre sources:
chordonomicon_bossa - Title coverage (this genre): 0 of 10 are named real songs; remainder are Chordonomicon entries whose title field is a Spotify track ID by upstream dataset policy
- Bar range (this genre): 24–78 bars (≈ 88s avg at typical tempo for this genre)
- Build script:
ai/training/build_eval_real_songs.py --seed 42 --per-genre 10— deterministic, re-runnable - Output:
ai/results/real_song_eval.csv(17 models × 130 songs, long format) - Full dataset composition + per-source license + methodology: see docs/EVAL.md
License and use
The adapter weights are released under CC BY-NC 4.0 (matching Chordonomicon, the upstream training corpus). Permitted: research, paper replication, portfolio, demo. Not permitted: commercial deployment without separate licensing of upstream data.
Usage
import torch
from huggingface_hub import hf_hub_download
from peft import PeftModel
from model import MusicTransformer
from tokenizer import ChordTokenizer
# 1. Load the F1 base
base_path = hf_hub_download(
repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80",
filename="best.pt",
)
base_ckpt = torch.load(base_path, map_location="cpu", weights_only=False)
tokenizer = ChordTokenizer()
model = MusicTransformer(
vocab_size=tokenizer.vocab_size,
d_model=512, n_heads=8, d_ff=2048, n_layers=8,
max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(base_ckpt["model_state_dict"])
# 2. Extend the embedding to fit the LoRA's expanded vocabulary
ext_path = hf_hub_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-lora-bossa", filename="embedding_extension.pt")
ext = torch.load(ext_path, map_location="cpu", weights_only=False)
# (See model/README.md for the apply-extension recipe.)
# 3. Apply the LoRA adapter
adapter_dir = hf_hub_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-lora-bossa", filename="adapter_model.safetensors")
model = PeftModel.from_pretrained(model, adapter_dir.rsplit("/", 1)[0])
model.eval()
Citation
Preprint: arXiv:2605.04998.
@misc{lee2026chordmix,
title = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
author = {Lee, Jinju},
year = {2026},
eprint = {2605.04998},
archivePrefix = {arXiv}
}
- Downloads last month
- 66