TheArtist Music Transformer — LoRA Adapter (Bossa nova)

LoRA adapter that conditions the F1 base (PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80) toward bossa nova chord progressions. One of eleven per-genre adapters released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). This release is the best-rank snapshot from a 5-point rank sweep (r ∈ {4, 8, 16, 32, 64}); see §Rank sweep below for the full table and selection criterion.

Adapter summary

Field	Value
Base model	`PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80` (F1, 25.6M params)
Adapter type	LoRA (Q/K/V projections)
LoRA rank	16
LoRA alpha	32
LoRA dropout	0.05
Target modules	`w_q`, `w_k`, `w_v`
Trainable parameters	~~401,408 (~~1.54% of base)
Adapter file size	~1.5 MB
Base vocabulary	351 tokens (jazz/pop)
Vocabulary extension	+8 genre tokens (`embedding_extension.pt`)
Training epochs	5

Training data

Source

14,315 chord-progression sequences in the bossa nova subset of the Chordonomicon dataset. Chordonomicon is licensed CC BY-NC 4.0; see the dataset card for full terms.

Filter rule

genres contains any of {bossa, samba, latin, salsa, cumbia}

(See ai/training/extract_genre_subsets.py:GENRE_FILTERS for the full extraction logic — main matches the main_genre column, genres_any substring-matches the free-form genres column. Each song is assigned to its first matching genre so it never double-counts.)

Splits (song-level, seed=42, 80/10/10)

Partition	Songs	Used for
train	11,452	this LoRA's training (12-key augmented → 137,424 sequences)
val	1,431	rank-sweep eval + best-epoch selection during training
test	1,432	held aside for future paired analysis

Vocabulary

Base: 351 tokens (jazz/pop chord vocab from the F1 base model)
Extension: +8 [GENRE:X] tokens covering 8 new genres (this LoRA adds the [GENRE:bossa] token)
Final vocab: 359 tokens (stored alongside the adapter in embedding_extension.pt)

Reproducibility

# 1. Pull Chordonomicon raw csv into ai/data/raw/chordonomicon/
# 2. Extract this genre subset
uv run python ai/training/extract_genre_subsets.py --genres bossa --merge

# 3. Train the LoRA at the released rank
uv run python ai/training/lora_train.py --config ai/training/configs/lora/bossa_r16.yaml

Hyperparameters: 5 epochs · batch 32 × accum 2 · lr 3e-4 · 1-epoch warmup · AMP fp16 · best.pt selected by min val_loss.

Genre character

Brazilian bossa nova jazz harmony

Rank sweep

The released adapter is the best-rank snapshot from training the same LoRA recipe at five different ranks. Every cell uses the same F1 base, same val split, same evaluate() call, and the same [GENRE:none]-initialized embedding extension — only lora_r (and lora_alpha = 2 × lora_r) changes. Numbers are validation-set token-level metrics (no key augmentation).

Rank	val_loss	val_top1 (%)	val_top5 (%)	Δtop1 vs F1
r=4	0.6593	81.13	96.00	+2.80
r=8	0.6601	81.15	94.86	+2.82
r=16	0.6589	82.30	95.99	+3.97 ← selected
r=32	0.6601	81.13	96.00	+2.80
r=64	0.6604	81.12	96.00	+2.79

Selection criterion: minimum validation cross-entropy loss; val_top1 as tiebreaker. val_loss is what the training loop optimizes and what selects each rank's best.pt epoch, so using it for cross-rank selection keeps consistency with how each individual checkpoint was chosen.

Full 11-genre × 5-rank sweep + full-FT anchor table: ai/results/lora_rank_sweep.md in the repo.

Evaluation

Validation token-level metrics on the genre-specific val split (1431 sequences, no key augmentation). The F1 base column uses the same val split, same dataloader, and the same [GENRE:none]-initialized embedding-extension setup as the LoRA run — only the LoRA parameters and the trained embedding rows differ.

Metric	F1 base alone	F1 + this LoRA	Δ
Top-1 accuracy (%)	78.33	82.30	+3.97
Top-5 accuracy (%)	93.64	95.99	+2.35
Cross-entropy loss	0.9635	0.6589	-0.3046

Source: ai/results/f1_per_genre_baseline.csv + ai/results/lora_rank_sweep.csv. Higher top-1/top-5 and lower loss are better.

Real-song eval

Mean validation top-1/top-5/cross-entropy on 10 held-out real bossa songs from ai/data/eval_real_songs.jsonl (held-out from ai/data/splits/{val,test}.jsonl, see docs/EVAL.md for dataset composition + methodology). Teacher-forced eval — same evaluate() call as the full-val rank-sweep eval above, just narrowed to a curated 10-song subset.

Model	Top-1 (%)	Top-5 (%)	val_loss
F1 base alone	81.43	95.47	0.7825
F1 + this LoRA	84.02	97.53	0.5604
Δ	+2.59	+2.07	-0.2221

Evaluation data

This adapter is evaluated on two complementary held-out sets, both drawn from the same val + test splits the LoRA never saw during training:

1. Full val split — used for the rank sweep table above

Size: 1,431 validation sequences (this genre's val partition)
Methodology: teacher-forced next-token CE / top-1 / top-5 with pad_id masking, batch 32, no key augmentation
Comparison fairness: same evaluate() call as ai/results/f1_per_genre_baseline.csv, same dataloader, same [GENRE:none]-initialised embedding-extension setup. Only the LoRA's adapter weights + the 8 new genre embedding rows differ.
Output: ai/results/lora_rank_sweep.csv (long format, one row per (genre, rank) cell)

2. Curated 130-song real-song eval — used for the Real-song eval section below

Size: 10 songs from this genre (10 per genre × 13 genres = 130 total)
Source partition: drawn from splits/val.jsonl + splits/test.jsonl only (no train leakage)
Per-genre sources: chordonomicon_bossa
Title coverage (this genre): 0 of 10 are named real songs; remainder are Chordonomicon entries whose title field is a Spotify track ID by upstream dataset policy
Bar range (this genre): 24–78 bars (≈ 88s avg at typical tempo for this genre)
Build script: ai/training/build_eval_real_songs.py --seed 42 --per-genre 10 — deterministic, re-runnable
Output: ai/results/real_song_eval.csv (17 models × 130 songs, long format)
Full dataset composition + per-source license + methodology: see docs/EVAL.md

License and use

The adapter weights are released under CC BY-NC 4.0 (matching Chordonomicon, the upstream training corpus). Permitted: research, paper replication, portfolio, demo. Not permitted: commercial deployment without separate licensing of upstream data.

Usage

import torch
from huggingface_hub import hf_hub_download
from peft import PeftModel
from model import MusicTransformer
from tokenizer import ChordTokenizer

# 1. Load the F1 base
base_path = hf_hub_download(
    repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80",
    filename="best.pt",
)
base_ckpt = torch.load(base_path, map_location="cpu", weights_only=False)
tokenizer = ChordTokenizer()
model = MusicTransformer(
    vocab_size=tokenizer.vocab_size,
    d_model=512, n_heads=8, d_ff=2048, n_layers=8,
    max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(base_ckpt["model_state_dict"])

# 2. Extend the embedding to fit the LoRA's expanded vocabulary
ext_path = hf_hub_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-lora-bossa", filename="embedding_extension.pt")
ext = torch.load(ext_path, map_location="cpu", weights_only=False)
# (See model/README.md for the apply-extension recipe.)

# 3. Apply the LoRA adapter
adapter_dir = hf_hub_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-lora-bossa", filename="adapter_model.safetensors")
model = PeftModel.from_pretrained(model, adapter_dir.rsplit("/", 1)[0])
model.eval()

Citation

Preprint: arXiv:2605.04998.

@misc{lee2026chordmix,
  title         = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2605.04998},
  archivePrefix = {arXiv}
}

Downloads last month: 66

Model tree for PearlLeeStudio/TheArtist-MusicTransformer-lora-bossa

Base model

PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80

Adapter

(11)

this model

Paper for PearlLeeStudio/TheArtist-MusicTransformer-lora-bossa

Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation

Paper • 2605.04998 • Published 11 days ago