TheArtist Music Transformer — LoRA Adapter (Bossa nova)

LoRA adapter that conditions the F1 base (PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80) toward bossa nova chord progressions. One of eleven per-genre adapters released alongside the paper Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation (Lee, 2026). This release is the best-rank snapshot from a 5-point rank sweep (r ∈ {4, 8, 16, 32, 64}); see §Rank sweep below for the full table and selection criterion.

Adapter summary

Field Value
Base model PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80 (F1, 25.6M params)
Adapter type LoRA (Q/K/V projections)
LoRA rank 16
LoRA alpha 32
LoRA dropout 0.05
Target modules w_q, w_k, w_v
Trainable parameters 401,408 (1.54% of base)
Adapter file size ~1.5 MB
Base vocabulary 351 tokens (jazz/pop)
Vocabulary extension +8 genre tokens (embedding_extension.pt)
Training epochs 5

Training data

Source

14,315 chord-progression sequences in the bossa nova subset of the Chordonomicon dataset. Chordonomicon is licensed CC BY-NC 4.0; see the dataset card for full terms.

Filter rule

genres contains any of {bossa, samba, latin, salsa, cumbia}

(See ai/training/extract_genre_subsets.py:GENRE_FILTERS for the full extraction logic — main matches the main_genre column, genres_any substring-matches the free-form genres column. Each song is assigned to its first matching genre so it never double-counts.)

Splits (song-level, seed=42, 80/10/10)

Partition Songs Used for
train 11,452 this LoRA's training (12-key augmented → 137,424 sequences)
val 1,431 rank-sweep eval + best-epoch selection during training
test 1,432 held aside for future paired analysis

Vocabulary

  • Base: 351 tokens (jazz/pop chord vocab from the F1 base model)
  • Extension: +8 [GENRE:X] tokens covering 8 new genres (this LoRA adds the [GENRE:bossa] token)
  • Final vocab: 359 tokens (stored alongside the adapter in embedding_extension.pt)

Reproducibility

# 1. Pull Chordonomicon raw csv into ai/data/raw/chordonomicon/
# 2. Extract this genre subset
uv run python ai/training/extract_genre_subsets.py --genres bossa --merge

# 3. Train the LoRA at the released rank
uv run python ai/training/lora_train.py --config ai/training/configs/lora/bossa_r16.yaml

Hyperparameters: 5 epochs · batch 32 × accum 2 · lr 3e-4 · 1-epoch warmup · AMP fp16 · best.pt selected by min val_loss.

Genre character

Brazilian bossa nova jazz harmony

Rank sweep

The released adapter is the best-rank snapshot from training the same LoRA recipe at five different ranks. Every cell uses the same F1 base, same val split, same evaluate() call, and the same [GENRE:none]-initialized embedding extension — only lora_r (and lora_alpha = 2 × lora_r) changes. Numbers are validation-set token-level metrics (no key augmentation).

Rank val_loss val_top1 (%) val_top5 (%) Δtop1 vs F1
r=4 0.6593 81.13 96.00 +2.80
r=8 0.6601 81.15 94.86 +2.82
r=16 0.6589 82.30 95.99 +3.97 ← selected
r=32 0.6601 81.13 96.00 +2.80
r=64 0.6604 81.12 96.00 +2.79

Selection criterion: minimum validation cross-entropy loss; val_top1 as tiebreaker. val_loss is what the training loop optimizes and what selects each rank's best.pt epoch, so using it for cross-rank selection keeps consistency with how each individual checkpoint was chosen.

Full 11-genre × 5-rank sweep + full-FT anchor table: ai/results/lora_rank_sweep.md in the repo.

Evaluation

Validation token-level metrics on the genre-specific val split (1431 sequences, no key augmentation). The F1 base column uses the same val split, same dataloader, and the same [GENRE:none]-initialized embedding-extension setup as the LoRA run — only the LoRA parameters and the trained embedding rows differ.

Metric F1 base alone F1 + this LoRA Δ
Top-1 accuracy (%) 78.33 82.30 +3.97
Top-5 accuracy (%) 93.64 95.99 +2.35
Cross-entropy loss 0.9635 0.6589 -0.3046

Source: ai/results/f1_per_genre_baseline.csv + ai/results/lora_rank_sweep.csv. Higher top-1/top-5 and lower loss are better.

Real-song eval

Mean validation top-1/top-5/cross-entropy on 10 held-out real bossa songs from ai/data/eval_real_songs.jsonl (held-out from ai/data/splits/{val,test}.jsonl, see docs/EVAL.md for dataset composition + methodology). Teacher-forced eval — same evaluate() call as the full-val rank-sweep eval above, just narrowed to a curated 10-song subset.

Model Top-1 (%) Top-5 (%) val_loss
F1 base alone 81.43 95.47 0.7825
F1 + this LoRA 84.02 97.53 0.5604
Δ +2.59 +2.07 -0.2221

Evaluation data

This adapter is evaluated on two complementary held-out sets, both drawn from the same val + test splits the LoRA never saw during training:

1. Full val split — used for the rank sweep table above

  • Size: 1,431 validation sequences (this genre's val partition)
  • Methodology: teacher-forced next-token CE / top-1 / top-5 with pad_id masking, batch 32, no key augmentation
  • Comparison fairness: same evaluate() call as ai/results/f1_per_genre_baseline.csv, same dataloader, same [GENRE:none]-initialised embedding-extension setup. Only the LoRA's adapter weights + the 8 new genre embedding rows differ.
  • Output: ai/results/lora_rank_sweep.csv (long format, one row per (genre, rank) cell)

2. Curated 130-song real-song eval — used for the Real-song eval section below

  • Size: 10 songs from this genre (10 per genre × 13 genres = 130 total)
  • Source partition: drawn from splits/val.jsonl + splits/test.jsonl only (no train leakage)
  • Per-genre sources: chordonomicon_bossa
  • Title coverage (this genre): 0 of 10 are named real songs; remainder are Chordonomicon entries whose title field is a Spotify track ID by upstream dataset policy
  • Bar range (this genre): 24–78 bars (≈ 88s avg at typical tempo for this genre)
  • Build script: ai/training/build_eval_real_songs.py --seed 42 --per-genre 10 — deterministic, re-runnable
  • Output: ai/results/real_song_eval.csv (17 models × 130 songs, long format)
  • Full dataset composition + per-source license + methodology: see docs/EVAL.md

License and use

The adapter weights are released under CC BY-NC 4.0 (matching Chordonomicon, the upstream training corpus). Permitted: research, paper replication, portfolio, demo. Not permitted: commercial deployment without separate licensing of upstream data.

Usage

import torch
from huggingface_hub import hf_hub_download
from peft import PeftModel
from model import MusicTransformer
from tokenizer import ChordTokenizer

# 1. Load the F1 base
base_path = hf_hub_download(
    repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80",
    filename="best.pt",
)
base_ckpt = torch.load(base_path, map_location="cpu", weights_only=False)
tokenizer = ChordTokenizer()
model = MusicTransformer(
    vocab_size=tokenizer.vocab_size,
    d_model=512, n_heads=8, d_ff=2048, n_layers=8,
    max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
model.load_state_dict(base_ckpt["model_state_dict"])

# 2. Extend the embedding to fit the LoRA's expanded vocabulary
ext_path = hf_hub_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-lora-bossa", filename="embedding_extension.pt")
ext = torch.load(ext_path, map_location="cpu", weights_only=False)
# (See model/README.md for the apply-extension recipe.)

# 3. Apply the LoRA adapter
adapter_dir = hf_hub_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-lora-bossa", filename="adapter_model.safetensors")
model = PeftModel.from_pretrained(model, adapter_dir.rsplit("/", 1)[0])
model.eval()

Citation

Preprint: arXiv:2605.04998.

@misc{lee2026chordmix,
  title         = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2605.04998},
  archivePrefix = {arXiv}
}
Downloads last month
66
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PearlLeeStudio/TheArtist-MusicTransformer-lora-bossa

Paper for PearlLeeStudio/TheArtist-MusicTransformer-lora-bossa