How to convert a single safetensors file to PEFT format

I’m struggle in getting adapter_model.safetensors and adapter_config.json from this weighted LoRA file

1 Like

No conversion needed. Use it directly with Diffusers like this:

https://huggingface.co/lightx2v/Qwen-Image-Lightning :

pipe = DiffusionPipeline.from_pretrained(
    "Qwen/Qwen-Image", scheduler=scheduler, torch_dtype=torch.bfloat16
).to("cuda")
pipe.load_lora_weights(
    "lightx2v/Qwen-Image-Lightning", weight_name="Qwen-Image-Lightning-8steps-V1.0.safetensors"
)

thanks reply, but vLLM-Omni request use PEFT format LoRA:

1 Like

Oh…


Key point: that .safetensors is a Diffusers/ComfyUI LoRA, not a Transformers “PEFT adapter folder”

The file you linked (Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors) is published to be loaded directly via Diffusers (pipe.load_lora_weights(...)) on top of the base model Qwen/Qwen-Image, or used in ComfyUI. The repo’s model card shows exactly that usage pattern. (Hugging Face)

By contrast, a Transformers/PEFT adapter typically lives in a directory containing adapter_config.json + adapter_model.safetensors. (Hugging Face)
Those files are not “extractable” from an arbitrary LoRA .safetensors unless you (re)construct the adapter configuration (target modules, rank, alpha, etc.) in a real model and then re-save it.


What vLLM-Omni expects

vLLM-Omni’s diffusion LoRA endpoint requires a PEFT adapter folder like: lora_adapter/adapter_config.json + lora_adapter/adapter_model.safetensors. (vLLM)

The file you linked (Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors) is a single-file Diffusers LoRA weight (meant to be loaded with pipe.load_lora_weights(...)), not a PEFT adapter folder. (Hugging Face)

So you need to load it into the base model once, then re-save it via Diffusers’ PEFT adapter API (save_lora_adapter), which generates the adapter_config.json and a safetensors weight file. (Hugging Face)


Conversion script (Diffusers → PEFT adapter folder)

Notes:

  • The Qwen-Image-Lightning model card explicitly recommends installing Diffusers from main. (Hugging Face)
  • This produces the exact folder structure vLLM-Omni documents. (vLLM)
import math
import torch
from diffusers import DiffusionPipeline, FlowMatchEulerDiscreteScheduler

# 1) Create the base pipeline (same pattern as the model card)
scheduler_config = {
    "base_image_seq_len": 256,
    "base_shift": math.log(3),
    "invert_sigmas": False,
    "max_image_seq_len": 8192,
    "max_shift": math.log(3),
    "num_train_timesteps": 1000,
    "shift": 1.0,
    "shift_terminal": None,
    "stochastic_sampling": False,
    "time_shift_type": "exponential",
    "use_beta_sigmas": False,
    "use_dynamic_shifting": True,
    "use_exponential_sigmas": False,
    "use_karras_sigmas": False,
}
scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)

pipe = DiffusionPipeline.from_pretrained(
    "Qwen/Qwen-Image",
    scheduler=scheduler,
    torch_dtype=torch.bfloat16,
).to("cuda")

# 2) Load the single safetensors LoRA file into the pipeline
pipe.load_lora_weights(
    "lightx2v/Qwen-Image-Lightning",
    weight_name="Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors",
    adapter_name="lightning_v2",  # give it a name so we can save it explicitly
)

# 3) Re-save as a PEFT adapter folder (adapter_config.json + adapter_model.safetensors)
#    save_lora_adapter() is a PEFT adapter API on the *underlying model component*.
#    For Qwen/Qwen-Image, LoRA is typically on the diffusion "transformer" component.
pipe.transformer.save_lora_adapter(
    "lora_adapter",
    adapter_name="lightning_v2",
    safe_serialization=True,
    weight_name="adapter_model.safetensors",
)

print("Wrote PEFT adapter to ./lora_adapter")

save_lora_adapter(...) is documented to serialize the adapter (and supports weight_name + safetensors). (Hugging Face)


Use the output with vLLM-Omni

Point vLLM-Omni at the created folder:

  • --lora-path /path/to/lora_adapter (must be readable by the server) (vLLM)

  • Folder must contain:

    • adapter_config.json
    • adapter_model.safetensors (vLLM)

Troubleshooting

1) AttributeError: '...Pipeline' object has no attribute 'transformer'

Some pipelines use unet instead of transformer. In that case, save from pipe.unet:

pipe.unet.save_lora_adapter("lora_adapter", adapter_name="lightning_v2",
                           safe_serialization=True, weight_name="adapter_model.safetensors")

2) The LoRA loads in Diffusers but fails in PEFT save

Prefer the PEFT “model-level” path: load the adapter onto the component, then save it. Diffusers documents load_lora_adapter(...) + save_lora_adapter(...) as the direct model-level workflow. (Hugging Face)

3) You’re tempted to hand-write adapter_config.json

Don’t, unless you know the exact target modules / ranks / alphas expected by the model. vLLM-Omni (and Transformers PEFT loaders) assume a valid adapter_config.json alongside the weights. (vLLM)

Edit:
doesn’t work practically…

Hi, I run your script, but only get adapter_model.safetensors, no adapter_config.json, I get it from follow code:

pipe.transformer.peft_config["lightning_v2"].save_pretrained("./lora_adapter")

then pass folder(./lora_adapter) to vLLM-Omni and raise error, it say “state_dict” keys is not match…

1 Like

Sorry… The implementation of the inference part of the Diffusion model itself seems to differ quite a bit between Diffusers, Comfy UI, and vLLM-Omni.:scream:

In this case, forcing the state_dict key names to match might make it work, but it’s unclear if it would function correctly. (Depends on the code of that version of vLLM-Omni)

Merging it first would definitely work, I think… but it wouldn’t be a conversion.


To use Qwen-Image-Lightning LoRA on vLLM-Omni

Option A (recommended): merge the LoRA into the base model, then serve it as a normal model

This avoids the entire “PEFT adapter keys don’t match” problem.

Why this works: vLLM-Omni’s diffusion LoRA path is strict about module name alignment (see Option B). If you “bake” the LoRA deltas into the base weights, vLLM-Omni just loads a single checkpoint and there is no adapter to validate.

Steps

  1. Load the base Qwen-Image model (same base that the Lightning LoRA was trained for).

  2. Load the Lightning LoRA safetensors into that pipeline (Diffusers or the Qwen-Image reference loader).

  3. Merge/fuse LoRA into the base weights (so the model weights become the adapted weights).

  4. Save the merged model directory.

  5. Serve the merged directory with vLLM-Omni:

    • vLLM-Omni serves a single diffusion model per server instance. (vLLM)
  6. Use 8 inference steps when requesting images (because this LoRA is “8steps”). vLLM-Omni exposes num_inference_steps in the request body. (vLLM)

Why I’d pick this first: vLLM-Omni diffusion LoRA support is PEFT-compatible, but it’s new and keyed to vLLM’s internal module naming/packing behavior. (GitHub)


Why your current “PEFT folder” fails in vLLM-Omni

You already discovered:

  • You can produce adapter_model.safetensors

  • You can produce adapter_config.json via:

    pipe.transformer.peft_config["lightning_v2"].save_pretrained("./lora_adapter")
    

…but vLLM-Omni rejects it with “state_dict keys not match”.

That error is expected if the adapter’s target module names (and therefore the saved weight keys) don’t align with what vLLM-Omni believes are “supported/expected LoRA modules” for that diffusion pipeline.

What vLLM-Omni is doing internally

vLLM-Omni’s DiffusionLoRAManager:

  • Computes supported module suffixes from the pipeline using get_supported_lora_modules()
  • Builds/uses a packed_modules_mapping so it can handle fused projections (e.g., packed QKV) and accept LoRAs trained on logical sub-projections
  • Expands an _expected_lora_modules set
  • Loads the adapter via LoRAModel.from_local_checkpoint(... expected_lora_modules=...)
  • Critically: it passes weights_mapper=None (so there is no automatic renaming of keys) (vLLM)

So if Diffusers/ComfyUI used names like to_q, to_k, to_v, to_out, etc., but vLLM-Omni’s Qwen-Image transformer uses different names (and often packed/fused linears), your adapter keys won’t validate.

This is also why “same repository / same model” can still differ: vLLM-Omni re-implements diffusion transformer components with vLLM-style layers and packed projections for performance/parallelism, so module naming/structure can differ from Diffusers.


Option B: make a real vLLM-Omni-compatible PEFT LoRA (harder, but possible)

vLLM-Omni expects a PEFT folder like: (vLLM)

lora_adapter/
├── adapter_config.json
└── adapter_model.safetensors

But the content must match vLLM-Omni’s expected module names.

B1) First, extract what vLLM-Omni expects (target module suffixes)

Your goal: get the set that DiffusionLoRAManager calls _expected_lora_modules. (vLLM)

Practical ways:

  • Enable debug logging and trigger adapter load; it logs the supported/expected modules. (vLLM)

  • Or write a small script that instantiates the same pipeline/module objects and prints:

    • get_supported_lora_modules(pipeline)
    • any packed_modules_mapping found on modules
    • expanded expected modules (same function the manager uses)

B2) Inspect your Lightning safetensors keys (what you currently have)

Run something like:

from safetensors.torch import load_file

sd = load_file("Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors")
keys = list(sd.keys())
print("num_keys:", len(keys))
print("sample:", keys[:50])

# quick “module suffix” feel
import re
mods = set()
for k in keys:
    # tweak this depending on actual key style you see
    m = re.search(r"\.(to_[qkv]|to_out|q_proj|k_proj|v_proj|proj|fc1|fc2)\b", k)
    if m:
        mods.add(m.group(1))
print("matched module-ish tokens:", sorted(mods))

This tells you whether the file is closer to:

  • Diffusers attention naming (to_q, to_k, to_v, to_out)
  • HF transformer naming (q_proj, k_proj, v_proj, o_proj)
  • Something ComfyUI-specific

B3) Build a mapping: Diffusers/ComfyUI module names → vLLM-Omni module names

Typical mismatch patterns (examples):

  • to_q, to_k, to_v vs packed qkv projections
  • to_out.0 vs proj / o_proj
  • MLP: fc1/fc2 vs gate_up_proj/down_proj-style

vLLM-Omni is explicitly designed to handle packed projections by:

  • discovering packed_modules_mapping on the model
  • treating QKVParallelLinear as 3-slice packed (["q","k","v"]) (vLLM)

So if (and only if) the Qwen-Image vLLM-Omni implementation exposes a compatible mapping, you may be able to rename your adapter keys to match the “slice names” it will accept.

B4) Rewrite the adapter weights and config

You may need to:

  • Rewrite state_dict key paths (the important part)
  • Ensure adapter_config.json includes target_modules that match what vLLM-Omni expects and what your rewritten keys implement (it logs target_modules when loading). (vLLM)

A template for renaming keys:

from safetensors.torch import load_file, save_file

src = load_file("adapter_model.safetensors")

RENAMES = [
    (".to_q.", ".q."),      # example only
    (".to_k.", ".k."),
    (".to_v.", ".v."),
    (".to_out.0.", ".proj."),
]

dst = {}
for k, v in src.items():
    nk = k
    for a, b in RENAMES:
        nk = nk.replace(a, b)
    dst[nk] = v

save_file(dst, "adapter_model_vllm.safetensors")
print("done. keys:", len(dst))

Then point adapter_config.json to target_modules matching the suffixes vLLM-Omni expects.

B5) Reality check: you may need to patch vLLM-Omni

Because diffusion LoRA loading currently uses weights_mapper=None, there is no built-in key translation hook. (vLLM)
If the required mapping is non-trivial (common), the clean solution is:

  • add a weights_mapper for diffusion adapters (or a model-specific mapper for Qwen-Image)
  • or ensure the model exposes packed_modules_mapping that matches popular training tool outputs

How other users effectively use “ComfyUI LoRA” with vLLM-Omni (practically)

Most people who succeed quickly do one of:

  1. Merge LoRA into base weights and serve the merged model (Option A)
  2. Use LoRAs that were trained/exported in PEFT format against a module naming scheme that vLLM/vLLM-Omni accepts (often not ComfyUI-native single-file LoRAs)

Given your current error and vLLM-Omni’s strict loader, Option A is the most reliable path.


Reading list (relevant, practical)

  • vLLM-Omni diffusion LoRA online serving example and required folder format (vLLM)
  • vLLM-Omni DiffusionLoRAManager internals (why key mismatches happen; packed modules mapping; no weights_mapper) (vLLM)
  • vLLM-Omni release notes highlighting “Diffusion LoRA Adapter Support (PEFT-compatible)” (feature maturity context) (GitHub)
  • vLLM LoRA adapters documentation (general vLLM LoRA expectations and serving patterns) (vLLM)

To merge/fuse Lightning into the base model — step-by-step

0) What you will produce

A new local model directory that contains the base Qwen-Image weights with Lightning already applied, so vLLM-Omni loads it as a normal diffusion model (no LoRA at runtime). vLLM-Omni serves diffusion models via /v1/images/generations. (docs.vllm.ai)


1) Prepare environment (Diffusers “main”)

The Lightning model card explicitly says to install Diffusers from main. (Hugging Face)

pip install -U "torch" "transformers" "accelerate" "safetensors"
pip install -U "git+https://github.com/huggingface/diffusers.git"

2) Fuse the V2.0 bf16 LoRA into Qwen/Qwen-Image

Create a script fuse_qwen_image_lightning_v2.py:

import math
import torch
from diffusers import DiffusionPipeline, FlowMatchEulerDiscreteScheduler

# Scheduler config used by Qwen-Image-Lightning authors (shift=3 distillation)
scheduler_config = {
    "base_image_seq_len": 256,
    "base_shift": math.log(3),
    "invert_sigmas": False,
    "max_image_seq_len": 8192,
    "max_shift": math.log(3),
    "num_train_timesteps": 1000,
    "shift": 1.0,
    "shift_terminal": None,
    "stochastic_sampling": False,
    "time_shift_type": "exponential",
    "use_beta_sigmas": False,
    "use_dynamic_shifting": True,
    "use_exponential_sigmas": False,
    "use_karras_sigmas": False,
}

def main():
    device = "cuda"
    dtype = torch.bfloat16

    scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)

    # 1) Load base model
    pipe = DiffusionPipeline.from_pretrained(
        "Qwen/Qwen-Image",
        scheduler=scheduler,
        torch_dtype=dtype,
    ).to(device)

    # 2) Load Lightning LoRA (your file)
    pipe.load_lora_weights(
        "lightx2v/Qwen-Image-Lightning",
        weight_name="Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors",
    )

    # 3) Fuse LoRA into base weights, then unload adapter tensors
    #    (Diffusers recommends unload after fuse, then save_pretrained)
    pipe.fuse_lora(lora_scale=1.0)
    pipe.unload_lora_weights()

    # 4) Save the fused pipeline locally
    out_dir = "./Qwen-Image-Lightning-8steps-V2.0-fused"
    pipe.save_pretrained(out_dir, safe_serialization=True)

    print(f"Saved fused model to: {out_dir}")

if __name__ == "__main__":
    main()

Why these exact pieces:

  • The scheduler config and the “8 steps / true_cfg_scale=1.0” recipe are from the Lightning model card (they use a FlowMatchEulerDiscreteScheduler config with shift=3 via logs, and call the pipeline with 8 steps). (Hugging Face)
  • The fuse workflow is Diffusers’ documented pattern: fuse_lora() → unload_lora_weights() → save_pretrained(). (Hugging Face)

Run it:

python fuse_qwen_image_lightning_v2.py

3) Sanity-check the fused directory (optional but recommended)

After fusion, the model should work without load_lora_weights():

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "./Qwen-Image-Lightning-8steps-V2.0-fused",
    torch_dtype=torch.bfloat16,
).to("cuda")

img = pipe(
    prompt="a tiny astronaut hatching from an egg on the moon, Ultra HD, 4K",
    negative_prompt=" ",
    width=1024,
    height=1024,
    num_inference_steps=8,
    true_cfg_scale=1.0,
    generator=torch.manual_seed(0),
).images[0]

img.save("fused_test.png")

The “8 steps” + true_cfg_scale=1.0 matches the Lightning authors’ recommended inference settings. (Hugging Face)


4) Serve the fused model with vLLM-Omni

vLLM-Omni serves diffusion models with:

vllm serve /ABS/PATH/Qwen-Image-Lightning-8steps-V2.0-fused --omni --port 8000
  • vLLM-Omni uses /v1/images/generations for diffusion models. (docs.vllm.ai)
  • vLLM supports serving a local model path. (vLLM Forums)

If you get OOM during serving, the Qwen text-to-image example notes you can enable VAE slicing/tiling flags to reduce memory. (docs.vllm.ai)


5) Call the API using Lightning-like parameters

vLLM-Omni’s Image Generation API supports num_inference_steps, negative_prompt, and true_cfg_scale. (docs.vllm.ai)

curl -X POST http://localhost:8000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a tiny astronaut hatching from an egg on the moon, Ultra HD, 4K",
    "negative_prompt": " ",
    "size": "1024x1024",
    "num_inference_steps": 8,
    "true_cfg_scale": 1.0,
    "seed": 0
  }' | jq -r ".data[0].b64_json" | base64 -d > out.png

Thank you again for your answer. I tried your method but it still didn’t work. vLLM-Omni raise error “transformer_blocks.0.attn.add_k_proj.alpha is unsupported LoRA weight” , I think we can only hope for support in the new version…:sad_but_relieved_face:

1 Like

Yeah. Or maybe it’d be faster to save the merged LoRA weights, upload them to Hugging Face, and use those…:thinking:
If we just use the entire model repository instead of LoRA, the differences in LoRA implementation won’t matter.

To make LoRAs for Diffusers/Comfy UI usable with vLLM-Omni, they’d need to make quite a few implementation changes on the vLLM-Omni side… Still, there seems to be demand (since there are many existing LoRAs), so the possibility of implementation might not be zero…

I’ve had luck using this script to convert ComfyUI-formatted plain safetensors LoRAs into a format that’s accepted by vllm-omni: comfyui-to-vllm-omni.py · GitHub

1 Like

Oh! I’ve rewritten it for Qwen-Image. From what I’ve tested so far, it seems that tensors with keys other than mlp* can be converted. However, it’s unclear whether LoRA will actually work with the converter below…


Qwen-Image support is mostly a naming/prefix problem.

vLLM-Omni diffusion LoRAs must be a PEFT adapter directory (adapter_config.json + adapter_model.safetensors). (vLLM)
vLLM is strict about module-name suffixes and PEFT key naming, and it breaks on *.to_out.0.* unless you normalize it to *.to_out.*. (GitHub)
For Qwen-Image specifically, the pipeline loads transformer weights under a transformer. prefix, and the pipeline has a self.transformer = QwenImageTransformer2DModel(...). (GitHub)
The Qwen-Image transformer also exposes packed projection shard mappings and normalizes .to_out.0. → .to_out. when loading weights. (GitHub)

Below is a rewritten version of the gist that adds a Qwen-Image converter for ComfyUI-style keys like:

transformer_blocks.N.attn.to_q.lora_down.weight

It converts them into PEFT keys like:

base_model.model.transformer.transformer_blocks.N.attn.to_q.lora_A.weight

Rewritten script (drop-in, supports Qwen-Image)

#!/usr/bin/env python3
"""
comfyui-to-vllm-omni-qwenimage.py

Convert ComfyUI-style Qwen-Image LoRA safetensors (lora_down/lora_up) into a PEFT
adapter folder accepted by vLLM-Omni diffusion LoRA loader.

Why this works:
- vLLM-Omni requires PEFT adapter directory format. (adapter_config.json + adapter_model.safetensors)
  https://docs.vllm.ai/projects/vllm-omni/en/latest/user_guide/diffusion/lora/
- vLLM expects lora_A/lora_B naming; ComfyUI uses lora_down/lora_up.
- vLLM has a known failure for ModuleList/Sequential numeric indices like "to_out.0".
  Fix by rewriting to "to_out". https://github.com/vllm-project/vllm/issues/35734
- Qwen-Image pipeline loads transformer weights with prefix "transformer." and defines self.transformer.
  https://raw.githubusercontent.com/vllm-project/vllm-omni/main/vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py
- Qwen-Image transformer exposes packed shard mapping and normalizes ".to_out.0." -> ".to_out." in load_weights.
  https://raw.githubusercontent.com/vllm-project/vllm-omni/main/vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py
"""

import argparse
import json
import re
import sys
from pathlib import Path

import torch
from safetensors.torch import load_file, save_file


# -------------------------
# Qwen-Image settings
# -------------------------

# vLLM strips "base_model.model." internally, and Qwen-Image modules live under "transformer.*"
# (pipeline uses prefix="transformer." and assigns self.transformer=QwenImageTransformer2DModel)
PREFIX_QWEN = "base_model.model.transformer."

# Attention-only by default (recommended). You can optionally include MLP keys with --include-mlp.
ALLOWED_QWEN_PREFIXES_ATTN = (
    "attn.to_q",
    "attn.to_k",
    "attn.to_v",
    "attn.to_out",
    "attn.add_q_proj",
    "attn.add_k_proj",
    "attn.add_v_proj",
    "attn.to_add_out",  # present in Qwen-Image-Lightning
)

# Optional MLP keys observed in Qwen-Image-Lightning (ComfyUI-style)
ALLOWED_QWEN_PREFIXES_MLP = (
    "img_mlp.net.0.proj",
    "img_mlp.net.2",
    "txt_mlp.net.0.proj",
    "txt_mlp.net.2",
)

# PEFT config fields vLLM-Omni documents as important: r, lora_alpha, target_modules, base_model_name_or_path
# https://docs.vllm.ai/projects/vllm-omni/en/latest/user_guide/diffusion/lora/
QWEN_TARGET_MODULES_ATTN = [
    "to_q", "to_k", "to_v", "to_out",
    "add_q_proj", "add_k_proj", "add_v_proj",
    "to_add_out",
    # packed names are fine to include even if unused:
    "to_qkv", "add_kv_proj",
]

# If you include MLP keys, vLLM will validate suffixes against expected modules.
# net.2 can be tricky; keep it optional.
QWEN_TARGET_MODULES_MLP = [
    "proj",
    # caution: module suffix may be "2" for net.2; only enable if your vLLM-Omni build expects it
    "2",
]

ADAPTER_CONFIG_TEMPLATE = {
    "peft_type": "LORA",
    "bias": "none",
    "inference_mode": True,
    "lora_dropout": 0.0,
    "r": None,
    "lora_alpha": None,
    "target_modules": None,
    "base_model_name_or_path": None,
}


# -------------------------
# Helpers
# -------------------------

def _remap_direction(direction: str) -> str:
    """lora_down -> lora_A, lora_up -> lora_B"""
    if direction == "lora_down":
        return "lora_A"
    if direction == "lora_up":
        return "lora_B"
    return direction


def _normalize_modulelist_indices(frag: str) -> str:
    """
    Fix vLLM numeric-index issue:
      attn.to_out.0 -> attn.to_out
    Similar normalization exists in Qwen-Image transformer's load_weights. (see qwen_image_transformer.py)
    """
    frag = frag.replace("attn.to_out.0", "attn.to_out")
    frag = frag.replace("attn.to_add_out.0", "attn.to_add_out")
    return frag


def detect_format(keys: list[str]) -> str:
    sample = [k for k in keys if not k.endswith(".alpha")][:50]
    # Qwen-Image-Lightning (ComfyUI style) looks like:
    # transformer_blocks.N.attn.to_q.lora_down.weight
    if any(re.match(r"^transformer_blocks\.\d+\..+\.(lora_down|lora_up)\.weight$", k) for k in sample):
        return "qwen_transformer_blocks_comfyui"
    return "unknown"


def extract_rank_and_alpha(tensors: dict[str, torch.Tensor]) -> tuple[int, float]:
    alpha = None
    for k, v in tensors.items():
        if k.endswith(".alpha"):
            try:
                alpha = float(v.item())
                break
            except Exception:
                pass

    r = None
    for k, v in tensors.items():
        if k.endswith(".lora_down.weight") and hasattr(v, "shape"):
            r = int(v.shape[0])
            break

    if r is None:
        raise ValueError("Could not infer LoRA rank r. Provide --rank.")
    if alpha is None:
        alpha = float(r)
    return r, alpha


# -------------------------
# Converter: Qwen-Image transformer_blocks.* (ComfyUI lora_down/lora_up)
# -------------------------

def convert_qwen_transformer_blocks_comfyui(
    tensors: dict[str, torch.Tensor],
    include_mlp: bool,
    dtype: torch.dtype,
) -> tuple[dict[str, torch.Tensor], list[str]]:
    out: dict[str, torch.Tensor] = {}
    unmapped: list[str] = []

    allowed_prefixes = ALLOWED_QWEN_PREFIXES_ATTN + (ALLOWED_QWEN_PREFIXES_MLP if include_mlp else ())

    pat = re.compile(r"^transformer_blocks\.(\d+)\.(.+?)\.(lora_down|lora_up)\.weight$")

    for k, v in tensors.items():
        if k.endswith(".alpha"):
            continue

        m = pat.match(k)
        if not m:
            unmapped.append(k)
            continue

        block_idx = int(m.group(1))
        frag = _normalize_modulelist_indices(m.group(2))
        direction = m.group(3)

        if not frag.startswith(allowed_prefixes):
            unmapped.append(k)
            continue

        ab = _remap_direction(direction)
        new_key = f"{PREFIX_QWEN}transformer_blocks.{block_idx}.{frag}.{ab}.weight"

        if v.dtype != dtype:
            v = v.to(dtype)
        out[new_key] = v

    # Final safety: remove any leftover ".to_out.0." in full key
    fixed: dict[str, torch.Tensor] = {}
    for k, v in out.items():
        nk = k.replace(".to_out.0.", ".to_out.").replace(".to_add_out.0.", ".to_add_out.")
        fixed[nk] = v

    return fixed, unmapped


# -------------------------
# Main
# -------------------------

def main():
    ap = argparse.ArgumentParser("Convert ComfyUI Qwen-Image LoRA -> vLLM-Omni PEFT adapter dir")
    ap.add_argument("--input", required=True, help="Input LoRA .safetensors")
    ap.add_argument("--output", required=True, help="Output adapter directory")
    ap.add_argument("--base-model", default="Qwen/Qwen-Image", help="base_model_name_or_path in adapter_config.json")
    ap.add_argument("--dtype", choices=["bf16", "fp16", "fp32"], default="bf16")
    ap.add_argument("--include-mlp", action="store_true", help="Also convert img_mlp/txt_mlp LoRA keys (may fail if vLLM expects different suffixes)")
    args = ap.parse_args()

    dtype_map = {"bf16": torch.bfloat16, "fp16": torch.float16, "fp32": torch.float32}
    out_dtype = dtype_map[args.dtype]

    in_path = Path(args.input)
    if not in_path.exists():
        sys.exit(f"[ERROR] Input not found: {in_path}")

    print(f"[INFO] Loading: {in_path}")
    tensors = load_file(str(in_path))
    keys = list(tensors.keys())

    fmt = detect_format(keys)
    print(f"[INFO] Detected format: {fmt}")
    if fmt != "qwen_transformer_blocks_comfyui":
        sys.exit(
            "[ERROR] This rewrite currently targets Qwen-Image ComfyUI keys like:\n"
            "  transformer_blocks.N.attn.to_q.lora_down.weight\n"
            "If your keys differ, paste 30 keys and adjust detect_format/regex."
        )

    r, alpha = extract_rank_and_alpha(tensors)
    print(f"[INFO] Inferred r={r}, lora_alpha={alpha}")

    converted, unmapped = convert_qwen_transformer_blocks_comfyui(
        tensors=tensors,
        include_mlp=args.include_mlp,
        dtype=out_dtype,
    )

    print(f"[INFO] Converted tensors: {len(converted)}")
    if unmapped:
        print(f"[WARN] Unmapped keys: {len(unmapped)} (showing first 20)")
        for k in unmapped[:20]:
            print("   ", k)

    out_dir = Path(args.output)
    out_dir.mkdir(parents=True, exist_ok=True)

    cfg = dict(ADAPTER_CONFIG_TEMPLATE)
    cfg["r"] = int(r)
    cfg["lora_alpha"] = float(alpha)
    cfg["base_model_name_or_path"] = args.base_model
    cfg["target_modules"] = (
        QWEN_TARGET_MODULES_ATTN + (QWEN_TARGET_MODULES_MLP if args.include_mlp else [])
    )

    (out_dir / "adapter_config.json").write_text(json.dumps(cfg, indent=2), encoding="utf-8")
    save_file(converted, str(out_dir / "adapter_model.safetensors"))

    print(f"[DONE] Wrote PEFT adapter dir: {out_dir}")
    print("       - adapter_config.json")
    print("       - adapter_model.safetensors")


if __name__ == "__main__":
    main()

Usage (for Qwen-Image-Lightning)

python comfyui-to-vllm-omni-qwenimage.py \
  --input Qwen-Image-Lightning-8steps-V2.0-bf16.safetensors \
  --output ./out_adapter \
  --dtype bf16 \
  --base-model Qwen/Qwen-Image

Why this matches Qwen-Image in vLLM-Omni

  • It writes LoRA keys under ...transformer... which aligns with Qwen-Image pipeline weight source prefix prefix="transformer." and self.transformer = QwenImageTransformer2DModel(...). (GitHub)
  • It keeps to_q/to_k/to_v and add_q_proj/add_k_proj/add_v_proj, which align with Qwen-Image transformer packed shard mapping (to_qkv shards and add_kv_proj shards). (GitHub)
  • It normalizes to_out.0 to to_out to avoid the known vLLM numeric-index LoRA failure. (GitHub)
  • It outputs the PEFT adapter folder vLLM-Omni requires. (vLLM)