ZeroCLIP β€” Pre-Built Model Files

This repository hosts pre-built bootstrapped model artifacts for ZeroCLIP, a text-free deterministic conditioning system for Stable Diffusion. Instead of writing text prompts, ZeroCLIP maps integer seeds to CLIP-compatible conditioning vectors β€” producing deterministic, reproducible imagery with no language model involved.

These artifacts are the result of offline build processes (anchor extraction, MLP training, PCA projection, and self-bootstrapped prior discovery) that would otherwise need to be run locally. By downloading them here, you skip the build step entirely and can start generating immediately.

What's Included

Pre-built artifacts are provided for both SDXL and SD1.5 checkpoints, covering all four ZeroCLIP conditioning variants:

ZeroCLIP-A: Basis Decomposition Anchors

  • Files: .npy anchor libraries
  • What they are: Libraries of CLIP text embeddings extracted from a curated vocabulary. At runtime, a seed selects a weighted combination of these anchors via coherent noise and softmax, producing a conditioning vector that is a smooth blend of known concepts.
  • Why pre-built: Extraction requires running a CLIP text encoder over the full vocabulary (~2 min on CPU with transformers installed). These files let you skip that dependency.

ZeroCLIP-B: Latent Coordinate MLP

  • Files: .pt model checkpoints (~50k parameters)
  • What they are: Tiny neural networks trained to map 3D coordinates (derived from seed values) directly to CLIP embedding space. The MLP learns a continuous manifold over the embedding space, enabling microsecond-speed inference.
  • Why pre-built: Training requires generating a dataset of CLIP embeddings and fitting the MLP (~10 min on CPU). These checkpoints are ready to load and run.

ZeroCLIP-C: PCA Projection (Guided Mode)

  • Files: .npz projection matrices
  • What they are: PCA-derived projection matrices that constrain entropy-sampled vectors to the principal components of real CLIP embedding space. This produces outputs that are statistically closer to natural language embeddings while remaining fully deterministic from seed.
  • Why pre-built: Computing the projection requires CLIP encoder access (~2 min on CPU). Note: ZeroCLIP-C pure mode requires no artifacts at all β€” it works immediately with zero setup.

ZeroCLIP-D: Self-Bootstrapped Prior Anchors

  • Files: .npy anchor libraries
  • What they are: Anchor embeddings discovered by the diffusion model itself through an iterative bootstrapping process. Unlike variant A (which derives anchors from text), variant D finds conditioning vectors that the model responds to strongly β€” with no text encoder involved at any stage. This is a fully text-free pipeline from build to inference.
  • Why pre-built: The bootstrap process requires running hundreds of diffusion inference steps to discover high-response anchors (~5 hours on GPU with diffusers and accelerate). These files save significant compute time.

Installation

  1. Download the model files for your target checkpoint type (SDXL, SD1.5, or both).

  2. Place all downloaded files into your ComfyUI models directory:

ComfyUI/models/zeroclip/

This is the default path that ZeroCLIP loader nodes scan. The directory is created automatically on first load of the custom nodes, but you can also create it manually.

  1. Install the ZeroCLIP custom nodes from: https://github.com/MushroomFleet/ComfyUI-ZeroCLIP-nodes

Copy the node pack into your ComfyUI custom nodes directory:

ComfyUI/custom_nodes/ZeroClip-nodes/
  1. Restart ComfyUI. The loader nodes (e.g., ZeroClip-A Load Anchors, ZeroClip-B Load MLP) will present dropdown file pickers populated from models/zeroclip/.

Example Workflows

The ComfyUI-ZeroCLIP-nodes repository includes ready-to-use workflow JSON files in the /workflows/ folder. Import them into ComfyUI by dragging the JSON file onto the canvas.

SD1.5 Workflows

Workflow Description
zeroclip_a_basic.json SeedPack -> A-Load Anchors -> A-Conditioning -> KSampler
zeroclip_b_basic.json SeedPack -> B-Load MLP -> B-Conditioning -> KSampler
zeroclip_c_pure.json C-Conditioning (pure, no model files needed) -> KSampler
zeroclip_c_guided.json C-Load Projection -> C-Conditioning (guided) -> KSampler
zeroclip_d_basic.json SeedPack -> D-Load Anchors -> D-Conditioning -> KSampler
zeroclip_blend_example.json Two seeds -> Two conditionings -> Blend -> KSampler

SDXL Workflows

Workflow Description
SDXL_zeroclip_a_basic.json SeedPack -> A-Load Anchors (seq+pooled) -> A-Conditioning SDXL -> KSampler
SDXL_zeroclip_b_basic.json SeedPack -> B-Load MLP (seq+pooled) -> B-Conditioning SDXL -> KSampler
SDXL_zeroclip_c_pure.json C-Conditioning SDXL (pure, no model files needed) -> KSampler
SDXL_zeroclip_c_guided.json C-Load Projection -> C-Conditioning SDXL (guided) -> KSampler
SDXL_zeroclip_d_basic.json SeedPack -> D-Load Anchors (seq+pooled) -> D-Conditioning SDXL -> KSampler
SDXL_zeroclip_blend_example.json Two seeds -> Two conditionings -> Blend -> KSampler

How ZeroCLIP Works

Traditional Stable Diffusion workflows require a text prompt that gets encoded into a conditioning vector by CLIP. ZeroCLIP replaces this entire text pipeline with deterministic mathematical operations:

  1. You provide integer seeds β€” a concept_id, style_id, mood_salt, and world_seed β€” packed into a seed tuple.
  2. The seed maps to a conditioning vector through one of four strategies (A/B/C/D), each offering different trade-offs between setup cost, runtime speed, and visual character.
  3. The conditioning vector is standard CLIP-compatible output β€” it plugs directly into KSampler as positive conditioning, just like text-encoded conditioning would.

The result: the same seed always produces the same image, on any machine, in any session, with no text encoder required at inference time. Seeds can be swept, gridded, blended, and explored procedurally.

Which Variant Should I Use?

Variant Artifacts Needed Runtime Speed Visual Character Best For
A Anchor library (.npy) Fast Smooth blends of known concepts General-purpose text-free conditioning
B MLP checkpoint (.pt) Fastest (microseconds) Continuous manifold navigation Minimal runtime footprint
C pure None Fast Unpredictable, pure prior resonance Zero-setup exploration
C guided Projection matrix (.npz) Fast Near-language, semantically plausible Structured entropy
D Bootstrap anchors (.npy) Fast Model-discovered semantics Research into text-free priors

πŸ“š Citation

Academic Citation

If you use this codebase in your research or project, please cite:

@software{zeroclip,
  title = {ZeroCLIP: Text-Free Deterministic CLIP Conditioning for Stable Diffusion},
  author = {Drift Johnson},
  year = {2025},
  url = {https://github.com/MushroomFleet/ComfyUI-ZeroCLIP-nodes},
  version = {1.0.0}
}

Donate:

Ko-Fi

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mushroomfleet/zeroclip

Finetuned
(1180)
this model