LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs
Paper • 2602.00462 • Published • 19
This repository contains trained MLP connector weights for the LatentLens project.
LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs
Benno Krojer, Shravan Nayak, Oscar Mañas, Vaibhav Adlakha, Desmond Elliott, Siva Reddy, Marius Mosbach.
Resources: Paper | Code | Demo
These are the trained connector (MLP projector) weights that map vision encoder outputs to LLM embedding space. The LLM and vision encoder weights are not included — they should be loaded from their original sources (OLMo, LLaMA, Qwen, CLIP, DINOv2, SigLIP).
| Connector | LLM | Vision Encoder | Size |
|---|---|---|---|
olmo-vit |
OLMo-7B | ViT-L/14-336 (CLIP) | 347 MB |
olmo-dino |
OLMo-7B | DINOv2-L-336 | 347 MB |
olmo-siglip |
OLMo-7B | SigLIP-L | 368 MB |
llama-vit |
LLaMA3-8B | ViT-L/14-336 (CLIP) | 451 MB |
llama-dino |
LLaMA3-8B | DINOv2-L-336 | 451 MB |
llama-siglip |
LLaMA3-8B | SigLIP-L | 479 MB |
qwen-vit |
Qwen2-7B | ViT-L/14-336 (CLIP) | 557 MB |
qwen-dino |
Qwen2-7B | DINOv2-L-336 | 557 MB |
qwen-siglip |
Qwen2-7B | SigLIP-L | 594 MB |
from huggingface_hub import hf_hub_download
# Download a specific connector
connector_path = hf_hub_download(
repo_id="McGill-NLP/latentlens-connectors",
filename="olmo-vit/connector.pt"
)
# Load the weights
import torch
connector_weights = torch.load(connector_path, map_location="cpu")
For full usage with the LatentLens library:
from latentlens import LatentLens
model = LatentLens.load("olmo-vit") # Downloads connector + base models automatically
results = model.analyze("image.jpg")
Each connector folder contains:
connector.pt — Trained MLP weights (PyTorch state dict)config.yaml — Training configuration (for reference)@article{krojer2026latentlens,
title={LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs},
author={Krojer, Benno and Nayak, Shravan and Ma{\~n}as, Oscar and Adlakha, Vaibhav and Elliott, Desmond and Reddy, Siva and Mosbach, Marius},
journal={arXiv preprint arXiv:2602.00462},
year={2026}
}
Apache 2.0 (inherited from Molmo)