File size: 9,747 Bytes

---
library_name: transformers
license: cc-by-nc-sa-4.0
pipeline_tag: text-ranking
tags:
  - sentence-transformers
  - cross-encoder
  - reranker
---

<div align="center">

# Contextual AI Reranker v2 2B

<img src="Contextual_AI_Brand_Mark_Dark.png" width="10%" alt="Contextual_AI"/>

[![Blog Post](https://img.shields.io/badge/📝%20Blog-ContextualReranker-green)](https://contextual.ai/blog/rerank-v2)
[![Hugging Face Collection](https://img.shields.io/badge/🤗%20Hugging%20Face-Model%20Collection-yellow)](https://huggingface.co/collections/ContextualAI/contextual-ai-reranker-v2)

</div>

<hr>

## Highlights

Contextual AI's reranker is the **first instruction-following reranker** capable of handling retrieval conflicts and ranking with custom instructions (e.g., prioritizing recent information). It achieves state-of-the-art performance on BEIR and sits on the cost/performance Pareto frontier across:

- Instruction following
- Question answering  
- Multilinguality (100+ languages)
- Product search & recommendation
- Real-world use cases

<p align="center">
    <img src="main_benchmark.png" width="1200"/>
<p>

For detailed benchmarks, see our [blog post](https://contextual.ai/blog/rerank-v2).

## Overview

- **Model Type**: Text Reranking
- **Supported Languages**: 100+
- **Parameters**: 2B
- **Context Length**: up to 32K

## When to Use This Model

Use this reranker when you need to:
- Re-rank retrieved documents with custom instructions
- Handle conflicting information in retrieval results
- Prioritize documents by recency or other criteria
- Support multilingual search (100+ languages)
- Process long contexts (up to 32K tokens)

## Quickstart

Each path below uses the same example inputs:

```
Query: What are the health benefits of exercise?
Instruction: Prioritize recent medical research
Documents:
  - Regular exercise reduces risk of heart disease and improves mental health.
  - A 2024 study shows exercise enhances cognitive function in older adults.
  - Ancient Greeks valued physical fitness for military training.
```

**Expected Output:**
```
Score: 0.8398 | Doc: A 2024 study shows exercise enhances cognitive function in older adults.
Score: -2.5469 | Doc: Regular exercise reduces risk of heart disease and improves mental health.
Score: -9.3750 | Doc: Ancient Greeks valued physical fitness for military training.
```

### Using Sentence Transformers

Install Sentence Transformers:
```bash
pip install sentence_transformers
```

```python
import torch
from sentence_transformers import CrossEncoder

model = CrossEncoder("ContextualAI/ctxl-rerank-v2-instruct-multilingual-2b", model_kwargs={"dtype": torch.bfloat16})

query = "What are the health benefits of exercise?"
instruction = "Prioritize recent medical research"
documents = [
    "Regular exercise reduces risk of heart disease and improves mental health.",
    "A 2024 study shows exercise enhances cognitive function in older adults.",
    "Ancient Greeks valued physical fitness for military training.",
]

pairs = [(query, doc) for doc in documents]
scores = model.predict(pairs, prompt=instruction)
print(scores)
# [-2.484375  0.828125 -9.3125  ]

rankings = model.rank(query, documents, prompt=instruction)
print(rankings)
# [{'corpus_id': 1, 'score': np.float32(0.828125)}, {'corpus_id': 0, 'score': np.float32(-2.484375)}, {'corpus_id': 2, 'score': np.float32(-9.3125)}]
```

The `prompt` argument is optional, you can omit it to score pairs without any custom instruction. Scores are the raw bfloat16 logits at token id 0 at the final position (matching the `Transformers` path below), so higher means more relevant.

### vLLM Usage (Recommended for Production)

Requires `vllm==0.10.0` for NVFP4 or `vllm>=0.8.5` for BF16.

```python
import os
os.environ['VLLM_USE_V1'] = '0'  # v1 engine doesn't support logits processor yet

import torch
from vllm import LLM, SamplingParams


def logits_processor(_, scores):
    """Custom logits processor for vLLM reranking."""
    index = scores[0].view(torch.uint16)
    scores = torch.full_like(scores, float("-inf"))
    scores[index] = 1
    return scores


def format_prompts(query: str, instruction: str, documents: list[str]) -> list[str]:
    """Format query and documents into prompts for reranking."""
    if instruction:
        instruction = f" {instruction}"
    prompts = []
    for doc in documents:
        prompt = f"Check whether a given document contains information helpful to answer the query.\n<Document> {doc}\n<Query> {query}{instruction} ??"
        prompts.append(prompt)
    return prompts


def infer_w_vllm(model_path: str, query: str, instruction: str, documents: list[str]):
    model = LLM(
        model=model_path,
        gpu_memory_utilization=0.85,
        max_model_len=8192,
        dtype="bfloat16",
        max_logprobs=2,
        max_num_batched_tokens=262144,
    )
    sampling_params = SamplingParams(
        temperature=0,
        max_tokens=1,
        logits_processors=[logits_processor]
    )
    prompts = format_prompts(query, instruction, documents)

    outputs = model.generate(prompts, sampling_params, use_tqdm=False)

    # Extract scores and create results
    results = []
    for i, output in enumerate(outputs):
        score = (
            torch.tensor([output.outputs[0].token_ids[0]], dtype=torch.uint16)
            .view(torch.bfloat16)
            .item()
        )    
        results.append((score, i, documents[i]))

    # Sort by score (descending)
    results = sorted(results, key=lambda x: x[0], reverse=True)

    print(f"Query: {query}")
    print(f"Instruction: {instruction}")
    for score, doc_id, doc in results:
        print(f"Score: {score:.4f} | Doc: {doc}")


# Example usage
if __name__ == "__main__":
    model_path = "ContextualAI/ctxl-rerank-v2-instruct-multilingual-2b"
    query = "What are the health benefits of exercise?"
    instruction = "Prioritize recent medical research"
    documents = [
        "Regular exercise reduces risk of heart disease and improves mental health.",
        "A 2024 study shows exercise enhances cognitive function in older adults.",
        "Ancient Greeks valued physical fitness for military training."
    ]
    
    infer_w_vllm(model_path, query, instruction, documents)
```


### Transformers Usage (Simpler Setup)

Requires `transformers>=4.51.0` for BF16. Not supported for NVFP4.

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM


def format_prompts(query: str, instruction: str, documents: list[str]) -> list[str]:
    """Format query and documents into prompts for reranking."""
    if instruction:
        instruction = f" {instruction}"
    prompts = []
    for doc in documents:
        prompt = f"Check whether a given document contains information helpful to answer the query.\n<Document> {doc}\n<Query> {query}{instruction} ??"
        prompts.append(prompt)
    return prompts


def infer_w_hf(model_path: str, query: str, instruction: str, documents: list[str]):
    device = "cuda" if torch.cuda.is_available() else "cpu"
    dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32

    tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "left"  # so -1 is the real last token for all prompts

    model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=dtype).to(device)
    model.eval()

    prompts = format_prompts(query, instruction, documents)
    enc = tokenizer(
        prompts,
        return_tensors="pt",
        padding=True,
        truncation=True,
    )
    input_ids = enc["input_ids"].to(device)
    attention_mask = enc["attention_mask"].to(device)

    with torch.no_grad():
        out = model(input_ids=input_ids, attention_mask=attention_mask)

    next_logits = out.logits[:, -1, :]  # [batch, vocab]

    scores_bf16 = next_logits[:, 0].to(torch.bfloat16)
    scores = scores_bf16.float().tolist()

    # Sort by score (descending)
    results = sorted([(s, i, documents[i]) for i, s in enumerate(scores)], key=lambda x: x[0], reverse=True)

    print(f"Query: {query}")
    print(f"Instruction: {instruction}")
    for score, doc_id, doc in results:
        print(f"Score: {score:.4f} | Doc: {doc}")
    """
    Query: What are the health benefits of exercise?
    Instruction: Prioritize recent medical research
    Score: 0.8281 | Doc: A 2024 study shows exercise enhances cognitive function in older adults.
    Score: -2.4844 | Doc: Regular exercise reduces risk of heart disease and improves mental health.
    Score: -9.3125 | Doc: Ancient Greeks valued physical fitness for military training.
    """


# Example usage
if __name__ == "__main__":
    model_path = "ContextualAI/ctxl-rerank-v2-instruct-multilingual-2b"
    query = "What are the health benefits of exercise?"
    instruction = "Prioritize recent medical research"
    documents = [
        "Regular exercise reduces risk of heart disease and improves mental health.",
        "A 2024 study shows exercise enhances cognitive function in older adults.",
        "Ancient Greeks valued physical fitness for military training."
    ]
    
    infer_w_hf(model_path, query, instruction, documents)
```

## Citation

If you use this model, please cite:

```bibtex
@misc{ctxl_rerank_v2_instruct_multilingual,
      title={Contextual AI Reranker v2}, 
      author={Halal, George and Agrawal, Sheshansh},
      year={2025},
      url={https://contextual.ai/blog/rerank-v2}, 
}
```

## License

Creative Commons Attribution Non Commercial Share Alike 4.0 (cc-by-nc-sa-4.0)

## Contact

For questions or issues, please open an issue on the model repository.