Instructions to use Qwen/Qwen3-Embedding-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Qwen/Qwen3-Embedding-4B with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Qwen/Qwen3-Embedding-4B") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use Qwen/Qwen3-Embedding-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Qwen/Qwen3-Embedding-4B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Embedding-4B") model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Embedding-4B") - Notebooks
- Google Colab
- Kaggle
Qwen_quantized_dynamic_8-bit _(INT8)
Quantization was performed using dynamic 8-bit integer quantizatio. The process leverages asymmetric weight quantization (symmetric for activations) without a calibration dataset, as dynamic quantization does not require one.
Quantization Type: Dynamic, 8-bit (INT8)
Calibration Dataset: None (dynamic quantization)
Operators for Quantization: MatMul, Add, Gather, EmbedLayerNormalization
Quantization Configuration:
- Weight Quantization: Symmetric
- Activation Quantization: Asymmetric
- Per-Channel Quantization: Enabled
- Reduce Range: Disabled
- MatMulConstBOnly: Enabled
Used Configuration:
extra_options = {
"WeightSymmetric": True,
"ActivationSymmetric": False,
"MatMulConstBOnly": True,
}
operators_to_quantize = [
"MatMul",
"Add",
"Gather",
"EmbedLayerNormalization"
]
quantize_dynamic(
model_input=input_model_path,
model_output=output_model_path,
op_types_to_quantize=operators_to_quantize,
nodes_to_exclude=[],
per_channel=True,
reduce_range=False,
weight_type=QuantType.QInt8,
use_external_data_format=use_external_data,
extra_options=extra_options
)