CTB2001/patents-green-50k
Viewer • Updated • 50k • 62
How to use CTB2001/PatentSBERTa-green-classifier with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="CTB2001/PatentSBERTa-green-classifier") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("CTB2001/PatentSBERTa-green-classifier")
model = AutoModelForSequenceClassification.from_pretrained("CTB2001/PatentSBERTa-green-classifier")How to use CTB2001/PatentSBERTa-green-classifier with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("CTB2001/PatentSBERTa-green-classifier")
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]A fine-tuned AI-Growth-Lab/PatentSBERTa model for binary classification of patent claims as green technology (1) or not green (0).
Developed as part of the Applied Deep Learning (AAU, Spring 2025) exam assignment on active learning, human-in-the-loop labelling, and multi-agent systems for patent classification.
| Property | Value |
|---|---|
| Architecture | MPNetForSequenceClassification (12 layers, 768 hidden) |
| Parameters | 109.5 M (all trainable) |
| Base model | AI-Growth-Lab/PatentSBERTa |
| Max sequence length | 512 tokens |
| Labels | 0 — not green, 1 — green |
| Framework | Transformers 5.2.0, PyTorch |
| Split | Rows | Source |
|---|---|---|
| train_silver | 25,000 | Silver labels from Parts A–C |
| gold_labels | 100 (× 25 upsampled = 2,500) | HITL-verified labels |
| Total training | 27,500 | Combined |
| eval_silver | 10,000 | Held-out balanced evaluation set |
| Parameter | Value |
|---|---|
| Learning rate | 2e-5 |
| Epochs | 5 |
| Effective batch size | 128 (4 × 16 × grad_accum 2) |
| LR scheduler | Cosine with 6% warmup |
| Weight decay | 0.01 |
| Label smoothing | 0.05 |
| Gold upsample factor | 25× |
| Early stopping patience | 3 |
| Precision | bf16 |
| Seed | 42 |
torchrunEvaluated on the held-out eval_silver split (10,000 samples, balanced).
| Precision | Recall | F1-score | Support | |
|---|---|---|---|---|
| not-green (0) | 0.8121 | 0.8058 | 0.8090 | 5,000 |
| green (1) | 0.8073 | 0.8136 | 0.8104 | 5,000 |
| Accuracy | 0.8097 | 10,000 |
| Pred not-green | Pred green | |
|---|---|---|
| Actual not-green | 4,029 | 971 |
| Actual green | 932 | 4,068 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "CTB2001/PatentSBERTa-green-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
claim = "A wind turbine blade comprising a spar cap formed from pultruded carbon strips..."
inputs = tokenizer(claim, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
pred = torch.argmax(logits, dim=-1).item()
print("green" if pred == 1 else "not-green")
@misc{trost-bertelsen2025patentsberta-green,
author = {Trøst-Bertelsen, Christian},
title = {PatentSBERTa Green Patent Classifier},
year = {2025},
howpublished = {Hugging Face Model Hub},
url = {https://huggingface.co/CTB2001/PatentSBERTa-green-classifier}
}
Christian Trøst-Bertelsen — Aalborg University, Student ID 20224083 Course: Applied Deep Learning, 8th semester, Spring 2025
Base model
AI-Growth-Lab/PatentSBERTa