textdetox/multilingual_toxicity_dataset
Viewer • Updated • 71.4k • 1.13k • 36
How to use tsmaitry/devica-toxicity-xlmr-large with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="tsmaitry/devica-toxicity-xlmr-large") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("tsmaitry/devica-toxicity-xlmr-large")
model = AutoModelForSequenceClassification.from_pretrained("tsmaitry/devica-toxicity-xlmr-large")Multilingual toxicity classifier fine-tuned on XLM-RoBERTa-Large (559M params) for Indian language content moderation.
Hindi, English, Bengali, Tamil, Telugu, Marathi, Kannada, Malayalam, Gujarati, Punjabi, Hinglish (code-mixed)
| Metric | Score |
|---|---|
| Accuracy | ~93-95% |
| F1 (weighted) | ~93-95% |
| F1 (toxic) | ~94% |
from transformers import pipeline
classifier = pipeline("text-classification", model="tsmaitry/devica-toxicity-xlmr-large")
classifier("तुम बहुत अच्छे हो") # → non-toxic
classifier("Saala kutta kamina") # → toxic
classifier("நீ முட்டாள்") # → toxic
Base model
FacebookAI/xlm-roberta-large