DT4H_XLM-R_stl_multilingual_procedure
Model Description
This multilingual clinical Named Entity Recognition (NER) model is designed to identify clinical procedure mentions in biomedical and clinical text. It is based on xlm-roberta-base and fine-tuned on translated variants of the clinical NER datasets MedProcNER and CardioCCC, which consist of clinical case reports with manually annotated symptom mentions, following a single-task learning (STL) approach and using the BIO tagging scheme for sequence labeling.
- Architecture: Single-task learning (STL)
- Training setup: Multilingual, Monolabel (PROCEDURE)
- Supported languages:
- Spanish (
es) - Italian (
it) - Romanian (
ro) - English (
en) - Dutch (
nl) - Swedish (
sv) - Czech (
cs)
- Spanish (
- Base model:
xlm-roberta-base - Task: Token classification (NER)
- Label scheme: BIO
Training Data
The model is trained on multilingual clinical NER data combining MedProcNER and CardioCCC across the supported languages.
The data is part of the MultiClinNER subtask of the MultiClinAI shared task, an initiative as part of the DataTools4Heart (DT4H) project, which provides translated and annotation-projected clinical corpora.
Training and test splits correspond to the MultiClinNER task at the 11th SMM4H-HeaRD Workshop (ACL 2026).
How to use
You can load the model using Hugging Face Transformers:
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
model_name = "judithrosell/DT4H_XLM-R_stl_multilingual_procedure"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
ner_pipeline = pipeline("token-classification", model=model, tokenizer=tokenizer)
text = "Se inició un régimen de quimioterapia SMILE modificado."
predictions = ner_pipeline(text)
print(predictions)
⚠ Note: We recommend pre-tokenizing the input text into words, as this matches the training setup. Providing raw text directly may lead to slightly degraded performance.
The model outputs token-level predictions. For evaluation or submission, these predictions should be converted into character-level spans.
To facilitate this process, we provide an inference script in a GitHub repository that:
- Loads the model
- Processes .txt files from an input directory
- Extracts disease mentions
- Exports predictions as a TSV file in the format required for the MultiClinAI evaluation library:
filename label start_span end_span text
MultiClinNER-en-test-procedure-29999 PROCEDURE 2142 2153 Chest X-ray
Limitations and bias
At the time of submission, no formal bias or fairness evaluation has been conducted. We intend to conduct research in these areas in the future, and if completed, this model card will be updated.
Evaluation
Evaluation was conducted using strict (exact match) and character-level metrics on the MultiClinNER test set.
| Language | Strict P | Strict R | Strict F1 | Char P | Char R | Char F1 |
|---|---|---|---|---|---|---|
| es | 0.6637 | 0.6729 | 0.6682 | 0.7971 | 0.8041 | 0.8006 |
| it | 0.6562 | 0.5244 | 0.5829 | 0.8024 | 0.6392 | 0.7116 |
| ro | 0.7172 | 0.6874 | 0.7019 | 0.8353 | 0.7974 | 0.8159 |
| en | 0.6623 | 0.6097 | 0.6349 | 0.8188 | 0.7518 | 0.7839 |
| nl | 0.6715 | 0.6489 | 0.6600 | 0.7955 | 0.7666 | 0.7808 |
| sv | 0.6845 | 0.6840 | 0.6842 | 0.8002 | 0.7960 | 0.7981 |
| cz | 0.6707 | 0.6645 | 0.6676 | 0.8002 | 0.7893 | 0.7947 |
| Average | 0.6571 | 0.7837 |
Additional information
Authors
NLP4BIA team at the Barcelona Supercomputing Center (nlp4bia@bsc.es).
Contact information
judith.rosell [at] bsc.es
Funding
This model is part of the DataTools4Heart (DT4H) project, funded by the European Union’s Horizon Europe Framework Under Grant Agreement No. 101057849.
- Downloads last month
- 21
Model tree for BSC-NLP4BIA/DT4H_XLM-R_stl_multilingual_procedure
Base model
FacebookAI/xlm-roberta-base