Instructions to use small-models-for-glam/cultural_heritage_metadata_accuracy with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use small-models-for-glam/cultural_heritage_metadata_accuracy with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="small-models-for-glam/cultural_heritage_metadata_accuracy")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("small-models-for-glam/cultural_heritage_metadata_accuracy") model = AutoModelForSequenceClassification.from_pretrained("small-models-for-glam/cultural_heritage_metadata_accuracy") - Notebooks
- Google Colab
- Kaggle
Cultural Heritage Metadata Accuracy
A BERT-based classifier that scores Italian cultural-heritage metadata descriptions as high quality or low quality — i.e. whether a description follows the ICCD (Istituto Centrale per il Catalogo e la Documentazione) cataloguing guidelines.
Trained on the biglam/cultural_heritage_metadata_accuracy dataset (~100K Italian descriptions from Cultura Italia, the Italian national cultural aggregator).
The dataset labels each description as HIGH quality if the object and subject of the item are both described according to ICCD guidelines, and LOW quality otherwise. Most of the dataset was manually annotated; ~30K descriptions were automatically labeled LOW quality due to length (less than 3 tokens) or provenance from old (pre-2012), non-curated collections.
Intended use
Useful for surfacing Italian metadata records that may benefit from additional human review. Before deploying, validate:
- How it performs on your specific data.
- Whether you agree with the original dataset's quality definitions.
Best used in a human-in-the-loop pipeline — flag low-quality records for catalogue review rather than making automatic accept/reject decisions.
Usage
from transformers import pipeline
pipe = pipeline(
"text-classification",
model="small-models-for-glam/cultural_heritage_metadata_accuracy",
)
pipe("Elemento di decorazione architettonica a rilievo")
Validation metrics
| Metric | Value |
|---|---|
| Accuracy | 0.972 |
| Macro F1 | 0.972 |
| Loss | 0.085 |
Trained via AutoTrain (binary classification). CO2 emissions: 7.17g.
Limitations
- Italian only. Trained on Italian metadata; will not generalise to other languages without further fine-tuning.
- Domain-bound. The training data is Cultura Italia records — performance on other Italian cataloguing traditions (e.g. archives, museums with different schema) is unverified.
- ICCD-defined notion of "quality". This is what the model learned; whether ICCD-compliance matches your definition of quality is a separate question.
Part of the small-models-for-glam collection — task- and domain-specific models for libraries, archives, and museums.
- Downloads last month
- 28