Bnaad
/

PARENT_bert

Text Classification

Eval Results (legacy)

Model card Files Files and versions

PARENT_bert / README.md

Bnaad's picture

Update README.md

13aea0b verified 7 months ago

|

history blame contribute delete

3.02 kB

	---
	language: en
	license: apache-2.0
	library_name: transformers
	tags:
	- bert
	- text-classification
	- privacy-policy
	- gdpr
	- torchscript
	datasets:
	- MAPP-116
	metrics:
	- f1
	model-index:
	- name: PARENT BERT
	results:
	- task:
	type: text-classification
	dataset:
	name: MAPP-116
	type: text
	metrics:
	- name: f1
	type: score
	value: 0.80 # replace with your actual F1 score
	---




	# PARENT BERT Models for Privacy Policy Analysis

	This repository contains TorchScript versions of 15 fine-tuned BERT models used in the PARENT project to analyse mobile app privacy policies. These models identify what data is collected, why it is collected, and how it is processed, helping assess GDPR compliance.

	They are part of a hybrid framework designed for non-technical users, particularly parents concerned about children’s privacy.

	---

	## Model Purpose

	- Segment privacy policies to detect:
	- Data collection types (e.g., contact info, location)
	- Purpose of data collection
	- How data is processed
	- Support GDPR compliance evaluation
	- Detect potential third-party sharing (in combination with a logistic regression model)

	---
	## References

	- MAPP Dataset: Arora, S., Hosseini, H., Utz, C., Bannihatti Kumar, V., Dhellemmes, T., Ravichander, A., Story, P., Mangat, J., Chen, R., Degeling, M., Norton, T.B., Hupperich, T., Wilson, S., & Sadeh, N.M. (2022). A tale of two regulatory regimes: Creation and analysis of a bilingual privacy policy corpus. Proceedings of the International Conference on Language Resources and Evaluation (LREC 2022). [PDF link](https://aclanthology.org/2022.lrec-1.585.pdf) [Accessed 12 July 2025].
	---

	## Usage

	```python
	import torch
	from transformers import BertTokenizerFast
	from huggingface_hub import hf_hub_download

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	REPO_ID = "Bnaad/PARENT_bert"

	# Load tokenizer
	tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")

	# Load one TorchScript model from Hugging Face
	label_name = "Information Type_Contact information"
	safe_label = label_name.replace(" ", "_").replace("/", "_")
	filename = f"torchscript_{safe_label}.pt"
	model_path = hf_hub_download(repo_id=REPO_ID, filename=filename)
	model = torch.jit.load(model_path, map_location=device)
	model.to(device)
	model.eval()

	# Example inference
	sample_text = """For any questions about your account or our services, please contact our customer support team by emailing support@example.com, calling +1-800-555-1234, or visiting our office at 123 Main Street, Springfield, IL, 62701 during business hours"""
	inputs = tokenizer(
	sample_text,
	return_tensors="pt",
	truncation=True,
	padding="max_length",
	max_length=512
	).to(device)

	with torch.no_grad():
	outputs = model(inputs["input_ids"], inputs["attention_mask"])

	print("Logits:", outputs)
	prob = torch.sigmoid(outputs.squeeze())
	print(prob)