YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

📝 Résumé de Texte Multilingue BART - Version Universelle

🎯 Nouveautés de cette version

✨ Support de TOUS les formats de fichiers !

Cette version améliore l'application originale en acceptant n'importe quel format de document :

📄 Documents texte

PDF (avec PyPDF2 et pdfplumber comme fallback)
DOC (ancien Microsoft Word)
DOCX (Microsoft Word moderne)
RTF (Rich Text Format)
ODT (OpenDocument Text)
TXT (texte brut)
MD (Markdown)

📊 Tableurs

XLSX (Excel moderne)
XLS (Excel ancien)
CSV (valeurs séparées par virgules)

🎨 Présentations

PPTX (PowerPoint)

📚 eBooks et Web

EPUB (eBooks)
HTML/HTM (pages web)

➕ Fallback universel

Pour tout autre format, l'application utilise textract comme solution de secours.

🚀 Installation

1. Cloner ou télécharger les fichiers

git clone <votre-repo>
cd <votre-repo>

2. Installer les dépendances

pip install -r requirements.txt

3. Dépendances système (optionnelles, pour .doc ancien)

# Ubuntu/Debian
sudo apt-get install antiword

# macOS
brew install antiword

💻 Utilisation locale

python app.py

Puis ouvrez votre navigateur à l'adresse affichée (généralement http://localhost:7860)

☁️ Déploiement sur Hugging Face Spaces

1. Créer un nouveau Space

Allez sur huggingface.co/spaces
Cliquez sur "Create new Space"
Choisissez "Gradio" comme SDK
Nommez votre Space

2. Uploader les fichiers

Uploadez ces fichiers dans votre Space :

app.py (code principal)
requirements.txt (dépendances)
README.md (documentation)

3. Configuration (optionnelle)

Créez un fichier packages.txt si vous voulez le support .doc :

antiword

4. Le Space se construira automatiquement !

📋 Fonctionnalités

🌍 Multilingue

Entrée : 100+ langues détectées automatiquement
Sortie : 15 langues disponibles (Français, English, Español, Deutsch, Italiano, Português, Русский, 中文, 日本語, 한국어, العربية, हिन्दी, Nederlands, Polski, Türkçe)

📏 Longueurs personnalisables

Court : ≈80 mots
Moyen : ≈150 mots
Long : ≈250 mots

📊 Statistiques

Affiche automatiquement :

Nombre de mots original
Nombre de mots du résumé
Taux de compression
Langue détectée

🔧 Architecture technique

Extraction de texte multi-format

read_file(file) -> text

Utilise une cascade de bibliothèques :

Format spécifique : PyPDF2, python-docx, etc.
Fallback 1 : Bibliothèques alternatives (pdfplumber, etc.)
Fallback 2 : textract universel
Fallback 3 : Lecture texte brut

Modèle de résumé

Base : BART Large CNN
Fine-tuned : karimhoucem/Multilingual_Text_Summarization_System-BART_v1.0.9
Optimisations : GPU/CPU automatique

Pipeline de traduction

Détection langue (langdetect)
Traduction vers anglais (deep-translator)
Résumé en anglais (BART)
Traduction vers langue cible (deep-translator)

🐛 Dépannage

Erreur : "❌ [bibliothèque] non installé"

Solution : Installez la bibliothèque manquante :

pip install [bibliothèque]

Erreur avec fichiers .doc anciens

Solution :

Installez antiword (système)
Ou installez textract : pip install textract
Ou convertissez en .docx

Erreur mémoire avec gros fichiers

Solution : L'application tronque automatiquement à 1024 tokens. Pour de très gros documents, découpez-les en sections.

📦 Dépendances détaillées

Obligatoires

gradio (interface)
transformers (modèle BART)
torch (backend ML)
deep-translator (traduction)
langdetect (détection langue)

Optionnelles (par format)

PyPDF2, pdfplumber → PDF
python-docx → DOCX
striprtf → RTF
odfpy → ODT
ebooklib, beautifulsoup4 → EPUB, HTML
openpyxl → Excel
python-pptx → PowerPoint
textract → Fallback universel

📄 Licence

Même licence que le projet original.

🙏 Crédits

Modèle original : karimhoucem/Multilingual_Text_Summarization_System-BART_v1.0.9
Amélioration : Support multi-format universel
Basé sur : BART (Facebook AI), Transformers (Hugging Face)

🔄 Changelog

Version 2.0 (Universal Format Support)

✅ Support de TOUS les formats de documents
✅ Extraction PDF améliorée (PyPDF2 + pdfplumber)
✅ Support .doc ancien (antiword + textract)
✅ Support tableurs (Excel, CSV)
✅ Support présentations (PowerPoint)
✅ Support eBooks (EPUB)
✅ Fallback universel avec textract
✅ Messages d'erreur détaillés
✅ Suppression de la restriction file_types

Version 1.0 (Original)

✅ Support basique : TXT, MD, PDF, DOCX
✅ Résumé multilingue
✅ Interface Gradio

📧 Contact

Pour toute question ou amélioration, ouvrez une issue sur le repo !

Downloads last month: 72

Safetensors

Model size

0.4B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support