Spaces:
Sleeping
title: LexiMind
emoji: π§
colorFrom: blue
colorTo: indigo
sdk: docker
app_file: scripts/demo_gradio.py
pinned: false
LexiMind: A Multi-Task NLP Model
LexiMind is a state-of-the-art Natural Language Processing model designed for complex document understanding. It features a custom-built Transformer architecture initialized with weights from Google's FLAN-T5, combining the flexibility of from-scratch implementation with the power of modern pre-trained models.
The model performs three sophisticated tasks simultaneously: text summarization, emotion classification, and topic clustering.
This project is built with industry-standard MLOps practices, including configuration management with Hydra, experiment tracking with MLflow, and containerization with Docker, making it a reproducible and scalable solution.
Core Features
- Abstractive Summarization: Generates concise, coherent summaries of long-form text using encoder-decoder attention.
- Emotion Classification: Identifies emotions (Joy, Sadness, Anger, Fear, Love, Surprise) conveyed in a document.
- Topic Clustering: Classifies documents into thematic categories (World, Sports, Business, Sci/Tech).
Model Architecture
LexiMind implements a from-scratch Transformer with modern architectural choices:
Custom Transformer Features
- Pre-Layer Normalization (Pre-LN): RMSNorm applied before each sublayer for stable training
- FlashAttention: Via PyTorch 2.0's
scaled_dot_product_attentionfor efficient computation - Learned Positional Embeddings: Trainable position representations
- Multi-Head Attention: 12 heads with 768-dimensional representations
- RMSNorm: Modern normalization without bias (more efficient than LayerNorm)
Pre-trained Weight Initialization
The model loads weights from Google's FLAN-T5-base, which provides:
- Strong language understanding from instruction-tuning
- Excellent performance on summarization and classification tasks
- Encoder-decoder architecture matching our custom implementation
Multi-Task Learning
A shared encoder-decoder backbone with task-specific heads:
- Summarization Head: Language modeling head with weight tying
- Emotion Head: Mean-pooled classification with dropout
- Topic Head: Mean-pooled classification with dropout
Technical Specifications
| Component | Specification |
|---|---|
| Architecture | Encoder-Decoder Transformer |
| Pre-trained Base | google/flan-t5-base |
| Hidden Dimension | 768 |
| Encoder Layers | 12 |
| Decoder Layers | 12 |
| Attention Heads | 12 |
| FFN Dimension | 2048 |
| Normalization | RMSNorm (Pre-LN) |
| Position Encoding | Learned Embeddings |
| Max Sequence Length | 512 tokens |
Getting Started
Prerequisites
- Python 3.10+
- Poetry for dependency management
- Docker (for containerized deployment)
- An NVIDIA GPU with CUDA support (for training and accelerated inference)
Installation
Clone the repository:
git clone https://github.com/OliverPerrin/LexiMind.git cd LexiMindInstall dependencies:
poetry installDownload and preprocess data:
poetry run python scripts/download_data.py poetry run python scripts/preprocess_data.py
Usage
Configuration
All training and model parameters are managed via Hydra. Configurations are located in the configs/ directory.
Available configurations:
model=base- FLAN-T5-base (default, 12 layers)model=small- Smaller model for testing (no pretrained weights)model=large- FLAN-T5-large (24 layers, requires more VRAM)training=dev- Quick development runtraining=medium- Balanced training (~2-3 hours on RTX 4070)training=full- Full training run
Training
# Default training with FLAN-T5-base
poetry run python scripts/train.py
# Quick development run
poetry run python scripts/train.py training=dev
# Medium training run (recommended for RTX 4070)
poetry run python scripts/train.py training=medium
# Override parameters
poetry run python scripts/train.py training.optimizer.lr=5e-5
# Resume from a checkpoint
poetry run python scripts/train.py training=full resume_from=checkpoints/epoch_5.pt
Experiments are automatically tracked with MLflow. View results with mlflow ui.
Evaluation
poetry run python scripts/evaluate.py --checkpoint checkpoints/best.pt
Inference & Demo
# Command-line inference
poetry run python scripts/inference.py "Your text to analyze"
# Gradio web demo
poetry run python scripts/demo_gradio.py
Docker
# Build
docker build -t leximind .
# Run demo
docker run -p 7860:7860 leximind
Project Structure
βββ configs/ # Hydra configuration files
β βββ model/ # Model architectures (base, small, large)
β βββ training/ # Training configs (dev, medium, full)
β βββ data/ # Dataset configurations
βββ src/
β βββ models/ # Custom Transformer implementation
β β βββ encoder.py # TransformerEncoder with Pre-LN RMSNorm
β β βββ decoder.py # TransformerDecoder with KV-cache
β β βββ attention.py # Multi-Head Attention with FlashAttention
β β βββ factory.py # Model building with FLAN-T5 weight loading
β βββ data/ # Data loading and preprocessing
β βββ training/ # Training loop with mixed precision
β βββ inference/ # Inference pipeline
βββ scripts/ # Entry points
βββ tests/ # Unit tests
βββ notebooks/ # Analysis notebooks
Code Quality
- Ruff: Fast linting and formatting
- MyPy: Static type checking
- Pytest: Full test suite covering data, models, and training
- Pre-commit hooks: Automated quality checks
# Install hooks
poetry run pre-commit install
# Lint
poetry run ruff check .
# Type check
poetry run mypy .
# Tests
poetry run pytest
Performance Optimizations
- torch.compile: JIT compilation with Inductor backend
- Mixed Precision: bfloat16 training on Ampere/Ada GPUs
- TF32: Enabled for RTX 30xx/40xx series
- KV-Cache: Efficient autoregressive decoding
- FlashAttention: Memory-efficient attention via SDPA
License
MIT License - see LICENSE for details.