--- title: LexiMind emoji: ๐Ÿง  colorFrom: blue colorTo: indigo sdk: docker app_file: scripts/demo_gradio.py pinned: false --- # LexiMind A multi-task NLP system for literary and academic text understanding. LexiMind jointly performs **abstractive summarization**, **topic classification**, and **multi-label emotion detection** using a single encoder-decoder transformer initialized from [FLAN-T5-base](https://huggingface.co/google/flan-t5-base) (272M parameters). **[Live Demo](https://huggingface.co/spaces/OliverPerrin/LexiMind)** ยท **[Model](https://huggingface.co/OliverPerrin/LexiMind-Model)** ยท **[Discovery Dataset](https://huggingface.co/datasets/OliverPerrin/LexiMind-Discovery)** ยท **[Research Paper](docs/research_paper.tex)** ## Results | Task | Metric | Score | | ---- | ------ | ----- | | Summarization | ROUGE-1 / ROUGE-L | 0.309 / 0.185 | | Summarization (academic) | ROUGE-1 | 0.319 | | Summarization (literary) | ROUGE-1 | 0.206 | | Topic Classification | Accuracy (95% CI) | 85.7% (80.4โ€“91.0%) | | Emotion Detection | Sample-avg F1 | 0.352 | | Emotion Detection (tuned thresholds) | Sample-avg F1 / Macro F1 | 0.503 / 0.294 | Trained for 8 epochs on an RTX 4070 12GB (~9 hours) with BFloat16 mixed precision, `torch.compile`, and cosine LR decay. ## Key Findings From my research paper: - **Naive MTL produces mixed results**: topic classification benefits (+3.7% accuracy), but emotion detection suffers negative transfer (โˆ’0.02 F1) under mean pooling with round-robin scheduling. - **Learned attention pooling + temperature sampling eliminates negative transfer entirely**: emotion F1 improves from 0.199 โ†’ 0.352 (+77%), surpassing the single-task baseline (0.218). - **Summarization is robust to MTL** โ€” quality remains stable across configurations. - **FLAN-T5 pre-training is essential** โ€” random initialization produces dramatically worse results on all tasks. - **Domain gap matters**: academic summaries (ROUGE-1: 0.319) substantially outperform literary (0.206), driven by an 11:1 training data imbalance. ## Architecture LexiMind is a **from-scratch PyTorch Transformer** that loads pre-trained FLAN-T5-base weights layer by layer via a custom factory module โ€” no HuggingFace model wrappers. | Component | Detail | | --------- | ------ | | Backbone | Encoder-Decoder Transformer (272M params) | | Encoder / Decoder | 12 layers each, 768d, 12 attention heads | | Normalization | RMSNorm (Pre-LN, T5-style) | | Attention | FlashAttention via PyTorch SDPA + T5 relative position bias | | FFN | Gated-GELU (wi\_0, wi\_1, wo) | | Summarization | Full decoder โ†’ language modeling head | | Emotion (28-class multi-label) | Learned attention pooling โ†’ linear head | | Topic (7-class) | Mean pooling โ†’ linear head | ### Multi-Task Training All three tasks share the encoder. Summarization uses the full encoder-decoder; classification heads branch off the encoder output. Key training details: - **Temperature-based task sampling** (ฮฑ=0.5): allocates training steps proportional to dataset size, preventing large tasks from dominating - **Attention pooling** for emotion: a learned query attends over encoder outputs, focusing on emotionally salient tokens rather than averaging the full sequence - **Fixed loss weights**: summarization=1.0, emotion=1.0, topic=0.3 (reduced to prevent overfitting on the small topic dataset) - **Frozen encoder layers 0โ€“3**: preserves FLAN-T5's language understanding in lower layers - **Gradient conflict diagnostics**: optional inter-task gradient cosine similarity monitoring See [docs/architecture.md](docs/architecture.md) for full implementation details, weight loading tables, and training configuration rationale. ## Training Data | Task | Source | Samples | | ---- | ------ | ------- | | Summarization | Gutenberg + Goodreads descriptions (literary) | ~4K | | Summarization | arXiv body โ†’ abstract (academic) | ~45K | | Topic | Gutenberg + arXiv metadata โ†’ 7 categories | 3,402 | | Emotion | GoEmotions โ€” Reddit comments, 28 labels | 43,410 | For summarization, the model learns to produce descriptive summaries โ€” what a book *is about* โ€” rather than plot recaps, by pairing Gutenberg full texts with Goodreads descriptions and arXiv papers with their abstracts. ## Getting Started ### Prerequisites - Python 3.10+ - NVIDIA GPU with CUDA (for training; CPU works for inference) ### Installation ```bash git clone https://github.com/OliverPerrin/LexiMind.git cd LexiMind pip install -r requirements.txt ``` ### Training ```bash # Full training (~9 hours on RTX 4070 12GB) python scripts/train.py training=full # Quick dev run python scripts/train.py training=dev # Override parameters python scripts/train.py training=full training.optimizer.lr=5e-5 # Resume from checkpoint python scripts/train.py training=full resume_from=checkpoints/epoch_5.pt ``` Experiments are tracked with MLflow (`mlflow ui` to browse). ### Evaluation ```bash python scripts/evaluate.py python scripts/evaluate.py --skip-bertscore # faster python scripts/evaluate.py --tune-thresholds # per-class threshold tuning ``` ### Inference ```bash # Command-line python scripts/inference.py "Your text to analyze" # Gradio web demo python scripts/demo_gradio.py ``` ### Profiling ```bash # Profile GPU usage (CUDA kernels, memory, Chrome trace) python scripts/profile_training.py ``` ### Docker ```bash docker build -t leximind . docker run -p 7860:7860 leximind ``` ## Project Structure ```text src/ โ”œโ”€โ”€ models/ # Encoder, decoder, attention, FFN, heads, factory โ”œโ”€โ”€ data/ # Datasets, dataloaders, tokenization, cross-task dedup โ”œโ”€โ”€ training/ # Trainer (AMP, grad accum, temperature sampling), metrics โ”œโ”€โ”€ inference/ # Pipeline + factory for checkpoint loading โ”œโ”€โ”€ api/ # FastAPI REST endpoint โ””โ”€โ”€ utils/ # Device detection, checkpointing, label I/O scripts/ โ”œโ”€โ”€ train.py # Hydra training entry point โ”œโ”€โ”€ evaluate.py # Full evaluation suite โ”œโ”€โ”€ inference.py # CLI inference โ”œโ”€โ”€ demo_gradio.py # Gradio discovery demo โ”œโ”€โ”€ profile_training.py # PyTorch profiler โ”œโ”€โ”€ train_multiseed.py # Multi-seed training with aggregation โ”œโ”€โ”€ visualize_training.py # Training curve visualization โ”œโ”€โ”€ download_data.py # Dataset downloader โ””โ”€โ”€ build_discovery_dataset.py # Pre-compute discovery dataset configs/ # Hydra configs (model, training, data) docs/ # Research paper + architecture documentation tests/ # Pytest suite ``` ## Code Quality ```bash ruff check . # Linting mypy src/ scripts/ tests/ # Type checking pytest # Tests pre-commit run --all-files # All checks ``` ## License GPL-3.0 โ€” see [LICENSE](LICENSE) for details. --- Built by Oliver Perrin ยท Appalachian State University ยท 2025โ€“2026