A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents
Paper
•
1804.05685
•
Published
•
1
This model is a fine-tuned version of google/pegasus-x-base on the arxiv-summarization dataset. It achieves the following results on the evaluation set:
Base Model: Pegasus-x-base (State-of-the-art for Long Context Summarization)
Finetuning Dataset:
GPU: (RTX A6000) x 1
Train time: About 24 hours for 3 epochs
Test time: About 8 hours for test dataset.
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 3.401 | 0.33 | 390 | 2.3985 |
| 2.5444 | 0.67 | 780 | 2.2461 |
| 2.4849 | 1.0 | 1170 | 2.2690 |
| 2.5735 | 1.33 | 1560 | 2.3334 |
| 2.7045 | 1.66 | 1950 | 2.4330 |
| 2.8939 | 2.0 | 2340 | 2.5461 |
| 3.0773 | 2.33 | 2730 | 2.6502 |
| 3.2149 | 2.66 | 3120 | 2.7039 |
| 3.2844 | 3.0 | 3510 | 2.7262 |
Base model
google/pegasus-x-base