asapp/slue-phase-2
Viewer • Updated • 74.8k • 2.26k • 12
How to use Masioki/fusion_gttbsc_distilbert-uncased-ft with Transformers:
# Load model directly
from transformers import FusionCrossAttentionSentenceClassifier
model = FusionCrossAttentionSentenceClassifier.from_pretrained("Masioki/fusion_gttbsc_distilbert-uncased-ft", dtype="auto")Ground truth text with prosody encoding and ASR encoding residual cross attention fusion multi-label DAC
ASR encoder: Whisper small encoder
Prosody encoder: 2 layer transformer encoder with initial dense projection
Backbone: DistilBert uncased
Fusion: 2 residual cross attention fusion layers (F_asr x F_text and F_prosody x F_text) with dense layer on top
Pooling: Self attention
Multi-label classification head: 2 dense layers with two dropouts 0.3 and Tanh activation inbetween
Trained on ground truth.
Evaluated on ground truth (GT) and normalized Whisper small transcripts (E2E).
The following hyperparameters were used during training: