Instructions to use InstaDeepAI/nucleotide-transformer-2.5b-multi-species with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use InstaDeepAI/nucleotide-transformer-2.5b-multi-species with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="InstaDeepAI/nucleotide-transformer-2.5b-multi-species")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("InstaDeepAI/nucleotide-transformer-2.5b-multi-species") model = AutoModelForMaskedLM.from_pretrained("InstaDeepAI/nucleotide-transformer-2.5b-multi-species") - Notebooks
- Google Colab
- Kaggle
[CLS] for donwstream tasks.
#4
by jinyuan22 - opened
I noticed a [CLS] token was add to the sequence. Was it used for training? Can I use it as a feature extraction for downstream tasks?
Hi jinyuan22,
In the paper, we use the mean embedding over all the tokens embedding, CLS excluded, as feature extraction for downstream tasks. You can use the CLS token embedding instead and you will probably obtain good performance too but you might not find the exact same results than in the paper if you use this approach.
Hope this helps!