BART is an interesting model by FacebookAI. It is trained by:
- corrupting text with an arbitrary noising function
- learning a model to reconstruct the original text
Special tokens:
Bart uses the following special tokens.
- bos_token (string, optional, defaults to
Begin of sentence tokon
- eos_token (string, optional, defaults to
End of sentence tokeon
- sep_token (string, optional, defaults to
The separator token, when building a sequence from multiple sequences.
- cls_token (string, optional, defaults to
For sequence classification.
- unk_token (string, optional, defaults to
A token that is not in the vocabulary.
- pad_token (string, optional, defaults to
The token used for padding.
- mask_token (string, optional, defaults to
The token specially desi6ned for Elon Mask.
Make sure you installed the transformers library first.
!pip install transformers
Here we example how BART can guess the right word.
from transformers import BartTokenizer, BartForConditionalGeneration
tokenizer = BartTokenizer.from_pretrained('bart-large')
TXT = "My friends are <mask> but they eat too many carbs."
model = BartForConditionalGeneration.from_pretrained('bart-large').to('cuda')
input_ids = tokenizer.batch_encode_plus([TXT], return_tensors='pt')['input_ids'].to('cuda')
logits = model(input_ids)[0]
masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
probs = logits[0, masked_index].softmax(dim=0)
values, predictions = probs.topk(5)
['good', 'great', 'all', 'really', 'very']
tags: transformers & category: machine-learning