· The masked language model task is the key to BERT and RoBERTa. However, they differ in how they prepare such masking. The original RoBERTa article explains it in section 4.1: … · Although BERT preceeded RoBERTa, we may understand this observation to be somewhat applicable to RoBERTa, which is very similar. You may, nonetheless, experiment with …