Kaggle NBME competition - other lessons

This is a summary of the interesting ideas other teams used during the Kaggle NLP competition. We were provided a list of notes, each note was created by a doctor evaluating a patient. Our goal was to find in the notes parts of the text that described particular features ('patient is a female', 'father died of heart attack', ...)

  • Use different threshold for each case
  • Initializing the output layer by transformers.PretrainedBertModel._init_weights method helps a lot for stabilizing and quickly-conversing the training
  • Use gradient clipping
  • Use smaller models to experiment, larger models for submission
  • Some folds can have poor correlation between CV and LB - only use the well correlated ones.
  • For problems with poor labels, focal loss can be a better loss function than entropy
  • People did MLM with masking 0.1 - 0.15 of the text, then finetune with smooth focal loss and pseudolabelling
  • Someone tried multi sample dropout - two heads with dropout+fc, then average them together.
  • Someone added tokens for commonly found shortcuts
  • Top solutions used both MLM pretraining and pseudolabelling
  • Someone used | for newline
  • Freeze some of the layers for faster training.

31 May 2022