Kaggle NBME competition - other lessons

This is a summary of the interesting ideas other teams used during the Kaggle NLP competition. We were provided a list of notes, each note was created by a doctor evaluating a patient. Our goal was to find in the notes parts of the text that described particular features ('patient is a female', 'father died of heart attack', ...)

Use different threshold for each case
Initializing the output layer by transformers.PretrainedBertModel._init_weights method helps a lot for stabilizing and quickly-conversing the training
Use gradient clipping
Use smaller models to experiment, larger models for submission
Some folds can have poor correlation between CV and LB - only use the well correlated ones.
For problems with poor labels, focal loss can be a better loss function than entropy
People did MLM with masking 0.1 - 0.15 of the text, then finetune with smooth focal loss and pseudolabelling
Someone tried multi sample dropout - two heads with dropout+fc, then average them together.
Someone added tokens for commonly found shortcuts
Top solutions used both MLM pretraining and pseudolabelling
Someone used | for newline
Freeze some of the layers for faster training.

Kaggle NBME competition - other lessons

31 May 2022