This is a summary of the interesting ideas other teams used during the
Kaggle
NLP competition. We were provided a list of notes, each note was created by a
doctor evaluating a patient. Our goal was to find in the notes parts of the text
that described particular features ('patient is a female', 'father died of heart
attack', ...)
- Use different threshold for each case
- Initializing the output layer by transformers.PretrainedBertModel._init_weights
method helps a lot for stabilizing and quickly-conversing the training
- Use gradient clipping
- Use smaller models to experiment, larger models for submission
- Some folds can have poor correlation between CV and LB - only use the well
correlated ones.
- For problems with poor labels, focal loss can be a better loss function than
entropy
- People did MLM with masking 0.1 - 0.15 of the text, then finetune with smooth
focal loss and pseudolabelling
- Someone tried multi sample dropout - two heads with dropout+fc, then average
them together.
- Someone added tokens for commonly found shortcuts
- Top solutions used both MLM pretraining and pseudolabelling
- Someone used | for newline
- Freeze some of the layers for faster training.
31 May 2022