This is a collection of notes about what I tried/other teams found useful during
the
Kaggle
Birdclef22 competition. We were provided a list of audio files with bird calls
and were asked to provide a classification model. The issue was some of the bird
species only had very few train files. We were provided audio for 152 bird species,
although only 21 of them were in the test set.
Things I did
- Started with a public baseline, based on efficient net
- Ensemble several models
- Varied thresholds for birds - the more samples for a bird, the higher the
threshold
- We were given many species of geese, but only one of them was scored. For
training, I merged all geese into a single class. Similarly for other families of
birds.
- For species with very few samples, hand pick the calls.
- Mel spectrogram is faster with torch audio
- Used linear schedule with warmup
Learned from reading other people's solutions:
- having good CV is important
- precompute mel spectrograms for additional speedup
- use external data (for example previous competitions)
- add human into the loop - create a model, have it make predictions, check top 2000 predictions
by hand for "clean" data
- similarly, use pseudo-labelling without human in the loop
- other models people tried: dm_nfnet_f0, eca_nfnet_l0, eca_nfnet_l1, tf_efficientnetv2_m_in21k,
seresnext50_32x4d, resnest50d_4s2x40d, convnext_tiny, resnet34,
tf_efficientnetv2_s_in21kk, seresnext26t_32x4d
- use prediction on time interval [t, t+5] together with [t-1, t+4] and [t+1, t+6] (potentially with
uneqaul weights)
- if bird is predicted anywhere in the audio file, lower threshold for the file
- add random sounds and have a new class "nocall"
- pretrain the models on 2021 data
- manually drop segments without birds sounds. Split data to smaller chunks.
- first train on all birds, then fine tune on the scored birds only
- mask time / frequency bands in the mel spectrogram
- bird net pretrained model is verygood
- Constant Q-transform (CQT1992v2 from nnAudio.Spectrogram) might be better than mel
spectrogram
31 May 2022