Kaggle Birdclef22 competition

This is a collection of notes about what I tried/other teams found useful during the Kaggle Birdclef22 competition. We were provided a list of audio files with bird calls and were asked to provide a classification model. The issue was some of the bird species only had very few train files. We were provided audio for 152 bird species, although only 21 of them were in the test set.

Things I did

  • Started with a public baseline, based on efficient net
  • Ensemble several models
  • Varied thresholds for birds - the more samples for a bird, the higher the threshold
  • We were given many species of geese, but only one of them was scored. For training, I merged all geese into a single class. Similarly for other families of birds.
  • For species with very few samples, hand pick the calls.
  • Mel spectrogram is faster with torch audio
  • Used linear schedule with warmup

Learned from reading other people's solutions:

  • having good CV is important
  • precompute mel spectrograms for additional speedup
  • use external data (for example previous competitions)
  • add human into the loop - create a model, have it make predictions, check top 2000 predictions by hand for "clean" data
  • similarly, use pseudo-labelling without human in the loop
  • other models people tried: dm_nfnet_f0, eca_nfnet_l0, eca_nfnet_l1, tf_efficientnetv2_m_in21k, seresnext50_32x4d, resnest50d_4s2x40d, convnext_tiny, resnet34, tf_efficientnetv2_s_in21kk, seresnext26t_32x4d
  • use prediction on time interval [t, t+5] together with [t-1, t+4] and [t+1, t+6] (potentially with uneqaul weights)
  • if bird is predicted anywhere in the audio file, lower threshold for the file
  • add random sounds and have a new class "nocall"
  • pretrain the models on 2021 data
  • manually drop segments without birds sounds. Split data to smaller chunks.
  • first train on all birds, then fine tune on the scored birds only
  • mask time / frequency bands in the mel spectrogram
  • bird net pretrained model is verygood
  • Constant Q-transform (CQT1992v2 from nnAudio.Spectrogram) might be better than mel spectrogram

31 May 2022