Researchers at ZHAW enable AI to effectively read Music Notes in Real-World settings
The usability of AI-powered music reading systems is significantly diminished in real-world settings due to imperfect data quality and the high impact of wrong predictions. Researchers at ZHAW have developed effective answers to these challenges by employing a combination of sophisticated data augmentation, unsupervised domain adaptation, and model ensembling.
ScoreAug is a sophisticated data augmentation scheme that combines classical augmentations such as affine transformations, blur or salt and pepper noise with real-world noise to achieve a far higher similarity between augmented synthetic data and real-world data (see Figure 1). The addition of ScoreAug to the training pipeline of our neural network increases the detection performance to 73.3% from previously 36.0%.
No AI is perfect and makes errors. Finding and correcting these is especially cumbersome for written music which features hundreds of symbols on every page. Robust confidence measures for each prediction can help alleviate this problem by guiding the attention of the corrector (see Figure 2). The presented confidences based on efficient snapshot-ensembles (in combination with ScoreAug) have increased the speed of a human corrector threefold over an existing baseline [1].
The full details of all methods mentioned can be read in the scientific publication: Real World Music Object Recognition (https://stdm.github.io/downloads/papers/TISMIR_2023.pdf)
[1] Tuggener, L., Elezi, I., Schmidhuber, J., and Stadelmann, T. (2018b). Deep watershed detector for music object recognition. In 19th International Society for Music Information Retrieval Conference (ISMIR)
 
                
        
    
        