Novel methods for Domain Adaptation and Confidence-Rated Predictions enable Digitalization of Real-World Sheet Music
CAI researchers make smart-phone-based digitalization of real-world musical scores possible by equipping the world’s most advanced optical music recognition system with a novel domain adaptation mechanism based on deep neural nets and confidence rated output for a sleek human user interface.
ScorePad AG developed a music digitalization pipeline with a variety of applications, ranging from digitalization services for large-scale preservers of score collections like libraries to feeding a user-facing app with fresh and individual content that can be used by music students and professionals alike to practice and perform music alone and in groups. Specifically, it sets free from handling printed scores by displaying and manipulating the scores in computer-readable format (MusicXML) on a tablet or computer.
This computer-readability of digitized music, as opposed to just displaying scanned images, enables novel and highly demanded features for the use cases described above, like ensemble coordination or automatic page turning for orchestra musicians, or enabling music analytics for scholarly users of digital sheet music collections in libraries. It builds on real digitization of the scores through the world's most advanced optical music recognition (OMR) system, whose foundation has been laid during the predecessor CTI project "DeepScore" that outperformed the state of the art in musical symbol recognition by a large margin.
Goal of the RealScore project has been to enable the music digitalization pipeline by extending the use of the predecessor technology, which has been confined to high-quality (synthetic) musical scores as input, to real-world scans of sheets that may have lingered in the musician’s gig bag for an extended period of time and have seen many rehearsals. Dealing with such artifacts like yellowed pages, stains and tears requires breakthroughs in applied R&D for symbol recognition (to better detect less frequent musical symbols, the technology needs to be extended to detect dynamically-shaped symbols like slurs at arbitrary rotated angles), domain adaptation (from perfectly produced score PDFs to messy scans or photos) and confidence rating (to mark a potentially non-perfect recognition result with specific colors to indicate where the system is likely to be right and wrong with its detections, according to neural network outputs). These ambitious goals could be achieved by a team of researchers around technical project lead Lukas Tuggener within Prof. Thilo Stadelmann’s Computer Vision, Perception and Cognition Group.
The results of project RealScore are two-fold: (i) Transitioning the pipeline to a S2A-Net-based system with rotated detection capabilities and designing an array of domain adaptation techniques based on (i.a) advanced input data augmentation (“ScoreAug”, see Fig. 2) that combine artificial data degradation with real world wear and tear, (i.b) specific neural network training regimes and (i.c) an adversarial domain adaptation algorithm (see Fig. 1), together improving the music symbol recognition (MOR) on real-world noisy data by more than 50%. (ii) Confidence-rated output (see Fig. 3) has been achieved by adapting Snapshot Ensembles successfully to the S2A-Net architecture for the first time in an efficient manner, improving the average precision of the MOR task by 4.6 pp and speeding up subsequent manual post-processing of results by a factor of 3 through a through a user tailored and optimized digitalization toolchain.
The training data has been released as an open research data resource. The final models are in productive use as ScorePad AG, Erlenbach, Switzerland.