DeepScore: Digital Music Stand with Musical Understanding via Active Sheet Technology
At a glance
- Project leader : Prof. Dr. Thilo Stadelmann
- Project team : Dr. Philipp Ackermann, Diego Hernan Browarnik, Dr. Dan Ciresan, Ismail Elezi, Gabriel Eyyi, Prof. Dr. Jürgen Schmidhuber, Lukas Tuggener
- Project budget : CHF 935'000
- Project status : completed
- Funding partner : CTI (KTI-Projekt / Projekt Nr. 17963.1 PFES-ES)
- Project partner : ScorePad AG, Swiss AI Lab IDSIA
- Contact person : Thilo Stadelmann
Description
_Management Abstract
Playing and enjoying music is amongst the most rewarding
recreational activities of humankind for individuals as well as in
group settings. Visiting concerts or sending one’s kids to music
lessons - thus being enabled to discuss and co-shape the musical
part of our culture - are hence important aspects of cultural
participation and social interaction. But with the public promotion
of culture and art being on the drawback, new ways for reducing the
cost of and easing the access to the variety of our music is
needed.
The CTI-funded DeepScore project - a consortium of Swiss startup ScorePad GmbH, data science leader ZHAW Datalab, and deep learning pioneer IDSIA - set out in mid-2016 to ensure this by bringing digitization to the process of playing music, thus securing the quality of life springing from feasible production and then consumption of musical performances.
By stretching the limits of current deep learning methods in computer vision, we aim at lifting Optical Music Recognition to a level where it can be used by musicians to convert musical scores into digital active sheets, allowing for a number of new convenience functions like playback, transposition, automatic page turning, intra-orchestra communication etc. We experience high demand already from professional orchestras as well as music schools - all because of the remarkably simple but technically currently very challenging idea of putting a tablet computer on the music stand that not just displays pictures of scores.
Why not putting a tablet on a music stand instead of a paper
sheet?
We know a lot of musicians who raise exactly this question.
Particularly to the younger generation – but not only! – it seems
almost self-evident to use mobile technology not only to support
and improve, but also ensure their cultural musical activities, and
also enrich the communication within the community. So why not just
take a tablet, and an APP? The short answer: Because nothing
appropriate can be found, and in particular digital scores in
sufficient quality are very rare. Below we explain how the
DeepScore project creates technology to digitize music efficiently
and enables a digital music stand for professional and leisure
musicians alike, thus enhancing the joy of living through the
enjoyment to play music.
The DeepScore project presented here is
- innovative and original: for the first time, deep learning methodology is applied to convert traditional music scores into a digital format, promising adequate transcription quality; and
- of general interest: there are huge libraries of works by renowned composers awaiting to be converted into a digital format. This will allow to further spread music by using modern technology, while reliably allowing to control the rights of the owners as appropriate.
_Introduction to DeepScore and its result, the ScorePad APP
For centuries now, paper has been the best material for music
scores due to lack of alternatives. Nevertheless, in practice,
certain properties are problematic. After some time of intense use,
paper may become unusable; with a certain amount of scores,
archiving and searching becomes cumbersome; turning pages calls for
a free hand; air can blow paper away etc. Musicians have a clear
need for a modern and easy to use solution, running on regular
tablet computers. Besides replacing paper sheet music, ScorePad
will go beyond the existing, and mainly static electronic products
(mainly PDF based). There will be individual functions, such as
taking notes, editing, transposition, fade-in and -out of elements,
automatic page turning, playback, as well as interactive functions
that shall support and enrich the play of ensembles and orchestras,
and the exchange of ideas among musicians. There is currently no
comparable product in the market, integrating the functions
relevant for musicians.
Our idea is designed by musicians for the specific needs of
musicians. The DeepScore project brings together practitioners and
active orchestras, and a start-up (ScorePad GmbH) with the IT
research dept. of a university of applied sciences (ZHAW, AI group
of Thilo Stadelmann & computer music expertise of Philipp
Ackermann) as well as a fundamental research institute (IDSIA,
group of deep learning pioneer Juergen Schmidhuber) to make sure to
fit the developed solution to the needs. The project with the
overall volume 935’000 CHF started in summer 2016 and will run for
approximately 2 years before delivering a marketable result, funded
by CTI.
_What will be the effect?
We foresee the following three specific effects of ScorePad on the
cultural scene:
New technology enables digitization in the music industry. Our deep
learning-based technology opens the way to absolutely new functions
one would not have thought about when using paper. Some new
functions seem obvious, e.g.
- automatic page turning (most musicians have no free hand, and some even no free feet),
- synchronization of entire orchestras (e.g. conductor can give precise indications to musicians in real time),
- exchange of ideas in the community (e.g. to prepare a performance, or to share individual interpretations with peers),
- transposition (e.g. when a teacher wants to adapt a violin piece for a cello pupil),
- support of education (learning programs, track-and-feedback).
New needs will certainly appear while using our new system. Once the system will be developed, it will considerably simplify the daily work of practitioners and students, giving them easier access to their genuine activity – playing music.
Digitized Scores. Precondition to the functioning of such a system
is the availability of digitized scores in good quality. The
established standard is musicXML [1]. Today, digital scores in good
quality are scarce. So far the music publishers are very cautious
with scores in a non-traditional format. Even official offerings in
PDF are rare. The reason is simple: illegal copying. Paper is being
photocopied, and PDF documents can even more easily be copied and
sent around the world. In this context, several points need to be
mentioned.
- Currently, a musician in need of a score in digital format (musincXML) has three options: (1) It exists and can be downloaded in the internet, but this is quite rare; (2) use a scanning software, but the resulting quality is bad, and around 25% content must be corrected by hand; (3) use an advanced composers software (Sibelius or Finale), and type in the entire score by hand - but this can be quite cumbersome.
- Scores in a logical digitized format allow the implementation of a proper digital rights management (DRM), and can thus safely be copy protected. This will open the door to cooperation with music editors.
- DeepScore will contribute to an efficient conversion of printed scores into a digital format (musicXML). It is planned to offer the conversion feature either to users who acquired a score in a traditional format, or directly to music editors, to allow them to convert their own library.
- Thanks to the result of the DeepScore project, a musician will be free to choose the specific music he or she really needs, and practice with one single application. Nowadays a musician needs a separate APP for almost each function, and is restricted to those score offerings within a given application (e.g. the Henle-APP allows to download and read only Henle scores).
Paper & Environment. It is well known that the production of
paper is rather resource intensive (primarily wood, and energy).
Traditional sheet music uses a lot of paper. According to one of
the German big players in the music publishing industry, “[a]bout
130,000 titles (scores, books, recordings) […] are shipped from
Mainz worldwide every year” [2]. There is no doubt that there is
potential for a reduction in paper consumption by introducing this
new technology.
_What are the scientific issues, and where is the
innovation?
Deep neural networks [3] have disrupted the world of computer
vision, which became obvious to a wider audience at least since
Alex Krizhevsky’s exploit on the ImageNet task in 2012 [4]. The
corresponding gain in perceptual performance on a wide variety of
vision tasks has often been such that error rates could be halved
or even be improved by an order of magnitude, for example on the
task of OCR (optical character recognition, the process from image
to machine-readable text) [5]. Taking into account this and the
previously described performance gap in the related OMR technology
(optical music recognition), which has yet not profited from the
deep learning surge, it is appealing to improve OMR by deep
learning.
The main challenge is that application is not straightforward:
While in classical OCR, the text is basically a 1D signal (symbols
to be recognized are organized in lines of fixed height, in which
they extend from left to right or vice versa), musical notation can
additionally be stacked arbitrarily also on the vertical axis, thus
becoming a 2D signal. This would exponentially increase the number
of symbols to be recognized, if approached the usual way (which is
intractable from a computational as well as from a classification
point of view). The alternative route we are going is to enhance
the usual convolutional neural network (CNN) approach with
techniques like end-to-end learning of a combined
detection-classifier in order to overcome this challenge.
The OMR problem is not solved, though, when a deep neural network has learned to recognize single musical symbols: The individual symbols have to be assembled to a complete score, which is usually done using a lot of musical understanding, encoded into rules. To focus on the recognition part of the DeepScore innovation, we are using the open source OMR system Audiveris [6] for the overarching process of creating MusicXML. We are in active discussion with the Audiveris project leader and will contribute our findings to the project, thus giving back to the music community. Our very first prototype after one third of the project duration already outperforms the best-in-class open source system by a large margin on the set of symbols that is hardest to classify. A workshop for the computer music open source community is planned in summer to discuss these findings and further ideas to disseminate them widely within the community, creating even more impact.
_Conclusion
ScorePad is a tablet solution for musicians, who can use the scores
of their choice, work as they are used to, exchange with peers, and
synchronize orchestras. If they wish to use the new technology,
they will be able to do this by using one single application. This
alone will contribute to controlling the costs of professional
musical offerings like classic concertos, keeping it affordable for
the wider public. ScorePad is also of great value to music teachers
and students of all levels, making learning and thus cultural
participation easier and cheaper. Thus, ScorePad will offer a new
system to the musician’s community who are an important part of the
living culture in our society. Additionally, the included exchange
platform for users will support them in the communication amongst
each other.
By providing an efficient conversion procedure, the DeepScore project will lay the ground for the availability of digitized scores, which are precondition to such a system. DeepScore does so by applying and enhancing state of the art deep neural networks that have proven invaluable in other areas, but which’s application to the 2D structure of musical notation is not straightforward. There are huge traditional music libraries – some can be considered an heritage of humanity – awaiting to be conversed into a modern usable format [7]. All this is done by going considerably beyond the current state of the art in deep learning and optical music recognition, in a collaboration project between industry and 2 academic partners with a real product as the result.
_References
[1] www.musicxml.com
[2] en.schott-music.com/about/
[3] Schmidhuber, “Deep learning in neural networks”, 2014
[4] Krizhevsky, Sutskever, Hinton, “Imagenet classification with
deep convolutional neural networks”, 2012
[5] Lee, Osindero, “Recursive Recurrent Nets With Attention
Modeling for OCR in the Wild”, 2016
[6] Audiveris open music scanner, audiveris.kenai.com
[7] see e.g. dme.mozarteum.at/DME/main/
Publications
-
Elezi, Ismail; Tuggener, Lukas ; Pelillo, Marcello; Stadelmann, Thilo ,
2018.
DeepScores and Deep Watershed Detection : current state and open issues [ paper ].
In:
Proceedings of the 1st International Workshop on Reading Music Systems.
1st International Workshop on Reading Music Systems at ISMIR 2018, Paris, France, 20 September 2018.
Paris:
Society for Music Information Retrieval.
pp. 13-14.
Available from : https://doi.org/10.21256/zhaw-4777
-
Tuggener, Lukas ; Elezi, Ismail; Schmidhuber, Jürgen; Stadelmann, Thilo ,
2018.
Deep watershed detector for music object recognition [ paper ].
In:
Proceedings of the 19th International Society for Music Information Retrieval Conference.
19th International Society for Music Information Retrieval Conference, Paris, 23-27 September 2018.
Paris:
Society for Music Information Retrieval.
Available from : https://doi.org/10.21256/zhaw-3760
-
Tuggener, Lukas ; Elezi, Ismail; Schmidhuber, Jürgen; Pelillo, Marcello; Stadelmann, Thilo ,
2018.
DeepScores : a dataset for segmentation, detection and classification of tiny objects [ paper ].
In:
Proceedings of the 24th International Conference on Pattern Recognition.
24th International Conference on Pattern Recognition (ICPR 2018), Beijing, China, 20-28 August 2018.
Beijing:
IAPR.
pp. 1-6.
Available from : https://doi.org/10.21256/zhaw-4255