Machine Perception and Cognition Group

“AI is THE key technology of the digital transformation, across sectors and industries, with major effects on our societies. Our research thus makes major contributions to the development of robust and trustworthy AI methods, and we enthusiastically teach their safe implementation and application.”
Fields of expertise

- Pattern recognition with deep learning
- Machine perception, computer vision and speaker recognition
- Neural system development
The MPC group conducts pattern recognition research, working on a wide variety of tasks relating to image, audio, and signal data per se. We focus on deep neural network and reinforcement learning methodology, inspired by biological learning. Each task we study has its own learning target (e.g., detection, classification, clustering, segmentation, novelty detection, control) and corresponding use case (e.g., predictive maintenance, speaker recognition for multimedia indexing, document analysis, optical music recognition, computer vision for industrial quality control, automated machine learning, deep reinforcement learning for automated game play or building control), which in turn sheds light on different aspects of the learning process. We use this experience to create increasingly general AI systems built on neural architectures.
Services
- Insight: keynotes, trainings
- AI consultancy: workshops, expert support, advise, technology assessment
- Research and development: small to large-scale collaborative projects, third party-funded research, student projects, commercially applicable prototypes
Team
Head of Research Group
Projects
-
TAILOR – Foundations of Trustworthy AI - Integrating Reasoning, Learning and Optimization
The main ambition of TAILOR is to build the capacity of providing the scientific foundations for Trustworthy AI in Europe by developing a network of research excellence centers with a technical focus on combining research excellence in the areas of learning, optimisation and reasoning . The current…
completed, 01/2020 - 12/2021
-
DeepScore: Digital Music Stand with Musical Understanding via Active Sheet Technology
_Management Abstract Playing and enjoying music is amongst the most rewarding recreational activities of humankind for individuals as well as in group settings. Visiting concerts or sending one’s kids to music lessons - thus being enabled to discuss and co-shape the musical part of our culture - are…
completed, 07/2016 - 01/2019
-
DIR3CT: Deep Image Reconstruction through X-Ray Projection-based 3D Learning of Computed Tomography Volumes
Project DIR3CT aims at improving the image quality of CBCT images by deep learning (DL) the 3D reconstruction from X-ray images end-to-end. This enables a novel CBCT product to be used during radiation therapy and will allow the use of these images for adaptive treatment.
completed, 02/2020 - 05/2022
-
RealScore – Scanning of Real-World Sheet Music for a Digital Music Stand
ScorePad’s sheet music scanning service works for high quality input; to scale up business, it should work as well for smartphone pictures, used sheets etc. Project RealScore enhances the successful predecessor project by making deep learning adapt to unseen data through unsupervised learning.
completed, 09/2019 - 05/2022
-
Visual Food Waste Analysis for Sustainable Kitchens (FWA)
A novel approach for a fully automated food waste management solution for commercial kitchens is investigated. Food waste is automatically detected using a new camera device, preprocessed in real-time and classified using machine learning algorithms.
completed, 07/2019 - 09/2021
-
Synthetic data generation of CoVID-19 CT/X-rays images for enabling fast triage of healthy vs. unhealthy patients
The automatic analysis of X-ray/CT images through artificial intelligence models can be useful to automate the clinical scanning procedure. Nonetheless, the limited access to real COVID patient data leads to the need of synthesizing image samples. The goal of this project is to use existing CT/X-ray…
completed, 05/2020 - 07/2020
-
3D-Master for a Digitized Manufacturing Platform
We enhance Bossard's Real Time Manufacturing Services by automatically creating quotes for special parts. The core is an AI-created 3D Master that unifies all available part information, enabling pricing and feasibility evaluation for many manufacturing technologies incl. additive…
completed, 12/2022 - 05/2025
-
Feasibility Study Reinforcement Learning for Heating Systems
completed, 10/2018 - 10/2020
-
MobileMall
Developing Intelligent Demand&Supply Routing in a Virtual Mobile Mall for Local Retailers
completed, 12/2013 - 02/2015
-
FarmAI – Artificial intelligence for Farming Simulator
completed, 12/2016 - 05/2018
-
Automated Article Segmentation of Newspaper Pages for "Real Time Print Media Monitoring" (PANOPTES)
The new product of ARGUS DATA INSIGHTS Schweiz AG "Real Time Print Media Monitoring" is an automated pipeline. It identifies relevant articles in print media, extracts them and sends them to the customers in real-time. Core of this project is the automated segmentation of full newspaper…
completed, 07/2015 - 09/2017
-
AI-BRIDGE - A Think-and-Do-Tank for Responsible Development and Societal Alignment of Artificial Intelligence Systems (AI-BRIDGE)
AI-BRIDGE brings responsible AI to the ground by bridging the gap between societal values and development of AI technology and solutions. The AI-BRIDGE Think-and-Do Tank will help organizations to exploit the potential of AI while complying with legal requirements and being compatible with societal…
ongoing, 04/2025 - 12/2029
-
AI powered CBCT for improved Combination Cancer Therapy (AC3T)
The project enables a novel, combined, adaptive cancer therapy combining tumor treating field and radiation therapy due to significantly improved static (3D) and time-resolved (4D) low dose Cone Beam Computer Tomography images based on artificial intelligence image reconstruction algorithms.
completed, 05/2022 - 02/2025
-
dAIrector – Automated multi-camera live production for events
The dAIrector automates multi-cam live productions of concerts, theatre, comedy, musicals through creative AI direction following the dramaturgy on stage. It provides small stages, events, festivals, and artists access to a worldwide audience via YourStage.live.
ongoing, 01/2025 - 12/2027
-
DeepText: Intelligent Text Analysis with Deep Learning
DeepText develops a software framework to automatically analyse texts in order to extract important information. The framework comprises modern algorithms from the field of machine learning (deep learning) that are better at analyzing texts than traditional approaches. They can for example be used…
completed, 09/2016 - 02/2018
-
DISTRAL: Industrial Process Monitoring for Injection Molding with Distributed Transfer Learning
We develop a distributed machine learning system to sort out defect plastic parts during production. Main challenge is the transferability of learnt process know-how from case to case; the solution builds on domain adaptation, continual data-centric deep learning and federated edge computing.
completed, 10/2022 - 03/2025
-
Complexity 4.0
Management of complexity in global value creation
completed, 06/2016 - 08/2017
-
Deep-Learning-basierter Spracherkenner mit beschränkten Trainingsdaten (DeLLA) (DeLLA)
Speech recognition systems baed on Deep Neural Networks (DNN) currently brake all records and is being applied already in different products. These systems normally are trained with thousands of hours of training material for applications and languages where these amounts of data are available. In…
completed, 09/2016 - 11/2017
-
Libra: A One-Tool Solution for MLD4 Compliance
Compared with earlier regulations, the 4th European Money Laundering Directive (MLD4) imposes rigorously increased requirements. It compels obliged entities to conduct in depth screenings of customers and their associations. The Libra Project aims at providing a one tool solution for meeting MLD4…
completed, 09/2016 - 05/2019
-
Talkalyzer
Share-in-Speech Analysis via Real-Time Speaker Classification
completed, 05/2013 - 11/2014
Publications
-
Amirian, Mohammadreza; Rombach, Katharina; Tuggener, Lukas; Schilling, Frank-Peter; Stadelmann, Thilo,
2019.
Efficient deep CNNs for cross-modal automated computer vision under time and space constraints[paper].
In:
ECML-PKDD 2019, Würzburg, Germany, 16-19 September 2019.
ZHAW Zürcher Hochschule für Angewandte Wissenschaften.
Available from: https://doi.org/10.21256/zhaw-18357
-
Hollenstein, Lukas; Lichtensteiger, Lukas; Stadelmann, Thilo; Amirian, Mohammadreza; Budde, Lukas; Meierhofer, Jürg; Füchslin, Rudolf Marcel; Friedli, Thomas,
2019.
Unsupervised learning and simulation for complexity management in business operations.
In:
Braschler, Martin; Stadelmann, Thilo; Stockinger, Kurt, eds.,
Applied data science : lessons learned for the data-driven business.
Cham:
Springer.
pp. 313-331.
Available from: https://doi.org/10.1007/978-3-030-11821-1_17
-
Elezi, Ismail; Tuggener, Lukas; Pelillo, Marcello; Stadelmann, Thilo,
2018.
DeepScores and Deep Watershed Detection : current state and open issues[paper].
In:
Proceedings of the 1st International Workshop on Reading Music Systems.
1st International Workshop on Reading Music Systems at ISMIR 2018, Paris, France, 20 September 2018.
Paris:
Society for Music Information Retrieval.
pp. 13-14.
Available from: https://doi.org/10.21256/zhaw-4777
-
Stadelmann, Thilo; Glinski-Haefeli, Sebastian; Gerber, Patrick; Dürr, Oliver,
2018.
Capturing suprasegmental features of a voice with RNNs for improved speaker clustering[paper].
In:
Artificial Neural Networks in Pattern Recognition.
8th IAPR TC3 Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR), Siena, Italy, 19-21 September 2018.
Springer.
pp. 333-345.
Lecture Notes in Computer Science ; 11081.
Available from: https://doi.org/10.1007/978-3-319-99978-4_26
-
Stadelmann, Thilo; Amirian, Mohammadreza; Arabaci, Ismail; Arnold, Marek; Duivesteijn, Gilbert François; Elezi, Ismail; Geiger, Melanie; Lörwald, Stefan; Meier, Benjamin Bruno; Rombach, Katharina; Tuggener, Lukas,
2018.
Deep learning in the wild[paper].
In:
Artificial Neural Networks in Pattern Recognition.
8th IAPR TC3 Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR), Siena, Italy, 19-21 September 2018.
Springer.
pp. 17-38.
Lecture Notes in Computer Science ; 11081.
Available from: https://doi.org/10.1007/978-3-319-99978-4_2
Other releases
When | Type | Content |
---|---|---|
2023 | Extended Abstract | Thilo Stadelmann. KI als Chance für die angewandten Wissenschaften im Wettbewerb der Hochschulen. Workshop (“Atelier”) at the Bürgenstock-Konferenz der Schweizer Fachhochschulen und Pädagogischen Hochschulen 2023, Luzern, Schweiz, 20. Januar 2023 |
2022 | Extended Abstract | Christoph von der Malsburg, Benjamin F. Grewe, and Thilo Stadelmann. Making Sense of the Natural Environment. Proceedings of the KogWis 2022 - Understanding Minds Biannual Conference of the German Cognitive Science Society, Freiburg, Germany, September 5-7, 2022. |
2022 | Open Reserach Data | Felix M. Schmitt-Koopmann, Elaine M. Huang, Hans-Peter Hutter, Thilo Stadelmann, and Alireza Darvishy. FormulaNet: A Benchmark Dataset for Mathematical Formula Detection. One unsolved sub-task of document analysis is mathematical formula detection (MFD). Research by ourselves and others has shown that existing MFD datasets with inline and display formula labels are small and have insufficient labeling quality. There is therefore an urgent need for datasets with better quality labeling for future research in the MFD field, as they have a high impact on the performance of the models trained on them. We present an advanced labeling pipeline and a new dataset called FormulaNet. At over 45k pages, we believe that FormulaNet is the largest MFD dataset with inline formula labels. Our dataset is intended to help address the MFD task and may enable the development of new applications, such as making mathematical formulae accessible in PDFs for visually impaired screen reader users. |
2020 | Open Research Data | Lukas Tuggener, Yvan Putra Satyawan, Alexander Pacha, Jürgen Schmidhuber, and Thilo Stadelmann, DeepScoresV2. The DeepScoresV2 Dataset for Music Object Detection contains digitally rendered images of written sheet music, together with the corresponding ground truth to fit various types of machine learning models. A total of 151 Million different instances of music symbols, belonging to 135 different classes are annotated. The total Dataset contains 255,385 Images. For most researches, the dense version, containing 1714 of the most diverse and interesting images, is a good starting point. |