Key areas of research
“Methods for computer vision detect cancer cells or help blind people to navigate through town, chatbots answer customer questions 24 hours a day, and self-driving cars are set to radically change our transport systems. Our research strives for excellence through practical applicability.”
At the CAI, we focus on machine learning and deep learning methodology. In our experience, breakthroughs in one use case tend to translate well to different domains, current AI methodology being largely sector-independent. We apply our expertise in the following areas:
Health and medicine, industry 4.0, robotics, predictive maintenance, automated quality control, document analysis, and other data science use cases in industries including manufacturing, finance and insurance, retail, transportation, digital farming, weather forecasting, earth observation and many more.
- Reinforcement learning
- Multi-agent systems
- Embodied AI
In the field of autonomous learning systems research, we investigate the design and development of intelligent systems, specifically the kinds that create a feedback loop between perception (processing of incoming sensor data) and action (execution of actions that influence the environment to be perceived: perception-action loop). An important methodology in this context is (deep) reinforcement learning, which allows agents to learn through trial and error. In the future, this reward-based type of learning will create pathways to completely new areas of application beyond the traditional learning from pairs of input and hand-engineered output in most industries, for example in industrial production or in the field of neurotechnology. Interconnecting such systems with hardware equipped with the required sensors and actuators creates additional training potential for the algorithms of autonomous systems through physical interaction (embodiment, e.g. in a robotic device).
For the globally successful "Farming simulator" video game series developed by GIANTS Software GmbH, a new, continually entertaining, easily extendable game mode has been made possible through artificial intelligence (AI). In this project, reinforcement learning algorithms are used to find suitable action strategies by simulating games.
While in living beings, vision is an active process whereby image acquisition and classification are intertwined to gradually refine perception, much of today’s computer vision is built on the inferior paradigm of episodic classification of i.i.d. samples. Our aim is to enable improved scene understanding for robots while taking the sequential nature of seeing over time into account. We present a combined supervised and reinforcement learning multi-task approach to answer questions about different aspects of a scene, such as the relationship between objects, their quantity or their relative positions to the camera.
- Pattern recognition
- Machine perception
- Neuromorphic engineering
The focus of the computer vision, perception and cognition area is on generating semantic understanding from high-dimensional input. This is achieved by learning and then finding essential patterns using machine learning methodology and, specifically, deep neural networks in a data-driven way (pattern recognition). Input sources include data from images, videos and other multimedia signals, but also multi-dimensional data series from any technical and non-technical field. Methodologically, classification, semantic segmentation and object detection play a role in the analysis of the input or corresponding generative models for their synthesis (e.g. using generative adversarial networks). Biology-inspired ideas from the field of neuroscience are used to further develop the methodology (neuromorphic engineering).
We have built a sheet music scanning service that ensures high quality input. To increase market penetration, we plan to extend its use to smartphone images, used sheets and other sources. Project RealScore enhances the successful precursor project by making deep learning adapt to previously unseen data through unsupervised learning.
Due to the methodologies vast success on a wide range of machine perception tasks, deep learning with neural networks is applied by an increasing number of actors outside classic research environments. While this interest is fuelled by exciting success stories, practical work in deep learning on novel tasks without existing baselines remains challenging. This paper explores specific challenges that arise in the realm of real-world tasks, based on case studies from research and development in conjunction with industry, and extracts the lessons learned.
- Trustworthy machine learning
- Robust deep learning
- AI & society
In this strategic focus area, we investigate deep learning approaches that meet the special requirements of professional (e.g. industrial or medical) practice with regard to the trustworthiness of the methods used: On the one hand, this means achieving results that are robust to slightly varying input (e.g. due to creeping changes in the environment -covariate shift, concept drift- or adversarial attacks) and robust despite small training sets (“small data” or “learning from little”, e.g. through better transfer learning and the use of self-supervised and unsupervised learning). On the other hand, it is important to make both the learned models and the training process itself explainable (Explainable AI, XAI). This serves the trust of a user or affected person in the system (Trustworthy AI). Fulfillment of regulatory (certifiability) and social requirements (e.g. avoidance of algorithmic bias) for the corresponding systems is also a subject of research in this area.
QualitAI researches and develops a device for the automatic quality control of industrial products such as cardiac balloon catheters. This is facilitated through innovation in analysing camera images using deep learning, specifically in rendering the resulting model robust and interpretable.
The existence of adversarial attacks on convolutional neural networks (CNN) raises doubts about the fitness of such models for serious applications. Such attacks can manipulate input images so that misclassification is evoked while the images continue to look normal to the human observer—they are therefore not easily detectable. In a different context, backpropagated activations of CNN hidden layers—“feature responses” to a given input—have been helpful in visualising for a human “debugger” what the CNN “looks at” while computing its output. In this paper, we propose a novel detection method for adversarial examples to prevent attacks.
- Data-Centric AI
- Continuous Learning
Under this strategic focal topic, development methods and processes for the practical implementation and deployment of the methods dealt with in the other focal topics in actual systems are examined. On the one hand, this includes the development and transfer of know-how and tools that support development (development environments, debugging, etc.), operation (e.g. deployment, life cycle management) and the interactions between both in the context of machine learning systems (MLOps, GPU cluster computing, etc.). On the other hand, new ways are being sought to develop corresponding systems with a focus on optimising the available data (data-centric AI) and to provide machine learning support for the collection, processing and labelling of the data. In this context, the focus is on the further development of approaches for systems that learn continuously instead of being trained only once and then used permanently (continuous learning). The findings from this thematic focus flow continuously into open source products and demonstrators.
Researchers from the CAI and InES are jointly investigating the opportunities for bundling process knowledge about injection molding processes in neural networks and transferring it to new application scenarios as part of a technical deep dive. The groups of Prof. Stadelmann (Computer Vision, Perception & Cognition, ZHAW CAI) and Prof. Rosenthal (Realtime Platforms, ZHAW InES)- recently teamed up with the Kistler Innovation Lab to explore risks and opportunities for improving automatic quality control and process monitoring in plastic injection molding using advanced machine learning. In particular, transfer learning / domain adaptation and continuous learning in neural networks are being evaluated for small data scenarios. The team secured funding for a technical deep dive through the NTN Databooster from the Data Innovation Alliance and Innosuisse for a three-month study. The results will inform the design and implementation of future Kistler products in another joint project. The NTN Databooster supports data-driven innovation by helping companies find suitable research partners for their challenges, identify new solutions, assess feasibility and risks in technical depths and finally apply for appropriate funding for joint R&D projects. CAI staff members have been active in various capacities since the founding of Databooster and its supporting organisation, the Data Innovation Alliance, and have successfully completed several R&D projects in this framework.
We present an extensive evaluation of a wide variety of promising design patterns for automated deep-learning (AutoDL) methods, organized according to the problem categories of the 2019 AutoDL challenges, which set the task of optimizing both model accuracy and search efficiency under tight time and computing constraints. We propose structured empirical evaluations as the most promising avenue to obtain design principles for deep-learning systems due to the absence of strong theoretical support. From these evaluations, we distill relevant patterns which give rise to neural network design recommendations. In particular, we establish (a) that very wide fully connected layers learn meaningful features faster; we illustrate (b) how the lack of pretraining in audio processing can be compensated by architecture search; we show (c) that in text processing deep-learning-based methods only pull ahead of traditional methods for short text lengths with less than a thousand characters under tight resource limitations; and lastly we present (d) evidence that in very data- and computing-constrained settings, hyperparameter tuning of more traditional machine-learning methods outperforms deep-learning systems.
- Dialogue systems
- Text analytics
- Speech Processing
The focus area natural language processing investigates the machine understanding of human speech in spoken (spoken language processing, automatic speech recognition, speaker diarization) and written form (natural language processing, text analytics) as well as the use of corresponding machine learning methods, such as transformers. This research is conducted in the context of dialogue systems and aims to enable natural language communication between humans and machines. Particular attention is paid to the development and availability of adapted methods and models for dialects and rare languages for which only limited training data is available. Further research areas include text classification (e.g. sentiment analysis), chatbots and natural language generation.
By developing child-like avatars for the training of interrogators of children, this project manages to close significant knowledge gaps regarding the effectiveness of individual training elements and personal influencing variables. The findings and the training tool can be used for education and training purposes as well as for the recruitment of personnel. The project’s findings can be used as a basis to improve interrogation practices and help to meet the international demand for a child-friendly justice system.
In this paper, we survey methods and concepts developed for the evaluation of dialogue systems. Evaluation, in and of itself, is a crucial part of the development process. Often, dialogue systems are assessed by means of human evaluation and questionnaires. However, these methods tend to be very expensive and time-consuming, which is why there have been intense efforts to find methods that reduce the involvement of human labour.
Bürgenstock-Konferenz der Schweizer Fachhochschulen und Pädagogischen Hochschulen, Luzern, Schweiz, 20.-21. Januar 2023.
Ali, Waqar; Vascon, Sebastiano; Stadelmann, Thilo; Pelillo, Marcello,
Proceedings of ACM SAC Conference (SAC’23).
2nd Graph Models for Learning and Recognition (GMLR 2023) Track at the 38th ACM/SIGAPP Symposium on Applied Computing (SAC 2023), Tallinn, Estonia, 27 March - 2 April 2023.
Association for Computing Machinery.
von Däniken, Pius; Deriu, Jan Milan; Agirre, Eneko; Brunner, Ursin; Cieliebak, Mark; Stockinger, Kurt,
5th International Conference on Natural Language and Speech Processing (ICNLSP), online, 16-17 December 2022.
ZHAW Zürcher Hochschule für Angewandte Wissenschaften.
Available from: https://doi.org/10.21256/zhaw-26147
Wertz, Lukas; Bogojeska, Jasmina; Mirylenka, Katsiaryna; Kuhn, Jonas,
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers).
2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNLP), online, 20-23 November 2022.
Association for Computational Linguistics.
Available from: https://doi.org/10.21256/zhaw-26577
von der Malsburg, Christoph; Grewe, Benjamin F.; Stadelmann, Thilo,
The Biannual Conference of the German Cognitive Science Society (KogWis), Freiburg, Germany, 5-7 September 2022.
Available from: https://stdm.github.io/downloads/papers/KogWis_2022.pdf
Accessible Scientific PDFs for All
PDF is the most popular document format to provide and distribute information on the internet. It was developed by Adobe 1996 but has been an open format since 2008. It was estimated in 2015 that more than 2.5 trillion PDF documents exist on the internet, covering all aspects of life and research, and their number ...
Speech-to-Text for Swiss German
Synthetic data generation of CoVID-19 CT/X-rays images for enabling fast triage of healthy vs. unhealthy patients
The automatic analysis of X-ray/CT images through artificial intelligence models can be useful to automate the clinical scanning procedure. Nonetheless, the limited access to real COVID patient data leads to the need of synthesizing image samples. The goal of this project is to use existing CT/X-ray image datasets ...
Virtual Kids - Virtual characters to improve the quality of child interrogations
If children are questioned in preliminary proceedings about their own experiences or observations relevant to criminal law, it depends decisively on the quality of the questioning whether their statements can be used in criminal proceedings or whether decisions can be made on this basis and appropriate consequences ...
DIR3CT: Deep Image Reconstruction through X-Ray Projection-based 3D Learning of Computed Tomography Volumes
Project DIR3CT aims at improving the image quality of CBCT images by deep learning (DL) the 3D reconstruction from X-ray images end-to-end. This enables a novel CBCT product to be used during radiation therapy and will allow the use of these images for adaptive treatment.