Swiss German Speech Processing for Persons with Speech Disorders (SpeeDi)

We will develop language technologies for imperfect Swiss German speech. These technologies are crucial to build AI applications for persons with speech disorders, e.g chatbots for children in speech therapy or diagnosis tools for patients with Parkinson’s Disease.

Key data

Description

A common application of AI in speech therapy are chatbots that let patients practice speaking. Their responses can be
analyzed to track speech progress or diagnose disorders (e.g., early Parkinson’s signs).

For such tools, four components are needed:

Speech-to-Text (STT): Converts patient speech to text, typically to Standard German.
Large Language Model (LLM): Generates intelligent responses for the STT output.
Text-to-Speech (TTS): Converts responses to SG audio.
Error Analysis: Identifies speech errors and abnormal phrasing.

While STT, LLM, and TTS exist for SG, there is no error analysis system yet.

Our project focuses on:
Error-Preserving STT (EP-STT-SG): Transcribes SG audio to SG text, thereby preserving speech errors - in contrast to
“normal” STT, which usually transcribes to Standard German text and smoothes out speech errors
Error Analysis: Detects speech errors and linguistic abnormalities in EP-STT-SG outputs

In addition, we will refine and optimize our existing STT and TTS models for SG, and we will make all resources available for
research and non-research purposes.