Datenbestand vom 12. November 2025
Verlag Dr. Hut GmbH Sternstr. 18 80538 München Tel: 0175 / 9263392 Mo - Fr, 9 - 12 Uhr
aktualisiert am 12. November 2025
978-3-8439-5679-6, Reihe Elektrotechnik
Marvin Tammen Combining Model-Based and Learning-Based Approaches for Speech Enhancement
206 Seiten, Dissertation Carl von Ossietzky Universität Oldenburg (2025), Hardcover, B5
In many speech communication devices, such as smartphones, smartspeakers, and hearing devices, the microphones capture not only the target speaker but also undesired ambient noise, degrading speech quality and speech intelligibility. Speech enhancement algorithms aim at extracting the target speech from the recorded microphone signals by suppressing noise while not distorting the target speech. Over the past decade, there has been a shift from model-based statistical signal processing approaches to learning-based data-driven approaches. Although model-based approaches offer interpretability and theoretical guarantees, they often struggle in complex, real-world acoustic scenarios where their assumptions are violated. In contrast, learning-based approaches generally achieve higher performance in such scenarios due to their strong representation capacity but may lack interpretability, theoretical guarantees, and robustness when the data observed during inference does not match the training data.
Motivated by the potential to combine the interpretability of model-based approaches with the strong representation capacity of learning-based approaches, the primary objective of this thesis is to develop and evaluate hybrid speech enhancement algorithms that employ a learning-based stage to estimate quantities required by a model-based enhancement stage. The main focus is on investigating whether imposing structure on the estimated quantities—such as correlation matrix structure, correlation vector structure, or spatial structure—improves speech enhancement performance, interpretability, and computational complexity. Another focus is on developing geometry-robust hybrid speech enhancement algorithms that can operate with arbitrary microphone array configurations. While the developed algorithms can be used for various speech enhancement applications, our focus is on hearing devices, where low latency is crucial.