Datenbestand vom 12. November 2025

Verlag Dr. Hut GmbH
Sternstr. 18
80538 München
Tel: 0175 / 9263392
Mo - Fr, 9 - 12 Uhr

Impressum	Warenkorb	Datenschutzhinweis	Dissertationsdruck	Dissertationsverlag	Institutsreihen		Preisrechner

aktualisiert am 12. November 2025

ISBN 978-3-8439-5679-6

84,00 € ^{inkl. MwSt, zzgl. Versand}

978-3-8439-5679-6, Reihe Elektrotechnik

Marvin Tammen
Combining Model-Based and Learning-Based Approaches for Speech Enhancement

206 Seiten, Dissertation Carl von Ossietzky Universität Oldenburg (2025), Hardcover, B5

Zusammenfassung / Abstract

In many speech communication devices, such as smartphones, smartspeakers, and hearing devices, the microphones capture not only the target speaker but also undesired ambient noise, degrading speech quality and speech intelligibility. Speech enhancement algorithms aim at extracting the target speech from the recorded microphone signals by suppressing noise while not distorting the target speech. Over the past decade, there has been a shift from model-based statistical signal processing approaches to learning-based data-driven approaches. Although model-based approaches oﬀer interpretability and theoretical guarantees, they often struggle in complex, real-world acoustic scenarios where their assumptions are violated. In contrast, learning-based approaches generally achieve higher performance in such scenarios due to their strong representation capacity but may lack interpretability, theoretical guarantees, and robustness when the data observed during inference does not match the training data.

Motivated by the potential to combine the interpretability of model-based approaches with the strong representation capacity of learning-based approaches, the primary objective of this thesis is to develop and evaluate hybrid speech enhancement algorithms that employ a learning-based stage to estimate quantities required by a model-based enhancement stage. The main focus is on investigating whether imposing structure on the estimated quantities—such as correlation matrix structure, correlation vector structure, or spatial structure—improves speech enhancement performance, interpretability, and computational complexity. Another focus is on developing geometry-robust hybrid speech enhancement algorithms that can operate with arbitrary microphone array configurations. While the developed algorithms can be used for various speech enhancement applications, our focus is on hearing devices, where low latency is crucial.