Datenbestand vom 20. Juni 2024

Warenkorb Datenschutzhinweis Dissertationsdruck Dissertationsverlag Institutsreihen     Preisrechner

aktualisiert am 20. Juni 2024

ISBN 978-3-8439-4245-4

84,00 € inkl. MwSt, zzgl. Versand

978-3-8439-4245-4, Reihe Elektrotechnik

Simone Hantke
Intelligent Gamified Crowdsourcing for Audio Processing

219 Seiten, Dissertation Technische Universität München (2019), Softcover, A5

Zusammenfassung / Abstract

Present-day speech assistance systems are an inherent part of our life. There are now spontaneous and robust speech analysis systems being developed as well as a broad range of machine learning algorithms and advanced evaluation methods that support researchers during data analysis. However, the success of these technologies is due in part to the amount and quality of labelled training data. Therefore, a large number of speakers and annotators is required, and thus a substantial investment is needed to arrange and label such resources. As such, data collection procedures are costly, time-consuming, and laborious, and there is currently a scarcity of labelled data for a variety of tasks. In this regard, this thesis presents novel speech processing and machine learning architectures by introducing the intelligent gamified crowdsourcing platform iHEARu-PLAY for large-scale, audio-visual, in-the-wild data collection and annotation. The platform includes two novel machine learning algorithms, both of which take into account the trustability of the annotator to help ensure reliable annotations. In this context, various large-scale audio datasets were either created, taken from internet archives, or collected in laboratory settings. These datasets and the proposed platform were evaluated by performing a broad range of classification and speech analysis tasks. Furthermore, extensive experiments concentrating on the proposed Trustability-based Dynamic Active Learning and Trustability-based Cooperative Learning algorithms demonstrated that the introduced approaches prevail over current state-of-the-art techniques in terms of both classification accuracy and annotation reduction. Finally, perception studies were conducted over a broad range of datasets, demonstrating the usability and versatility of iHEARu-PLAY. Key results within this thesis indicate that the introduced principles lead to faster, more cost-effcient, and more reliable data collection than previously feasible.