Datenbestand vom 19. September 2016

Tel: 089 / 66060798 Mo - Fr, 9 - 12 Uhr

Impressum Fax: 089 / 66060799

aktualisiert am 19. September 2016

978-3-86853-910-3, Reihe Statistik

Manuel J. A. Eugster Benchmark Experiments - A Tool for Analyzing Statistical Learning Algorithms

194 Seiten, Dissertation Ludwig-Maximilians-Universität München (2011), Softcover, A5

Benchmark experiments nowadays are the method of choice to evaluate learning algorithms in most research fields with applications related to statistical learning. Benchmark experiments are an empirical tool to analyze statistical learning algorithms on one or more data sets: to compare a set of algorithms, to find the best hyperparameters for an algorithm, or to make a sensitivity analysis of an algorithm. In the main part, this dissertation focus on the comparison of candidate algorithms and introduces a comprehensive toolbox for analyzing such benchmark experiments. A systematic approach is introduced -- from exploratory analyses with specialized visualizations (static and interactive) via formal investigations and their interpretation as preference relations through to a consensus order of the algorithms, based on one or more performance measures and data sets. The performance of learning algorithms is determined by data set characteristics, this is common knowledge. Not exactly known is the concrete relationship between characteristics and algorithms. A formal framework on top of benchmark experiments is presented for investigation on this relationship. Furthermore, benchmark experiments are commonly treated as fixed-sample experiments, but their nature is sequential. First thoughts on a sequential framework are presented and its advantages are discussed. Finally, this main part of the dissertation is concluded with a discussion on future research topics in the field of benchmark experiments.

The second part of the dissertation is concerned with archetypal analysis. Archetypal analysis has the aim to represent observations in a data set as convex combinations of a few extremal points. This is used as an analysis approach for benchmark experiments -- the identification and interpretation of the extreme performances of candidate algorithms. In turn, benchmark experiments are used to analyze the general framework for archetypal analyses worked out in this second part of the dissertation. Using its generalizability, the weighted and robust archetypal problems are introduced and solved; and in the outlook a generalization towards prototypes is discussed.

The two freely available R packages -- benchmark and archetypes -- make the introduced methods generally applicable.