Datenbestand vom 20. Mai 2019
Tel: 089 / 66060798
Mo - Fr, 9 - 12 Uhr
Fax: 089 / 66060799
aktualisiert am 20. Mai 2019
978-3-8439-0309-7, Reihe Statistik
Boosting in Structured Additive Models
168 Seiten, Dissertation Ludwig-Maximilians-Universität München (2011), Softcover, A5
Variable selection and model choice are of major concern in many statistical applications, especially in regression models for high-dimensional data. Boosting is a convenient statistical method that combines model fitting with intrinsic model selection. We investigate the impact of base-learner specification on the performance of boosting as a model selection procedure. We show that variable selection may be biased if the base-learners have different degrees of flexibility, both for categorical covariates and for smooth effects of continuous covariates. We investigate these problems from a theoretical perspective and suggest a framework for unbiased model selection based on a general class of penalized least squares base-learners. Making all base-learners comparable in terms of their degrees of freedom strongly reduces the selection bias observed with naive boosting specifications.
Furthermore, the definition of degrees of freedom that is used in the smoothing literature is questionable in the context of boosting, and an alternative definition is theoretically derived. The importance of unbiased model selection is demonstrated in simulations and in an application to forest health models. A second aspect of this thesis is the expansion of the boosting algorithm to new estimation problems: by using constraint base-learners, monotonicity constrained effect estimates can be seamlessly incorporated in the existing boosting framework. This holds for both, smooth effects and ordinal variables. Furthermore, cyclic restrictions can be integrated in the model for smooth effects of continuous covariates. In particular in time-series models, cyclic constraints play an important role. Monotonic and cyclic constraints of smooth effects can, in addition, be extended to smooth, bivariate function estimates. If the true effects are monotonic or cyclic, simulation studies show that constrained estimates are superior to unconstrained estimates. In three case studies (the modeling the presence of Red Kite in Bavaria, the modeling of activity profiles for Roe Deer, and the modeling of deaths caused by air pollution in São Paulo) it is shown that both constraints can be integrated in the boosting framework and that they are easy to use. All described results were included in the R add-on package mboost.