Datenbestand vom 13. März 2019

Tel: 089 / 66060798 Mo - Fr, 9 - 12 Uhr

Impressum Fax: 089 / 66060799

aktualisiert am 13. März 2019

978-3-86853-782-6, Reihe Informatik

Nikolas Fechner Structured Kernel-based Machine Learning for Cheminformatics

155 Seiten, Dissertation Eberhard-Karls-Universität Tübingen (2010), Softcover, A5

A fundamental paradigm in cheminformatics is the similarity axiom that states that similar chemical structures have similar properties. This assumption serves as the base for all machine learning approaches that target the generation of models for complex molecular properties (e.g., the biological activity) by learning the functional relation between structure and property using a set of molecules for which the property has been experimentally determined. Unfortunately, most approaches to compute these relationships mathematically rely on a numeric representation of the molecules. Therefore, the similarity axiom can not be used directly, but is replaced by two assumptions: first, that a similar structure implies a similar numeric encoding and second, that a similar encoding corresponds to a similar target property. Consequently, the search for numerical encodings that fulfil these requirements has been an important research area in cheminformatics.

In recent years, kernel-based machine learning techniques increasingly gained attention in cheminformatic research. At first, this was motivated by the theoretical properties that these machine learning methods have, but the additional possibility that these techniques allow to apply the similarity principle in a direct way proved as an important advantage as well. Kernel-based techniques allow to formulate machine learning approaches solely by means of pairwise similarities, if the similarity measure fulfils the kernel properties. Therefore, the additional assumptions needed by the numerical encoding based approaches are not necessary. Instead, it is possible to design the similarity measure such that the real chemical similarity is expressed in a sensible way. In this thesis, several similarity measures for molecules, which can be used for kernel-based techniques, are presented. It is shown, that the modelling performance can be increased by using such a structured kernel instead of a numerical encoding.

Unfortunately, many approaches to problems related to the modelling of molecular properties are based on numerical representations of molecules. In order to apply these ideas to kernel-based techniques, kernel-based formulations of these approaches have to be defined. In this thesis, kernel-based formulations for the estimation of the applicability domain of a model are presented and compared regarding their application in virtual screening. Moreover, a Free-Wilson-like concept to analyze QSAR models based on a certain type of molecule kernels is developed and evaluated regarding its usage for revealing the possible structural causes for the biological activity as well as for the target selectivity of molecules.