Datenbestand vom 10. Juli 2019
Tel: 089 / 66060798
Mo - Fr, 9 - 12 Uhr
Fax: 089 / 66060799
aktualisiert am 10. Juli 2019
978-3-86853-859-5, Reihe Informatik
Corpus based methods for ontology modularization in healthcare
154 Seiten, Dissertation Ludwig-Maximilians-Universität München (2011), Hardcover, A5
The goal of this thesis is to develop a corpus based approach for ontology modularization that can be applied within a computational framework. Modularization refers to a situation, where an artifact as a whole can also be perceived as a set of its parts. Consequently, ontology modularization identifies the set of parts that represent the whole ontology. Therefore, the goal of ontology modularization is to reduce the amount of information in ontologies to a subset that is sufficient and relevant for an application, for example data annotation or semantic search. Two main approaches for identifying ontology modules can be grouped as being semantics-driven or structure-driven. However, neither of them utilize the context, which can be in the healthcare domain a specific disease or a kind of medical image.
We propose a corpus based ontology modularization approach that utilizes the context information. Preconditions of this approach include compiling domain corpora, which are in our case clinical corpora. The underlying assumption is that domain corpora can represent context (e.g., diseases) and context information can be utilized to identify application relevant parts of clinical ontologies. Consequently, we apply statistical and structural analysis methods to the clinical corpora and to the clinical ontologies to determine the relevant ontology modules. In our use case, the identified ontology modules are used to annotate medical images of patients. Thus, searching for medical images reporting on a specific disease such as breast cancer becomes possible.
This thesis is divided into five main parts. In Part I, we give an overview of the landscape into which this thesis fits by examining ontologies, natural language processing and their role in healthcare. In Part II, we investigate the potential of knowledge engineering methodologies to achieve ontology based solutions in the medical domain. We design and implement an approach to establish efficient knowledge based systems that utilize clinical ontologies. We also explore the ontology modularization research field in terms of common approaches and applications.
In Part III, we explain the corpus based modularization approach in two main steps. The first step includes the statistical analysis of the ontology concepts given domain corpora. The statistically most significant concepts are then subject to the structural analysis in the second step to identify the ultimate modules. Toward this end we describe and discuss our experiments. In Part IV, we empirically evaluate our results, where we additionally solicit external feedback from medical experts. Finally in Part V we discuss our main findings, analyze success factors and introduce prospective research directions.
Our findings show that our corpus based approach for modularizing large ontologies is feasible and useful, while there are critical success factors. The representativeness and size of corpora have significant impact on the quality of the results. Hence, the larger and the more context relevant the corpora, the better are the results. Furthermore, the choice of the statistical algorithm used for the selection of ontology concepts is important. Hence, an algorithm that is more strict with the selection of the context relevant ontology concepts delivers better results.