Knowledge Organization Systems (KOS) are typically used as background knowledge
for document indexing in information retrieval. They have to be maintained
and adapted constantly to reflect changes in the domain and the terminology. In
this thesis, approaches are provided that support the maintenance of hierarchical
knowledge organization systems, like thesauri, classifications, or taxonomies, by
making information about the usage of KOS concepts available to the maintainer.
The central contribution is the ICE-Map Visualization, a treemap-based visualization
on top of a generalized statistical framework that is able to visualize almost
arbitrary usage information. The proper selection of an existing KOS for available
documents and the evaluation of a KOS for different indexing techniques by means
of the ICE-Map Visualization is demonstrated.
For the creation of a new KOS, an approach based on crowdsourcing is presented
that uses feedback from Amazon Mechanical Turk to relate terms hierarchically.
The extension of an existing KOS with new terms derived from the documents
to be indexed is performed with a machine-learning approach that relates
the terms to existing concepts in the hierarchy. The features are derived from text
snippets in the result list of a web search engine. For the splitting of overpopulated
concepts into new subconcepts, an interactive clustering approach is presented that
is able to propose names for the new subconcepts.
The implementation of a framework is described that integrates all approaches
of this thesis and contains the reference implementation of the ICE-Map Visualization.
It is extendable and supports the implementation of evaluation methods
that build on other evaluations. Additionally, it supports the visualization of the
results and the implementation of new visualizations. An important building block
for practical applications is the simple linguistic indexer that is presented as minor
contribution. It is knowledge-poor and works without any training.
This thesis applies computer science approaches in the domain of information
science. The introduction describes the foundations in information science; in the
conclusion, the focus is set on the relevance for practical applications, especially
regarding the handling of different qualities of KOSs due to automatic and semiautomatic
maintenance