197 research outputs found

    The ICE-Map Visualization

    Full text link
    In this paper, we describe in detail the Information Content Evaluation Map (ICE-Map Visualization, formerly referred to as IC Difference Analysis). The ICE-Map Visualization is a visual data mining approach for all kinds of concept hierarchies that uses statistics about the concept usage to help a user in the evaluation and maintenance of the hierarchy. It consists of a statistical framework that employs the the notion of information content from information theory, as well as a visualization of the hierarchy and the result of the statistical analysis by means of a treemap

    University of Mannheim @ CLSciSumm-17: Citation-Based Summarization of Scientific Articles Using Semantic Textual Similarity

    Full text link
    The number of publications is rapidly growing and it is essential to enable fast access and analysis of relevant articles. In this paper, we describe a set of methods based on measuring semantic textual similarity, which we use to semantically analyze and summarize publications through other publications that cite them. We report the performance of our approach in the context of the third CL-SciSumm shared task and show that our system performs favorably to competing systems in terms of produced summaries

    Extending DCAM for Metadata Provenance

    Get PDF
    The Metadata Provenance Task Group aims to define a data model that allows for making assertions about description sets. Creating a shared model of the data elements required to describe an aggregation of metadata statements allows to collectively import, access, use and publish facts about the quality, rights, timeliness, data source type, trust situation, etc. of the described statements. In this paper we outline the preliminary model created by the task group, together with first examples that demonstrate how the model is to be used

    Usage-driven Maintenance of Knowledge Organization Systems

    Full text link
    Knowledge Organization Systems (KOS) are typically used as background knowledge for document indexing in information retrieval. They have to be maintained and adapted constantly to reflect changes in the domain and the terminology. In this thesis, approaches are provided that support the maintenance of hierarchical knowledge organization systems, like thesauri, classifications, or taxonomies, by making information about the usage of KOS concepts available to the maintainer. The central contribution is the ICE-Map Visualization, a treemap-based visualization on top of a generalized statistical framework that is able to visualize almost arbitrary usage information. The proper selection of an existing KOS for available documents and the evaluation of a KOS for different indexing techniques by means of the ICE-Map Visualization is demonstrated. For the creation of a new KOS, an approach based on crowdsourcing is presented that uses feedback from Amazon Mechanical Turk to relate terms hierarchically. The extension of an existing KOS with new terms derived from the documents to be indexed is performed with a machine-learning approach that relates the terms to existing concepts in the hierarchy. The features are derived from text snippets in the result list of a web search engine. For the splitting of overpopulated concepts into new subconcepts, an interactive clustering approach is presented that is able to propose names for the new subconcepts. The implementation of a framework is described that integrates all approaches of this thesis and contains the reference implementation of the ICE-Map Visualization. It is extendable and supports the implementation of evaluation methods that build on other evaluations. Additionally, it supports the visualization of the results and the implementation of new visualizations. An important building block for practical applications is the simple linguistic indexer that is presented as minor contribution. It is knowledge-poor and works without any training. This thesis applies computer science approaches in the domain of information science. The introduction describes the foundations in information science; in the conclusion, the focus is set on the relevance for practical applications, especially regarding the handling of different qualities of KOSs due to automatic and semiautomatic maintenance

    RDF validation requirements - evaluation and logical underpinning

    Get PDF
    There are many case studies for which the formulation of RDF constraints and the validation of RDF data conforming to these constraint is very important. As a part of the collaboration with the W3C and the DCMI working groups on RDF validation, we identified major RDF validation requirements and initiated an RDF validation requirements database which is available to contribute at http://purl.org/net/rdf-validation. The purpose of this database is to collaboratively collect case studies, use cases, requirements, and solutions regarding RDF validation. Although, there are multiple constraint languages which can be used to formulate RDF constraints (associated with these requirements), there is no standard way to formulate them. This paper serves to evaluate to which extend each requirement is satisfied by each of these constraint languages. We take reasoning into account as an important pre-validation step and therefore map constraints to DL in order to show that each constraint can be mapped to an ontology describing RDF constraints generically

    Metadata Provenance: Dublin Core on the Next Level

    Get PDF
    With this poster, we want to present the current state of the DCMI Metadata Provenance Task Group, which will wrap up its work at the time of DC-2011. The motivation for a Dublin Core extension for metadata provenance is twofold: Firstly, we want to represent existing metadata provenance information in a simple and unified way that is well suited as an application of Dublin Core. Secondly, we want to enable the provision of provenance information for Dublin Core metadata in a Dublin Core compatible way
    • …