45,929 research outputs found

    Knowledge Base Population using Semantic Label Propagation

    Get PDF
    A crucial aspect of a knowledge base population system that extracts new facts from text corpora, is the generation of training data for its relation extractors. In this paper, we present a method that maximizes the effectiveness of newly trained relation extractors at a minimal annotation cost. Manual labeling can be significantly reduced by Distant Supervision, which is a method to construct training data automatically by aligning a large text corpus with an existing knowledge base of known facts. For example, all sentences mentioning both 'Barack Obama' and 'US' may serve as positive training instances for the relation born_in(subject,object). However, distant supervision typically results in a highly noisy training set: many training sentences do not really express the intended relation. We propose to combine distant supervision with minimal manual supervision in a technique called feature labeling, to eliminate noise from the large and noisy initial training set, resulting in a significant increase of precision. We further improve on this approach by introducing the Semantic Label Propagation method, which uses the similarity between low-dimensional representations of candidate training instances, to extend the training set in order to increase recall while maintaining high precision. Our proposed strategy for generating training data is studied and evaluated on an established test collection designed for knowledge base population tasks. The experimental results show that the Semantic Label Propagation strategy leads to substantial performance gains when compared to existing approaches, while requiring an almost negligible manual annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge Bases for Natural Language Processin

    Connecting the dots: a multi-pivot approach to data exploration

    No full text
    The purpose of data browsers is to help users identify and query data effectively without being overwhelmed by large complex graphs of data. A proposed solution to identify and query data in graph-based datasets is Pivoting (or set-oriented browsing), a many-to-many graph browsing technique that allows users to navigate the graph by starting from a set of instances followed by navigation through common links. Relying solely on navigation, however, makes it difficult for users to find paths or even see if the element of interest is in the graph when the points of interest may be many vertices apart. Further challenges include finding paths which require combinations of forward and backward links in order to make the necessary connections which further adds to the complexity of pivoting. In order to mitigate the effects of these problems and enhance the strengths of pivoting we present a multi-pivot approach which we embodied in tool called Visor. Visor allows users to explore from multiple points in the graph, helping users connect key points of interest in the graph on the conceptual level, visually occluding the remainder parts of the graph, thus helping create a road-map for navigation. We carried out an user study to demonstrate the viability of our approach

    Semantic data mining and linked data for a recommender system in the AEC industry

    Get PDF
    Even though it can provide design teams with valuable performance insights and enhance decision-making, monitored building data is rarely reused in an effective feedback loop from operation to design. Data mining allows users to obtain such insights from the large datasets generated throughout the building life cycle. Furthermore, semantic web technologies allow to formally represent the built environment and retrieve knowledge in response to domain-specific requirements. Both approaches have independently established themselves as powerful aids in decision-making. Combining them can enrich data mining processes with domain knowledge and facilitate knowledge discovery, representation and reuse. In this article, we look into the available data mining techniques and investigate to what extent they can be fused with semantic web technologies to provide recommendations to the end user in performance-oriented design. We demonstrate an initial implementation of a linked data-based system for generation of recommendations
    • …
    corecore