9,318 research outputs found

    Generating Preview Tables for Entity Graphs

    Full text link
    Users are tapping into massive, heterogeneous entity graphs for many applications. It is challenging to select entity graphs for a particular need, given abundant datasets from many sources and the oftentimes scarce information for them. We propose methods to produce preview tables for compact presentation of important entity types and relationships in entity graphs. The preview tables assist users in attaining a quick and rough preview of the data. They can be shown in a limited display space for a user to browse and explore, before she decides to spend time and resources to fetch and investigate the complete dataset. We formulate several optimization problems that look for previews with the highest scores according to intuitive goodness measures, under various constraints on preview size and distance between preview tables. The optimization problem under distance constraint is NP-hard. We design a dynamic-programming algorithm and an Apriori-style algorithm for finding optimal previews. Results from experiments, comparison with related work and user studies demonstrated the scoring measures' accuracy and the discovery algorithms' efficiency.Comment: This is the camera-ready version of a SIGMOD16 paper. There might be tiny differences in layout, spacing and linebreaking, compared with the version in the SIGMOD16 proceedings, since we must submit TeX files and use arXiv to compile the file

    Recent advances in the theory and practice of logical analysis of data

    Get PDF
    Logical Analysis of Data (LAD) is a data analysis methodology introduced by Peter L. Hammer in 1986. LAD distinguishes itself from other classification and machine learning methods by the fact that it analyzes a significant subset of combinations of variables to describe the positive or negative nature of an observation and uses combinatorial techniques to extract models defined in terms of patterns. In recent years, the methodology has tremendously advanced through numerous theoretical developments and practical applications. In the present paper, we review the methodology and its recent advances, describe novel applications in engineering, finance, health care, and algorithmic techniques for some stochastic optimization problems, and provide a comparative description of LAD with well-known classification methods

    Feature-tree labeling for case base maintenance

    Get PDF
    Case Base Maintenance (CBM) algorithms update the content of the case base with the aim of improving the case-based reasoner performance. In this paper, we introduce a novel CBM method called Feature-Tree Labeling (FTL) with the focus on increasing the general accuracy of a Case-Based Reasoning (CBR) system. The proposed FTL algorithm is designed to detect and remove noisy cases from the case base, based on value distribution of individual features in the available data. The competence of the FTL method has been compared with well-known state-ofthe-art CBM algorithms. The tests have been done on 25 datasets selected from the UCI repository. The results show that FTL obtains higher accuracy than some of the state-of-the-art methods and CBR, with a statistically significant degreePeer ReviewedPostprint (author's final draft
    corecore