9,323 research outputs found
Generating Preview Tables for Entity Graphs
Users are tapping into massive, heterogeneous entity graphs for many
applications. It is challenging to select entity graphs for a particular need,
given abundant datasets from many sources and the oftentimes scarce information
for them. We propose methods to produce preview tables for compact presentation
of important entity types and relationships in entity graphs. The preview
tables assist users in attaining a quick and rough preview of the data. They
can be shown in a limited display space for a user to browse and explore,
before she decides to spend time and resources to fetch and investigate the
complete dataset. We formulate several optimization problems that look for
previews with the highest scores according to intuitive goodness measures,
under various constraints on preview size and distance between preview tables.
The optimization problem under distance constraint is NP-hard. We design a
dynamic-programming algorithm and an Apriori-style algorithm for finding
optimal previews. Results from experiments, comparison with related work and
user studies demonstrated the scoring measures' accuracy and the discovery
algorithms' efficiency.Comment: This is the camera-ready version of a SIGMOD16 paper. There might be
tiny differences in layout, spacing and linebreaking, compared with the
version in the SIGMOD16 proceedings, since we must submit TeX files and use
arXiv to compile the file
Recent advances in the theory and practice of logical analysis of data
Logical Analysis of Data (LAD) is a data analysis methodology introduced by Peter L. Hammer in 1986. LAD distinguishes itself from other classification and machine learning methods by the fact that it analyzes a significant subset of combinations of variables to describe the positive or negative nature of an observation and uses combinatorial techniques to extract models defined in terms of patterns. In recent years, the methodology has tremendously advanced through numerous theoretical developments and practical applications. In the present paper, we review the methodology and its recent advances, describe novel applications in engineering, finance, health care, and algorithmic techniques for some stochastic optimization problems, and provide a comparative description of LAD with well-known classification methods
Feature-tree labeling for case base maintenance
Case Base Maintenance (CBM) algorithms update the content of the case base with the aim of improving the case-based reasoner performance. In this paper, we introduce a novel CBM method called Feature-Tree Labeling (FTL) with the focus on increasing the general accuracy of a Case-Based Reasoning (CBR) system.
The proposed FTL algorithm is designed to detect and remove noisy cases from the case base, based on value distribution of individual features in the available data.
The competence of the FTL method has been compared with well-known state-ofthe-art CBM algorithms. The tests have been done on 25 datasets selected from the UCI repository. The results show that FTL obtains higher accuracy than some of the state-of-the-art methods and CBR, with a statistically significant degreePeer ReviewedPostprint (author's final draft
- …