4,589 research outputs found
Learning what matters - Sampling interesting patterns
In the field of exploratory data mining, local structure in data can be
described by patterns and discovered by mining algorithms. Although many
solutions have been proposed to address the redundancy problems in pattern
mining, most of them either provide succinct pattern sets or take the interests
of the user into account-but not both. Consequently, the analyst has to invest
substantial effort in identifying those patterns that are relevant to her
specific interests and goals. To address this problem, we propose a novel
approach that combines pattern sampling with interactive data mining. In
particular, we introduce the LetSIP algorithm, which builds upon recent
advances in 1) weighted sampling in SAT and 2) learning to rank in interactive
pattern mining. Specifically, it exploits user feedback to directly learn the
parameters of the sampling distribution that represents the user's interests.
We compare the performance of the proposed algorithm to the state-of-the-art in
interactive pattern mining by emulating the interests of a user. The resulting
system allows efficient and interleaved learning and sampling, thus
user-specific anytime data exploration. Finally, LetSIP demonstrates favourable
trade-offs concerning both quality-diversity and exploitation-exploration when
compared to existing methods.Comment: PAKDD 2017, extended versio
High-Performance Reachability Query Processing under Index Size Restrictions
In this paper, we propose a scalable and highly efficient index structure for
the reachability problem over graphs. We build on the well-known node interval
labeling scheme where the set of vertices reachable from a particular node is
compactly encoded as a collection of node identifier ranges. We impose an
explicit bound on the size of the index and flexibly assign approximate
reachability ranges to nodes of the graph such that the number of index probes
to answer a query is minimized. The resulting tunable index structure generates
a better range labeling if the space budget is increased, thus providing a
direct control over the trade off between index size and the query processing
performance. By using a fast recursive querying method in conjunction with our
index structure, we show that in practice, reachability queries can be answered
in the order of microseconds on an off-the-shelf computer - even for the case
of massive-scale real world graphs. Our claims are supported by an extensive
set of experimental results using a multitude of benchmark and real-world
web-scale graph datasets.Comment: 30 page
Interactive Constrained Association Rule Mining
We investigate ways to support interactive mining sessions, in the setting of
association rule mining. In such sessions, users specify conditions (queries)
on the associations to be generated. Our approach is a combination of the
integration of querying conditions inside the mining phase, and the incremental
querying of already generated associations. We present several concrete
algorithms and compare their performance.Comment: A preliminary report on this work was presented at the Second
International Conference on Knowledge Discovery and Data Mining (DaWaK 2000
Discovering Knowledge from Local Patterns with Global Constraints
It is well known that local patterns are at the core of a lot of
knowledge which may be discovered from data. Nevertheless, use of local
patterns is limited by
their huge number and computational costs. Several approaches (e.g.,
condensed representations, pattern set discovery) aim at grouping or
synthesizing local patterns to provide a global view of the data. A
global pattern is a pattern which is a set or a synthesis of local
patterns coming from the data. In this paper, we propose the idea of
global constraints to write queries addressing global patterns. A key
point is the ability to bias the designing of global patterns according
to the expectation of the user. For instance, a global pattern can be
oriented towards the search of exceptions or a clustering. It requires
to write queries taking into account such biases. Open issues are to
design a generic framework to express powerful global constraints and
solvers to mine them. We think that global constraints are a promising
way to discover relevant global patterns
Visual exploration and retrieval of XML document collections with the generic system X2
This article reports on the XML retrieval system X2 which has been developed at the University of Munich over the last five years. In a typical session with X2, the user
first browses a structural summary of the XML database in order to select interesting elements and keywords occurring in documents. Using this intermediate result, queries combining structure and textual references are composed semiautomatically.
After query evaluation, the full set of answers is presented in a visual and structured way. X2 largely exploits the structure found in documents, queries and answers to enable new interactive visualization and exploration techniques that support mixed IR and database-oriented querying, thus bridging the gap between these three views on the data to be retrieved. Another salient characteristic of X2 which distinguishes it from other visual query systems for XML is that it supports various degrees of detailedness in the presentation of answers, as well as techniques for dynamically reordering and grouping retrieved elements once the complete answer set has been computed
- …