25,101 research outputs found
Learning what matters - Sampling interesting patterns
In the field of exploratory data mining, local structure in data can be
described by patterns and discovered by mining algorithms. Although many
solutions have been proposed to address the redundancy problems in pattern
mining, most of them either provide succinct pattern sets or take the interests
of the user into account-but not both. Consequently, the analyst has to invest
substantial effort in identifying those patterns that are relevant to her
specific interests and goals. To address this problem, we propose a novel
approach that combines pattern sampling with interactive data mining. In
particular, we introduce the LetSIP algorithm, which builds upon recent
advances in 1) weighted sampling in SAT and 2) learning to rank in interactive
pattern mining. Specifically, it exploits user feedback to directly learn the
parameters of the sampling distribution that represents the user's interests.
We compare the performance of the proposed algorithm to the state-of-the-art in
interactive pattern mining by emulating the interests of a user. The resulting
system allows efficient and interleaved learning and sampling, thus
user-specific anytime data exploration. Finally, LetSIP demonstrates favourable
trade-offs concerning both quality-diversity and exploitation-exploration when
compared to existing methods.Comment: PAKDD 2017, extended versio
A pattern mining approach for information filtering systems
It is a big challenge to clearly identify the boundary between positive and negative streams for information filtering systems. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on the RCV1 data collection, and substantial experiments show that the proposed approach achieves encouraging performance and the performance is also consistent for adaptive filtering as well
FS^3: A Sampling based method for top-k Frequent Subgraph Mining
Mining labeled subgraph is a popular research task in data mining because of
its potential application in many different scientific domains. All the
existing methods for this task explicitly or implicitly solve the subgraph
isomorphism task which is computationally expensive, so they suffer from the
lack of scalability problem when the graphs in the input database are large. In
this work, we propose FS^3, which is a sampling based method. It mines a small
collection of subgraphs that are most frequent in the probabilistic sense. FS^3
performs a Markov Chain Monte Carlo (MCMC) sampling over the space of a
fixed-size subgraphs such that the potentially frequent subgraphs are sampled
more often. Besides, FS^3 is equipped with an innovative queue manager. It
stores the sampled subgraph in a finite queue over the course of mining in such
a manner that the top-k positions in the queue contain the most frequent
subgraphs. Our experiments on database of large graphs show that FS^3 is
efficient, and it obtains subgraphs that are the most frequent amongst the
subgraphs of a given size
- …