58,888 research outputs found
A Fast Minimal Infrequent Itemset Mining Algorithm
A novel fast algorithm for finding quasi identifiers in large datasets is
presented. Performance measurements on a broad range of datasets demonstrate
substantial reductions in run-time relative to the state of the art and the
scalability of the algorithm to realistically-sized datasets up to several
million records
GTRACE-RS: Efficient Graph Sequence Mining using Reverse Search
The mining of frequent subgraphs from labeled graph data has been studied
extensively. Furthermore, much attention has recently been paid to frequent
pattern mining from graph sequences. A method, called GTRACE, has been proposed
to mine frequent patterns from graph sequences under the assumption that
changes in graphs are gradual. Although GTRACE mines the frequent patterns
efficiently, it still needs substantial computation time to mine the patterns
from graph sequences containing large graphs and long sequences. In this paper,
we propose a new version of GTRACE that enables efficient mining of frequent
patterns based on the principle of a reverse search. The underlying concept of
the reverse search is a general scheme for designing efficient algorithms for
hard enumeration problems. Our performance study shows that the proposed method
is efficient and scalable for mining both long and large graph sequence
patterns and is several orders of magnitude faster than the original GTRACE
Revisiting Numerical Pattern Mining with Formal Concept Analysis
In this paper, we investigate the problem of mining numerical data in the
framework of Formal Concept Analysis. The usual way is to use a scaling
procedure --transforming numerical attributes into binary ones-- leading either
to a loss of information or of efficiency, in particular w.r.t. the volume of
extracted patterns. By contrast, we propose to directly work on numerical data
in a more precise and efficient way, and we prove it. For that, the notions of
closed patterns, generators and equivalent classes are revisited in the
numerical context. Moreover, two original algorithms are proposed and used in
an evaluation involving real-world data, showing the predominance of the
present approach
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
- …