10,679 research outputs found
An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming
The main advantage of Constraint Programming (CP) approaches for sequential
pattern mining (SPM) is their modularity, which includes the ability to add new
constraints (regular expressions, length restrictions, etc). The current best
CP approach for SPM uses a global constraint (module) that computes the
projected database and enforces the minimum frequency; it does this with a
filtering algorithm similar to the PrefixSpan method. However, the resulting
system is not as scalable as some of the most advanced mining systems like
Zaki's cSPADE. We show how, using techniques from both data mining and CP, one
can use a generic constraint solver and yet outperform existing specialized
systems. This is mainly due to two improvements in the module that computes the
projected frequencies: first, computing the projected database can be sped up
by pre-computing the positions at which an symbol can become unsupported by a
sequence, thereby avoiding to scan the full sequence each time; and second by
taking inspiration from the trailing used in CP solvers to devise a
backtracking-aware data structure that allows fast incremental storing and
restoring of the projected database. Detailed experiments show how this
approach outperforms existing CP as well as specialized systems for SPM, and
that the gain in efficiency translates directly into increased efficiency for
other settings such as mining with regular expressions.Comment: frequent sequence mining, constraint programmin
A Constraint Programming Approach for Mining Sequential Patterns in a Sequence Database
Constraint-based pattern discovery is at the core of numerous data mining
tasks. Patterns are extracted with respect to a given set of constraints
(frequency, closedness, size, etc). In the context of sequential pattern
mining, a large number of devoted techniques have been developed for solving
particular classes of constraints. The aim of this paper is to investigate the
use of Constraint Programming (CP) to model and mine sequential patterns in a
sequence database. Our CP approach offers a natural way to simultaneously
combine in a same framework a large set of constraints coming from various
origins. Experiments show the feasibility and the interest of our approach
Evaluation and optimization of frequent association rule based classification
Deriving useful and interesting rules from a data mining system is an essential and important task. Problems
such as the discovery of random and coincidental patterns or patterns with no significant values, and the
generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness
of rules generated by data mining algorithms are actively and constantly being examined and developed. In this
paper, a systematic way to evaluate the association rules discovered from frequent itemset mining algorithms,
combining common data mining and statistical interestingness measures, and outline an appropriated sequence of usage is presented. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided. Empirical results show that with a proper combination of data mining and statistical analysis, the framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy and coverage rules when used in the classification problem. Moreover, the results reveal the important characteristics of mining frequent itemsets, and the impact of confidence measure for the classification task
Is Simple Better? Revisiting Non-linear Matrix Factorization for Learning Incomplete Ratings
Matrix factorization techniques have been widely used as a method for
collaborative filtering for recommender systems. In recent times, different
variants of deep learning algorithms have been explored in this setting to
improve the task of making a personalized recommendation with user-item
interaction data. The idea that the mapping between the latent user or item
factors and the original features is highly nonlinear suggest that classical
matrix factorization techniques are no longer sufficient. In this paper, we
propose a multilayer nonlinear semi-nonnegative matrix factorization method,
with the motivation that user-item interactions can be modeled more accurately
using a linear combination of non-linear item features. Firstly, we learn
latent factors for representations of users and items from the designed
multilayer nonlinear Semi-NMF approach using explicit ratings. Secondly, the
architecture built is compared with deep-learning algorithms like Restricted
Boltzmann Machine and state-of-the-art Deep Matrix factorization techniques. By
using both supervised rate prediction task and unsupervised clustering in
latent item space, we demonstrate that our proposed approach achieves better
generalization ability in prediction as well as comparable representation
ability as deep matrix factorization in the clustering task.Comment: version
- …