10,679 research outputs found

    An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming

    Full text link
    The main advantage of Constraint Programming (CP) approaches for sequential pattern mining (SPM) is their modularity, which includes the ability to add new constraints (regular expressions, length restrictions, etc). The current best CP approach for SPM uses a global constraint (module) that computes the projected database and enforces the minimum frequency; it does this with a filtering algorithm similar to the PrefixSpan method. However, the resulting system is not as scalable as some of the most advanced mining systems like Zaki's cSPADE. We show how, using techniques from both data mining and CP, one can use a generic constraint solver and yet outperform existing specialized systems. This is mainly due to two improvements in the module that computes the projected frequencies: first, computing the projected database can be sped up by pre-computing the positions at which an symbol can become unsupported by a sequence, thereby avoiding to scan the full sequence each time; and second by taking inspiration from the trailing used in CP solvers to devise a backtracking-aware data structure that allows fast incremental storing and restoring of the projected database. Detailed experiments show how this approach outperforms existing CP as well as specialized systems for SPM, and that the gain in efficiency translates directly into increased efficiency for other settings such as mining with regular expressions.Comment: frequent sequence mining, constraint programmin

    A Constraint Programming Approach for Mining Sequential Patterns in a Sequence Database

    Full text link
    Constraint-based pattern discovery is at the core of numerous data mining tasks. Patterns are extracted with respect to a given set of constraints (frequency, closedness, size, etc). In the context of sequential pattern mining, a large number of devoted techniques have been developed for solving particular classes of constraints. The aim of this paper is to investigate the use of Constraint Programming (CP) to model and mine sequential patterns in a sequence database. Our CP approach offers a natural way to simultaneously combine in a same framework a large set of constraints coming from various origins. Experiments show the feasibility and the interest of our approach

    Evaluation and optimization of frequent association rule based classification

    Get PDF
    Deriving useful and interesting rules from a data mining system is an essential and important task. Problems such as the discovery of random and coincidental patterns or patterns with no significant values, and the generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness of rules generated by data mining algorithms are actively and constantly being examined and developed. In this paper, a systematic way to evaluate the association rules discovered from frequent itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriated sequence of usage is presented. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided. Empirical results show that with a proper combination of data mining and statistical analysis, the framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy and coverage rules when used in the classification problem. Moreover, the results reveal the important characteristics of mining frequent itemsets, and the impact of confidence measure for the classification task

    Is Simple Better? Revisiting Non-linear Matrix Factorization for Learning Incomplete Ratings

    Full text link
    Matrix factorization techniques have been widely used as a method for collaborative filtering for recommender systems. In recent times, different variants of deep learning algorithms have been explored in this setting to improve the task of making a personalized recommendation with user-item interaction data. The idea that the mapping between the latent user or item factors and the original features is highly nonlinear suggest that classical matrix factorization techniques are no longer sufficient. In this paper, we propose a multilayer nonlinear semi-nonnegative matrix factorization method, with the motivation that user-item interactions can be modeled more accurately using a linear combination of non-linear item features. Firstly, we learn latent factors for representations of users and items from the designed multilayer nonlinear Semi-NMF approach using explicit ratings. Secondly, the architecture built is compared with deep-learning algorithms like Restricted Boltzmann Machine and state-of-the-art Deep Matrix factorization techniques. By using both supervised rate prediction task and unsupervised clustering in latent item space, we demonstrate that our proposed approach achieves better generalization ability in prediction as well as comparable representation ability as deep matrix factorization in the clustering task.Comment: version

    Data mining based cyber-attack detection

    Get PDF
    • …
    corecore