53,285 research outputs found

    A Survey of Sequential Pattern Based E-Commerce Recommendation Systems

    Get PDF
    E-commerce recommendation systems usually deal with massive customer sequential databases, such as historical purchase or click stream sequences. Recommendation systems’ accuracy can be improved if complex sequential patterns of user purchase behavior are learned by integrating sequential patterns of customer clicks and/or purchases into the user–item rating matrix input of collaborative filtering. This review focuses on algorithms of existing E-commerce recommendation systems that are sequential pattern-based. It provides a comprehensive and comparative performance analysis of these systems, exposing their methodologies, achievements, limitations, and potential for solving more important problems in this domain. The review shows that integrating sequential pattern mining of historical purchase and/or click sequences into a user–item matrix for collaborative filtering can (i) improve recommendation accuracy, (ii) reduce user–item rating data sparsity, (iii) increase the novelty rate of recommendations, and (iv) improve the scalability of recommendation systems

    Negative sequential pattern mining

    Full text link
    University of Technology, Sydney. Faculty of Engineering and Information Technology.Sequential pattern mining provides an important way to obtain special patterns from sequence data. It produces important insights on bioinformatics data, web-logs, customer transaction data, and so on. Different from traditional positive sequential pattern (PSP) mining, negative sequential pattern (NSP) mining takes negative itemsets into account besides positive ones. It would be more interesting in applications where non-occurring itemsets need to be considered. This thesis reports our previous and the latest research outcomes in this area. The contributions of the thesis are as following. • A comprehensive literature review of negative frequent pattern mining is described. • A general framework of the NSP mining is proposed. It can be used to describe the big picture of both PSP and NSP mining problems. • Three innovative algorithms are proposed to mine NSP efficiently. • Extensive experiments about the three algorithms on either synthetic or real-world datasets show that the proposed methods can find NSP efficiently. • A case study describes a real-life application on customer claims analysis in health insurance industry. Three algorithms of NSP mining are proposed in this thesis, listed as below: (1) The first algorithm Neg-GSP (Zheng, Zhao, Zuo & Cao 2009) is based on a PSP mining algorithm GSP (Srikant & Agrawal 1996). Neg-GSP deals with negative problem by introducing new methods of joining and generating candidates, which borrow ideas from GSP algorithm. And also, an effective pruning method to reduce the number of candidates is proposed as well. (2) The second one is a Genetic Algorithm based algorithm (Zheng, Zhao, Zuo & Cao 2010), which is called GA-NSP. It is proposed to find NSP with novel crossover and mutation operations, which are efficient at passing good genes on to next generations. An effective dynamic fitness function and a pruning method are also provided to improve performance. (3) The third algorithm e-NSP (Dong, Zheng, Cao, Zhao, Zhang, Li, Wei & Ou 2011) is based on the Set Theory. It mines NSP by only involving the identified PSP, without re-scanning the database. In this way, mining NSP does not require any additional database scans. It facilitates the existing PSP mining algorithms to mine NSP. It offers a new strategy for efficient mining of NSP. The results of extensive experiments about the three algorithms show that they can find NSP efficiently. They have good performance compared with some other existing NSP mining algorithms, such as PNSP (Hsueh, Lin & Chen 2008). If we compare the problem statements of the above three methods, Neg-GSP and GA-NSP share the same definitions, e-NSP uses stronger constraints since it requires clear boundary to follow the Set Theory. When comparing their performances, GA-NSP algorithm slightly outperforms Neg-GSP in terms of execution time, but it may miss some patterns in the complete result sets due to limitations of Genetic Algorithm. Apparently, e-NSP is the most efficient and effective one since it does not need to scan datasets to calculate the support of NSP. Although adding stronger constraints on e-NSP makes the search space much smaller than what it is under the normal definitions, it is still very practicable while being used in some real-life applications. Following that, NSP mining case studies coming from health insurance industry are introduced. Based on real-life customer claims datasets, we use the proposed NSP mining methods to find PSP and NSP on solving two business issues, one is in ancillary service over-service analysis, another is fraud claim detection. Both of the two case studies demonstrate the benefits gained from mining NSP

    A Constraint Programming Approach for Mining Sequential Patterns in a Sequence Database

    Full text link
    Constraint-based pattern discovery is at the core of numerous data mining tasks. Patterns are extracted with respect to a given set of constraints (frequency, closedness, size, etc). In the context of sequential pattern mining, a large number of devoted techniques have been developed for solving particular classes of constraints. The aim of this paper is to investigate the use of Constraint Programming (CP) to model and mine sequential patterns in a sequence database. Our CP approach offers a natural way to simultaneously combine in a same framework a large set of constraints coming from various origins. Experiments show the feasibility and the interest of our approach

    An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming

    Full text link
    The main advantage of Constraint Programming (CP) approaches for sequential pattern mining (SPM) is their modularity, which includes the ability to add new constraints (regular expressions, length restrictions, etc). The current best CP approach for SPM uses a global constraint (module) that computes the projected database and enforces the minimum frequency; it does this with a filtering algorithm similar to the PrefixSpan method. However, the resulting system is not as scalable as some of the most advanced mining systems like Zaki's cSPADE. We show how, using techniques from both data mining and CP, one can use a generic constraint solver and yet outperform existing specialized systems. This is mainly due to two improvements in the module that computes the projected frequencies: first, computing the projected database can be sped up by pre-computing the positions at which an symbol can become unsupported by a sequence, thereby avoiding to scan the full sequence each time; and second by taking inspiration from the trailing used in CP solvers to devise a backtracking-aware data structure that allows fast incremental storing and restoring of the projected database. Detailed experiments show how this approach outperforms existing CP as well as specialized systems for SPM, and that the gain in efficiency translates directly into increased efficiency for other settings such as mining with regular expressions.Comment: frequent sequence mining, constraint programmin

    Privacy Preserving Utility Mining: A Survey

    Full text link
    In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
    • …
    corecore