25,029 research outputs found

    Efficient Mining of Sequential Patterns in a Sequence Database with Weight Constraint

    Get PDF
    Sequence pattern mining is one of the essential data mining tasks with broad applications. Many sequence mining algorithms have been developed to find a set of frequent sub-sequences satisfying the support threshold in a sequence database. The main problem in most of these algorithms is they generate huge number of sequential patterns when the support threshold is low and all the sequence patterns are treated uniformly while real sequential patterns have different importance. In this paper, we propose an algorithm which aims to find more interesting sequential patterns, considering the different significance of each data element in a sequence database. Unlike the conventional weighted sequential pattern mining, where the weights of items are preassigned according to the priority or importance, in our approach the weights are set according to the real data and during the mining process not only the supports but also weights of patterns are considered. The experimental results show that the algorithm is efficient and effective in generating more interesting patterns

    Privacy Preserving Utility Mining: A Survey

    Full text link
    In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page

    Effective pattern discovery for text mining

    Get PDF
    Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase) based approaches should perform better than the term-based ones, but many experiments did not support this hypothesis. This paper presents an innovative technique, effective pattern discovery which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance
    • …
    corecore