1,701 research outputs found

    An efficient parallel method for mining frequent closed sequential patterns

    Get PDF
    Mining frequent closed sequential pattern (FCSPs) has attracted a great deal of research attention, because it is an important task in sequences mining. In recently, many studies have focused on mining frequent closed sequential patterns because, such patterns have proved to be more efficient and compact than frequent sequential patterns. Information can be fully extracted from frequent closed sequential patterns. In this paper, we propose an efficient parallel approach called parallel dynamic bit vector frequent closed sequential patterns (pDBV-FCSP) using multi-core processor architecture for mining FCSPs from large databases. The pDBV-FCSP divides the search space to reduce the required storage space and performs closure checking of prefix sequences early to reduce execution time for mining frequent closed sequential patterns. This approach overcomes the problems of parallel mining such as overhead of communication, synchronization, and data replication. It also solves the load balance issues of the workload between the processors with a dynamic mechanism that re-distributes the work, when some processes are out of work to minimize the idle CPU time.Web of Science5174021739

    On mining complex sequential data by means of FCA and pattern structures

    Get PDF
    Nowadays data sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of "complex" sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of Formal Concept Analysis (FCA) and its extension based on "pattern structures". Pattern structures are used for mining complex data (such as sequences or graphs) and are based on a subsumption operation, which in our case is defined with respect to the partial order on sequences. We show how pattern structures along with projections (i.e., a data reduction of sequential structures), are able to enumerate more meaningful patterns and increase the computing efficiency of the approach. Finally, we show the applicability of the presented method for discovering and analyzing interesting patient patterns from a French healthcare data set on cancer. The quantitative and qualitative results (with annotations and analysis from a physician) are reported in this use case which is the main motivation for this work. Keywords: data mining; formal concept analysis; pattern structures; projections; sequences; sequential data.Comment: An accepted publication in International Journal of General Systems. The paper is created in the wake of the conference on Concept Lattice and their Applications (CLA'2013). 27 pages, 9 figures, 3 table

    Problem-Solving Knowledge Mining from Users’\ud Actions in an Intelligent Tutoring System

    Get PDF
    In an intelligent tutoring system (ITS), the domain expert should provide\ud relevant domain knowledge to the tutor so that it will be able to guide the\ud learner during problem solving. However, in several domains, this knowledge is\ud not predetermined and should be captured or learned from expert users as well as\ud intermediate and novice users. Our hypothesis is that, knowledge discovery (KD)\ud techniques can help to build this domain intelligence in ITS. This paper proposes\ud a framework to capture problem-solving knowledge using a promising approach\ud of data and knowledge discovery based on a combination of sequential pattern\ud mining and association rules discovery techniques. The framework has been implemented\ud and is used to discover new meta knowledge and rules in a given domain\ud which then extend domain knowledge and serve as problem space allowing\ud the intelligent tutoring system to guide learners in problem-solving situations.\ud Preliminary experiments have been conducted using the framework as an alternative\ud to a path-planning problem solver in CanadarmTutor

    Incremental Mining of Frequent Serial Episodes Considering Multiple Occurrences

    Get PDF
    The need to analyze information from streams arises in a variety of applications. One of its fundamental research directions is to mine sequential patterns over data streams. Current studies mine series of items based on the presence of the pattern in transactions but pay no attention to the series of itemsets and their multiple occurrences. The pattern over a window of itemsets stream and their multiple occurrences, however, provides additional capability to recognize the essential characteristics of the patterns and the inter-relationships among them that are unidentifiable by the existing presence-based studies. In this paper, we study such a new sequential pattern mining problem and propose a corresponding sequential miner with novel strategies to prune the search space efficiently. Experiments on both real and synthetic data show the utility of our approach

    Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks

    Get PDF
    The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale

    Empirical analysis of customer behaviors in Chinese e-commerce

    Full text link
    With the burgeoning e-Business websites, E-Commerce in China has been developing rapidly in recent years. From the analysis of Chinese E-Commerce market, it is possible to discover customer purchasing patterns or behavior characteristics, which are indispensable knowledge for the expansion of Chinese E-Commerce market. This paper presents an empirical analysis on the sale transactions from the 360buy website based on the analysis of time interval distributions in perspectives of customers. Results reveal that in most situations the time intervals approximately obey the power-law distribution over two orders of magnitudes. Additionally, time interval on customer’s successive purchase can reflect how loyal a customer is to a specific product category. Moreover, we also find an interesting phenomenon about human behaviors that could be related to psychology of customers. In general, customers’ requirements in different product categories are similar. The investigation into individual behaviors may help researchers understand how customers’ group behaviors generated

    Motifs Séquentiels Discriminants pour les puces ADN

    Get PDF
    National audienceDécouvrir de nouvelles informations sur les groupes de gènes impliqués dans une maladie est un véritable challenge. Les puces ADN sont des outils puissants pour l'analyse des expressions de gènes. Elles mesurent l'expression de milliers de gènes dans différentes conditions biologiques. Dans cet article, nous proposons une nouvelle approche mettant en évidence des relations d'ordre entre les expressions de gènes. Tout d'abord, nous extrayons des motifs séquentiels qui peuvent être utilisés comme matériel d'étude par les biologistes. Or, comme la densité des bases issues des puces à ADN rend difficile l'extraction de ces motifs, nous introduisons une source de connaissances pendant le processus de fouille. De cette manière, l'espace de recherche est réduit et les résultats obtenus sont plus pertinents d'un point de vue biologique. Les expérimentations sur des données réelles soulignent la pertinence de notre proposition
    corecore