Search CORE

1,701 research outputs found

An efficient parallel method for mining frequent closed sequential patterns

Author: Huynh Bao
Snášel Václav
Vo Bay
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Mining frequent closed sequential pattern (FCSPs) has attracted a great deal of research attention, because it is an important task in sequences mining. In recently, many studies have focused on mining frequent closed sequential patterns because, such patterns have proved to be more efficient and compact than frequent sequential patterns. Information can be fully extracted from frequent closed sequential patterns. In this paper, we propose an efficient parallel approach called parallel dynamic bit vector frequent closed sequential patterns (pDBV-FCSP) using multi-core processor architecture for mining FCSPs from large databases. The pDBV-FCSP divides the search space to reduce the required storage space and performs closure checking of prefix sequences early to reduce execution time for mining frequent closed sequential patterns. This approach overcomes the problems of parallel mining such as overhead of communication, synchronization, and data replication. It also solves the load balance issues of the workload between the processors with a dynamic mechanism that re-distributes the work, when some processes are out of work to minimize the idle CPU time.Web of Science5174021739

DSpace at VSB Technical University of Ostrava

On mining complex sequential data by means of FCA and pattern structures

Author: Buzmakov Aleksey
Egho Elias
Jay Nicolas
Kuznetsov Sergei O.
Napoli Amedeo
Raïssi Chedy
Publication venue
Publication date: 09/04/2015
Field of study

Nowadays data sets are available in very complex and heterogeneous ways. Mining of such data collections is essential to support many real-world applications ranging from healthcare to marketing. In this work, we focus on the analysis of "complex" sequential data by means of interesting sequential patterns. We approach the problem using the elegant mathematical framework of Formal Concept Analysis (FCA) and its extension based on "pattern structures". Pattern structures are used for mining complex data (such as sequences or graphs) and are based on a subsumption operation, which in our case is defined with respect to the partial order on sequences. We show how pattern structures along with projections (i.e., a data reduction of sequential structures), are able to enumerate more meaningful patterns and increase the computing efficiency of the approach. Finally, we show the applicability of the presented method for discovering and analyzing interesting patient patterns from a French healthcare data set on cancer. The quantitative and qualitative results (with annotations and analysis from a physician) are reported in this use case which is the main motivation for this work. Keywords: data mining; formal concept analysis; pattern structures; projections; sequences; sequential data.Comment: An accepted publication in International Journal of General Systems. The paper is created in the wake of the conference on Concept Lattice and their Applications (CLA'2013). 27 pages, 9 figures, 3 table

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Problem-Solving Knowledge Mining from Users’\ud Actions in an Intelligent Tutoring System

Author: Couturier Olivier
Fournier-Viger Philippe
Mephu Engelbert
Nkambou Roger
Publication venue: Springer-Verlag
Publication date: 01/05/2007
Field of study

In an intelligent tutoring system (ITS), the domain expert should provide\ud relevant domain knowledge to the tutor so that it will be able to guide the\ud learner during problem solving. However, in several domains, this knowledge is\ud not predetermined and should be captured or learned from expert users as well as\ud intermediate and novice users. Our hypothesis is that, knowledge discovery (KD)\ud techniques can help to build this domain intelligence in ITS. This paper proposes\ud a framework to capture problem-solving knowledge using a promising approach\ud of data and knowledge discovery based on a combination of sequential pattern\ud mining and association rules discovery techniques. The framework has been implemented\ud and is used to discover new meta knowledge and rules in a given domain\ud which then extend domain knowledge and serve as problem space allowing\ud the intelligent tutoring system to guide learners in problem-solving situations.\ud Preliminary experiments have been conducted using the framework as an alternative\ud to a path-planning problem solver in CanadarmTutor

Archipel - Université du Québec à Montréal

Incremental Mining of Frequent Serial Episodes Considering Multiple Occurrences

Author: Bifet Albert
Guyet Thomas
Zhang Wenbin
Publication venue
Publication date: 01/01/2022
Field of study

The need to analyze information from streams arises in a variety of applications. One of its fundamental research directions is to mine sequential patterns over data streams. Current studies mine series of items based on the presence of the pattern in transactions but pay no attention to the series of itemsets and their multiple occurrences. The pattern over a window of itemsets stream and their multiple occurrences, however, provides additional capability to recognize the essential characteristics of the patterns and the inter-relationships among them that are unidentifiable by the existing presence-based studies. In this paper, we study such a new sequential pattern mining problem and propose a corresponding sequential miner with novel strategies to prune the search space efficiently. Experiments on both real and synthetic data show the utility of our approach

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Research Commons@Waikato

HAL

Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks

Author: Blasa Francesco
Cafiero Simone
Di Fatta Giuseppe
Fortino Giancarlo
Publication venue: 'Elsevier BV'
Publication date: 01/03/2013
Field of study

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale

Central Archive at the University of Reading

Empirical analysis of customer behaviors in Chinese e-commerce

Author: Gao Ke
Li Gang
Wang Jinlong
Publication venue: 'Academy Publisher'
Publication date: 01/10/2010
Field of study

With the burgeoning e-Business websites, E-Commerce in China has been developing rapidly in recent years. From the analysis of Chinese E-Commerce market, it is possible to discover customer purchasing patterns or behavior characteristics, which are indispensable knowledge for the expansion of Chinese E-Commerce market. This paper presents an empirical analysis on the sale transactions from the 360buy website based on the analysis of time interval distributions in perspectives of customers. Results reveal that in most situations the time intervals approximately obey the power-law distribution over two orders of magnitudes. Additionally, time interval on customer’s successive purchase can reflect how loyal a customer is to a specific product category. Moreover, we also find an interesting phenomenon about human behaviors that could be related to psychology of customers. In general, customers’ requirements in different product categories are similar. The investigation into individual behaviors may help researchers understand how customers’ group behaviors generated

Deakin Research Online

Motifs Séquentiels Discriminants pour les puces ADN

Author: Bringay Sandra
Salle Paola
Teisseire Maguelonne
Publication venue: HAL CCSD
Publication date: 26/05/2009
Field of study

National audienceDécouvrir de nouvelles informations sur les groupes de gènes impliqués dans une maladie est un véritable challenge. Les puces ADN sont des outils puissants pour l'analyse des expressions de gènes. Elles mesurent l'expression de milliers de gènes dans différentes conditions biologiques. Dans cet article, nous proposons une nouvelle approche mettant en évidence des relations d'ordre entre les expressions de gènes. Tout d'abord, nous extrayons des motifs séquentiels qui peuvent être utilisés comme matériel d'étude par les biologistes. Or, comme la densité des bases issues des puces à ADN rend difficile l'extraction de ces motifs, nous introduisons une source de connaissances pendant le processus de fouille. De cette manière, l'espace de recherche est réduit et les résultats obtenus sont plus pertinents d'un point de vue biologique. Les expérimentations sur des données réelles soulignent la pertinence de notre proposition

HAL-CIRAD