375,576 research outputs found
GCG: Mining Maximal Complete Graph Patterns from Large Spatial Data
Recent research on pattern discovery has progressed from mining frequent
patterns and sequences to mining structured patterns, such as trees and graphs.
Graphs as general data structure can model complex relations among data with
wide applications in web exploration and social networks. However, the process
of mining large graph patterns is a challenge due to the existence of large
number of subgraphs. In this paper, we aim to mine only frequent complete graph
patterns. A graph g in a database is complete if every pair of distinct
vertices is connected by a unique edge. Grid Complete Graph (GCG) is a mining
algorithm developed to explore interesting pruning techniques to extract
maximal complete graphs from large spatial dataset existing in Sloan Digital
Sky Survey (SDSS) data. Using a divide and conquer strategy, GCG shows high
efficiency especially in the presence of large number of patterns. In this
paper, we describe GCG that can mine not only simple co-location spatial
patterns but also complex ones. To the best of our knowledge, this is the first
algorithm used to exploit the extraction of maximal complete graphs in the
process of mining complex co-location patterns in large spatial dataset.Comment: 1
Sequential Patterns Post-processing for Structural Relation Patterns Mining
Sequential patterns mining is an important data-mining technique used to identify frequently observed sequential
occurrence of items across ordered transactions over time. It has been extensively studied in the literature, and there
exists a diversity of algorithms. However, more complex structural patterns are often hidden behind sequences.
This article begins with the introduction of a model for the representation of sequential patterns—Sequential
Patterns Graph—which motivates the search for new structural relation patterns. An integrative framework for
the discovery of these patterns–Postsequential Patterns Mining–is then described which underpins the postprocessing
of sequential patterns. A corresponding data-mining method based on sequential patterns postprocessing
is proposed and shown to be effective in the search for concurrent patterns. From experiments conducted on three
component algorithms, it is demonstrated that sequential patterns-based concurrent patterns mining provides
an efficient method for structural knowledge discover
Complex Data: Mining using Patterns
There is a growing need to analyse sets of complex data, i.e., data in which the individual data items are (semi-) structured collections of data themselves, such as sets of time-series. To perform such analysis, one has to redefine familiar notions such as similarity on such complex data types. One can do that either on the data items directly, or indi- rectly, based on features or patterns computed from the individual data items. In this paper, we argue that wavelet decomposition is a general tool for the latter approac
Graph-based Modelling of Concurrent Sequential Patterns
Structural relation patterns have been introduced recently to extend the search for complex patterns often hidden behind large sequences of data. This has motivated a novel approach to sequential patterns post-processing and a corresponding data mining method was proposed for Concurrent Sequential Patterns (ConSP). This article refines the approach in the context of ConSP modelling, where a companion graph-based model is devised as an extension of previous work. Two new modelling methods are presented here together with a construction algorithm, to complete the transformation of concurrent sequential patterns to a ConSP-Graph representation. Customer orders data is used to demonstrate the effectiveness of ConSP mining while synthetic sample data highlights the strength of the modelling technique, illuminating the theories developed
On mining complex sequential data by means of FCA and pattern structures
Nowadays data sets are available in very complex and heterogeneous ways.
Mining of such data collections is essential to support many real-world
applications ranging from healthcare to marketing. In this work, we focus on
the analysis of "complex" sequential data by means of interesting sequential
patterns. We approach the problem using the elegant mathematical framework of
Formal Concept Analysis (FCA) and its extension based on "pattern structures".
Pattern structures are used for mining complex data (such as sequences or
graphs) and are based on a subsumption operation, which in our case is defined
with respect to the partial order on sequences. We show how pattern structures
along with projections (i.e., a data reduction of sequential structures), are
able to enumerate more meaningful patterns and increase the computing
efficiency of the approach. Finally, we show the applicability of the presented
method for discovering and analyzing interesting patient patterns from a French
healthcare data set on cancer. The quantitative and qualitative results (with
annotations and analysis from a physician) are reported in this use case which
is the main motivation for this work.
Keywords: data mining; formal concept analysis; pattern structures;
projections; sequences; sequential data.Comment: An accepted publication in International Journal of General Systems.
The paper is created in the wake of the conference on Concept Lattice and
their Applications (CLA'2013). 27 pages, 9 figures, 3 table
Mining Heterogeneous Multivariate Time-Series for Learning Meaningful Patterns: Application to Home Health Telecare
For the last years, time-series mining has become a challenging issue for
researchers. An important application lies in most monitoring purposes, which
require analyzing large sets of time-series for learning usual patterns. Any
deviation from this learned profile is then considered as an unexpected
situation. Moreover, complex applications may involve the temporal study of
several heterogeneous parameters. In that paper, we propose a method for mining
heterogeneous multivariate time-series for learning meaningful patterns. The
proposed approach allows for mixed time-series -- containing both pattern and
non-pattern data -- such as for imprecise matches, outliers, stretching and
global translating of patterns instances in time. We present the early results
of our approach in the context of monitoring the health status of a person at
home. The purpose is to build a behavioral profile of a person by analyzing the
time variations of several quantitative or qualitative parameters recorded
through a provision of sensors installed in the home
- …