666 research outputs found
GCG: Mining Maximal Complete Graph Patterns from Large Spatial Data
Recent research on pattern discovery has progressed from mining frequent
patterns and sequences to mining structured patterns, such as trees and graphs.
Graphs as general data structure can model complex relations among data with
wide applications in web exploration and social networks. However, the process
of mining large graph patterns is a challenge due to the existence of large
number of subgraphs. In this paper, we aim to mine only frequent complete graph
patterns. A graph g in a database is complete if every pair of distinct
vertices is connected by a unique edge. Grid Complete Graph (GCG) is a mining
algorithm developed to explore interesting pruning techniques to extract
maximal complete graphs from large spatial dataset existing in Sloan Digital
Sky Survey (SDSS) data. Using a divide and conquer strategy, GCG shows high
efficiency especially in the presence of large number of patterns. In this
paper, we describe GCG that can mine not only simple co-location spatial
patterns but also complex ones. To the best of our knowledge, this is the first
algorithm used to exploit the extraction of maximal complete graphs in the
process of mining complex co-location patterns in large spatial dataset.Comment: 1
Mining non-contiguous mutation chain in biological sequences based on 3D-structure
Master'sMASTER OF SCIENC
Discovery of Spatiotemporal Event Sequences
Finding frequent patterns plays a vital role in many analytics tasks such as finding itemsets, associations, correlations, and sequences. In recent decades, spatiotemporal frequent pattern mining has emerged with the main goal focused on developing data-driven analysis frameworks for understanding underlying spatial and temporal characteristics in massive datasets. In this thesis, we will focus on discovering spatiotemporal event sequences from large-scale region trajectory datasetes with event annotations. Spatiotemporal event sequences are the series of event types whose trajectory-based instances follow each other in spatiotemporal context. We introduce new data models for storing and processing evolving region trajectories, provide a novel framework for modeling spatiotemporal follow relationships, and present novel spatiotemporal event sequence mining algorithms
An Investigation in Efficient Spatial Patterns Mining
The technical progress in computerized spatial data acquisition and storage results
in the growth of vast spatial databases. Faced with large amounts of increasing spatial
data, a terminal user has more difficulty in understanding them without the helpful
knowledge from spatial databases. Thus, spatial data mining has been brought under
the umbrella of data mining and is attracting more attention.
Spatial data mining presents challenges. Differing from usual data, spatial data includes
not only positional data and attribute data, but also spatial relationships among
spatial events. Further, the instances of spatial events are embedded in a continuous
space and share a variety of spatial relationships, so the mining of spatial patterns demands
new techniques.
In this thesis, several contributions were made. Some new techniques were proposed,
i.e., fuzzy co-location mining, CPI-tree (Co-location Pattern Instance Tree),
maximal co-location patterns mining, AOI-ags (Attribute-Oriented Induction based on Attributes’
Generalization Sequences), and fuzzy association prediction. Three algorithms
were put forward on co-location patterns mining: the fuzzy co-location mining algorithm,
the CPI-tree based co-location mining algorithm (CPI-tree algorithm) and the orderclique-
based maximal prevalence co-location mining algorithm (order-clique-based algorithm).
An attribute-oriented induction algorithm based on attributes’ generalization sequences
(AOI-ags algorithm) is further given, which unified the attribute thresholds and
the tuple thresholds. On the two real-world databases with time-series data, a fuzzy association
prediction algorithm is designed. Also a cell-based spatial object fusion algorithm
is proposed. Two fuzzy clustering methods using domain knowledge were proposed:
Natural Method and Graph-Based Method, both of which were controlled by a
threshold. The threshold was confirmed by polynomial regression. Finally, a prototype
system on spatial co-location patterns’ mining was developed, and shows the relative
efficiencies of the co-location techniques proposed
The techniques presented in the thesis focus on improving the feasibility, usefulness,
effectiveness, and scalability of related algorithm. In the design of fuzzy co-location
Abstract
mining algorithm, a new data structure, the binary partition tree, used to improve the
process of fuzzy equivalence partitioning, was proposed. A prefix-based approach to
partition the prevalent event set search space into subsets, where each sub-problem can
be solved in main-memory, was also presented. The scalability of CPI-tree algorithm is
guaranteed since it does not require expensive spatial joins or instance joins for identifying
co-location table instances. In the order-clique-based algorithm, the co-location table
instances do not need be stored after computing the Pi value of corresponding colocation,
which dramatically reduces the executive time and space of mining maximal colocations.
Some technologies, for example, partitions, equivalence partition trees, prune
optimization strategies and interestingness, were used to improve the efficiency of the
AOI-ags algorithm. To implement the fuzzy association prediction algorithm, the “growing
window” and the proximity computation pruning were introduced to reduce both I/O and
CPU costs in computing the fuzzy semantic proximity between time-series.
For new techniques and algorithms, theoretical analysis and experimental results
on synthetic data sets and real-world datasets were presented and discussed in the thesis
- …