17,915 research outputs found
GCG: Mining Maximal Complete Graph Patterns from Large Spatial Data
Recent research on pattern discovery has progressed from mining frequent
patterns and sequences to mining structured patterns, such as trees and graphs.
Graphs as general data structure can model complex relations among data with
wide applications in web exploration and social networks. However, the process
of mining large graph patterns is a challenge due to the existence of large
number of subgraphs. In this paper, we aim to mine only frequent complete graph
patterns. A graph g in a database is complete if every pair of distinct
vertices is connected by a unique edge. Grid Complete Graph (GCG) is a mining
algorithm developed to explore interesting pruning techniques to extract
maximal complete graphs from large spatial dataset existing in Sloan Digital
Sky Survey (SDSS) data. Using a divide and conquer strategy, GCG shows high
efficiency especially in the presence of large number of patterns. In this
paper, we describe GCG that can mine not only simple co-location spatial
patterns but also complex ones. To the best of our knowledge, this is the first
algorithm used to exploit the extraction of maximal complete graphs in the
process of mining complex co-location patterns in large spatial dataset.Comment: 1
Querying recurrent convoys over trajectory data
National Research Foundation (NRF) Singapore under International Research Centres in Singapore Funding Initiativ
A stigmergy-based analysis of city hotspots to discover trends and anomalies in urban transportation usage
A key aspect of a sustainable urban transportation system is the
effectiveness of transportation policies. To be effective, a policy has to
consider a broad range of elements, such as pollution emission, traffic flow,
and human mobility. Due to the complexity and variability of these elements in
the urban area, to produce effective policies remains a very challenging task.
With the introduction of the smart city paradigm, a widely available amount of
data can be generated in the urban spaces. Such data can be a fundamental
source of knowledge to improve policies because they can reflect the
sustainability issues underlying the city. In this context, we propose an
approach to exploit urban positioning data based on stigmergy, a bio-inspired
mechanism providing scalar and temporal aggregation of samples. By employing
stigmergy, samples in proximity with each other are aggregated into a
functional structure called trail. The trail summarizes relevant dynamics in
data and allows matching them, providing a measure of their similarity.
Moreover, this mechanism can be specialized to unfold specific dynamics.
Specifically, we identify high-density urban areas (i.e hotspots), analyze
their activity over time, and unfold anomalies. Moreover, by matching activity
patterns, a continuous measure of the dissimilarity with respect to the typical
activity pattern is provided. This measure can be used by policy makers to
evaluate the effect of policies and change them dynamically. As a case study,
we analyze taxi trip data gathered in Manhattan from 2013 to 2015.Comment: Preprin
Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma
A novel algorithm and implementation of real-time identification and tracking
of blob-filaments in fusion reactor data is presented. Similar spatio-temporal
features are important in many other applications, for example, ignition
kernels in combustion and tumor cells in a medical image. This work presents an
approach for extracting these features by dividing the overall task into three
steps: local identification of feature cells, grouping feature cells into
extended feature, and tracking movement of feature through overlapping in
space. Through our extensive work in parallelization, we demonstrate that this
approach can effectively make use of a large number of compute nodes to detect
and track blob-filaments in real time in fusion plasma. On a set of 30GB fusion
simulation data, we observed linear speedup on 1024 processes and completed
blob detection in less than three milliseconds using Edison, a Cray XC30 system
at NERSC.Comment: 14 pages, 40 figure
An Investigation in Efficient Spatial Patterns Mining
The technical progress in computerized spatial data acquisition and storage results
in the growth of vast spatial databases. Faced with large amounts of increasing spatial
data, a terminal user has more difficulty in understanding them without the helpful
knowledge from spatial databases. Thus, spatial data mining has been brought under
the umbrella of data mining and is attracting more attention.
Spatial data mining presents challenges. Differing from usual data, spatial data includes
not only positional data and attribute data, but also spatial relationships among
spatial events. Further, the instances of spatial events are embedded in a continuous
space and share a variety of spatial relationships, so the mining of spatial patterns demands
new techniques.
In this thesis, several contributions were made. Some new techniques were proposed,
i.e., fuzzy co-location mining, CPI-tree (Co-location Pattern Instance Tree),
maximal co-location patterns mining, AOI-ags (Attribute-Oriented Induction based on Attributesâ
Generalization Sequences), and fuzzy association prediction. Three algorithms
were put forward on co-location patterns mining: the fuzzy co-location mining algorithm,
the CPI-tree based co-location mining algorithm (CPI-tree algorithm) and the orderclique-
based maximal prevalence co-location mining algorithm (order-clique-based algorithm).
An attribute-oriented induction algorithm based on attributesâ generalization sequences
(AOI-ags algorithm) is further given, which unified the attribute thresholds and
the tuple thresholds. On the two real-world databases with time-series data, a fuzzy association
prediction algorithm is designed. Also a cell-based spatial object fusion algorithm
is proposed. Two fuzzy clustering methods using domain knowledge were proposed:
Natural Method and Graph-Based Method, both of which were controlled by a
threshold. The threshold was confirmed by polynomial regression. Finally, a prototype
system on spatial co-location patternsâ mining was developed, and shows the relative
efficiencies of the co-location techniques proposed
The techniques presented in the thesis focus on improving the feasibility, usefulness,
effectiveness, and scalability of related algorithm. In the design of fuzzy co-location
Abstract
mining algorithm, a new data structure, the binary partition tree, used to improve the
process of fuzzy equivalence partitioning, was proposed. A prefix-based approach to
partition the prevalent event set search space into subsets, where each sub-problem can
be solved in main-memory, was also presented. The scalability of CPI-tree algorithm is
guaranteed since it does not require expensive spatial joins or instance joins for identifying
co-location table instances. In the order-clique-based algorithm, the co-location table
instances do not need be stored after computing the Pi value of corresponding colocation,
which dramatically reduces the executive time and space of mining maximal colocations.
Some technologies, for example, partitions, equivalence partition trees, prune
optimization strategies and interestingness, were used to improve the efficiency of the
AOI-ags algorithm. To implement the fuzzy association prediction algorithm, the âgrowing
windowâ and the proximity computation pruning were introduced to reduce both I/O and
CPU costs in computing the fuzzy semantic proximity between time-series.
For new techniques and algorithms, theoretical analysis and experimental results
on synthetic data sets and real-world datasets were presented and discussed in the thesis
- âŠ