383,352 research outputs found
Interactive probabilistic post-mining of user-preferred spatial co-location patterns
© 2018 IEEE. Spatial co-location pattern mining is an important task in spatial data mining. However, traditional mining frameworks often produce too many prevalent patterns of which only a small proportion may be truly interesting to end users. To satisfy user preferences, this work proposes an interactive probabilistic post-mining method to discover user-preferred co-location patterns from the early-round of mined results by iteratively involving user's feedback and probabilistically refining preferred patterns. We first introduce a framework of interactively post-mining preferred co-location patterns, which enables a user to effectively discover the co-location patterns tailored to his/her specific preference. A probabilistic model is further introduced to measure the user feedback-based subjective preferences on resultant co-location patterns. This measure is used to not only select sample co-location patterns in the iterative user feedback process but also rank the results. The experimental results on real and synthetic data sets demonstrate the effectiveness of our approach
GCG: Mining Maximal Complete Graph Patterns from Large Spatial Data
Recent research on pattern discovery has progressed from mining frequent
patterns and sequences to mining structured patterns, such as trees and graphs.
Graphs as general data structure can model complex relations among data with
wide applications in web exploration and social networks. However, the process
of mining large graph patterns is a challenge due to the existence of large
number of subgraphs. In this paper, we aim to mine only frequent complete graph
patterns. A graph g in a database is complete if every pair of distinct
vertices is connected by a unique edge. Grid Complete Graph (GCG) is a mining
algorithm developed to explore interesting pruning techniques to extract
maximal complete graphs from large spatial dataset existing in Sloan Digital
Sky Survey (SDSS) data. Using a divide and conquer strategy, GCG shows high
efficiency especially in the presence of large number of patterns. In this
paper, we describe GCG that can mine not only simple co-location spatial
patterns but also complex ones. To the best of our knowledge, this is the first
algorithm used to exploit the extraction of maximal complete graphs in the
process of mining complex co-location patterns in large spatial dataset.Comment: 1
An Investigation in Efficient Spatial Patterns Mining
The technical progress in computerized spatial data acquisition and storage results
in the growth of vast spatial databases. Faced with large amounts of increasing spatial
data, a terminal user has more difficulty in understanding them without the helpful
knowledge from spatial databases. Thus, spatial data mining has been brought under
the umbrella of data mining and is attracting more attention.
Spatial data mining presents challenges. Differing from usual data, spatial data includes
not only positional data and attribute data, but also spatial relationships among
spatial events. Further, the instances of spatial events are embedded in a continuous
space and share a variety of spatial relationships, so the mining of spatial patterns demands
new techniques.
In this thesis, several contributions were made. Some new techniques were proposed,
i.e., fuzzy co-location mining, CPI-tree (Co-location Pattern Instance Tree),
maximal co-location patterns mining, AOI-ags (Attribute-Oriented Induction based on Attributesâ
Generalization Sequences), and fuzzy association prediction. Three algorithms
were put forward on co-location patterns mining: the fuzzy co-location mining algorithm,
the CPI-tree based co-location mining algorithm (CPI-tree algorithm) and the orderclique-
based maximal prevalence co-location mining algorithm (order-clique-based algorithm).
An attribute-oriented induction algorithm based on attributesâ generalization sequences
(AOI-ags algorithm) is further given, which unified the attribute thresholds and
the tuple thresholds. On the two real-world databases with time-series data, a fuzzy association
prediction algorithm is designed. Also a cell-based spatial object fusion algorithm
is proposed. Two fuzzy clustering methods using domain knowledge were proposed:
Natural Method and Graph-Based Method, both of which were controlled by a
threshold. The threshold was confirmed by polynomial regression. Finally, a prototype
system on spatial co-location patternsâ mining was developed, and shows the relative
efficiencies of the co-location techniques proposed
The techniques presented in the thesis focus on improving the feasibility, usefulness,
effectiveness, and scalability of related algorithm. In the design of fuzzy co-location
Abstract
mining algorithm, a new data structure, the binary partition tree, used to improve the
process of fuzzy equivalence partitioning, was proposed. A prefix-based approach to
partition the prevalent event set search space into subsets, where each sub-problem can
be solved in main-memory, was also presented. The scalability of CPI-tree algorithm is
guaranteed since it does not require expensive spatial joins or instance joins for identifying
co-location table instances. In the order-clique-based algorithm, the co-location table
instances do not need be stored after computing the Pi value of corresponding colocation,
which dramatically reduces the executive time and space of mining maximal colocations.
Some technologies, for example, partitions, equivalence partition trees, prune
optimization strategies and interestingness, were used to improve the efficiency of the
AOI-ags algorithm. To implement the fuzzy association prediction algorithm, the âgrowing
windowâ and the proximity computation pruning were introduced to reduce both I/O and
CPU costs in computing the fuzzy semantic proximity between time-series.
For new techniques and algorithms, theoretical analysis and experimental results
on synthetic data sets and real-world datasets were presented and discussed in the thesis
Object Discovery From a Single Unlabeled Image by Mining Frequent Itemset With Multi-scale Features
TThe goal of our work is to discover dominant objects in a very general
setting where only a single unlabeled image is given. This is far more
challenge than typical co-localization or weakly-supervised localization tasks.
To tackle this problem, we propose a simple but effective pattern mining-based
method, called Object Location Mining (OLM), which exploits the advantages of
data mining and feature representation of pre-trained convolutional neural
networks (CNNs). Specifically, we first convert the feature maps from a
pre-trained CNN model into a set of transactions, and then discovers frequent
patterns from transaction database through pattern mining techniques. We
observe that those discovered patterns, i.e., co-occurrence highlighted
regions, typically hold appearance and spatial consistency. Motivated by this
observation, we can easily discover and localize possible objects by merging
relevant meaningful patterns. Extensive experiments on a variety of benchmarks
demonstrate that OLM achieves competitive localization performance compared
with the state-of-the-art methods. We also evaluate our approach compared with
unsupervised saliency detection methods and achieves competitive results on
seven benchmark datasets. Moreover, we conduct experiments on fine-grained
classification to show that our proposed method can locate the entire object
and parts accurately, which can benefit to improving the classification results
significantly
Summarizing data with representative patterns
University of Technology Sydney. Faculty of Engineering and Information Technology.The advance of technology makes data acquisition and storage become unprecedentedly convenient. It contributes to the rapid growth of not only the volume but also the veracity and variety of data in recent years, which poses new challenges to the data mining area. For example, uncertain data mining emerges due to its capability to model the inherent veracity of data; spatial data mining attracts much research attention as the widespread of location-based services and wearable devices. As a fundamental topic of data mining, how to effectively and efficiently summarize data in this situation still remains to be explored.
This thesis studied the problem of summarizing data with representative patterns. The objective is to find a set of patterns, which is much more concise but still contains rich information of the original data, and may provide valuable insights for further analysis of data. In the light of this idea, we formally formulate the problem and provide effective and efficient solutions in various scenarios.
We study the problem of summarizing probabilistic frequent patterns over uncertain data. Probabilistic frequent pattern mining over uncertain data has received much research attention due to the wide applicabilities of uncertain data. It suffers from the problem of generating an exponential number of result patterns, which hinders the analysis of patterns and calls for the need to find a small number of representative patterns to approximate all other patterns. We formally formulate the problem of probabilistic representative frequent pattern (P-RFP) mining, which aims to find the minimal set of patterns with sufficiently high probability to represent all other patterns. The bottleneck turns out to be checking whether a pattern can probabilistically represent another, which involves the computation of a joint probability of the supports of two patterns. We propose a novel dynamic programming-based approach to address the problem and devise effective optimization strategies to improve the computation efficiency.
To enhance the practicability of P-RFP mining, we introduce a novel approximation of the joint probability with both theoretical and empirical proofs. Based on the approximation, we propose an Approximate P-RFP Mining (APM) algorithm, which effectively and efficiently compresses the probabilistic frequent pattern set. The error rate of APM is guaranteed to be very small when the database contains hundreds of transactions, which further affirms that APM is a practical solution for summarizing probabilistic frequent patterns.
We address the problem of directly summarizing uncertain transaction database by formulating the problem as Minimal Probabilistic Tile Cover Mining, which aims to find a high-quality probabilistic tile set covering an uncertain database with minimal cost. We define the concept of Probabilistic Price and Probabilistic Price Order to evaluate and compare the quality of tiles, and propose a framework to discover the minimal probabilistic tile cover. The bottleneck is to check whether a tile is better than another according to the Probabilistic Price Order, which involves the computation of a joint probability. We prove that it can be decomposed into independent terms and calculated efficiently. Several optimization techniques are devised to further improve the performance.
We analyze the problem of summarizing co-locations mined from spatial databases. Co-location pattern mining finds patterns of spatial features whose instances tend to locate together in geographic space. However, the traditional framework of co-location pattern mining produces an exponential number of patterns because of the downward closure property, which makes it difficult for users to understand, assess or apply the huge number of resulted patterns. To address this issue, we study the problem of mining representative co-location patterns (RCP). We first define a covering relationship between two co-location patterns then formally formulate the problem of Representative Co-location Pattern mining. To solve the problem of RCP mining, we propose the RCPFast algorithm adopting the post-mining framework and the RCPMS algorithm pushing pattern summarization into the co-location mining process
- âŠ