383,352 research outputs found

    Interactive probabilistic post-mining of user-preferred spatial co-location patterns

    Full text link
    © 2018 IEEE. Spatial co-location pattern mining is an important task in spatial data mining. However, traditional mining frameworks often produce too many prevalent patterns of which only a small proportion may be truly interesting to end users. To satisfy user preferences, this work proposes an interactive probabilistic post-mining method to discover user-preferred co-location patterns from the early-round of mined results by iteratively involving user's feedback and probabilistically refining preferred patterns. We first introduce a framework of interactively post-mining preferred co-location patterns, which enables a user to effectively discover the co-location patterns tailored to his/her specific preference. A probabilistic model is further introduced to measure the user feedback-based subjective preferences on resultant co-location patterns. This measure is used to not only select sample co-location patterns in the iterative user feedback process but also rank the results. The experimental results on real and synthetic data sets demonstrate the effectiveness of our approach

    GCG: Mining Maximal Complete Graph Patterns from Large Spatial Data

    Full text link
    Recent research on pattern discovery has progressed from mining frequent patterns and sequences to mining structured patterns, such as trees and graphs. Graphs as general data structure can model complex relations among data with wide applications in web exploration and social networks. However, the process of mining large graph patterns is a challenge due to the existence of large number of subgraphs. In this paper, we aim to mine only frequent complete graph patterns. A graph g in a database is complete if every pair of distinct vertices is connected by a unique edge. Grid Complete Graph (GCG) is a mining algorithm developed to explore interesting pruning techniques to extract maximal complete graphs from large spatial dataset existing in Sloan Digital Sky Survey (SDSS) data. Using a divide and conquer strategy, GCG shows high efficiency especially in the presence of large number of patterns. In this paper, we describe GCG that can mine not only simple co-location spatial patterns but also complex ones. To the best of our knowledge, this is the first algorithm used to exploit the extraction of maximal complete graphs in the process of mining complex co-location patterns in large spatial dataset.Comment: 1

    An Investigation in Efficient Spatial Patterns Mining

    Get PDF
    The technical progress in computerized spatial data acquisition and storage results in the growth of vast spatial databases. Faced with large amounts of increasing spatial data, a terminal user has more difficulty in understanding them without the helpful knowledge from spatial databases. Thus, spatial data mining has been brought under the umbrella of data mining and is attracting more attention. Spatial data mining presents challenges. Differing from usual data, spatial data includes not only positional data and attribute data, but also spatial relationships among spatial events. Further, the instances of spatial events are embedded in a continuous space and share a variety of spatial relationships, so the mining of spatial patterns demands new techniques. In this thesis, several contributions were made. Some new techniques were proposed, i.e., fuzzy co-location mining, CPI-tree (Co-location Pattern Instance Tree), maximal co-location patterns mining, AOI-ags (Attribute-Oriented Induction based on Attributes’ Generalization Sequences), and fuzzy association prediction. Three algorithms were put forward on co-location patterns mining: the fuzzy co-location mining algorithm, the CPI-tree based co-location mining algorithm (CPI-tree algorithm) and the orderclique- based maximal prevalence co-location mining algorithm (order-clique-based algorithm). An attribute-oriented induction algorithm based on attributes’ generalization sequences (AOI-ags algorithm) is further given, which unified the attribute thresholds and the tuple thresholds. On the two real-world databases with time-series data, a fuzzy association prediction algorithm is designed. Also a cell-based spatial object fusion algorithm is proposed. Two fuzzy clustering methods using domain knowledge were proposed: Natural Method and Graph-Based Method, both of which were controlled by a threshold. The threshold was confirmed by polynomial regression. Finally, a prototype system on spatial co-location patterns’ mining was developed, and shows the relative efficiencies of the co-location techniques proposed The techniques presented in the thesis focus on improving the feasibility, usefulness, effectiveness, and scalability of related algorithm. In the design of fuzzy co-location Abstract mining algorithm, a new data structure, the binary partition tree, used to improve the process of fuzzy equivalence partitioning, was proposed. A prefix-based approach to partition the prevalent event set search space into subsets, where each sub-problem can be solved in main-memory, was also presented. The scalability of CPI-tree algorithm is guaranteed since it does not require expensive spatial joins or instance joins for identifying co-location table instances. In the order-clique-based algorithm, the co-location table instances do not need be stored after computing the Pi value of corresponding colocation, which dramatically reduces the executive time and space of mining maximal colocations. Some technologies, for example, partitions, equivalence partition trees, prune optimization strategies and interestingness, were used to improve the efficiency of the AOI-ags algorithm. To implement the fuzzy association prediction algorithm, the “growing window” and the proximity computation pruning were introduced to reduce both I/O and CPU costs in computing the fuzzy semantic proximity between time-series. For new techniques and algorithms, theoretical analysis and experimental results on synthetic data sets and real-world datasets were presented and discussed in the thesis

    Object Discovery From a Single Unlabeled Image by Mining Frequent Itemset With Multi-scale Features

    Full text link
    TThe goal of our work is to discover dominant objects in a very general setting where only a single unlabeled image is given. This is far more challenge than typical co-localization or weakly-supervised localization tasks. To tackle this problem, we propose a simple but effective pattern mining-based method, called Object Location Mining (OLM), which exploits the advantages of data mining and feature representation of pre-trained convolutional neural networks (CNNs). Specifically, we first convert the feature maps from a pre-trained CNN model into a set of transactions, and then discovers frequent patterns from transaction database through pattern mining techniques. We observe that those discovered patterns, i.e., co-occurrence highlighted regions, typically hold appearance and spatial consistency. Motivated by this observation, we can easily discover and localize possible objects by merging relevant meaningful patterns. Extensive experiments on a variety of benchmarks demonstrate that OLM achieves competitive localization performance compared with the state-of-the-art methods. We also evaluate our approach compared with unsupervised saliency detection methods and achieves competitive results on seven benchmark datasets. Moreover, we conduct experiments on fine-grained classification to show that our proposed method can locate the entire object and parts accurately, which can benefit to improving the classification results significantly

    Summarizing data with representative patterns

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.The advance of technology makes data acquisition and storage become unprecedentedly convenient. It contributes to the rapid growth of not only the volume but also the veracity and variety of data in recent years, which poses new challenges to the data mining area. For example, uncertain data mining emerges due to its capability to model the inherent veracity of data; spatial data mining attracts much research attention as the widespread of location-based services and wearable devices. As a fundamental topic of data mining, how to effectively and efficiently summarize data in this situation still remains to be explored. This thesis studied the problem of summarizing data with representative patterns. The objective is to find a set of patterns, which is much more concise but still contains rich information of the original data, and may provide valuable insights for further analysis of data. In the light of this idea, we formally formulate the problem and provide effective and efficient solutions in various scenarios. We study the problem of summarizing probabilistic frequent patterns over uncertain data. Probabilistic frequent pattern mining over uncertain data has received much research attention due to the wide applicabilities of uncertain data. It suffers from the problem of generating an exponential number of result patterns, which hinders the analysis of patterns and calls for the need to find a small number of representative patterns to approximate all other patterns. We formally formulate the problem of probabilistic representative frequent pattern (P-RFP) mining, which aims to find the minimal set of patterns with sufficiently high probability to represent all other patterns. The bottleneck turns out to be checking whether a pattern can probabilistically represent another, which involves the computation of a joint probability of the supports of two patterns. We propose a novel dynamic programming-based approach to address the problem and devise effective optimization strategies to improve the computation efficiency. To enhance the practicability of P-RFP mining, we introduce a novel approximation of the joint probability with both theoretical and empirical proofs. Based on the approximation, we propose an Approximate P-RFP Mining (APM) algorithm, which effectively and efficiently compresses the probabilistic frequent pattern set. The error rate of APM is guaranteed to be very small when the database contains hundreds of transactions, which further affirms that APM is a practical solution for summarizing probabilistic frequent patterns. We address the problem of directly summarizing uncertain transaction database by formulating the problem as Minimal Probabilistic Tile Cover Mining, which aims to find a high-quality probabilistic tile set covering an uncertain database with minimal cost. We define the concept of Probabilistic Price and Probabilistic Price Order to evaluate and compare the quality of tiles, and propose a framework to discover the minimal probabilistic tile cover. The bottleneck is to check whether a tile is better than another according to the Probabilistic Price Order, which involves the computation of a joint probability. We prove that it can be decomposed into independent terms and calculated efficiently. Several optimization techniques are devised to further improve the performance. We analyze the problem of summarizing co-locations mined from spatial databases. Co-location pattern mining finds patterns of spatial features whose instances tend to locate together in geographic space. However, the traditional framework of co-location pattern mining produces an exponential number of patterns because of the downward closure property, which makes it difficult for users to understand, assess or apply the huge number of resulted patterns. To address this issue, we study the problem of mining representative co-location patterns (RCP). We first define a covering relationship between two co-location patterns then formally formulate the problem of Representative Co-location Pattern mining. To solve the problem of RCP mining, we propose the RCPFast algorithm adopting the post-mining framework and the RCPMS algorithm pushing pattern summarization into the co-location mining process
    • 

    corecore