44,023 research outputs found

    Towards Distributed Convoy Pattern Mining

    Full text link
    Mining movement data to reveal interesting behavioral patterns has gained attention in recent years. One such pattern is the convoy pattern which consists of at least m objects moving together for at least k consecutive time instants where m and k are user-defined parameters. Existing algorithms for detecting convoy patterns, however do not scale to real-life dataset sizes. Therefore a distributed algorithm for convoy mining is inevitable. In this paper, we discuss the problem of convoy mining and analyze different data partitioning strategies to pave the way for a generic distributed convoy pattern mining algorithm.Comment: SIGSPATIAL'15 November 03-06, 2015, Bellevue, WA, US

    Distributed clustering algorithms.

    Get PDF
    by Chan Wai To.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 117-121).Abstracts in English and Chinese.Abstract --- p.iiAcknowledgments --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Clustering --- p.3Chapter 1.2 --- Mobile Agent --- p.4Chapter 1.3 --- Contribution --- p.4Chapter 1.4 --- Outline of this Thesis --- p.5Chapter 2 --- Related Work --- p.6Chapter 2.1 --- Clustering --- p.6Chapter 2.1.1 --- K-Means Clustering --- p.6Chapter 2.1.2 --- A more efficient K-Means Clustering Algorithm --- p.3Chapter 2.1.3 --- K-Medoids Clustering Algorithms --- p.8Chapter 2.1.4 --- Linkage-based Methods --- p.11Chapter 2.1.5 --- BIRCH --- p.13Chapter 2.1.6 --- DBSCAN --- p.14Chapter 2.1.7 --- Other Clustering Algorithm --- p.17Chapter 2.2 --- Parallel Clustering and Distributed Clustering --- p.17Chapter 2.2.1 --- A Fast Parallel Clustering Algorithm for Large Spatial Databases --- p.17Chapter 2.3 --- Distributed Data Mining --- p.18Chapter 2.3.1 --- A Distributed Clustering Algorithm --- p.18Chapter 2.3.2 --- Efficient Mining of Association Rules in Distributed Databases --- p.19Chapter 2.4 --- Information Retrieval and Document Clustering --- p.20Chapter 2.4.1 --- Document and Document Set Representation --- p.20Chapter 2.4.2 --- TFIDF --- p.20Chapter 2.4.3 --- Similarity --- p.21Chapter 2.4.4 --- Partitional Document Clustering --- p.22Chapter 2.4.5 --- Hierarchical Document Clustering --- p.22Chapter 2.4.6 --- Document Clustering Application --- p.23Chapter 3 --- Distributed Clustering --- p.24Chapter 3.1 --- Problem Description --- p.24Chapter 3.2 --- Distributed k-Means Clustering Algorithm --- p.25Chapter 3.2.1 --- Initialization --- p.25Chapter 3.2.2 --- weighted k-Means procedure --- p.26Chapter 3.2.3 --- Refinement --- p.27Chapter 3.2.4 --- Example --- p.31Chapter 3.2.5 --- Communication Cost --- p.34Chapter 3.3 --- Grid k-Mean --- p.34Chapter 3.3.1 --- Runtime Splitting --- p.36Chapter 3.3.2 --- Initial Clusters --- p.38Chapter 3.3.3 --- Refinement --- p.38Chapter 3.3.4 --- Overall Algorithm --- p.39Chapter 3.3.5 --- Efficiency in Decomposition --- p.42Chapter 3.3.6 --- Example --- p.42Chapter 3.3.7 --- Comparison with previous k-Means method --- p.43Chapter 3.3.8 --- Communication Cost --- p.44Chapter 3.4 --- Experiment --- p.44Chapter 3.4.1 --- Performance --- p.46Chapter 3.4.2 --- Communication Cost --- p.47Chapter 3.4.3 --- Quality of Clustering --- p.49Chapter 3.4.4 --- Clustering in High Dimension --- p.49Chapter 3.4.5 --- Other Data Distributions --- p.52Chapter 4 --- Distributed DBSCAN --- p.54Chapter 4.1 --- Representative points of local candidate clusters --- p.55Chapter 4.2 --- Verification and Cluster Merging --- p.57Chapter 4.2.1 --- Clustering Result Quality --- p.59Chapter 4.3 --- Experiment --- p.62Chapter 5 --- Document Clustering --- p.72Chapter 5.1 --- Initialization --- p.73Chapter 5.2 --- Refinement --- p.76Chapter 5.3 --- Stopping criteria --- p.77Chapter 5.4 --- Message --- p.77Chapter 5.5 --- Algorithm --- p.78Chapter 5.6 --- Experiment --- p.82Chapter 5.6.1 --- Data Source and Experimental Setup --- p.82Chapter 5.6.2 --- Data Size --- p.34Chapter 5.6.3 --- Evaluation Metrics --- p.85Chapter 5.6.4 --- Experimental Result --- p.85Chapter 5.6.5 --- Comparison to Other Algorithms --- p.94Chapter 5.6.6 --- Conclusion --- p.94Chapter 5.7 --- Future Work --- p.95Chapter 6 --- Agent and Implementation Details --- p.96Chapter 6.1 --- Agent Introduction --- p.96Chapter 6.1.1 --- Reason to use Mobile Agent --- p.97Chapter 6.1.2 --- Grasshopper Overview --- p.97Chapter 6.1.3 --- Agent Scenario --- p.98Chapter 6.1.4 --- Another Agent Scenario --- p.99Chapter 6.2 --- Implementation Details --- p.100Chapter 6.2.1 --- Distributed k-Means --- p.100Chapter 6.2.2 --- Grid k-Means --- p.104Chapter 6.2.3 --- Distributed DBSCAN --- p.109Chapter 6.2.4 --- Distributed Document Clustering --- p.112Chapter 7 --- Conclusio

    Performance evaluation of a distributed clustering approach for spatial datasets

    Get PDF
    The analysis of big data requires powerful, scalable, and accurate data analytics techniques that the traditional data mining and machine learning do not have as a whole. Therefore, new data analytics frameworks are needed to deal with the big data challenges such as volumes, velocity, veracity, variety of the data. Distributed data mining constitutes a promising approach for big data sets, as they are usually produced in distributed locations, and processing them on their local sites will reduce significantly the response times, communications, etc. In this paper, we propose to study the performance of a distributed clustering, called Dynamic Distributed Clustering (DDC). DDC has the ability to remotely generate clusters and then aggregate them using an efficient aggregation algorithm. The technique is developed for spatial datasets. We evaluated the DDC using two types of communications (synchronous and asynchronous), and tested using various load distributions. The experimental results show that the approach has super-linear speed-up, scales up very well, and can take advantage of the recent programming models, such as MapReduce model, as its results are not affected by the types of communication

    Methods of Hierarchical Clustering

    Get PDF
    We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid-based algorithm.Comment: 21 pages, 2 figures, 1 table, 69 reference
    corecore