Search CORE

44,023 research outputs found

Towards Distributed Convoy Pattern Mining

Author: Ester M.
Ghemawat S.
Hua K. A.
Kwon Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Mining movement data to reveal interesting behavioral patterns has gained attention in recent years. One such pattern is the convoy pattern which consists of at least m objects moving together for at least k consecutive time instants where m and k are user-defined parameters. Existing algorithms for detecting convoy patterns, however do not scale to real-life dataset sizes. Therefore a distributed algorithm for convoy mining is inevitable. In this paper, we discuss the problem of convoy mining and analyze different data partitioning strategies to pave the way for a generic distributed convoy pattern mining algorithm.Comment: SIGSPATIAL'15 November 03-06, 2015, Bellevue, WA, US

arXiv.org e-Print Archive

Crossref

Institutional Repository Universiteit Antwerpen

HAL Université de Tours

Distributed clustering algorithms.

Author
Publication venue
Publication date: 01/01/2001
Field of study

by Chan Wai To.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 117-121).Abstracts in English and Chinese.Abstract --- p.iiAcknowledgments --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Clustering --- p.3Chapter 1.2 --- Mobile Agent --- p.4Chapter 1.3 --- Contribution --- p.4Chapter 1.4 --- Outline of this Thesis --- p.5Chapter 2 --- Related Work --- p.6Chapter 2.1 --- Clustering --- p.6Chapter 2.1.1 --- K-Means Clustering --- p.6Chapter 2.1.2 --- A more efficient K-Means Clustering Algorithm --- p.3Chapter 2.1.3 --- K-Medoids Clustering Algorithms --- p.8Chapter 2.1.4 --- Linkage-based Methods --- p.11Chapter 2.1.5 --- BIRCH --- p.13Chapter 2.1.6 --- DBSCAN --- p.14Chapter 2.1.7 --- Other Clustering Algorithm --- p.17Chapter 2.2 --- Parallel Clustering and Distributed Clustering --- p.17Chapter 2.2.1 --- A Fast Parallel Clustering Algorithm for Large Spatial Databases --- p.17Chapter 2.3 --- Distributed Data Mining --- p.18Chapter 2.3.1 --- A Distributed Clustering Algorithm --- p.18Chapter 2.3.2 --- Efficient Mining of Association Rules in Distributed Databases --- p.19Chapter 2.4 --- Information Retrieval and Document Clustering --- p.20Chapter 2.4.1 --- Document and Document Set Representation --- p.20Chapter 2.4.2 --- TFIDF --- p.20Chapter 2.4.3 --- Similarity --- p.21Chapter 2.4.4 --- Partitional Document Clustering --- p.22Chapter 2.4.5 --- Hierarchical Document Clustering --- p.22Chapter 2.4.6 --- Document Clustering Application --- p.23Chapter 3 --- Distributed Clustering --- p.24Chapter 3.1 --- Problem Description --- p.24Chapter 3.2 --- Distributed k-Means Clustering Algorithm --- p.25Chapter 3.2.1 --- Initialization --- p.25Chapter 3.2.2 --- weighted k-Means procedure --- p.26Chapter 3.2.3 --- Refinement --- p.27Chapter 3.2.4 --- Example --- p.31Chapter 3.2.5 --- Communication Cost --- p.34Chapter 3.3 --- Grid k-Mean --- p.34Chapter 3.3.1 --- Runtime Splitting --- p.36Chapter 3.3.2 --- Initial Clusters --- p.38Chapter 3.3.3 --- Refinement --- p.38Chapter 3.3.4 --- Overall Algorithm --- p.39Chapter 3.3.5 --- Efficiency in Decomposition --- p.42Chapter 3.3.6 --- Example --- p.42Chapter 3.3.7 --- Comparison with previous k-Means method --- p.43Chapter 3.3.8 --- Communication Cost --- p.44Chapter 3.4 --- Experiment --- p.44Chapter 3.4.1 --- Performance --- p.46Chapter 3.4.2 --- Communication Cost --- p.47Chapter 3.4.3 --- Quality of Clustering --- p.49Chapter 3.4.4 --- Clustering in High Dimension --- p.49Chapter 3.4.5 --- Other Data Distributions --- p.52Chapter 4 --- Distributed DBSCAN --- p.54Chapter 4.1 --- Representative points of local candidate clusters --- p.55Chapter 4.2 --- Verification and Cluster Merging --- p.57Chapter 4.2.1 --- Clustering Result Quality --- p.59Chapter 4.3 --- Experiment --- p.62Chapter 5 --- Document Clustering --- p.72Chapter 5.1 --- Initialization --- p.73Chapter 5.2 --- Refinement --- p.76Chapter 5.3 --- Stopping criteria --- p.77Chapter 5.4 --- Message --- p.77Chapter 5.5 --- Algorithm --- p.78Chapter 5.6 --- Experiment --- p.82Chapter 5.6.1 --- Data Source and Experimental Setup --- p.82Chapter 5.6.2 --- Data Size --- p.34Chapter 5.6.3 --- Evaluation Metrics --- p.85Chapter 5.6.4 --- Experimental Result --- p.85Chapter 5.6.5 --- Comparison to Other Algorithms --- p.94Chapter 5.6.6 --- Conclusion --- p.94Chapter 5.7 --- Future Work --- p.95Chapter 6 --- Agent and Implementation Details --- p.96Chapter 6.1 --- Agent Introduction --- p.96Chapter 6.1.1 --- Reason to use Mobile Agent --- p.97Chapter 6.1.2 --- Grasshopper Overview --- p.97Chapter 6.1.3 --- Agent Scenario --- p.98Chapter 6.1.4 --- Another Agent Scenario --- p.99Chapter 6.2 --- Implementation Details --- p.100Chapter 6.2.1 --- Distributed k-Means --- p.100Chapter 6.2.2 --- Grid k-Means --- p.104Chapter 6.2.3 --- Distributed DBSCAN --- p.109Chapter 6.2.4 --- Distributed Document Clustering --- p.112Chapter 7 --- Conclusio

CUHK Digital Repository

Performance evaluation of a distributed clustering approach for spatial datasets

Author: A Chaudhuri
A Zhou
AK Jain
D Arlia
E Bauer
E Januzaj
F Bellifemine
H Edelsbrunner
IS Dhillon
J Han
L Rokach
LM Aouad
M Coppola
M Duckhama
M Ester
M Fadilia
M Melkemi
MJ Zaki
R Solar
S Brecheisen
S Ghosh
Tian Zhang
X Wu
X Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/04/2018
Field of study

The analysis of big data requires powerful, scalable, and accurate data analytics techniques that the traditional data mining and machine learning do not have as a whole. Therefore, new data analytics frameworks are needed to deal with the big data challenges such as volumes, velocity, veracity, variety of the data. Distributed data mining constitutes a promising approach for big data sets, as they are usually produced in distributed locations, and processing them on their local sites will reduce significantly the response times, communications, etc. In this paper, we propose to study the performance of a distributed clustering, called Dynamic Distributed Clustering (DDC). DDC has the ability to remotely generate clusters and then aggregate them using an efficient aggregation algorithm. The technique is developed for spatial datasets. We evaluated the DDC using two types of communications (synchronous and asynchronous), and tested using various load distributions. The experimental results show that the approach has super-linear speed-up, scales up very well, and can take advantage of the recent programming models, such as MapReduce model, as its results are not affected by the types of communication

Crossref

Irish Universities

DCU Online Research Access Service

Methods of Hierarchical Clustering

Author: Contreras Pedro
Murtagh Fionn
Publication venue
Publication date: 01/01/2011
Field of study

We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid-based algorithm.Comment: 21 pages, 2 figures, 1 table, 69 reference

arXiv.org e-Print Archive

Royal Holloway Research Online

Royal Holloway - Pure