Search CORE

23 research outputs found

Warped K-Means: An algorithm to cluster sequentially-distributed data

Author: Ackermann
Arikan
Athavale
Bashir
Beringer
Bezdek
Davies
Domingos
Dubes
Duda
Duda
Dunn
Dunn
Enrique Vidal
Farnstrom
Fod
Guha
Hofmann
Hubert
Jain
Jain
Kaufman
Kranen
Liu
Lloyd
Luis A. Leiva
Murtagh
Niebles
Panagiotakis
Patra
Peshkin
Pérez-Cortés
Seni
Trahanias
Veenman
Ward
Xu
Yu
Zhang
Zhou
Publication venue: 'Elsevier BV'
Publication date: 10/07/2013
Field of study

[EN] Many devices generate large amounts of data that follow some sort of sequentiality, e.g., motion sensors, e-pens, eye trackers, etc. and often these data need to be compressed for classification, storage, and/or retrieval tasks. Traditional clustering algorithms can be used for this purpose, but unfortunately they do not cope with the sequential information implicitly embedded in such data. Thus, we revisit the well-known K-means algorithm and provide a general method to properly cluster sequentially-distributed data. We present Warped K-Means (WKM), a multi-purpose partitional clustering procedure that minimizes the sum of squared error criterion, while imposing a hard sequentiality constraint in the classification step. We illustrate the properties of WKM in three applications, one being the segmentation and classification of human activity. WKM outperformed five state-of- the-art clustering techniques to simplify data trajectories, achieving a recognition accuracy of near 97%, which is an improvement of around 66% over their peers. Moreover, such an improvement came with a reduction in the computational cost of more than one order of magnitude.This work has been partially supported by Casmacat (FP7-ICT-2011-7, Project 287576), tranScriptorium (FP7-ICT-2011-9, Project 600707), STraDA (MINECO, TIN2012-37475-0O2-01), and ALMPR (GVA, Prometeo/20091014) projects.Leiva Torres, LA.; Vidal, E. (2013). Warped K-Means: An algorithm to cluster sequentially-distributed data. Information Sciences. 237:196-210. https://doi.org/10.1016/j.ins.2013.02.042S19621023

Crossref

RiuNet

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

arXiv.org e-Print Archive

Crossref

@19

Author: carol farnstrom
Publication venue: 'Center for Open Science'
Publication date: 10/11/2017
Field of study

OSF Preprints

Stream Clustering of Growing Objects

Author: B.R. Dai
F. Farnstrom
H. Elghazel
I.H. Witten
J. Beringer
P.N. Tan
S. Guha
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Crossref

A Framework for Clustering Evolving Data Streams

Author: Aggarwal
Ankerst
Babcock
Bradley
Cortes
Domingos
Farnstrom
Guha
Guha
Jain
Kaufman
Ng
O'Callaghan
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2003
Field of study

Crossref

Accelerate K-means Algorithm by Using GPU in the Hadoop Framework

Author: F Farnstrom
J Dean
K Arai
M Harris
R Anil
S Che
W Fang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Framework for Projected Clustering of High Dimensional Data Streams

Author: Aggarwal
Aggarwal
Aggarwal
Aggarwal
Aggarwal
Agrawal
Ankerst
Babcock
Cortes
Domingos
Farnstrom
Feigenbaum
Guha
Guha
Jain
Ng
O'Callaghan
Zhang
Publication venue
Publication date: 01/01/2004
Field of study

The data stream problem has been studied extensively in recent years, because of the great ease in collection of stream data. The nature of stream data makes it essential to use algorithms which require only one pass over the data. Recently, single-scan, stream analysis methods have been proposed in this context. However

CiteSeerX

Crossref

Sharing Less Data (Is a Good Thing)

Author: Aha
Arturo Olvera-López
Breimann
Chang
Dejaeger
Faloutsos
Farnstrom
Fayyad
Hall
Hall
Kohavi
Li
Menzies
Menzies
Murphy
Peters
Shepperd
Valerdi
White
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Crossref