Search CORE

42 research outputs found

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

arXiv.org e-Print Archive

Crossref

Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters

Author: AK Jain
B Efron
B Hausdorf
B Hausdorf
C Fraley
C Fraley
C Hennig
C Keribin
Catherine Sugar
Chien-Ju Lin
Christian Hennig
Christian Hennig
F Drasgow
G Milligan
H Xiong
HH Bock
L Kaufman
O Arbelaitz
R Tibshirani
T Calinski
TF Cox
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Looking for Conflict: Gaze Dynamics in a Dyadic Mixed-Motive Game

Author: A Abele
A Vinciarelli
Ana Paiva
BG Tabachnick
C Castelfranchi
C Yu
CD Frith
D Heylen
EVd Vliert
G Doherty-Sneddon
HH Kelley
I Poggi
I Poggi
J Nadler
JA Hartigan
JD Boucher
Joana Campos
K Horney
K Sigmund
L Kriesberg
M Argyle
M Argyle
M Foddy
M Tomasello
MG Glaholt
N Bolshakova
NJ Emery
O Arbelaitz
Patrícia Alves-Oliveira
R Bakeman
TS Jones
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Webometrics benefitting from web mining? An investigation of methods and applications of two research fields

Author: A Bifet
A Gruzd
A Guerbas
A Martínez-Ruiz
A Noruzi
A Noruzi
A Rettinger
A Schubert
A Zuccala
AB Barragáns-Martínez
ARH Fischer
B Mobasher
B Mobasher
B Yang
B Yang
BN Miller
C Romero
C Wang
C Woo-Young
C-L Hsu
CJ Williams
D Ai
D Minguillo
D Pierrakos
D Stuart
D Wilkinson
David Gunnarsson Lorentzen
E Angus
E Kontopoulos
E Orduña-Malea
E Otte
E Romero-Frías
F Aminpour
F Barjak
F Didegah
FM Facca
G Lappas
G Paliouras
G Qiu
G Somprasertsri
GD Kumar
H Kretschmer
H Small
H-F Li
H-W Park
H-W Park
H-W Park
I Aguillo
I-C Yeh
IF Aguillo
J Bar-Ilan
J Bar-Ilan
J Borges
J Canny
J Fernández
J Srivastava
J-C Ou
JA Kirby
JA Pratt
JD Velásquez
JD Velásquez
JL Ortega
JL Ortega
JL Ortega
JL Ortega
JM Kleinberg
JW Palmer
K Holmberg
K Holmberg
K Jonkers
K Poongothai
K-Y Wang
KA-I Nekaris
L Björneborn
L Björneborn
L Björneborn
L Vaughan
L Vaughan
L Vaughan
L Vaughan
L Vaughan
L Zoonen Van
L-W Ku
M Asadi
M Biehl
M Chau
M Cheong
M Deshpande
M Efron
M Eirinaki
M Erfanmanesh
M Shekofteh
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M Thelwall
M-L Shyu
MA Bayir
MA Islam
MA Islam
MR Martínez-Torres
MR Martínez-Torres
O Arbelaitz
O Etzioni
O Nasraoui
O Nasraoui
P Ingwersen
P Wang
P Wang
P-H Chou
PB Lang
PB Lang
Q He
Q Zhang
R Ball
R Das
R Duane Ireland
R Kosala
R Malinský
RL Glass
S Alsaleh
S Brin
S Kundu
S Milgram
S-H Lin
SA Hale
SE Cho
T Becher
T Hofmann
T Holloway
T Leeuwen Van
T Takahashi
TC Almind
TJ Ruller
V Panchal
V Popova
VD Blondel
WE Nwagwu
X Polanco
Y Lai
Y Nam
Y Zhang
Yuan Shunbo
Z Huang
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Webometrics and web mining are two fields where research is focused on quantitative analyses of the web. This literature review outlines definitions of the fields, and then focuses on their methods and applications. It also discusses the potential of closer contact and collaboration between them. A key difference between the fields is that webometrics has focused on exploratory studies, whereas web mining has been dominated by studies focusing on development of methods and algorithms. Differences in type of data can also be seen, with webometrics more focused on analyses of the structure of the web and web mining more focused on web content and usage, even though both fields have been embracing the possibilities of user generated content. It is concluded that research problems where big data is needed can benefit from collaboration between webometricians, with their tradition of exploratory studies, and web miners, with their tradition of developing methods and algorithms

Crossref

University of Borås

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Identification of anti-tumour biologics using primary tumour models, 3-D phenotypic screening and image-based multi-parametric profiling

Author: AC Siva
Alan M. Sandercock
AR Webb
B Casar
B Casar
B Neumann
Bram Herpers
C Lloyd
Carl Hay
D Siolas
DC Swinney
DS Spassov
DS Spassov
DS Spassov
DT Dudley
E Fennema
E Lopez-Crapez
G Kollmorgen
G Kurosawa
G Sawada
H Kawasaki
HJ Buhring
J Bierwolf
JC Bezdek
JG Moffat
JI Ikeda
Jim Freeth
JJ Park
Jo Soden
K Fukuchi
K Mark von der
K Yan
Kris F. Sachsenmeier
Kuan Yan
L Breiman
L Turner
Leo S. Price
LH Loo
Lutz Jermutus
Matt Flynn
N Veitonmaki
Nick Holoweckyj
NT Elliott
O Arbelaitz
P Loukopoulos
Qihui Huang
R Genuer
Ralph Minter
Robert Hollingsworth
RZ Lin
S Miura
S Rust
Sandrine Guillard
SE Perry
Steven Rust
T Uekita
T Uekita
T Uekita
TJ Vaughan
VC Daniel
W Yu
X Zhao
Y Feng
Y He
YS DeRose
Z Di
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Intelligent Routing System for a Personalised Electronic Tourist

Author: Arbelaitz O
Garcia Ander
Linaza M
Vansteenwegen Pieter
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2009
Field of study

When tourists are at a destination, they typically search for information in the Local Tourist Organizations. There, the staff categorizes tourists’ profile and restrictions. Combining this information with their up-to-date knowledge about the local attractions, weather and public transportation, they suggest a personalised route for the tourist agenda. This paper presents an intelligent routing system for a Personalised Electronic Tourist Guide to fulfil the same task. This system improves the automatic route creation functionality of existing PETs to solve better the needs of tourists in several aspects: i) it includes public transportation, ii) it takes varying travelling times into account, adapting to real circumstances as rush-hours, iii) it calculates routes in real time to react to unexpected events, iv) it applies last generation heuristics from Operations Research to create routes efficiently, even in destinations with a large number of point of interests and a dense public transportation network.status: publishe

Lirias

Remotely Sensed Data Clustering Using K-Harmonic Means Algorithm and Cluster Validity Index

Author: A.K. Jain
D. Davies
G. Gan
J.C. Bezdeck
J.C. Bezdek
J.C. Dunn
K. Huang
K. Thangavel
M. Halkidi
M.K. Pakhira
O. Arbelaitz
Q. Zhao
X.L. Xie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Crossref

Selecting the Minkowski Exponent for Intelligent K-Means with Feature Weighting

Author: B. Mirkin
E.Y. Chan
H. Frigui
J.Z. Huang
M.M.T. Chiang
O. Arbelaitz
P.J. Rousseeuw
R.C. Amorim de
R.C. Amorim de
R.C. Amorim de
V. Makarenkov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/06/2014
Field of study

Recently, a three-stage version of K-Means has been introduced, at which not only clusters and their centers, but also feature weights are adjusted to minimize the summary p-th power of the Minkowski p-distance between entities and centroids of their clusters. The value of the Minkowski exponent p appears to be instrumental in the ability of the method to recover clusters hidden in data. This paper advances into the problem of finding the best p for a Minkowski metric-based version of K-Means, in each of the following two settings: semi-supervised and unsupervised. This paper presents experimental evidence that solutions found with the proposed approaches are sufficiently close to the optimum.Peer reviewe

Crossref

University of Hertfordshire Research Archive

A Cluster Analysis Approach for Rule Base Reduction

Author: A Riid
AK Jain
B Lazzerini
CT Chao
D Simon
Didier Dubois
H Bellaaj
H Wang
M Setnes
MY Chen
O Arbelaitz
P Baranyi
S Nefti
S Saitta
W Trutschnig
Y Jin
Y Jin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

In this paper we propose an iterative algorithm for fuzzy rule base simplification based on cluster analysis. The proposed approach uses a dissimilarity measure that allows to assign different importance to values and ambiguities of fuzzy terms in antecedent and consequent parts of fuzzy rules

Archivio Ricerca Ca'Foscari

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio Istituzionale della Ricerca- Università del Salento

Impact of the road network configuration on map‐matching algorithms for FCD in urban environments

Author: Behrisch M.
Brakatsoulas S.
Estibaliz Loyo
Feng T.
Goh C.
Harbil Arregui
He Z.C.
Lou Y.
Marchal F.
Mattheis S.
Mazhelis O.
Newson P.
Ochieng W.Y.
Oihana Otaegui
Olatz Arbelaitz
Yang H.
Ying J.J.‐C.
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date
Field of study

Crossref