Search CORE

140 research outputs found

A survey on feature weighting based K-Means algorithms

Author: A GODER
A STURN
AK JAIN
AL BLUM
AP DEMPSTER
AP GASCH
B Mirkin
CY TSAI
D ALOISE
D Steinley
D STEINLEY
D STEINLEY
D WETTSCHERECK
DS MODHA
E Polak
F Murtagh
G Soete de
G Soete de
GH BALL
H Steinhaus
I GUYON
JC BEZDEK
L HUBERT
LA ZADEH
P DRINEAS
P MITRA
PE GREEN
R Bellman
R GNANADESIKAN
R KOHAVI
RC AMORIM DE
RC AMORIM DE
Renato Cordeiro de Amorim
SP CHATZIS
V MAKARENKOV
WS DESARBO
WS DESARBO
WS DESARBO
Z Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/08/2016
Field of study

This is a pre-copyedited, author-produced PDF of an article accepted for publication in Journal of Classification [de Amorim, R. C., 'A survey on feature weighting based K-Means algorithms', Journal of Classification, Vol. 33(2): 210-242, August 25, 2016]. Subject to embargo. Embargo end date: 25 August 2017. The final publication is available at Springer via http://dx.doi.org/10.1007/s00357-016-9208-4 © Classification Society of North America 2016In a real-world data set there is always the possibility, rather high in our opinion, that different features may have different degrees of relevance. Most machine learning algorithms deal with this fact by either selecting or deselecting features in the data preprocessing phase. However, we maintain that even among relevant features there may be different degrees of relevance, and this should be taken into account during the clustering process. With over 50 years of history, K-Means is arguably the most popular partitional clustering algorithm there is. The first K-Means based clustering algorithm to compute feature weights was designed just over 30 years ago. Various such algorithms have been designed since but there has not been, to our knowledge, a survey integrating empirical evidence of cluster recovery ability, common flaws, and possible directions for future research. This paper elaborates on the concept of feature weighting and addresses these issues by critically analysing some of the most popular, or innovative, feature weighting mechanisms based in K-Means.Peer reviewedFinal Accepted Versio

University of Essex Research Repository

Crossref

University of Hertfordshire Research Archive

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

arXiv.org e-Print Archive

Crossref

Detecting price thresholds in choice models using a semi-parametric approach

Author: A Biswas
A Tversky
BGS Hardie
C Janiszewski
D Kahneman
DC Jain
E Gijsbrechts
G Kalyanaram
G Kalyanaram
H Estelami
HJ Heerde van
JH Friedman
K Chang
K Raman
Kalyan Raman
L Krishnamurthi
L Krishnamurthi
Lutz Hildebrandt
M Sherif
M Wricke
OB Linton
P McCullagh
PM Guadagni
RA Briesch
RS Winer
RS Winer
RS Winer
RW Niedrich
S Han
T Erdem
T Hastie
T Mazumdar
T Mazumdar
T Mazumdar
WA Kamakura
WJ Steiner
WS DeSarbo
WS DeSarbo
Yasemin Boztuğ
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Two-mode clustering methods: astructuredoverview

Author: Barbut M
Bock H-H.
Bock H-H.
Bock H-H.
Bock H-H.
Bock H-H.
Both M
Braverman EM.
Carroll JD
Celeux G
DeSarbo WS
Eckes T.
Espejo E
Everitt BS
Fisher W.
Frege G.
Gordon AD.
Govaert G.
Govaert G.
Hans-Hermann Bock
Hartigan JA.
Iven Van Mechelen
Jain AK
Li J
Marcotorchino F.
Mezzich JE
Mickey MR
Paul De Boeck
Pötzelberger K
Tryon RC.
Tucker LR.
Vichi M.
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

An exponential-family multidimensional scaling mixture methodology

Author: Desarbo WS
Wedel M
Publication venue
Publication date: 01/10/1996
Field of study

A multidimensional scaling methodology (STUNMIX) for the analysis of subjects' preference/choice of stimuli that sets out to integrate the previous work in this area into a single framework, as well as to provide a variety of new options and models, is presented. Locations of the stimuli and the ideal points of derived segments of subjects on latent dimensions are estimated simultaneously. The methodology is formulated in the framework of the exponential family of distributions, whereby a wide range of different data types can be analyzed. Possible reparameterizations of stimulus coordinates by stimulus characteristics, as well as of probabilities of segment membership by subject background variables, are permitted. The models are estimated in a maximum likelihood framework. The performance of the models is demonstrated on synthetic data, and robustness is investigated. An empirical application is provided, concerning intentions to buy portable telephones

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

A LATENT CLASS BINOMIAL LOGIT METHODOLOGY FOR THE ANALYSIS OF PAIRED-COMPARISON CHOICE DATA - AN APPLICATION REINVESTIGATING THE DETERMINANTS OF PERCEIVED RISK

Author: DESARBO WS
WEDEL M
Publication venue
Publication date: 01/01/1993
Field of study

A latent class model for identifying classes of subjects in paired comparison choice experiments is developed. The model simultaneously estimates a probabilistic classification of subjects and the logit models' coefficients relating characteristics of objects to choices for each respective group among two alternatives in paired comparison experiments. A modest Monte Carlo analysis of algorithm performance is presented. The proposed model is illustrated with empirical data from a consumer psychology experiment that examines the determinants of perceived consumer risk. The predictive validity of the method is assessed and compared to that of several other procedures. The sensitivity of the method to (randomly) eliminate comparisons, which is important in view of reducing respondent fatigue in the task, is investigated.</p

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen