Search CORE

598 research outputs found

Constrained Clustering with Minkowski Weighted K-Means

Author: Amorim RC
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/04/2013
Field of study

In this paper we introduce the Constrained Minkowski Weighted K-Means. This algorithm calculates cluster specific feature weights that can be interpreted as feature rescaling factors thanks to the use of the Minkowski distance. Here, we use an small amount of labelled data to select a Minkowski exponent and to generate clustering constrains based on pair-wise must-link and cannot-link rules. We validate our new algorithm with a total of 12 datasets, most of which containing features with uniformly distributed noise. We have run the algorithm numerous times in each dataset. These experiments ratify the general superiority of using feature weighting in K-Means, particularly when applying the Minkowski distance. We have also found that the use of constrained clustering rules has little effect on the average proportion of correctly clustered entities. However, constrained clustering does improve considerably the maximum of such proportion

University of Essex Research Repository

Crossref

University of Hertfordshire Research Archive

Improving cluster recovery with feature rescaling factors

Author: A Hatamlou
AK Jain
C Hennig
D Aloise
D Steinley
D Steinley
D Xu
E Lord
H-P Kriegel
H-P Kriegel
M Erisoglu
MCP de Souto
MM-T Chiang
R Panda
R Suzuki
R Ünlü
RC de Amorim
RC de Amorim
RC de Amorim
RC de Amorim
RL Melvin
WM Rand
X Li
Y Sun
Z Deng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2021
Field of study

The data preprocessing stage is crucial in clustering. Features may describe entities using different scales. To rectify this, one usually applies feature normalisation aiming at rescaling features so that none of them overpowers the others in the objective function of the selected clustering algorithm. In this paper, we argue that the rescaling procedure should not treat all features identically. Instead, it should favour the features that are more meaningful for clustering. With this in mind, we introduce a feature rescaling method that takes into account the within-cluster degree of relevance of each feature. Our comprehensive simulation study, carried out on real and synthetic data, with and without noise features, clearly demonstrates that clustering methods that use the proposed data normalization strategy clearly outperform those that use traditional data normalization

University of Essex Research Repository

Crossref

Feature weighting as a tool for unsupervised feature selection

Author: Amorim RC
Lane P
Panday D
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Feature selection is a popular data pre-processing step. The aim is to remove some of the features in a data set with minimum information loss, leading to a number of benefits including faster running time and easier data visualisation. In this paper we introduce two unsupervised feature selection algorithms. These make use of a cluster-dependent feature-weighting mechanism reflecting the within-cluster degree of relevance of a given feature. Those features with a relatively low weight are removed from the data set. We compare our algorithms to two other popular alternatives using a number of experiments on both synthetic and real-world data sets, with and without added noisy features. These experiments demonstrate our algorithms clearly outperform the alternatives

University of Essex Research Repository

Crossref

University of Hertfordshire Research Archive

Minkowski distances and standardisation for clustering and classification of high dimensional data

Author: C Hennig
D Art
F Murtagh
GW Milligan
J Ahn
LJ Hubert
P Hall
R McGill
R Serfling
RC Amorim de
TN Cover
Publication venue
Publication date: 01/01/2020
Field of study

There are many distance-based methods for classification and clustering, and for data with a high number of dimensions and a lower number of observations, processing distances is computationally advantageous compared to the raw data matrix. Euclidean distances are used as a default for continuous multivariate data, but there are alternatives. Here the so-called Minkowski distances,

L_1

(city block)-,

L_2

(Euclidean)-,

L_3

L_4

-, and maximum distances are combined with different schemes of standardisation of the variables before aggregating them. Boxplot transformation is proposed, a new transformation method for a single variable that standardises the majority of observations but brings outliers closer to the main bulk of the data. Distances are compared in simulations for clustering by partitioning around medoids, complete and average linkage, and classification by nearest neighbours, of data with a low number of observations but high dimensionality. The

L_1

-distance and the boxplot transformation show good results.Comment: Preliminary version; final version to be published by Springer, using Springer's svmult LATEX styl

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

A survey on feature weighting based K-Means algorithms

Author: A GODER
A STURN
AK JAIN
AL BLUM
AP DEMPSTER
AP GASCH
B Mirkin
CY TSAI
D ALOISE
D Steinley
D STEINLEY
D STEINLEY
D WETTSCHERECK
DS MODHA
E Polak
F Murtagh
G Soete de
G Soete de
GH BALL
H Steinhaus
I GUYON
JC BEZDEK
L HUBERT
LA ZADEH
P DRINEAS
P MITRA
PE GREEN
R Bellman
R GNANADESIKAN
R KOHAVI
RC AMORIM DE
RC AMORIM DE
Renato Cordeiro de Amorim
SP CHATZIS
V MAKARENKOV
WS DESARBO
WS DESARBO
WS DESARBO
Z Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/08/2016
Field of study

This is a pre-copyedited, author-produced PDF of an article accepted for publication in Journal of Classification [de Amorim, R. C., 'A survey on feature weighting based K-Means algorithms', Journal of Classification, Vol. 33(2): 210-242, August 25, 2016]. Subject to embargo. Embargo end date: 25 August 2017. The final publication is available at Springer via http://dx.doi.org/10.1007/s00357-016-9208-4 © Classification Society of North America 2016In a real-world data set there is always the possibility, rather high in our opinion, that different features may have different degrees of relevance. Most machine learning algorithms deal with this fact by either selecting or deselecting features in the data preprocessing phase. However, we maintain that even among relevant features there may be different degrees of relevance, and this should be taken into account during the clustering process. With over 50 years of history, K-Means is arguably the most popular partitional clustering algorithm there is. The first K-Means based clustering algorithm to compute feature weights was designed just over 30 years ago. Various such algorithms have been designed since but there has not been, to our knowledge, a survey integrating empirical evidence of cluster recovery ability, common flaws, and possible directions for future research. This paper elaborates on the concept of feature weighting and addresses these issues by critically analysing some of the most popular, or innovative, feature weighting mechanisms based in K-Means.Peer reviewedFinal Accepted Versio

University of Essex Research Repository

Crossref

University of Hertfordshire Research Archive

Prevalence Rates of Mental Disorders in Chilean Prisons

Author: A Adesanya
A Banerjee
Adrian P. Mundt
AP Mundt
B Case
B Vicente
BJ van den Bergh
Carolina Villagra
Catalina Poblete
DM Dumont
DV Sheehan
E Colmenares Bermúdez
E Dettbarn
F Sirotich
HU Wittchen
IA Binswanger
J Baillargeon
J Baillargeon
JM Haro
M Dudeck
M Fotiadou
Marianna Mazza
MM Zahari
MP Pondé
P Amorim
R Walmsley
RC Kessler
Rosemarie Fritsch
Rubén Alvarado
S Fazel
S Fazel
S George
S Naidoo
S Priebe
SA Agbahowe
Sinja Kastner
SM Assadi
Stefan Priebe
T Exworthy
Y Ginsberg
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

PMCID: PMC371883

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Queen Mary Research Online

Repositorio Académico de la Universidad de Chile

Extracellular Hsp72 concentration relates to a minimum endogenous criteria during acute exercise-heat exposure

Author: A Asea
A Asea
A Asea
A Asea
A Burton
AG Pockley
Alex Dennis
AM Watkins
AP Gagge
AT Garrett
AT Garrett
AT Garrett
B Drust
B Nielsen
D Gagnon
DB Dill
DOF Canada
DS Moran
DS Moran
E Fehrenbach
E Fehrenbach
E Giraldo
E Ortega
EM Winter
F Amorim
FDC Magalhães
FT Amorim
G Borg
G Multhoff
GA Selkirk
GA Selkirk
GI Lancaster
GI Lancaster
GI Lancaster
HC Marshall
HH Kampinga
J Campisi
JD Johnson
JD Périard
JP McClung
JV Durnin
K Anbarasi
K Ogawa
KD Singleton
L Taylor
L Taylor
L Taylor
L Taylor
LA Mizzen
LB Rowell
Lee Taylor
LL Hom
M Fleshner
M Iguchi
M Whitham
M Whitham
MA Febbraio
ME Sandström
ME Sandström
MN Sawka
Neil S. Maxwell
Oliver R. Gibson
P-Z Lu
PA Ruell
Peter W. Watt
PM Yamada
R Duffield
R Morimoto
R Njemini
RC Walsh
RJ Maughan
RW Hubbard
S Lorenzo
S Lorenzo
SDR Galloway
SG Rhind
SN Cheuvront
Tony Parfitt
WE Siri
WJ Welch
Y Ogura
Y Oishi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/10/2013
Field of study

Extracellular heat-shock protein 72 (eHsp72) concentration increases during exercise-heat stress when conditions elicit physiological strain. Differences in severity of environmental and exercise stimuli have elicited varied response to stress. The present study aimed to quantify the extent of increased eHsp72 with increased exogenous heat stress, and determine related endogenous markers of strain in an exercise-heat model. Ten males cycled for 90 min at 50% O2peak in three conditions (TEMP, 20°C/63% RH; HOT, 30.2°C/51%RH; VHOT, 40.0°C/37%RH). Plasma was analysed for eHsp72 pre, immediately post and 24-h post each trial utilising a commercially available ELISA. Increased eHsp72 concentration was observed post VHOT trial (+172.4%) (P<0.05), but not TEMP (-1.9%) or HOT (+25.7%) conditions. eHsp72 returned to baseline values within 24hrs in all conditions. Changes were observed in rectal temperature (Trec), rate of Trec increase, area under the curve for Trec of 38.5°C and 39.0°C, duration Trec ≥ 38.5°C and ≥ 39.0°C, and change in muscle temperature, between VHOT, and TEMP and HOT, but not between TEMP and HOT. Each condition also elicited significantly increasing physiological strain, described by sweat rate, heart rate, physiological strain index, rating of perceived exertion and thermal sensation. Stepwise multiple regression reported rate of Trec increase and change in Trec to be predictors of increased eHsp72 concentration. Data suggests eHsp72 concentration increases once systemic temperature and sympathetic activity exceeds a minimum endogenous criteria elicited during VHOT conditions and is likely to be modulated by large, rapid changes in core temperature

Crossref

University of Brighton Research Portal

PubMed Central

Brunel University Research Archive

Heterogeneities in leishmania infantum infection : using skin parasite burdens to identify highly infectious dogs

Author: A Mohammadiha
A Mohammadiha
ANS Maia-Elkhoury
AP da Costa-Val
AP Galvani
BL Travi
BLA Vercosa
C Carson
C Fraser
C Mary
C Mary
CDL Oliveira
CHN Costa
CHN Costa
Connor Carson
DJ Shaw
DW Dowdy
EM Michalsky
GA Soper
IFGd Amorim
JO Lloyd-Smith
KJ Esch
KTD Eames
L Dipineto
L Manna
L Manna
L Matthews
Leo Calvo-Bado
LHdS Marques
LLH Lau
Lourdes M. Garcez
LVR Lima
M Chase-Topping
Marleen Boelaert
MEJ Woolhouse
O Courtenay
O Courtenay
O Courtenay
O Francino
Orin Courtenay
PF Quaresma
R Lainson
R Molina
R Molina
R Verin
RA Stein
RB Gomes
RC Giunchetti
RJ Quinnell
RJ Quinnell
RJ Quinnell
RJ Quinnell
RM Anderson
RM May
Rupert J. Quinnell
S Bossolasco
SdA Ferreira
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Background: The relationships between heterogeneities in host infection and infectiousness (transmission to arthropod vectors) can provide important insights for disease management. Here, we quantify heterogeneities in Leishmania infantum parasite numbers in reservoir and non-reservoir host populations, and relate this to their infectiousness during natural infection. Tissue parasite number was evaluated as a potential surrogate marker of host transmission potential. Methods: Parasite numbers were measured by qPCR in bone marrow and ear skin biopsies of 82 dogs and 34 crab-eating foxes collected during a longitudinal study in Amazon Brazil, for which previous data was available on infectiousness (by xenodiagnosis) and severity of infection. Results: Parasite numbers were highly aggregated both between samples and between individuals. In dogs, total parasite abundance and relative numbers in ear skin compared to bone marrow increased with the duration and severity of infection. Infectiousness to the sandfly vector was associated with high parasite numbers; parasite number in skin was the best predictor of being infectious. Crab-eating foxes, which typically present asymptomatic infection and are non-infectious, had parasite numbers comparable to those of non-infectious dogs. Conclusions: Skin parasite number provides an indirect marker of infectiousness, and could allow targeted control particularly of highly infectious dogs

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

White Rose Research Online

FigShare