Search CORE

5,319 research outputs found

Methods for fast and reliable clustering

Author: Kärkkäinen Ismo
Publication venue: University of Joensuu
Publication date
Field of study

Categorising count data into ordinal responses with application to ecological communities

Author: Fernández Martínez Daniel
Pledger Shirley
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2016
Field of study

Count data sets may involve overdispersion from a set of species and underdispersion from another set which would require fitting different models (e.g. a negative binomial model for the overdispersed set and a binomial model for the underdispersed one). Additionally, many count data sets have very high counts and very low counts. Categorising these counts into ordinal categories makes the actual counts less influential in the model fitting, giving broad categories which enable us to detect major broadly based patterns of turnover or nestedness shown by groups of species. In this paper, a strategy of categorising count data into ordinal data was carried out and also we implemented measures to compare different cluster structures. The application of this categorising strategy and a comparison of clustering results between count and categorised ordinal data in two ecological community data sets are shown. A major advantage of using our ordinal approach is that it allows for the inclusion of all different levels of dispersion in the data in one methodology, without treating the data differently. This reduction of the parameters on modelling different levels of dispersion does not substantially change the results in clustering structure. In the two data sets used in this paper, we observed ordinal clustering structure up to 93.1 % similar to those from the count data approaches. This has the important implication of supporting simpler, faster data collection using ordinal scales only.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Characterization of between-group inequality of longevity in European Union countries

Author: Chaves L.
Debon A.
Haberman S.
Villa F.
Publication venue: 'Elsevier BV'
Publication date: 01/07/2017
Field of study

Comparisons of differential survival by country are useful in many domains. In the area of public policy, they help policymakers and analysts assess how much various groups benefit from public programs, such as social security and health care. In financial markets and especially for actuaries, they are important for designing annuities and life insurance products. This paper presents a method for clustering information about differential mortality by country. The approach is then used to group mortality surfaces for European Union (EU) countries. The aim of this paper is to measure between-group inequality in mortality experience in EU countries through a range of mortality indicators. Additionally, the indicators permit the characterization of each group. It is important to take into account characteristics such as sex; therefore, this study differentiates between males and females in order to detect whether their patterns and characterizations are different. It is concluded that there are clear differences in mortality between the east and west of the EU that are more important than the traditional south-north division, with a significant disadvantage for Eastern Europe, and especially for males in Baltic countries. We find that the mortality indicators have evolved in all countries in such a way that the gap between groups has been maintained, both in terms of the differences in mortality levels and variability

City Research Online

RiuNet

A survey on feature weighting based K-Means algorithms

Author: A GODER
A STURN
AK JAIN
AL BLUM
AP DEMPSTER
AP GASCH
B Mirkin
CY TSAI
D ALOISE
D Steinley
D STEINLEY
D STEINLEY
D WETTSCHERECK
DS MODHA
E Polak
F Murtagh
G Soete de
G Soete de
GH BALL
H Steinhaus
I GUYON
JC BEZDEK
L HUBERT
LA ZADEH
P DRINEAS
P MITRA
PE GREEN
R Bellman
R GNANADESIKAN
R KOHAVI
RC AMORIM DE
RC AMORIM DE
Renato Cordeiro de Amorim
SP CHATZIS
V MAKARENKOV
WS DESARBO
WS DESARBO
WS DESARBO
Z Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/08/2016
Field of study

This is a pre-copyedited, author-produced PDF of an article accepted for publication in Journal of Classification [de Amorim, R. C., 'A survey on feature weighting based K-Means algorithms', Journal of Classification, Vol. 33(2): 210-242, August 25, 2016]. Subject to embargo. Embargo end date: 25 August 2017. The final publication is available at Springer via http://dx.doi.org/10.1007/s00357-016-9208-4 © Classification Society of North America 2016In a real-world data set there is always the possibility, rather high in our opinion, that different features may have different degrees of relevance. Most machine learning algorithms deal with this fact by either selecting or deselecting features in the data preprocessing phase. However, we maintain that even among relevant features there may be different degrees of relevance, and this should be taken into account during the clustering process. With over 50 years of history, K-Means is arguably the most popular partitional clustering algorithm there is. The first K-Means based clustering algorithm to compute feature weights was designed just over 30 years ago. Various such algorithms have been designed since but there has not been, to our knowledge, a survey integrating empirical evidence of cluster recovery ability, common flaws, and possible directions for future research. This paper elaborates on the concept of feature weighting and addresses these issues by critically analysing some of the most popular, or innovative, feature weighting mechanisms based in K-Means.Peer reviewedFinal Accepted Versio

University of Essex Research Repository

Crossref

University of Hertfordshire Research Archive

Rough Fuzzy Subspace Clustering for Data with Missing Values

Author: Simiński Krzysztof
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 03/06/2014
Field of study

The paper presents rough fuzzy subspace clustering algorithm and experimental results of clustering. In this algorithm three approaches for handling missing values are used: marginalisation, imputation and rough sets. The algorithm also assigns weights to attributes in each cluster; this leads to subspace clustering. The parameters of clusters are elaborated in the iterative procedure based on minimising of criterion function. The crucial parameter of the proposed algorithm is the parameter having the influence on the sharpness of elaborated subspace cluster. The lower values of the parameter lead to selection of the most important attribute. The higher values create clusters in the global space, not in subspaces. The paper is accompanied by results of clustering of synthetic and real life data sets

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)