5,803 research outputs found
Possibilistic and fuzzy clustering methods for robust analysis of non-precise data
This work focuses on robust clustering of data affected by imprecision. The imprecision is managed in terms of fuzzy sets. The clustering process is based on the fuzzy and possibilistic approaches. In both approaches the observations are assigned to the clusters by means of membership degrees. In fuzzy clustering the membership degrees express the degrees of sharing of the observations to the clusters. In contrast, in possibilistic clustering the membership degrees are degrees of typicality. These two sources of information are complementary because the former helps to discover the best fuzzy partition of the observations while the latter reflects how well the observations are described by the centroids and, therefore, is helpful to identify outliers. First, a fully possibilistic k-means clustering procedure is suggested. Then, in order to exploit the benefits of both the approaches, a joint possibilistic and fuzzy clustering method for fuzzy data is proposed. A selection procedure for choosing the parameters of the new clustering method is introduced. The effectiveness of the proposal is investigated by means of simulated and
real-life data
3rd Workshop in Symbolic Data Analysis: book of abstracts
This workshop is the third regular meeting of researchers interested in Symbolic Data Analysis. The main aim of the
event is to favor the meeting of people and the exchange of ideas from different fields - Mathematics, Statistics, Computer Science, Engineering, Economics, among others - that contribute to Symbolic Data Analysis
Fuzzy clustering with entropy regularization for interval-valued data with an application to scientific journal citations
In recent years, the research of statistical methods to analyze complex structures of data has increased. In particular, a lot of attention has been focused on the interval-valued data. In a classical cluster analysis framework, an interesting line of research has focused on the clustering of interval-valued data based on fuzzy approaches. Following the partitioning around medoids fuzzy approach research line, a new fuzzy clustering model for interval-valued data is suggested. In particular, we propose a new model based on the use of the entropy as a regularization function in the fuzzy clustering criterion. The model uses a robust weighted dissimilarity measure to smooth noisy data and weigh the center and radius components of the interval-valued data, respectively. To show the good performances of the proposed clustering model, we provide a simulation study and an application to the clustering of scientific journals in research evaluation
Fuzzy C-ordered medoids clustering of interval-valued data
Fuzzy clustering for interval-valued data helps us to find natural vague boundaries in such data. The
Fuzzy c-Medoids Clustering (FcMdC) method is one of the most popular clustering methods based on a
partitioning around medoids approach. However, one of the greatest disadvantages of this method is its
sensitivity to the presence of outliers in data. This paper introduces a new robust fuzzy clustering
method named Fuzzy c-Ordered-Medoids clustering for interval-valued data (FcOMdC-ID). The Huber's
M-estimators and the Yager's Ordered Weighted Averaging (OWA) operators are used in the method
proposed to make it robust to outliers. The described algorithm is compared with the fuzzy c-medoids
method in the experiments performed on synthetic data with different types of outliers. A real application of the FcOMdC-ID is also provided
Uncertainty-Aware Principal Component Analysis
We present a technique to perform dimensionality reduction on data that is
subject to uncertainty. Our method is a generalization of traditional principal
component analysis (PCA) to multivariate probability distributions. In
comparison to non-linear methods, linear dimensionality reduction techniques
have the advantage that the characteristics of such probability distributions
remain intact after projection. We derive a representation of the PCA sample
covariance matrix that respects potential uncertainty in each of the inputs,
building the mathematical foundation of our new method: uncertainty-aware PCA.
In addition to the accuracy and performance gained by our approach over
sampling-based strategies, our formulation allows us to perform sensitivity
analysis with regard to the uncertainty in the data. For this, we propose
factor traces as a novel visualization that enables to better understand the
influence of uncertainty on the chosen principal components. We provide
multiple examples of our technique using real-world datasets. As a special
case, we show how to propagate multivariate normal distributions through PCA in
closed form. Furthermore, we discuss extensions and limitations of our
approach
An Agent-Based Algorithm exploiting Multiple Local Dissimilarities for Clusters Mining and Knowledge Discovery
We propose a multi-agent algorithm able to automatically discover relevant
regularities in a given dataset, determining at the same time the set of
configurations of the adopted parametric dissimilarity measure yielding compact
and separated clusters. Each agent operates independently by performing a
Markovian random walk on a suitable weighted graph representation of the input
dataset. Such a weighted graph representation is induced by the specific
parameter configuration of the dissimilarity measure adopted by the agent,
which searches and takes decisions autonomously for one cluster at a time.
Results show that the algorithm is able to discover parameter configurations
that yield a consistent and interpretable collection of clusters. Moreover, we
demonstrate that our algorithm shows comparable performances with other similar
state-of-the-art algorithms when facing specific clustering problems
Fuzzy clustering of spatial interval-valued data
In this paper, two fuzzy clustering methods for spatial intervalvalued
data are proposed, i.e. the fuzzy C-Medoids clustering
of spatial interval-valued data with and without entropy regularization.
Both methods are based on the Partitioning Around
Medoids (PAM) algorithm, inheriting the great advantage of
obtaining non-fictitious representative units for each cluster.
In both methods, the units are endowed with a relation
of contiguity, represented by a symmetric binary matrix. This
can be intended both as contiguity in a physical space and as
a more abstract notion of contiguity. The performances of the
methods are proved by simulation, testing the methods with
different contiguity matrices associated to natural clusters of
units. In order to show the effectiveness of the methods in
empirical studies, three applications are presented: the clustering
of municipalities based on interval-valued pollutants levels, the
clustering of European fact-checkers based on interval-valued
data on the average number of impressions received by their
tweets and the clustering of the residential zones of the city of
Rome based on the interval of price values
Fuzzy clustering of spatial interval-valued data
In this paper, two fuzzy clustering methods for spatial interval-valued data are proposed, i.e. the fuzzy
C-Medoids clustering of spatial interval-valued data with and without entropy regularization. Both methods are based on the Partitioning Around Medoids (PAM) algorithm, inheriting the great advantage of obtaining non-fictitious representative units for each cluster.
In both methods, the units are endowed with a relation of contiguity, represented by a symmetric binary matrix. This can be intended both as contiguity in a physical space and as a more abstract notion of contiguity. The performances of the methods are proved by simulation, testing the methods with different contiguity matrices associated to natural clusters of units. In order to show the effectiveness of the methods in empirical studies, three applications are presented: the clustering of municipalities based on interval-valued pollutants levels, the clustering of European fact-checkers based on interval-valued data on the average number of impressions received by their tweets and the clustering of the residential zones of the city of Rome based on the interval of price values
- …