55,053 research outputs found
Probabilistic clustering of interval data
In this paper we address the problem of clustering interval data, adopting a model-based approach. To this purpose, parametric models for interval-valued variables are used which consider configurations for the variance-covariance matrix that take the nature of the interval data directly into account. Results, both on synthetic and empirical data, clearly show the well-founding of the proposed approach. The method succeeds in finding parsimonious heterocedastic models which is a critical feature in many applications. Furthermore, the analysis of the different data sets made clear the need to explicitly consider the intrinsic variability present in interval data.info:eu-repo/semantics/publishedVersio
Hierarchical Clustering with Simple Matching and Joint Entropy Dissimilarity Measure
Conventional clustering algorithms are restricted for use with data containing ratio or interval scale variables; hence, distances are used. As social studies require merely categorical data, the literature is enriched with more complicated clustering techniques and algorithms of categorical data. These techniques are based on similarity or dissimilarity matrices. The algorithms are using density based or pattern based approaches. A probabilistic nature to similarity structure is proposed. The entropy dissimilarity measure has comparable results with simple matching dissimilarity at hierarchical clustering. It overcomes dimension increase through binarization of the categorical data. This approach is also functional with the clustering methods, where a- priori cluster number information is available
Clustering of TS-fuzzy system
This paper presents a fuzzy c-means clustering method for partitioning symbolic interval data, namely the T-S fuzzy rules. The proposed method furnish a fuzzy partition and prototype for each cluster by optimizing an adequacy criterion based on suitable squared Euclidean distances between vectors of intervals. This methodology leads to a fuzzy partition of the TS-fuzzy rules, one for each cluster, which corresponds to a new set of fuzzy sub-systems. When applied to the clustering of TS-fuzzy system the result is a set of additive decomposed TS-fuzzy sub-systems. In this work a generalized Probabilistic Fuzzy C-Means algorithm is proposed and applied to TS-Fuzzy System clustering
MAINT.Data: modelling and analysing interval data in R
We present the CRAN R package MAINT.Data for the modelling and analysis of multivariate interval data, i.e., where units are described by variables whose values are intervals of IR, representing intrinsic variability. Parametric inference methodologies based on probabilistic models for interval variables have been developed, where each interval is represented by its midpoint and log-range, for
which multivariate Normal and Skew-Normal distributions are assumed. The intrinsic nature of the interval variables leads to special structures of the variance-covariance matrix, which are represented by four different possible configurations. MAINT.Data implements the proposed methodologies
in the S4 object system, introducing a specific data class for representing interval data. It includes functions and methods for modelling and analysing interval data, in particular maximum likelihood estimation, statistical tests for the different configurations, (M)ANOVA and Discriminant Analysis.
For the Gaussian model, Model-based Clustering, robust estimation, outlier detection and Robust Discriminant Analysis are also availableinfo:eu-repo/semantics/publishedVersio
The Advantage of Evidential Attributes in Social Networks
Nowadays, there are many approaches designed for the task of detecting
communities in social networks. Among them, some methods only consider the
topological graph structure, while others take use of both the graph structure
and the node attributes. In real-world networks, there are many uncertain and
noisy attributes in the graph. In this paper, we will present how we detect
communities in graphs with uncertain attributes in the first step. The
numerical, probabilistic as well as evidential attributes are generated
according to the graph structure. In the second step, some noise will be added
to the attributes. We perform experiments on graphs with different types of
attributes and compare the detection results in terms of the Normalized Mutual
Information (NMI) values. The experimental results show that the clustering
with evidential attributes gives better results comparing to those with
probabilistic and numerical attributes. This illustrates the advantages of
evidential attributes.Comment: 20th International Conference on Information Fusion, Jul 2017, Xi'an,
Chin
- âŠ