15,817 research outputs found
Outlier Detection Method on UCI Repository Dataset by Entropy Based Rough K-means
Rough set theory is used to handle uncertainty and incomplete information by applying two sets, lower and upper approximation. In this paper, the clustering process is improved by adapting the preliminary centroid selection method on rough K-means (RKM) algorithm. The entropy based rough K-means (ERKM) method is developed by adapting entropy based preliminary centroids selection on RKM and executed and also validated by cluster validity indexes. An example shows that the ERKM performs effectively by selection of entropy based preliminary centroid. In addition, Outlier detection is an important task in data mining and very much different from the rest of the objects in the cluster. Entropy based rough outlier factor (EROF) method is used to detect outlier effectively for yeast dataset. An example shows that EROF detects outlier effectively on protein localisation sites and ERKM clustering algorithm performed effectively. Further, experimental readings show that the ERKM and EROF method outperformed the other methods.
Data granulation by the principles of uncertainty
Researches in granular modeling produced a variety of mathematical models,
such as intervals, (higher-order) fuzzy sets, rough sets, and shadowed sets,
which are all suitable to characterize the so-called information granules.
Modeling of the input data uncertainty is recognized as a crucial aspect in
information granulation. Moreover, the uncertainty is a well-studied concept in
many mathematical settings, such as those of probability theory, fuzzy set
theory, and possibility theory. This fact suggests that an appropriate
quantification of the uncertainty expressed by the information granule model
could be used to define an invariant property, to be exploited in practical
situations of information granulation. In this perspective, a procedure of
information granulation is effective if the uncertainty conveyed by the
synthesized information granule is in a monotonically increasing relation with
the uncertainty of the input data. In this paper, we present a data granulation
framework that elaborates over the principles of uncertainty introduced by
Klir. Being the uncertainty a mesoscopic descriptor of systems and data, it is
possible to apply such principles regardless of the input data type and the
specific mathematical setting adopted for the information granules. The
proposed framework is conceived (i) to offer a guideline for the synthesis of
information granules and (ii) to build a groundwork to compare and
quantitatively judge over different data granulation procedures. To provide a
suitable case study, we introduce a new data granulation technique based on the
minimum sum of distances, which is designed to generate type-2 fuzzy sets. We
analyze the procedure by performing different experiments on two distinct data
types: feature vectors and labeled graphs. Results show that the uncertainty of
the input data is suitably conveyed by the generated type-2 fuzzy set models.Comment: 16 pages, 9 figures, 52 reference
Observer-biased bearing condition monitoring: from fault detection to multi-fault classification
Bearings are simultaneously a fundamental component and one of the principal causes of failure in rotary machinery. The work focuses on the employment of fuzzy clustering for bearing condition monitoring, i.e., fault detection and classification. The output of a clustering algorithm is a data partition (a set of clusters) which is merely a hypothesis on the structure of the data. This hypothesis requires validation by domain experts. In general, clustering algorithms allow a limited usage of domain knowledge on the cluster formation process. In this study, a novel method allowing for interactive clustering in bearing fault diagnosis is proposed. The method resorts to shrinkage to generalize an otherwise unbiased clustering algorithm into a biased one. In this way, the method provides a natural and intuitive way to control the cluster formation process, allowing for the employment of domain knowledge to guiding it. The domain expert can select a desirable level of granularity ranging from fault detection to classification of a variable number of faults and can select a specific region of the feature space for detailed analysis. Moreover, experimental results under realistic conditions show that the adopted algorithm outperforms the corresponding unbiased algorithm (fuzzy c-means) which is being widely used in this type of problems. (C) 2016 Elsevier Ltd. All rights reserved.Grant number: 145602
Robust EM algorithm for model-based curve clustering
Model-based clustering approaches concern the paradigm of exploratory data
analysis relying on the finite mixture model to automatically find a latent
structure governing observed data. They are one of the most popular and
successful approaches in cluster analysis. The mixture density estimation is
generally performed by maximizing the observed-data log-likelihood by using the
expectation-maximization (EM) algorithm. However, it is well-known that the EM
algorithm initialization is crucial. In addition, the standard EM algorithm
requires the number of clusters to be known a priori. Some solutions have been
provided in [31, 12] for model-based clustering with Gaussian mixture models
for multivariate data. In this paper we focus on model-based curve clustering
approaches, when the data are curves rather than vectorial data, based on
regression mixtures. We propose a new robust EM algorithm for clustering
curves. We extend the model-based clustering approach presented in [31] for
Gaussian mixture models, to the case of curve clustering by regression
mixtures, including polynomial regression mixtures as well as spline or
B-spline regressions mixtures. Our approach both handles the problem of
initialization and the one of choosing the optimal number of clusters as the EM
learning proceeds, rather than in a two-fold scheme. This is achieved by
optimizing a penalized log-likelihood criterion. A simulation study confirms
the potential benefit of the proposed algorithm in terms of robustness
regarding initialization and funding the actual number of clusters.Comment: In Proceedings of the 2013 International Joint Conference on Neural
Networks (IJCNN), 2013, Dallas, TX, US
- …