6,223 research outputs found
Recommended from our members
A niching memetic algorithm for simultaneous clustering and feature selection
Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data
Batch and median neural gas
Neural Gas (NG) constitutes a very robust clustering algorithm given
euclidian data which does not suffer from the problem of local minima like
simple vector quantization, or topological restrictions like the
self-organizing map. Based on the cost function of NG, we introduce a batch
variant of NG which shows much faster convergence and which can be interpreted
as an optimization of the cost function by the Newton method. This formulation
has the additional benefit that, based on the notion of the generalized median
in analogy to Median SOM, a variant for non-vectorial proximity data can be
introduced. We prove convergence of batch and median versions of NG, SOM, and
k-means in a unified formulation, and we investigate the behavior of the
algorithms in several experiments.Comment: In Special Issue after WSOM 05 Conference, 5-8 september, 2005, Pari
Denver Groups Classification of Human Chromosomes Using Fuzzy C-Means Clustering
Unbanded human chromosome can be classified into seven Denver Groups (A-G) based their lengths and the ratio of the length of the shorter arm to the whole length of the chromosome, which is called the centromere index (CI). In this article, the fuzzy c-means method will be used to perform the Denver Group classification of a given set of human chromosomes. The objective in clustering is to partition a given human chromosome set into homogeneous clusters; by homogeneous we mean that all points in the same cluster share similar attributes and they do not share similar attributes with points in other clusters. However, the separation of clusters and the meaning of similarity are fuzzy notions and can be described as such. It is found that the clusters iterations converge, highly depend on the initial partition matrix
The detection of globular clusters in galaxies as a data mining problem
We present an application of self-adaptive supervised learning classifiers
derived from the Machine Learning paradigm, to the identification of candidate
Globular Clusters in deep, wide-field, single band HST images. Several methods
provided by the DAME (Data Mining & Exploration) web application, were tested
and compared on the NGC1399 HST data described in Paolillo 2011. The best
results were obtained using a Multi Layer Perceptron with Quasi Newton learning
rule which achieved a classification accuracy of 98.3%, with a completeness of
97.8% and 1.6% of contamination. An extensive set of experiments revealed that
the use of accurate structural parameters (effective radius, central surface
brightness) does improve the final result, but only by 5%. It is also shown
that the method is capable to retrieve also extreme sources (for instance, very
extended objects) which are missed by more traditional approaches.Comment: Accepted 2011 December 12; Received 2011 November 28; in original
form 2011 October 1
Recommended from our members
Recursive Percentage based Hybrid Pattern Training for Supervised Learning
Supervised learning algorithms, often used to find the I/O relationship in data, have the tendency to be trapped in local optima as opposed to the desirable global optima. In this paper, we discuss the RPHP learning algorithm. The algorithm uses Real Coded Genetic Algorithm based global and local searches to find a set of pseudo global optimal solutions. Each pseudo global optimum is a local optimal solution from the point of view of all the patterns but globally optimal from the point of view of a subset of patterns. Together with RPHP, a Kth nearest neighbor algorithm is used as a second level pattern distributor to solve a test pattern. We also show theoretically the condition under which finding several pseudo global optimal solutions requires a shorter training time than finding a single global optimal solution. As the difficulty of curve fitting problems is easily estimated, we verify the capability of the RPHP algorithm against them and compare the RPHP algorithm with three counterparts to show the benefits of hybrid learning and active recursive subset selection. The RPHP shows a clear superiority in performance. We conclude our paper by identifying possible loopholes in the RPHP algorithm and proposing possible solutions
- …