598 research outputs found
Multiobjective optimization of cluster measures in Microarray Cancer data using Genetic Algorithm Based Fuzzy Clustering
The field of biological and biomedical research has been changed rapidly with the invention of microarray technology, which facilitates simultaneously monitoring of large number of genes across different experimental conditions. In this report a multi objective genetic algorithm technique called Non-Dominated Sorting Genetic Algorithm (NSGA) - II based approach has been proposed for fuzzy clustering of microarray cancer expression dataset that encodes the cluster modes and simultaneously optimizes the two factors called fuzzy compactness and fuzzy separation of the clusters. The multiobjective technique produces a set of non-dominated solutions. This approach identifies the solution i.e. the individual chromosome which gives the optimal value of the parameters
Clustering Algorithms: Their Application to Gene Expression Data
Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure
A review of clustering techniques and developments
© 2017 Elsevier B.V. This paper presents a comprehensive study on clustering: exiting methods and developments made at various times. Clustering is defined as an unsupervised learning where the objects are grouped on the basis of some similarity inherent among them. There are different methods for clustering the objects such as hierarchical, partitional, grid, density based and model based. The approaches used in these methods are discussed with their respective states of art and applicability. The measures of similarity as well as the evaluation criteria, which are the central components of clustering, are also presented in the paper. The applications of clustering in some fields like image segmentation, object and character recognition and data mining are highlighted
Self learning neuro-fuzzy modeling using hybrid genetic probabilistic approach for engine air/fuel ratio prediction
Machine Learning is concerned in constructing models which can learn and make predictions based on data. Rule extraction from real world data that are usually tainted with noise, ambiguity, and uncertainty, automatically requires feature selection. Neuro-Fuzzy system (NFS) which is known with its prediction performance has the difficulty in determining the proper number of rules and the number of membership functions for each rule. An enhanced hybrid Genetic Algorithm based Fuzzy Bayesian
classifier (GA-FBC) was proposed to help the NFS in the rule extraction. Feature selection was performed in the rule level overcoming the problems of the FBC which depends on the frequency of the features leading to ignore the patterns of small classes. As dealing with a real world problem such as the Air/Fuel Ratio (AFR) prediction, a multi-objective problem is adopted. The GA-FBC uses mutual information entropy, which considers the relevance between feature attributes and class attributes. A fitness function is proposed to deal with multi-objective problem without weight using a new composition method. The model was compared to other learning algorithms for NFS such as Fuzzy c-means (FCM) and grid partition algorithm. Predictive accuracy and the complexity of the Fuzzy Rule Base System (FRBS) including number of rules and number of terms in each rule were taken as terms of evaluation. It was also compared to the original GA-FBC depending on the
frequency not on Mutual Information (MI). Experimental results using Air/Fuel Ratio
(AFR) data sets show that the new model participates in decreasing the average number of attributes in the rule and sometimes in increasing the average performance compared to other models. This work facilitates in achieving a self-generating FRBS from real data. The GA-FBC can be used as a new direction in machine learning research. This research contributes in controlling automobile emissions in helping the
reduction of one of the most causes of pollution to produce greener environment
Adaptive K-means algorithm for overlapped graph clustering
Electronic version of an article published as International Journal of Neural Systems 2, 5, 2012, DOI: 10.1142/S0129065712500189 © 2012 copyright World Scientific Publishing CompanyThe graph clustering problem has become highly relevant due to the growing interest of several research communities in social networks and their possible applications. Overlapped graph clustering algorithms try to find subsets of nodes that can belong to different clusters. In social network-based applications it is quite usual for a node of the network to belong to different groups, or communities, in the graph. Therefore, algorithms trying to discover, or analyze, the behavior of these networks needed to handle this feature, detecting and identifying the overlapped nodes. This paper shows a soft clustering approach based on a genetic algorithm where a new encoding is designed to achieve two main goals: first, the automatic adaptation of the number of communities that can be detected and second, the definition of several fitness functions that guide the searching process using some measures extracted from graph theory. Finally, our approach has been experimentally tested using the Eurovision contest dataset, a well-known social-based data network, to show how overlapped communities can be found using our method.This work has been partly supported by: Spanish
Ministry of Science and Education under project
TIN2010-19872 and the grant BES-2011-049875 from
the same Ministry
Classication and Clustering Using Intelligent Techniques: Application to Microarray Cancer Data
Analysis and interpretation of DNA Microarray data is a fundamental task in bioinformatics. Feature Extraction plays a critical role in better performance of the classifier. We address the dimension reduction of DNA features in which relevant features are extracted among thousands of irrelevant ones through dimensionality reduction. This enhances the speed and accuracy of the classifiers. Principal Component Analysis is a technique used for feature extraction which helps to retrieve intrinsic information from high dimensional data in eigen spaces to solve the curse of dimensionality problem. Neural Networks and Support Vector Machine are implemented on reduced data set and their performances are measured in terms of predictive accuracy, specificity, and sensitivity. Next, we propose a Multiobjective Genetic Algorithm-based fuzzy clustering technique using real coded encoding of cluster centers for clustering and classification. This technique is implemented on microarray cancer data to select training data using multiobjective genetic algorithm with non-dominated sorting. The two objective functions for this multiobjective techniques are optimization of cluster compactness as well as separation simultaneously. This approach identifies the solution. Support Vector Machine classifier is further trained by the selected training points which have high confidence value. Then remaining points are classified by trained SVM classifier. Finally, the four clustering label vectors through majority voting ensemble are combined. The performance of the proposed MOGA-SVM, classification and clustering method has been compared to MOGA-BP, SVM, BP. The performance are measured in terms of Silhoutte Index, ARI Index respectively. The experiment were carried on three public domain cancer data sets, viz., Ovarian, Colon and Leukemia cancer
An unsupervised learning algorithm for membrane computing
This paper focuses on the unsupervised learning problem within membrane computing,
and proposes an innovative solution inspired by membrane computing techniques, the
fuzzy membrane clustering algorithm. An evolution–communication P system with nested
membrane structure is the core component of the algorithm. The feasible cluster centers
are represented by means of objects, and three types of membranes are considered: evolution,
local store, and global store. Based on the designed membrane structure and the
inherent communication mechanism, a modified differential evolution mechanism is
developed to evolve the objects in the system. Under the control of the evolution–communication
mechanism of the P system, the proposed fuzzy clustering algorithm achieves
good fuzzy partitioning for a data set. The proposed fuzzy clustering algorithm is compared
to three recently-developed and two classical clustering algorithms for five artificial and
five real-life data sets.National Natural Science Foundation of China No 61170030National Natural Science Foundation of China No 61472328Chunhui Project Foundation of the Education Department of China No. Z2012025Chunhui Project Foundation of the Education Department of China No. Z2012031Sichuan Key Technology Research and Development Program No. 2013GZX015
- …