272 research outputs found

    Multiobjective optimization of cluster measures in Microarray Cancer data using Genetic Algorithm Based Fuzzy Clustering

    Get PDF
    The field of biological and biomedical research has been changed rapidly with the invention of microarray technology, which facilitates simultaneously monitoring of large number of genes across different experimental conditions. In this report a multi objective genetic algorithm technique called Non-Dominated Sorting Genetic Algorithm (NSGA) - II based approach has been proposed for fuzzy clustering of microarray cancer expression dataset that encodes the cluster modes and simultaneously optimizes the two factors called fuzzy compactness and fuzzy separation of the clusters. The multiobjective technique produces a set of non-dominated solutions. This approach identifies the solution i.e. the individual chromosome which gives the optimal value of the parameters

    Classication and Clustering Using Intelligent Techniques: Application to Microarray Cancer Data

    Get PDF
    Analysis and interpretation of DNA Microarray data is a fundamental task in bioinformatics. Feature Extraction plays a critical role in better performance of the classifier. We address the dimension reduction of DNA features in which relevant features are extracted among thousands of irrelevant ones through dimensionality reduction. This enhances the speed and accuracy of the classifiers. Principal Component Analysis is a technique used for feature extraction which helps to retrieve intrinsic information from high dimensional data in eigen spaces to solve the curse of dimensionality problem. Neural Networks and Support Vector Machine are implemented on reduced data set and their performances are measured in terms of predictive accuracy, specificity, and sensitivity. Next, we propose a Multiobjective Genetic Algorithm-based fuzzy clustering technique using real coded encoding of cluster centers for clustering and classification. This technique is implemented on microarray cancer data to select training data using multiobjective genetic algorithm with non-dominated sorting. The two objective functions for this multiobjective techniques are optimization of cluster compactness as well as separation simultaneously. This approach identifies the solution. Support Vector Machine classifier is further trained by the selected training points which have high confidence value. Then remaining points are classified by trained SVM classifier. Finally, the four clustering label vectors through majority voting ensemble are combined. The performance of the proposed MOGA-SVM, classification and clustering method has been compared to MOGA-BP, SVM, BP. The performance are measured in terms of Silhoutte Index, ARI Index respectively. The experiment were carried on three public domain cancer data sets, viz., Ovarian, Colon and Leukemia cancer

    Advances in Evolutionary Algorithms

    Get PDF
    With the recent trends towards massive data sets and significant computational power, combined with evolutionary algorithmic advances evolutionary computation is becoming much more relevant to practice. Aim of the book is to present recent improvements, innovative ideas and concepts in a part of a huge EA field

    Clustering: finding patterns in the darkness

    Get PDF
    Machine learning is changing the world and fuelling Industry 4.0. These statistical methods focused on identifying patterns in data to provide an intelligent response to specific requests. Although understanding data tends to require expert knowledge to supervise the decision-making process, some techniques need no supervision. These unsupervised techniques can work blindly but they are based on data similarity. One of the most popular areas in this field is clustering. Clustering groups data to guarantee that the clusters’ elements have a strong similarity while the clusters are distinct among them. This field started with the K-means algorithm, one of the most popular algorithms in machine learning with extensive applications. Currently, there are multiple strategies to deal with the clustering problem. This review introduces some of the classical algorithms, focusing significantly on algorithms based on evolutionary computation, and explains some current applications of clustering to large datasets

    Computational intelligence techniques for maximum energy efficiency of cogeneration processes based on internal combustion engines

    Get PDF
    153 p.El objeto de la tesis consiste en desarrollar estrategias de modelado y optimización del rendimiento energético de plantas de cogeneración basadas en motores de combustión interna (MCI), mediante el uso de las últimas tecnologías de inteligencia computacional. Con esta finalidad se cuenta con datos reales de una planta de cogeneración de energía, propiedad de la compañía EnergyWorks, situada en la localidad de Monzón (provincia de Huesca). La tesis se realiza en el marco de trabajo conjunto del Grupo de Diseño en Electrónica Digital (GDED) de la Universidad del País Vasco UPV/EHU y la empresa Optimitive S.L., empresa dedicada al software avanzado para la mejora en tiempo real de procesos industriale

    Clustering Categorical Data: Soft Rounding k-modes

    Full text link
    Over the last three decades, researchers have intensively explored various clustering tools for categorical data analysis. Despite the proposal of various clustering algorithms, the classical k-modes algorithm remains a popular choice for unsupervised learning of categorical data. Surprisingly, our first insight is that in a natural generative block model, the k-modes algorithm performs poorly for a large range of parameters. We remedy this issue by proposing a soft rounding variant of the k-modes algorithm (SoftModes) and theoretically prove that our variant addresses the drawbacks of the k-modes algorithm in the generative model. Finally, we empirically verify that SoftModes performs well on both synthetic and real-world datasets

    A single-objective and a multi-objective genetic algorithm to generate accurate and interpretable fuzzy rule based classifiers for the analysis of complex financial data

    Get PDF
    Nowadays, organizations deal with rapidly increasing amount of data that is stored in their databases. It has therefore become of crucial importance for them to identify the necessary patterns in these large databases to turn row data into valuable and actionable information. By exploring these important datasets, the organizations gain competitive advantage against other competitors, based on the assumption that the added value of Knowledge Management Systems strength is first and foremost to facilitate the decision making process. Especially if we consider the importance of knowledge in the 21st century, data mining can be seen as a very effective tool to explore the essential data that foster competitive gain in a changing environment. The overall aim of this study is to design the rule base component of a fuzzy rule-based system (FRBS) through the use of genetic algorithms. The main objective is to generate accurate and interpretable models of the data trying to overcome the existing tradeoff between accuracy and interpretability. We propose two different approaches: an accuracy-driven single-objective genetic algorithm, and a three-objective genetic algorithm that produce a Pareto front approximation, composed of classifiers with different tradeoffs between accuracy and complexity. The proposed approaches have been compared with two other systems, namely a rule selection single-objective algorithm, and a three-objective algorithm. The latter has been developed by the University of Pisa and is able to generate the rule base, while simultaneously learning the definition points of the membership functions, by taking into account both the accuracy and the interpretability of the final model

    Exoplanet host star classification: multi-objective optimization of incomplete stellar abundance data

    Get PDF
    The presence of a planetary companion around its host star has been repeatedly linked with stellar properties, affecting the likelihood of substellar object formation and stability in the protoplanetary disc, thus presenting a key challenge in exoplanet science. Furthermore, abundance and stellar parameter data sets tend to be incomplete, which limits the ability to infer distributional characteristics harnessing the entire data set. This work aims to develop a methodology using machine learning (ML) and multi-objective optimization for reliable imputation for subsequent comparison tests and host star recommendation. It integrates fuzzy clustering for imputation and ML classification of hosts and comparison stars into an evolutionary multi-objective optimization algorithm. We test several candidates for the classification model, starting with a binary classification for giant planet hosts. Upon confirmation that the eXtreme Gradient Boosting algorithm provides the best performance, we interpret the performance of both the imputation and classification modules for binary classification. The model is extended to handle multilabel classification for low-mass planets and planet multiplicity. Constraints on the model’s use and feature/sample selection are given, outlining strengths and limitations. We conclude that the careful use of this technique for host star recommendation will be an asset to future missions and the compilation of necessary target lists

    Surrogate-Assisted Unified Optimization Framework for Investigating Marine Structural Design Under Information Uncertainty.

    Full text link
    Structural decisions made in the early stages of marine systems design can have a large impact on future acquisition, maintenance and life-cycle costs. However, owing to the unique nature of early stage marine system design, these critical structure decisions are often made on the basis of incomplete information or knowledge about the design. When coupled with design optimization analysis, the complex, uncertain early stage design environment makes it very difficult to deliver a quantified trade-off analysis for decision making. This work presents a novel decision support method that integrates design optimization, high-fidelity analysis, and modeling of information uncertainty for early stage design and analysis. To support this method this dissertation improves the design optimization methods for marine structures by proposing several novel surrogate modeling techniques and strategies. The proposed work treats the uncertainties that are sourced from limited information in a non-statistical interval uncertainty form. This interval uncertainty is treated as an objective function in an optimization framework in order to explore the impact of information uncertainty on structural design performance. In this examination, the potential structural weight penalty regarding information uncertainty can be quickly identified in early stage, avoiding costly redesign later in the design. This dissertation then continues to explore a balanced computational structure between fidelity and efficiency. A proposed novel variable fidelity approach can be applied to wisely allocate expensive high-fidelity computational simulations. In achieving the proposed capabilities for design optimization, several surrogate modeling methods are developed concerning worst-case estimation, clustered multiple meta-modeling, and mixed variable modeling techniques. These surrogate methods have been demonstrated to significantly improve the efficiency of optimizer in dealing with the challenges of early stage marine structure design.PhDNaval Architecture and Marine EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133365/1/yanliuch_1.pd
    corecore