80 research outputs found

    Clustering Algorithms: Their Application to Gene Expression Data

    Get PDF
    Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure

    Simulated Annealing

    Get PDF
    The book contains 15 chapters presenting recent contributions of top researchers working with Simulated Annealing (SA). Although it represents a small sample of the research activity on SA, the book will certainly serve as a valuable tool for researchers interested in getting involved in this multidisciplinary field. In fact, one of the salient features is that the book is highly multidisciplinary in terms of application areas since it assembles experts from the fields of Biology, Telecommunications, Geology, Electronics and Medicine

    Machine Learning with Metaheuristic Algorithms for Sustainable Water Resources Management

    Get PDF
    The main aim of this book is to present various implementations of ML methods and metaheuristic algorithms to improve modelling and prediction hydrological and water resources phenomena having vital importance in water resource management

    Contributions on evolutionary computation for statistical inference

    Get PDF
    Evolutionary Computation (EC) techniques have been introduced in the 1960s for dealing with complex situations. One possible example is an optimization problems not having an analytical solution or being computationally intractable; in many cases such methods, named Evolutionary Algorithms (EAs), have been successfully implemented. In statistics there are many situations where complex problems arise, in particular concerning optimization. A general example is when the statistician needs to select, inside a prohibitively large discrete set, just one element, which could be a model, a partition, an experiment, or such: this would be the case of model selection, cluster analysis or design of experiment. In other situations there could be an intractable function of data, such as a likelihood, which needs to be maximized, as it happens in model parameter estimation. These kind of problems are naturally well suited for EAs, and in the last 20 years a large number of papers has been concerned with applications of EAs in tackling statistical issues. The present dissertation is set in this part of literature, as it reports several implementations of EAs in statistics, although being mainly focused on statistical inference problems. Original results are proposed, as well as overviews and surveys on several topics. EAs are employed and analyzed considering various statistical points of view, showing and confirming their efficiency and flexibility. The first proposal is devoted to parametric estimation problems. When EAs are employed in such analysis a novel form of variability related to their stochastic elements is introduced. We shall analyze both variability due to sampling, associated with selected estimator, and variability due to the EA. This analysis is set in a framework of statistical and computational tradeoff question, crucial in nowadays problems, by introducing cost functions related to both data acquisition and EA iterations. The proposed method will be illustrated by means of model building problem examples. Subsequent chapter is concerned with EAs employed in Markov Chain Monte Carlo (MCMC) sampling. When sampling from multimodal or highly correlated distribution is concerned, in fact, a possible strategy suggests to run several chains in parallel, in order to improve their mixing. If these chains are allowed to interact with each other then many analogies with EC techniques can be observed, and this has led to research in many fields. The chapter aims at reviewing various methods found in literature which conjugates EC techniques and MCMC sampling, in order to identify specific and common procedures, and unifying them in a framework of EC. In the last proposal we present a complex time series model and an identification procedure based on Genetic Algorithms (GAs). The model is capable of dealing with seasonality, by Periodic AutoRegressive (PAR) modelling, and structural changes in time, leading to a nonstationary structure. As far as a very large number of parameters and possibilites of change points are concerned, GAs are appropriate for identifying such model. Effectiveness of procedure is shown on both simulated data and real examples, these latter referred to river flow data in hydrology. The thesis concludes with some final remarks, concerning also future work

    Contributions on evolutionary computation for statistical inference

    Get PDF
    Evolutionary Computation (EC) techniques have been introduced in the 1960s for dealing with complex situations. One possible example is an optimization problems not having an analytical solution or being computationally intractable; in many cases such methods, named Evolutionary Algorithms (EAs), have been successfully implemented. In statistics there are many situations where complex problems arise, in particular concerning optimization. A general example is when the statistician needs to select, inside a prohibitively large discrete set, just one element, which could be a model, a partition, an experiment, or such: this would be the case of model selection, cluster analysis or design of experiment. In other situations there could be an intractable function of data, such as a likelihood, which needs to be maximized, as it happens in model parameter estimation. These kind of problems are naturally well suited for EAs, and in the last 20 years a large number of papers has been concerned with applications of EAs in tackling statistical issues. The present dissertation is set in this part of literature, as it reports several implementations of EAs in statistics, although being mainly focused on statistical inference problems. Original results are proposed, as well as overviews and surveys on several topics. EAs are employed and analyzed considering various statistical points of view, showing and confirming their efficiency and flexibility. The first proposal is devoted to parametric estimation problems. When EAs are employed in such analysis a novel form of variability related to their stochastic elements is introduced. We shall analyze both variability due to sampling, associated with selected estimator, and variability due to the EA. This analysis is set in a framework of statistical and computational tradeoff question, crucial in nowadays problems, by introducing cost functions related to both data acquisition and EA iterations. The proposed method will be illustrated by means of model building problem examples. Subsequent chapter is concerned with EAs employed in Markov Chain Monte Carlo (MCMC) sampling. When sampling from multimodal or highly correlated distribution is concerned, in fact, a possible strategy suggests to run several chains in parallel, in order to improve their mixing. If these chains are allowed to interact with each other then many analogies with EC techniques can be observed, and this has led to research in many fields. The chapter aims at reviewing various methods found in literature which conjugates EC techniques and MCMC sampling, in order to identify specific and common procedures, and unifying them in a framework of EC. In the last proposal we present a complex time series model and an identification procedure based on Genetic Algorithms (GAs). The model is capable of dealing with seasonality, by Periodic AutoRegressive (PAR) modelling, and structural changes in time, leading to a nonstationary structure. As far as a very large number of parameters and possibilites of change points are concerned, GAs are appropriate for identifying such model. Effectiveness of procedure is shown on both simulated data and real examples, these latter referred to river flow data in hydrology. The thesis concludes with some final remarks, concerning also future work

    Pertanika Journal of Science & Technology

    Get PDF

    Pertanika Journal of Science & Technology

    Get PDF

    Study On Clustering Techniques And Application To Microarray Gene Expression Bioinformatics Data

    Get PDF
    With the explosive growth of the amount of publicly available genomic data, a new field of computer science i.e., bioinformatics has been emerged, focusing on the use of computing systems for efficiently deriving, storing, and analyzing the character strings of genome to help to solve problems in molecular biology. The flood of data from biology, mainly in the form of DNA, RNA and Protein sequences, puts heavy demand on computers and computational scientists. At the same time, it demands a transformation of basic ethos of biological sciences. Hence, Data mining techniques can be used efficiently to explore hidden pattern underlying in biological data. Un-supervised classification, also known as Clustering; which is one of the branch of Data Mining can be applied to biological data and this can result in a better era of rapid medical development and drug discovery. In the past decade, the advent of efficient genome sequencing tools have led to enormous progress in life sciences. Among the most important innovations, microarray technology allows to quantify the expression for thousand of genes simultaneously. The characteristic of these data which makes it different from machine-learning/pattern recognition data includes, a fair amount of random noise, missing values, a dimension in the range of thousands, and a sample size in few dozens. A particular application of the microarray technology is in the area of cancer research, where the goal is for precise and early detection of tumorous cells with high accuracy. The challenge for a biologist and computer scientist is to provide solution based on terms of automation, quality and efficiency

    Computational intelligence approaches to robotics, automation, and control [Volume guest editors]

    Get PDF
    No abstract available
    corecore