143 research outputs found

    On the role of metaheuristic optimization in bioinformatics

    Get PDF
    Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics

    Evolutionary dynamics, topological disease structures, and genetic machine learning

    Full text link
    Topological evolution is a new dynamical systems model of biological evolution occurring within a genomic state space. It can be modeled equivalently as a stochastic dynamical system, a stochastic differential equation, or a partial differential equation drift-diffusion model. An application of this approach is a model of disease evolution tracing diseases in ways similar to standard functional traits (e.g., organ evolution). Genetically embedded diseases become evolving functional components of species-level genomes. The competition between species-level evolution (which tends to maintain diseases) and individual evolution (which acts to eliminate them), yields a novel structural topology for the stochastic dynamics involved. In particular, an unlimited set of dynamical time scales emerges as a means of timing different levels of evolution: from individual to group to species and larger units. These scales exhibit a dynamical tension between individual and group evolutions, which are modeled on very different (fast and slow, respectively) time scales. This is analyzed in the context of a potentially major constraint on evolution: the species-level enforcement of lifespan via (topological) barriers to genomic longevity. This species-enforced behavior is analogous to certain types of evolutionary altruism, but it is denoted here as extreme altruism based on its potential shaping through mass extinctions. We give examples of biological mechanisms implementing some of the topological barriers discussed and provide mathematical models for them. This picture also introduces an explicit basis for lifespan-limiting evolutionary pressures. This involves a species-level need to maintain flux in its genome via a paced turnover of its biomass. This is necessitated by the need for phenomic characteristics to keep pace with genomic changes through evolution. Put briefly, the phenome must keep up with the genome, which occurs with an optimized limited lifespan. An important consequence of this model is a new role for diseases in evolution. Rather than their commonly recognized role as accidental side-effects, they play a central functional role in the shaping of an optimal lifespan for a species implemented through the topology of their embedding into the genome state space. This includes cancers, which are known to be embedded into the genome in complex and sometimes hair-triggered ways arising from DNA damage. Such cancers are known also to act in engineered and teleological ways that have been difficult to explain using currently very popular theories of intra-organismic cancer evolution. This alternative inter-organismic picture presents cancer evolution as occurring over much longer (evolutionary) time scales rather than very shortened organic evolutions that occur in individual cancers. This in turn may explain some evolved, intricate, and seemingly engineered properties of cancer. This dynamical evolutionary model is framed in a multiscaled picture in which different time scales are almost independently active in the evolutionary process acting on semi-independent parts of the genome. We additionally move from natural evolution to artificial implementations of evolutionary algorithms. We study genetic programming for the structured construction of machine learning features in a new structural risk minimization environment. While genetic programming in feature engineering is not new, we propose a Lagrangian optimization criterion for defining new feature sets inspired by structural risk minimization in statistical learning. We bifurcate the optimization of this Lagrangian into two exhaustive categories involving local and global search. The former is accomplished through local descent with given basins of attraction while the latter is done through a combinatorial search for new basins via an evolution algorithm

    Nonlinear Dynamic System Identification and Model Predictive Control Using Genetic Programming

    Get PDF
    During the last century, a lot of developments have been made in research of complex nonlinear process control. As a powerful control methodology, model predictive control (MPC) has been extensively applied to chemical industrial applications. Core to MPC is a predictive model of the dynamics of the system being controlled. Most practical systems exhibit complex nonlinear dynamics, which imposes big challenges in system modelling. Being able to automatically evolve both model structure and numeric parameters, Genetic Programming (GP) shows great potential in identifying nonlinear dynamic systems. This thesis is devoted to GP based system identification and model-based control of nonlinear systems. To improve the generalization ability of GP models, a series of experiments that use semantic-based local search within a multiobjective GP framework are reported. The influence of various ways of selecting target subtrees for local search as well as different methods for performing that search were investigated; a comparison with the Random Desired Operator (RDO) of Pawlak et al. was made by statistical hypothesis testing. Compared with the corresponding baseline GP algorithms, models produced by a standard steady state or generational GP followed by a carefully-designed single-objective GP implementing semantic-based local search are statistically more accurate and with smaller (or equal) tree size, compared with the RDO-based GP algorithms. Considering the practical application, how to correctly and efficiently apply an evolved GP model to other larger systems is a critical research concern. Currently, the replication of GP models is normally done by repeating other’s work given the necessary algorithm parameters. However, due to the empirical and stochastic nature of GP, it is difficult to completely reproduce research findings. An XML-based standard file format, named Genetic Programming Markup Language (GPML), is proposed for the interchange of GP trees. A formal definition of this standard and details of implementation are described. GPML provides convenience and modularity for further applications based on GP models. The large-scale adoption of MPC in buildings is not economically viable due to the time and cost involved in designing and adjusting predictive models by expert control engineers. A GP-based control framework is proposed for automatically evolving dynamic nonlinear models for the MPC of buildings. An open-loop system identification was conducted using the data generated by a building simulator, and the obtained GP model was then employed to construct the predictive model for the MPC. The experimental result shows GP is able to produce models that allow the MPC of building to achieve the desired temperature band in a single zone space

    Artificial Neurogenesis: An Introduction and Selective Review

    Get PDF
    International audienceIn this introduction and review—like in the book which follows—we explore the hypothesis that adaptive growth is a means of producing brain-like machines. The emulation of neural development can incorporate desirable characteristics of natural neural systems into engineered designs. The introduction begins with a review of neural development and neural models. Next, artificial development— the use of a developmentally-inspired stage in engineering design—is introduced. Several strategies for performing this " meta-design " for artificial neural systems are reviewed. This work is divided into three main categories: bio-inspired representations ; developmental systems; and epigenetic simulations. Several specific network biases and their benefits to neural network design are identified in these contexts. In particular, several recent studies show a strong synergy, sometimes interchange-ability, between developmental and epigenetic processes—a topic that has remained largely under-explored in the literature

    Inductive biases and metaknowledge representations for search-based optimization

    Get PDF
    "What I do not understand, I can still create."- H. Sayama The following work follows closely the aforementioned bonmot. Guided by questions such as: ``How can evolutionary processes exhibit learning behavior and consolidate knowledge?´´, ``What are cognitive models of problem-solving?´´ and ``How can we harness these altogether as computational techniques?´´, we clarify within this work essentials required to implement them for metaheuristic search and optimization.We therefore look into existing models of computational problem-solvers and compare these with existing methodology in literature. Particularly, we find that the meta-learning model, which frames problem-solving in terms of domain-specific inductive biases and the arbitration thereof through means of high-level abstractions resolves outstanding issues with methodology proposed within the literature. Noteworthy, it can be also related to ongoing research on algorithm selection and configuration frameworks. We therefore look in what it means to implement such a model by first identifying inductive biases in terms of algorithm components and modeling these with density estimation techniques. And secondly, propose methodology to process metadata generated by optimization algorithms in an automated manner through means of deep pattern recognition architectures for spatio-temporal feature extraction. At last we look into an exemplary shape optimization problem which allows us to gain insight into what it means to apply our methodology to application scenarios. We end our work with a discussion on future possible directions to explore and discuss the limitations of such frameworks for system deployment

    Regulatory network discovery using heuristics

    Get PDF
    This thesis improves the GRN discovery process by integrating heuristic information via a co-regulation function, a post-processing procedure, and a Hub Network algorithm to build the backbone of the network.Doctor of Philosoph

    Compact Dynamic Optimisation Algorithm

    Get PDF
    In recent years, the field of evolutionary dynamic optimisation has seen significant increase in scientific developments and contributions. This is as a result of its relevance in solving academic and real-world problems. Several techniques such as hyper-mutation, hyper-learning, hyper-selection, change detection and many more have been developed specifically for solving dynamic optimisation problems. However, the complex structure of algorithms employing these techniques make them unsuitable for real-world, real-time dynamic optimisation problem using embedded systems with limited memory. The work presented in this thesis focuses on a compact approach as an alternative to population based optimisation algorithm, suitable for solving real-time dynamic optimisation problems. Specifically, a novel compact dynamic optimisation algorithm suitable for embedded systems with limited memory is presented. Three novel dynamic approaches that augment and enhance the evolving properties of the compact genetic algorithm in dynamic environments are introduced. These are 1.) change detection scheme that measures the degree of dynamic change 2.) mutation schemes whereby the mutation rates is directly linked to the detected degree of change and 3.) change trend scheme the monitors change pattern exhibited by the system. The novel compact dynamic optimization algorithm outlined was applied to two differing dynamic optimization problems. This work evaluates the algorithm in the context of tuning a controller for a physical target system in a dynamic environment and solving a dynamic optimization problem using an artificial dynamic environment generator. The novel compact dynamic optimisation algorithm was compared to some existing dynamic optimisation techniques. Through a series of experiments, it was shown that maintaining diversity at a population level is more efficient than diversity at an individual level. Among the five variants of the novel compact dynamic optimization algorithm, the third variant showed the best performance in terms of response to dynamic changes and solution quality. Furthermore, it was demonstrated that information transfer based on dynamic change patterns can effectively minimize the exploration/exploitation dilemma in a dynamic environment

    A tandem evolutionary algorithm for identifying causal rules from complex data

    Get PDF
    We propose a new evolutionary approach for discovering causal rules in complex classification problems from batch data. Key aspects include (a) the use of a hypergeometric probability mass function as a principled statistic for assessing fitness that quantifies the probability that the observed association between a given clause and target class is due to chance, taking into account the size of the dataset, the amount of missing data, and the distribution of outcome categories, (b) tandem age-layered evolutionary algorithms for evolving parsimonious archives of conjunctive clauses, and disjunctions of these conjunctions, each of which have probabilistically significant associations with outcome classes, and (c) separate archive bins for clauses of different orders, with dynamically adjusted order-specific thresholds. The method is validated on majority-on and multiplexer benchmark problems exhibiting various combinations of heterogeneity, epistasis, overlap, noise in class associations, missing data, extraneous features, and imbalanced classes. We also validate on a more realistic synthetic genome dataset with heterogeneity, epistasis, extraneous features, and noise. In all synthetic epistatic benchmarks, we consistently recover the true causal rule sets used to generate the data. Finally, we discuss an application to a complex real-world survey dataset designed to inform possible ecohealth interventions for Chagas disease

    Improving the Generalisability of Brain Computer Interface Applications via Machine Learning and Search-Based Heuristics

    Get PDF
    Brain Computer Interfaces (BCI) are a domain of hardware/software in which a user can interact with a machine without the need for motor activity, communicating instead via signals generated by the nervous system. These interfaces provide life-altering benefits to users, and refinement will both allow their application to a much wider variety of disabilities, and increase their practicality. The primary method of acquiring these signals is Electroencephalography (EEG). This technique is susceptible to a variety of different sources of noise, which compounds the inherent problems in BCI training data: large dimensionality, low numbers of samples, and non-stationarity between users and recording sessions. Feature Selection and Transfer Learning have been used to overcome these problems, but they fail to account for several characteristics of BCI. This thesis extends both of these approaches by the use of Search-based algorithms. Feature Selection techniques, known as Wrappers use ‘black box’ evaluation of feature subsets, leading to higher classification accuracies than ranking methods known as Filters. However, Wrappers are more computationally expensive, and are prone to over-fitting to training data. In this thesis, we applied Iterated Local Search (ILS) to the BCI field for the first time in literature, and demonstrated competitive results with state-of-the-art methods such as Least Absolute Shrinkage and Selection Operator and Genetic Algorithms. We then developed ILS variants with guided perturbation operators. Linkage was used to develop a multivariate metric, Intrasolution Linkage. This takes into account pair-wise dependencies of features with the label, in the context of the solution. Intrasolution Linkage was then integrated into two ILS variants. The Intrasolution Linkage Score was discovered to have a stronger correlation with the solutions predictive accuracy on unseen data than Cross Validation Error (CVE) on the training set, the typical approach to feature subset evaluation. Mutual Information was used to create Minimum Redundancy Maximum Relevance Iterated Local Search (MRMR-ILS). In this algorithm, the perturbation operator was guided using an existing Mutual Information measure, and compared with current Filter and Wrapper methods. It was found to achieve generally lower CVE rates and higher predictive accuracy on unseen data than existing algorithms. It was also noted that solutions found by the MRMR-ILS provided CVE rates that had a stronger correlation with the accuracy on unseen data than solutions found by other algorithms. We suggest that this may be due to the guided perturbation leading to solutions that are richer in Mutual Information. Feature Selection reduces computational demands and can increase the accuracy of our desired models, as evidenced in this thesis. However, limited quantities of training samples restricts these models, and greatly reduces their generalisability. For this reason, utilisation of data from a wide range of users is an ideal solution. Due to the differences in neural structures between users, creating adequate models is difficult. We adopted an existing state-of-the-art ensemble technique Ensemble Learning Generic Information (ELGI), and developed an initial optimisation phase. This involved using search to transplant instances between user subsets to increase the generalisability of each subset, before combination in the ELGI. We termed this Evolved Ensemble Learning Generic Information (eELGI). The eELGI achieved higher accuracy than user-specific BCI models, across all eight users. Optimisation of the training dataset allowed smaller training sets to be used, offered protection against neural drift, and created models that performed similarly across participants, regardless of neural impairment. Through the introduction and hybridisation of search based algorithms to several problems in BCI we have been able to show improvements in modelling accuracy and efficiency. Ultimately, this represents a step towards more practical BCI systems that will provide life altering benefits to users
    • …