83 research outputs found
A Parallel Divide-and-Conquer based Evolutionary Algorithm for Large-scale Optimization
Large-scale optimization problems that involve thousands of decision
variables have extensively arisen from various industrial areas. As a powerful
optimization tool for many real-world applications, evolutionary algorithms
(EAs) fail to solve the emerging large-scale problems both effectively and
efficiently. In this paper, we propose a novel Divide-and-Conquer (DC) based EA
that can not only produce high-quality solution by solving sub-problems
separately, but also highly utilizes the power of parallel computing by solving
the sub-problems simultaneously. Existing DC-based EAs that were deemed to
enjoy the same advantages of the proposed algorithm, are shown to be
practically incompatible with the parallel computing scheme, unless some
trade-offs are made by compromising the solution quality.Comment: 12 pages, 0 figure
Understanding the Structural and Functional Importance of Early Folding Residues in Protein Structures
Proteins adopt three-dimensional structures which serve as a starting point to understand protein function and their evolutionary ancestry. It is unclear how proteins fold in vivo and how this process can be recreated in silico in order to predict protein structure from sequence. Contact maps are a possibility to describe whether two residues are in spatial proximity and structures can be derived from this simplified representation. Coevolution or supervised machine learning techniques can compute contact maps from sequence: however, these approaches only predict sparse subsets of the actual contact map. It is shown that the composition of these subsets substantially influences the achievable reconstruction quality because most information in a contact map is redundant. No strategy was proposed which identifies unique contacts for which no redundant backup exists.
The StructureDistiller algorithm quantifies the structural relevance of individual contacts and identifies crucial contacts in protein structures. It is demonstrated that using this information the reconstruction performance on a sparse subset of a contact map is increased by 0.4 A, which constitutes a substantial performance gain. The set of the most relevant contacts in a map is also more resilient to false positively predicted contacts: up to 6% of false positives are compensated before reconstruction quality matches a naive selection of contacts without any false positive contacts. This information is invaluable for the training to new structure prediction methods and provides insights into how robustness and information content of contact maps can be improved.
In literature, the relevance of two types of residues for in vivo folding has been described. Early folding residues initiate the folding process, whereas highly stable residues prevent spontaneous unfolding events. The structural relevance score proposed by this thesis is employed to characterize both types of residues. Early folding residues form pivotal secondary structure elements, but their structural relevance is average. In contrast, highly stable residues exhibit significantly increased structural relevance. This implies that residues crucial for the folding process are not relevant for structural integrity and vice versa. The position of early folding residues is preserved over the course of evolution as demonstrated for two ancient regions shared by all aminoacyl-tRNA synthetases. One arrangement of folding initiation sites resembles an ancient and widely distributed structural packing motif and captures how reverberations of the earliest periods of life can still be observed in contemporary protein structures
Using MapReduce Streaming for Distributed Life Simulation on the Cloud
Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp
Evolutionary-based methods for predicting genotype-phenotype associations in the mammalian genome
Phenotypic and genotypic variation between species are the result of millions of experiments performed by nature. Understanding why and how phenotypic complexity arises is a central goal of evolutionary biology. Technological advancements enabling whole genome sequencing have laid the foundation for developing comparative genomics-based tools for inferring genetic elements underlying phenotypic adaptations. The work covered as part of this thesis will develop these tools drawing from principles of convergent evolution, aimed at generating specific functional hypotheses that can help focus experimental efforts. These tools will be relevant for characterizing context-specific functions of cis-regulatory elements as well as protein-coding genes, where a large number lack functional annotation beyond domain homology. Expanding from one-dimensional approaches studying proteins in isolation, we propose to build an integrated co-evolutionary framework that will serve as a powerful tool for protein interaction prediction. In this dissertation, we discuss these ideas through the following three projects.
In chapter 1, we perform a genome-wide scan for genes showing convergent rate changes in four subterranean mammals, and study the underlying changes in selective pressure causing these convergent shifts in rate. Using a new variant of our rates-based method, we demonstrate that eye-specific regulatory regions show strong rate accelerations in the subterranean mammals. This study demonstrates the potential of convergent evolution-based tools in the functional annotation of eye-specific genetic elements.
In chapter 2, we build a robust method to infer shifts in rate associated with a wide range of evolutionary scenarios. We investigate the statistical underpinnings of our rates-based framework and identify the best performing variant of our method across real and simulated phylogenetic datasets. We distribute these tools to the research community, enabling large scale generation of specific functional hypotheses for regulatory regions.
In chapter 3, we propose to construct a powerful framework for protein interaction prediction using integration of proteome-wide co-evolutionary signatures. We systematically benchmark the predictions of our coevolutionary framework using known functional interactions among proteins across various scales. We make the predictions of the framework publicly available, useful for functional annotation of less well-characterized genes
Recommended from our members
Modelling the structural, functional and phenotypic consequences of protein coding mutations
Proteins are integral to all cellular processes and underpin the function of all extant organisms, meaning variants impacting them are a primary cause of phenotypic variation. Protein coding variants are a key area of study in biology, with relevance from structural and molecular biology to population genetics. They are also medically important, impacting inherited genetic diseases, cancer and response to pathogens. Recent advances in highthroughput experimental techniques have opened the door to many new approaches in biology, and protein variants are no exception. Deep mutational scanning experiments exhaustively measure the fitness of variants in a protein, which gives us more experimentally validated mutational consequence measurements than ever before. Such advances, together with ever larger sequence and structure databases, have created an opportunity to apply large scale analyses to coding variation, studying the effect on protein structure, function and phenotype.
In this thesis I perform three large scale variant analyses. First, I use the consequences of variation to learn about protein structure and function. I compile a dataset from 28 deep mutational scanning studies, covering 6291 positions in 30 proteins, and use the consequences of mutation at each position to define a mutational landscape. I show rich biophysical relationships in this landscape and identify functionally distinct positional subtypes of each amino acid. In the second analysis, I explore genotype to phenotype prediction using a dataset of 1011 S. cerevisiae strains, with genotypes, transcriptomics, proteomics and measured phenotypes, and comprehensive gene deletions in four strains. I show knowledge-based
models of mutational consequences and pathway function can be used to associate genes with phenotypes and predict growth phenotypes across 34 growth conditions. However, genetic background is found to have a large effect on variant consequences, to such an extent that the same deletion can be highly significant in one strain and have no effect in another. Finally, I analyse computational variant effect prediction, benchmarking current predictors using deep mutational scanning data. I then develop a new end-to-end deep convolutional neural network predictor that predicts consequences directly from sequence and structure and show it improves on current methods. Together these projects advance our knowledge of protein coding variation and enhance our capacity to link variation to impacts on structure, function and phenotype
Multiobjective Optimization of Fuzzy System for Cardiovascular Risk Classification
Dado que las enfermedades cardiovasculares (ECV) plantean una preocupación mundial crítica, la identificación de los factores de riesgo asociados sigue siendo un foco de investigación fundamental. Este estudio tiene como objetivo proponer y optimizar un sistema difuso para la clasificación del riesgo cardiovascular (RCV) utilizando un enfoque multiobjetivo, abordando aspectos computacionales como la configuración del sistema difuso, el proceso de optimización, la selección de una solución adecuada a partir del frente de Pareto óptimo, y la interpretabilidad del sistema de lógica difusa después del proceso de optimización. El sistema propuesto utiliza datos, incluida la edad, el peso, la altura, el sexo y la presión arterial sistólica para determinar el riesgo cardiovascular. El modelo difuso se basa en información preliminar de la literatura; por lo tanto, para ajustar el sistema de lógica difusa utilizando un enfoque multiobjetivo, el índice de masa corporal (IMC) se considera como un resultado adicional ya que hay datos disponibles para este índice, y el índice de masa corporal se reconoce como un indicador aproximado del riesgo cardiovascular dada la propensión a sufrir enfermedades cardiovasculares. Estas enfermedades se atribuyen al exceso de tejido adiposo, que puede elevar la presión arterial, los niveles de colesterol y triglicéridos, provocando daño arterial y cardíaco. Al emplear un enfoque multiobjetivo, el estudio pretende obtener un equilibrio entre los dos resultados correspondientes a la clasificación de riesgo cardiovascular y el índice de masa corporal. Para la optimización multiobjetivo se propone un conjunto de experimentos que arrojan como resultado un frente de Pareto óptimo para posteriormente determinar la solución adecuada. Los resultados muestran una adecuada optimización del sistema de lógica difusa, permitiendo la interpretabilidad de los conjuntos difusos luego de realizar el proceso de optimización. De esta manera, este artículo contribuye al avance del uso de técnicas computacionales en el ámbito médico.Since cardiovascular diseases (CVDs) pose a critical global concern, identifying associated risk factors remains a pivotal research focus. This study aims to propose and optimize a fuzzy system for cardiovascular risk (CVR) classification using a multiobjective approach, addressing computational aspects such as the configuration of the fuzzy system, the optimization process, the selection of a suitable solution from the optimal Pareto front, and the interpretability of the fuzzy logic system after the optimization process. The proposed system utilizes data, including age, weight, height, gender, and systolic blood pressure to determine cardiovascular risk. The fuzzy model is based on preliminary information from the literature; therefore, to adjust the fuzzy logic system using a multiobjective approach, the body mass index (BMI) is considered as an additional output as data are available for this index, and body mass index is acknowledged as a proxy for cardiovascular risk given the propensity for these diseases attributed to surplus adipose tissue, which can elevate blood pressure, cholesterol, and triglyceride levels, leading to arterial and cardiac damage. By employing a multiobjective approach, the study aims to obtain a balance between the two outputs corresponding to cardiovascular risk classification and body mass index. For the multiobjective optimization, a set of experiments is proposed that render an optimal Pareto front, as a result, to later determine the appropriate solution. The results show an adequate optimization of the fuzzy logic system, allowing the interpretability of the fuzzy sets after carrying out the optimization process. In this way, this paper contributes to the advancement of the use of computational techniques in the medical domain
Dynamics of Macrosystems; Proceedings of a Workshop, September 3-7, 1984
There is an increasing awareness of the important and persuasive role that instability and random, chaotic motion play in the dynamics of macrosystems. Further research in the field should aim at providing useful tools, and therefore the motivation should come from important questions arising in specific macrosystems. Such systems include biochemical networks, genetic mechanisms, biological communities, neutral networks, cognitive processes and economic structures. This list may seem heterogeneous, but there are similarities between evolution in the different fields. It is not surprising that mathematical methods devised in one field can also be used to describe the dynamics of another.
IIASA is attempting to make progress in this direction. With this aim in view this workshop was held at Laxenburg over the period 3-7 September 1984. These Proceedings cover a broad canvas, ranging from specific biological and economic problems to general aspects of dynamical systems and evolutionary theory
Evolvability and organismal architecture:The blind watchmaker and the reminiscent architect
Organisms are constantly faced with the challenge of adapting to new circumstances. In this thesis, I argue that the ability to adapt to new circumstances, “evolvability”, is deeply ingrained in the genetic, developmental, morphological, and physiological architecture of organisms. Using a blend of conceptual research, theoretical modelling, and multidisciplinary studies, I demonstrate how organismal architecture can evolve so that organisms can cope better and better with future environmental challenges. As a first step, I systematically classify the many factors contributing to evolvability. Then I use a simulation approach to show how evolvability-enhancing structures can readily evolve in gene-regulatory networks. This happens via the evolution of "mutational transformers" - structural elements that convert random mutations at the genetic level into adaptation-enhancing mutations at the phenotypic level. In another thesis chapter, I demonstrate that even if selection acts only sporadically, complex adaptations can evolve and persist over long time periods. In other words, complex adaptations do not require constant selection pressure. In an interdisciplinary contribution, I apply biological insights regarding the properties of an evolvability-enhancing mutation structure to the design of algorithms used in Artificial Intelligence. The result is the “Facilitated Mutation” method which enhances the performance of the algorithms in various respects, highlighting the potential for leveraging biological principles in computational sciences. Finally, I embed my research findings in a philosophical context. I emphasise the importance of organismal architecture in retaining evolutionary memories and suggest future research directions to further enhance our understanding of evolvability
- …