1,303 research outputs found

    Soft Computing Techiniques for the Protein Folding Problem on High Performance Computing Architectures

    Get PDF
    The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.This work is jointly supported by the FundaciónSéneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants 15290/PI/2010 and 18946/JLI/13, by the Spanish MEC and European Commission FEDER under grant with reference TEC2012-37945-C02-02 and TIN2012-31345, by the Nils Coordinated Mobility under grant 012-ABEL-CM-2014A, in part financed by the European Regional Development Fund (ERDF). We also thank NVIDIA for hardware donation within UCAM GPU educational and research centers.Ingeniería, Industria y Construcció

    A self-adaptive multimeme memetic algorithm co-evolving utility scores to control genetic operators and their parameter settings

    Get PDF
    Memetic algorithms are a class of well-studied metaheuristics which combine evolutionary algorithms and local search techniques. A meme represents contagious piece of information in an adaptive information sharing system. The canonical memetic algorithm uses a fixed meme, denoting a hill climbing operator, to improve each solution in a population during the evolutionary search process. Given global parameters and multiple parametrised operators, adaptation often becomes a crucial constituent in the design of MAs. In this study, a self-adaptive self-configuring steady-state multimeme memetic algorithm (SSMMA) variant is proposed. Along with the individuals (solutions), SSMMA co-evolves memes, encoding the utility score for each algorithmic component choice and relevant parameter setting option. An individual uses tournament selection to decide which operator and parameter setting to employ at a given step. The performance of the proposed algorithm is evaluated on six combinatorial optimisation problems from a cross-domain heuristic search benchmark. The results indicate the success of SSMMA when compared to the static Mas as well as widely used self-adaptive Multimeme Memetic Algorithm from the scientific literature

    Cooperative Metaheuristics for Exploring Proteomic Data

    Get PDF
    Most combinatorial optimization problems cannotbe solved exactly. A class of methods, calledmetaheuristics, has proved its efficiency togive good approximated solutions in areasonable time. Cooperative metaheuristics area sub-set of metaheuristics, which implies aparallel exploration of the search space byseveral entities with information exchangebetween them. The importance of informationexchange in the optimization process is relatedto the building block hypothesis ofevolutionary algorithms, which is based onthese two questions: what is the pertinentinformation of a given potential solution andhow this information can be shared? Aclassification of cooperative metaheuristicsmethods depending on the nature of cooperationinvolved is presented and the specificproperties of each class, as well as a way tocombine them, is discussed. Severalimprovements in the field of metaheuristics arealso given. In particular, a method to regulatethe use of classical genetic operators and todefine new more pertinent ones is proposed,taking advantage of a building block structuredrepresentation of the explored space. Ahierarchical approach resting on multiplelevels of cooperative metaheuristics is finallypresented, leading to the definition of acomplete concerted cooperation strategy. Someapplications of these concepts to difficultproteomics problems, including automaticprotein identification, biological motifinference and multiple sequence alignment arepresented. For each application, an innovativemethod based on the cooperation concept isgiven and compared with classical approaches.In the protein identification problem, a firstlevel of cooperation using swarm intelligenceis applied to the comparison of massspectrometric data with biological sequencedatabase, followed by a genetic programmingmethod to discover an optimal scoring function.The multiple sequence alignment problem isdecomposed in three steps involving severalevolutionary processes to infer different kindof biological motifs and a concertedcooperation strategy to build the sequencealignment according to their motif conten

    The inference of gene trees with species trees

    Get PDF
    Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can co-exist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. In this article we review the various models that have been used to describe the relationship between gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a better basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational Evolutionary Biology" conference, Montpellier, 201

    Optimización de algoritmos bioinspirados en sistemas heterogéneos CPU-GPU.

    Get PDF
    Los retos científicos del siglo XXI precisan del tratamiento y análisis de una ingente cantidad de información en la conocida como la era del Big Data. Los futuros avances en distintos sectores de la sociedad como la medicina, la ingeniería o la producción eficiente de energía, por mencionar sólo unos ejemplos, están supeditados al crecimiento continuo en la potencia computacional de los computadores modernos. Sin embargo, la estela de este crecimiento computacional, guiado tradicionalmente por la conocida “Ley de Moore”, se ha visto comprometido en las últimas décadas debido, principalmente, a las limitaciones físicas del silicio. Los arquitectos de computadores han desarrollado numerosas contribuciones multicore, manycore, heterogeneidad, dark silicon, etc, para tratar de paliar esta ralentización computacional, dejando en segundo plano otros factores fundamentales en la resolución de problemas como la programabilidad, la fiabilidad, la precisión, etc. El desarrollo de software, sin embargo, ha seguido un camino totalmente opuesto, donde la facilidad de programación a través de modelos de abstracción, la depuración automática de código para evitar efectos no deseados y la puesta en producción son claves para una viabilidad económica y eficiencia del sector empresarial digital. Esta vía compromete, en muchas ocasiones, el rendimiento de las propias aplicaciones; consecuencia totalmente inadmisible en el contexto científico. En esta tesis doctoral tiene como hipótesis de partida reducir las distancias entre los campos hardware y software para contribuir a solucionar los retos científicos del siglo XXI. El desarrollo de hardware está marcado por la consolidación de los procesadores orientados al paralelismo masivo de datos, principalmente GPUs Graphic Processing Unit y procesadores vectoriales, que se combinan entre sí para construir procesadores o computadores heterogéneos HSA. En concreto, nos centramos en la utilización de GPUs para acelerar aplicaciones científicas. Las GPUs se han situado como una de las plataformas con mayor proyección para la implementación de algoritmos que simulan problemas científicos complejos. Desde su nacimiento, la trayectoria y la historia de las tarjetas gráficas ha estado marcada por el mundo de los videojuegos, alcanzando altísimas cotas de popularidad según se conseguía más realismo en este área. Un hito importante ocurrió en 2006, cuando NVIDIA (empresa líder en la fabricación de tarjetas gráficas) lograba hacerse con un hueco en el mundo de la computación de altas prestaciones y en el mundo de la investigación con el desarrollo de CUDA “Compute Unified Device Arquitecture. Esta arquitectura posibilita el uso de la GPU para el desarrollo de aplicaciones científicas de manera versátil. A pesar de la importancia de la GPU, es interesante la mejora que se puede producir mediante su utilización conjunta con la CPU, lo que nos lleva a introducir los sistemas heterogéneos tal y como detalla el título de este trabajo. Es en entornos heterogéneos CPU-GPU donde estos rendimientos alcanzan sus cotas máximas, ya que no sólo las GPUs soportan el cómputo científico de los investigadores, sino que es en un sistema heterogéneo combinando diferentes tipos de procesadores donde podemos alcanzar mayor rendimiento. En este entorno no se pretende competir entre procesadores, sino al contrario, cada arquitectura se especializa en aquella parte donde puede explotar mejor sus capacidades. Donde mayor rendimiento se alcanza es en estos clústeres heterogéneos, donde múltiples nodos son interconectados entre sí, pudiendo dichos nodos diferenciarse no sólo entre arquitecturas CPU-GPU, sino también en las capacidades computacionales dentro de estas arquitecturas. Con este tipo de escenarios en mente, se presentan nuevos retos en los que lograr que el software que hemos elegido como candidato se ejecuten de la manera más eficiente y obteniendo los mejores resultados posibles. Estas nuevas plataformas hacen necesario un rediseño del software para aprovechar al máximo los recursos computacionales disponibles. Se debe por tanto rediseñar y optimizar los algoritmos existentes para conseguir que las aportaciones en este campo sean relevantes, y encontrar algoritmos que, por su propia naturaleza sean candidatos para que su ejecución en dichas plataformas de alto rendimiento sea óptima. Encontramos en este punto una familia de algoritmos denominados bioinspirados, que utilizan la inteligencia colectiva como núcleo para la resolución de problemas. Precisamente esta inteligencia colectiva es la que les hace candidatos perfectos para su implementación en estas plataformas bajo el nuevo paradigma de computación paralela, puesto que las soluciones pueden ser construidas en base a individuos que mediante alguna forma de comunicación son capaces de construir conjuntamente una solución común. Esta tesis se centrará especialmente en uno de estos algoritmos bioinspirados que se engloba dentro del término metaheurísticas bajo el paradigma del Soft Computing, el Ant Colony Optimization “ACO”. Se realizará una contextualización, estudio y análisis del algoritmo. Se detectarán las partes más críticas y serán rediseñadas buscando su optimización y paralelización, manteniendo o mejorando la calidad de sus soluciones. Posteriormente se pasará a implementar y testear las posibles alternativas sobre diversas plataformas de alto rendimiento. Se utilizará el conocimiento adquirido en el estudio teórico-práctico anterior para su aplicación a casos reales, más en concreto se mostrará su aplicación sobre el plegado de proteínas. Todo este análisis es trasladado a su aplicación a un caso concreto. En este trabajo, aunamos las nuevas plataformas hardware de alto rendimiento junto al rediseño e implementación software de un algoritmo bioinspirado aplicado a un problema científico de gran complejidad como es el caso del plegado de proteínas. Es necesario cuando se implementa una solución a un problema real, realizar un estudio previo que permita la comprensión del problema en profundidad, ya que se encontrará nueva terminología y problemática para cualquier neófito en la materia, en este caso, se hablará de aminoácidos, moléculas o modelos de simulación que son desconocidos para los individuos que no sean de un perfil biomédico.Ingeniería, Industria y Construcció

    Evolution of molecular innovations in cyanobacterial light-perceiving systems

    Get PDF
    Novel functional features are prominent throughout evolution, yet paradoxical: how does evolution create something innovative when all it can work with is variation of established biology? Neo-functionalization of proteins after gene duplication is one common explanation, driven by natural selection for a new, potentially innovative function. However, groundbreaking novelty may not be explained by adaptive diversification of existing proteins. In this thesis, we tackled the paradox of molecular innovation with molecular phylogenetics in two original research publications. The first article examined the evolution of cyanobacteriochromes (CBCRs), a class of phytochromes found exclusively in cyanobacteria. CBCRs gained the innovative ability to collectively sense the entire spectrum of visible light with a single-domain protein, in contrast to canonical tri-domain phytochromes that respond primarily to red- and far-red light signals. Using ancestral sequence reconstruction (ASR) and biochemical verification of resurrected proteins, we showed that the last common ancestor of CBCRs responded reversibly to green- and red-light signals. Latent blue-light perception and the ability to bind alternative chromophores, coupled with the minimalistic domain architecture may have enabled the vast diversification of CBCRs. This indicates that molecular innovation can potentially be achieved by reducing protein complexity, which may open up sequence space for new functions, such as broader color perception. The second article focused on the evolution of a novel allosteric regulation in cyanobacterial photoprotection by direct protein-protein interaction. It is unclear whether such required protein surface compatibilities can only be built by selection in small incremental steps, or whether they can also emerge fortuitously. Here, we used ASR and biophysical protein characterization to retrace the evolution of the allosteric interaction between the orange carotenoid protein (OCP) and its unrelated regulator, the fluorescence recovery protein (FRP). This interaction evolved when a precursor of FRP was horizontally acquired by cyanobacteria. FRP’s precursors could already interact with and regulate OCP even before these proteins first encountered each other in an ancestral cyanobacterium. The OCP–FRP interaction exploits an ancient dimer interface in OCP, which also predates the recruitment of FRP into the photoprotection system. This shows how evolution can easily fashion complex regulatory systems from pre-existing components, even without prior gene duplication. Together, we have shown that chance events may play an underestimated role in protein evolution and can indeed lead to groundbreaking innovations in biology

    Communities of Local Optima as Funnels in Fitness Landscapes

    Get PDF
    We conduct an analysis of local optima networks extracted from fitness landscapes of the Kauffman NK model under iterated local search. Applying the Markov Cluster Algorithm for community detection to the local optima networks, we find that the landscapes consist of multiple clusters. This result complements recent findings in the literature that landscapes often decompose into multiple funnels, which increases their difficulty for iterated local search. Our results suggest that the number of clusters as well as the size of the cluster in which the global optimum is located are correlated to the search difficulty of landscapes. We conclude that clusters found by community detection in local optima networks offer a new way to characterize the multi-funnel structure of fitness landscapes

    Red Queen Coevolution on Fitness Landscapes

    Full text link
    Species do not merely evolve, they also coevolve with other organisms. Coevolution is a major force driving interacting species to continuously evolve ex- ploring their fitness landscapes. Coevolution involves the coupling of species fit- ness landscapes, linking species genetic changes with their inter-specific ecological interactions. Here we first introduce the Red Queen hypothesis of evolution com- menting on some theoretical aspects and empirical evidences. As an introduction to the fitness landscape concept, we review key issues on evolution on simple and rugged fitness landscapes. Then we present key modeling examples of coevolution on different fitness landscapes at different scales, from RNA viruses to complex ecosystems and macroevolution.Comment: 40 pages, 12 figures. To appear in "Recent Advances in the Theory and Application of Fitness Landscapes" (H. Richter and A. Engelbrecht, eds.). Springer Series in Emergence, Complexity, and Computation, 201
    corecore