12 research outputs found

    On the role of metaheuristic optimization in bioinformatics

    Get PDF
    Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics

    Parallel Multi-Objective Evolutionary Algorithms: A Comprehensive Survey

    Get PDF
    Multi-Objective Evolutionary Algorithms (MOEAs) are powerful search techniques that have been extensively used to solve difficult problems in a wide variety of disciplines. However, they can be very demanding in terms of computational resources. Parallel implementations of MOEAs (pMOEAs) provide considerable gains regarding performance and scalability and, therefore, their relevance in tackling computationally expensive applications. This paper presents a survey of pMOEAs, describing a refined taxonomy, an up-to-date review of methods and the key contributions to the field. Furthermore, some of the open questions that require further research are also briefly discussed

    Computational Design and Experimental Validation of Functional Ribonucleic Acid Nanostructures

    Get PDF
    In living cells, two major classes of ribonucleic acid (RNA) molecules can be found. The first class called the messenger RNA (mRNA) contains the genetic information that allows the ribosome to read and translate it into proteins. The second class called non-coding RNA (ncRNA), do not code for proteins and are involved with key cellular processes, such as gene expression regulation, splicing, differentiation, and development. NcRNAs fold into an ensemble of thermodynamically stable secondary structures, which will eventually lead the molecule to fold into a specific 3D structure. It is widely known that ncRNAs carry their functions via their 3D structures as well as their molecular composition. The secondary structure of ncRNAs is composed of different types of structural elements (motifs) such as stacking base pairs, internal loops, hairpin loops and pseudoknots. Pseudoknots are specifically difficult to model, are abundant in nature and known to stabilize the functional form of the molecule. Due to the diverse range of functions of ncRNAs, their computational design and analysis have numerous applications in nano-technology, therapeutics, synthetic biology, and materials engineering. The RNA design problem is to find novel RNA sequences that are predicted to fold into target structure(s) while satisfying specific qualitative characteristics and constraints. RNA design can be modeled as a combinatorial optimization problem (COP) and is known to be computationally challenging or more precisely NP-hard. Numerous algorithms to solve the RNA design problem have been developed over the past two decades, however mostly ignore pseudoknots and therefore limit application to only a slice of real-world modeling and design problems. Moreover, the few existing pseudoknot designer methods which were developed only recently, do not provide any evidence about the applicability of their proposed design methodology in biological contexts. The two objectives of this thesis are set to address these two shortcomings. First, we are interested in developing an efficient computational method for the design of RNA secondary structures including pseudoknots that show significantly improved in-silico quality characteristics than the state of the art. Second, we are interested in showing the real-world worthiness of the proposed method by validating it experimentally. More precisely, our aim is to design instances of certain types of RNA enzymes (i.e. ribozymes) and demonstrate that they are functionally active. This would likely only happen if their predicted folding matched their actual folding in the in-vitro experiments. In this thesis, we present four contributions. First, we propose a novel adaptive defect weighted sampling algorithm to efficiently solve the RNA secondary structure design problem where pseudoknots are included. We compare the performance of our design algorithm with the state of the art and show that our method generates molecules that are thermodynamically more stable and less defective than those generated by state of the art methods. Moreover, we show when the effect of fitness evaluation is decoupled from the search and optimization process, our optimization method converges faster than the non-dominated sorting genetic algorithm (NSGA II) and the ant colony optimization (ACO) algorithm do. Second, we use our algorithmic development to implement an RNA design pipeline called Enzymer and make it available as an open source package useful for wet lab practitioners and RNA bioinformaticians. Enzymer uses multiple sequence alignment (MSA) data to generate initial design templates for further optimization. Our design pipeline can then be used to re-engineer naturally occurring RNA enzymes such as ribozymes and riboswitches. Our first and second contributions are published in the RNA section of the Journal of Frontiers in Genetics. Third, we use Enzymer to reengineer three different species of pseudoknotted ribozymes: a hammerhead ribozyme from the mouse gut metagenome, a hammerhead ribozyme from Yarrowia lipolytica and a glmS ribozyme from Thermoanaerobacter tengcogensis. We designed a total of 18 ribozyme sequences and showed the 16 of them were active in-vitro. Our experimental results have been submitted to the RNA journal and strongly suggest that Enzymer is a reliable tool to design pseudoknotted ncRNAs with desired secondary structure. Finally, we propose a novel architecture for a new ribozyme-based gene regulatory network where a hammerhead ribozyme modulates expression of a reporter gene when an external stimulus IPTG is present. Our in-vivo results show expected results in 7 out of 12 cases

    New evolutionary approaches to protein structure prediction

    Get PDF
    Programa de doctorado en Biotecnología y Tecnología QuímicaThe problem of Protein Structure Prediction (PSP) is one of the principal topics in Bioinformatics. Multiple approaches have been developed in order to predict the protein structure of a protein. Determining the three dimensional structure of proteins is necessary to understand the functions of molecular protein level. An useful, and commonly used, representation for protein 3D structure is the protein contact map, which represents binary proximities (contact or non-contact) between each pair of amino acids of a protein. This thesis work, includes a compilation of the soft computing techniques for the protein structure prediction problem (secondary and tertiary structures). A novel evolutionary secondary structure predictor is also widely described in this work. Results obtained confirm the validity of our proposal. Furthermore, we also propose a multi-objective evolutionary approach for contact map prediction based on physico-chemical properties of amino acids. The evolutionary algorithm produces a set of decision rules that identifies contacts between amino acids. The rules obtained by the algorithm impose a set of conditions based on amino acid properties in order to predict contacts. Results obtained by our approach on four different protein data sets are also presented. Finally, a statistical study was performed to extract valid conclusions from the set of prediction rules generated by our algorithm.Universidad Pablo de Olavide. Centro de Estudios de Postgrad

    Implementing decision tree-based algorithms in medical diagnostic decision support systems

    Get PDF
    As a branch of healthcare, medical diagnosis can be defined as finding the disease based on the signs and symptoms of the patient. To this end, the required information is gathered from different sources like physical examination, medical history and general information of the patient. Development of smart classification models for medical diagnosis is of great interest amongst the researchers. This is mainly owing to the fact that the machine learning and data mining algorithms are capable of detecting the hidden trends between features of a database. Hence, classifying the medical datasets using smart techniques paves the way to design more efficient medical diagnostic decision support systems. Several databases have been provided in the literature to investigate different aspects of diseases. As an alternative to the available diagnosis tools/methods, this research involves machine learning algorithms called Classification and Regression Tree (CART), Random Forest (RF) and Extremely Randomized Trees or Extra Trees (ET) for the development of classification models that can be implemented in computer-aided diagnosis systems. As a decision tree (DT), CART is fast to create, and it applies to both the quantitative and qualitative data. For classification problems, RF and ET employ a number of weak learners like CART to develop models for classification tasks. We employed Wisconsin Breast Cancer Database (WBCD), Z-Alizadeh Sani dataset for coronary artery disease (CAD) and the databanks gathered in Ghaem Hospital’s dermatology clinic for the response of patients having common and/or plantar warts to the cryotherapy and/or immunotherapy methods. To classify the breast cancer type based on the WBCD, the RF and ET methods were employed. It was found that the developed RF and ET models forecast the WBCD type with 100% accuracy in all cases. To choose the proper treatment approach for warts as well as the CAD diagnosis, the CART methodology was employed. The findings of the error analysis revealed that the proposed CART models for the applications of interest attain the highest precision and no literature model can rival it. The outcome of this study supports the idea that methods like CART, RF and ET not only improve the diagnosis precision, but also reduce the time and expense needed to reach a diagnosis. However, since these strategies are highly sensitive to the quality and quantity of the introduced data, more extensive databases with a greater number of independent parameters might be required for further practical implications of the developed models

    Innovative hybrid MOEA/AD variants for solving multi-objective combinatorial optimization problems

    Get PDF
    Orientador : Aurora Trinidad Ramirez PozoCoorientador : Roberto SantanaTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 16/12/2016Inclui referências : f. 103-116Resumo: Muitos problemas do mundo real podem ser representados como um problema de otimização combinatória. Muitas vezes, estes problemas são caracterizados pelo grande número de variáveis e pela presença de múltiplos objetivos a serem otimizados ao mesmo tempo. Muitas vezes estes problemas são difíceis de serem resolvidos de forma ótima. Suas resoluções tem sido considerada um desafio nas últimas décadas. Os algoritimos metaheurísticos visam encontrar uma aproximação aceitável do ótimo em um tempo computacional razoável. Os algoritmos metaheurísticos continuam sendo um foco de pesquisa científica, recebendo uma atenção crescente pela comunidade. Uma das têndencias neste cenário é a arbordagem híbrida, na qual diferentes métodos e conceitos são combinados objetivando propor metaheurísticas mais eficientes. Nesta tese, nós propomos algoritmos metaheurísticos híbridos para a solução de problemas combinatoriais multiobjetivo. Os principais ingredientes das nossas propostas são: (i) o algoritmo evolutivo multiobjetivo baseado em decomposição (MOEA/D framework), (ii) a otimização por colônias de formigas e (iii) e os algoritmos de estimação de distribuição. Em nossos frameworks, além dos operadores genéticos tradicionais, podemos instanciar diferentes modelos como mecanismo de reprodução dos algoritmos. Além disso, nós introduzimos alguns componentes nos frameworks objetivando balancear a convergência e a diversidade durante a busca. Nossos esforços foram direcionados para a resolução de problemas considerados difíceis na literatura. São eles: a programação quadrática binária sem restrições multiobjetivo, o problema de programação flow-shop permutacional multiobjetivo, e também os problemas caracterizados como deceptivos. Por meio de estudos experimentais, mostramos que as abordagens propostas são capazes de superar os resultados do estado-da-arte em grande parte dos casos considerados. Mostramos que as diretrizes do MOEA/D hibridizadas com outras metaheurísticas é uma estratégia promissora para a solução de problemas combinatoriais multiobjetivo. Palavras-chave: metaheuristicas, otimização multiobjetivo, problemas combinatoriais, MOEA/D, otimização por colônia de formigas, algoritmos de estimação de distribuição, programação quadrática binária sem restrições multiobjetivo, problema de programação flow-shop permutacional multiobjetivo, abordagens híbridas.Abstract: Several real-world problems can be stated as a combinatorial optimization problem. Very often, they are characterized by the large number of variables and the presence of multiple conflicting objectives to be optimized at the same time. These kind of problems are, usually, hard to be solved optimally, and their solutions have been considered a challenge for a long time. Metaheuristic algorithms aim at finding an acceptable approximation to the optimal solution in a reasonable computational time. The research on metaheuristics remains an attractive area and receives growing attention. One of the trends in this scenario are the hybrid approaches, in which different methods and concepts are combined aiming to propose more efficient approaches. In this thesis, we have proposed hybrid metaheuristic algorithms for solving multi-objective combinatorial optimization problems. Our proposals are based on (i) the multi-objective evolutionary algorithm based on decomposition (MOEA/D framework), (ii) the bio-inspired metaheuristic ant colony optimization, and (iii) the probabilistic models from the estimation of distribution algorithms. Our algorithms are considered MOEA/D variants. In our MOEA/D variants, besides the traditional genetic operators, we can instantiate different models as the variation step (reproduction). Moreover, we include some design modifications into the frameworks to control the convergence and the diversity during their search (evolution). We have addressed some important problems from the literature, e.g., the multi-objective unconstrained binary quadratic programming, the multiobjective permutation flowshop scheduling problem, and the problems characterized by deception. As a result, we show that our proposed frameworks are able to solve these problems efficiently by outperforming the state-of-the-art approaches in most of the cases considered. We show that the MOEA/D guidelines hybridized to other metaheuristic components and concepts is a powerful strategy for solving multi-objective combinatorial optimization problems. Keywords: meta-heuristics, multi-objective optimization, combinatorial problems, MOEA/D, ant colony optimization, estimation of distribution algorithms, unconstrained binary quadratic programming, permutation flowshop scheduling problem, hybrid approaches

    Using MapReduce Streaming for Distributed Life Simulation on the Cloud

    Get PDF
    Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp
    corecore