797 research outputs found

    On the role of metaheuristic optimization in bioinformatics

    Get PDF
    Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics

    Seventh Biennial Report : June 2003 - March 2005

    No full text

    Development and Improvement of Tools and Algorithms for the Problem of Atom Type Perception and for the Assessment of Protein-Ligand-Complex Geometries

    Get PDF
    In context of the present work, a scoring function for protein-ligand complexes has been developed, not aimed at affinity prediction, but rather a good recognition rate of near native geometries. The developed program DSX makes use of the same formalism as the knowledge-based scoring function DrugScore, hence using the knowledge from crystallographic databases and atom-type specific distance-dependent distribution functions. It is based on newly defined atom-types. Additionally, the program is augmented by two novel potentials which evaluate the torsion angles and (de-)solvation effects. Validation of DSX is based on a literature-known, comprehensive data-set that allows for comparison with other popular scoring functions. DSX is intended for the recognition of near-native binding modes. In this important task, DSX outperforms the competitors, but is also among the best scoring functions regarding the ranking of different compounds. Another essential step in the development of DSX was the automatical assignment of the new atom types. A powerful programming framework was implemented to fulfill this task. Validation was done on a literature-known data-set and showed superior efficiency and quality compared to similar programs where this data was available. The front-end fconv was developed to share this functionality with the scientific community. Multiple features useful in computational drug-design workflows are also included and fconv was made freely available as Open Source Project. Based on the developed potentials for DSX, a number of further applications was created and impemented: The program HotspotsX calculates favorable interaction fields in protein binding pockets that can be used as a starting point for pharmacophoric models and that indicate possible directions for the optimization of lead structures. The program DSFP calculates scores based on fingerprints for given binding geometries. These fingerprints are compared with reference fingerprints that are derived from DSX interactions in known crystal structures of the particular target. Finally, the program DSX_wat was developed to predict stable water networks within a binding pocket. DSX interaction fields are used to calculate the putative water positions

    Sixth Biennial Report : August 2001 - May 2003

    No full text

    Flexible molecular alignment: an industrial case study on quantum algorithmic techniques

    Get PDF
    Dissertação de mestrado em Engenharia FísicaFlexible molecular alignment is a complex and challenging problem in the area of Medic inal Chemistry. The current approach to this problem does not test all possible alignments, but makes a previous analysis of all the variables and chooses the ones with potentially greater impact in the posterior alignment. This procedure can lead to wrong ”best align ments” since not every data is considered. Quantum computation, due to its natural parallelism, may improve algorithmic solutions for this kind of problems because it may test and/or simulate all possible solutions in an execution cycle. As a case study proposed by BIAL and in collaboration with IBM, the main goal of this dissertation was to study and create quantum algorithms able to refactor the problem of molecular alignment in the new setting of quantum computation. Additionally, the comparison between both classical and quantum solutions was defined as a subsequent goal. During this dissertation and due to its complexity, in order to produce a practical solu tion to this problem, we resorted to a manageable number of conformations per molecule, revisited the classical solution and elaborated a corresponding quantum algorithm. Such algorithm was then tested in both a quantum simulator and a real device. Despite the privileged collaboration with IBM, the quantum simulations were not pro duced in viable time, making them impractical for industry applications. Nonetheless, tak ing in consideration the current point of development of quantum hardware, the suggested solutions still has potential for the future.O alinhamento de moléculas flexíveis é um problema complexo na área de Química Medicinal, onde, mesmo hoje em dia, é um desafio encontrar uma solução. A atual abordagem para este problema não testa todos os possíveis alinhamentos. Em vez disso, realiza uma análise prévia de todas as variáveis e escolhe aquelas com maior potencial de impacto no posterior alinhamento. Este procedimento pode levar a falsos “melhores alinhamentos” visto que nem todos os dados são considerados. A computação quântica, devido ao seu natural paralelismo, pode melhorar as soluções algorítmicas deste tipo de problemas visto que poderá testar e/ou simular todas as possíveis soluções num ciclo de execução. Partindo de um caso de estudo proposto pela BIAL, e em colaboração com a IBM, o objetivo principal desta dissertação foi estudar e criar algoritmos quânticos capazes reformular no contexto de computação quântica o problema de alinhamento de moléculas. Adicionalmente, e como objetivo subsequente, foi prevista a comparação entre os algoritmos clássicos e quânticos. Durante esta dissertação e devido à sua complexidade, de modo a produzir uma solução prática para este problema, foi utilizado um número tratável de conformações por molécula, revisitada a solução clássica e desenvolvido um algoritmo quântico correspondente. Tal algoritmo foi depois testado tanto num simulador quântico como num dispositivo real. Apesar da colaboração privilegiada com a IBM, as simulações quânticas não foram produzidas em tempo viável, tornando-as impraticáveis para aplicações industriais. Não obstante, tendo em consideração o ponto atual de desenvolvimento dos dispositivos quânticos, as soluções propostas terão potencial para o futuro

    Mapping biophysics through enhanced Monte Carlo techniques

    Get PDF
    This thesis is focused on the study of molecular interactions at the atomistic detail and is divided into one introductory chapter and four chapters referencing different problems and methodological approaches. All of them are focused on the development and improvement of computational Monte Carlo algorithms to study, in an efficient manner, the behavior of these systems at a classical molecular mechanics level. The four biophysical problems studied in this thesis are: induced fit docking between protein-ligand and between DNA-ligand to understand the binding mechanism, protein stretching response, and generation/ scoring of protein-protein docking poses. The thesis is organized as follows: First chapter corresponds to the state of the art in computational methods to study biophysical interactions, which is the starting point of this thesis. Our in-house PELE algorithm and the main standard methods such as molecular dynamics will be explained in detail. Chapter two is focused on the main PELE modifications to add new features, such as the addition of a new force field, implicit solvent and an anisotropic network specific for DNA simulation studies. We study, compare and validate the conformations generated by six representative DNA fragments with the new PELE features using molecular dynamics as a reference. Chapter three is devoted to applying the new methods implemented and tested in PELE to study protein-ligand interactions and DNA-ligand interactions using four systems. First, we study the porphyrin binding to Gun4 protein combining PELE and molecular dynamics simulations. Besides, we provide a docking pose that has been corroborated by a new crystal structure published during the revision process of the submitted study showing the accuracy of our predictions. In the second project, we use our improved version of PELE to generate the first structural model of an alpha glucose 1,6-bisphosphate substrate bound to the human Phosphomannomutase 2 demonstrating that this ligand can adopt two low-energy orientations. The third project is the study of DNA-ligand interactions for three cisplatin drugs where we evaluate the binding free energy using Markov state models. We show excellent results respect another free energy methods studied with molecular dynamics. The last project is the study of the daunomycin DNA intercalator where we simulate and study the binding process with PELE. Chapter four is focused on the computational study of force extension profiles during the protein unfolding. We added a dynamic harmonic constraint following a similar procedure applied in steered molecular dynamics to our Monte Carlo approach to fix or pull some selected atoms forcing the protein unfolding in a defined direction. We implement and compare with steered molecular dynamics this technique with Ubiquitin and Azurin proteins. Moreover, we add this feature to a well-known algorithm called MCPRO from William Jorgensen¿s group at YALE University to evaluate the free energy associated to the unfolding of the deca-alanine system. Chapter five corresponds to the introduction of a multiscale approach to study protein-protein docking. A coarse-grained model will be combined with a Monte Carlo exploration reducing the degrees of freedom to generate thousands of protein-protein poses in a quick way. Poses produced by this procedure will be refined and ranked through a protonation, hydrogen bond optimization, and minimization protocol at the all-atom representation to identify the best poses. I present two test cases where this procedure has been applied showing a good accuracy in the predictions: tryptogalinin and ferredoxin/flavodoxin systems.Aquesta tesi es centra en l'estudi de les interaccions moleculars amb detall atomic i es divideix en un capítol d'introducció i quatre capítols que fan referència a diferents problemes i enfocaments metodològics. Tots ells se centren en el desenvolupament i millora dels algoritmes computacionals de Monte Carlo per estudiar, de manera eficient, el comportament d'aquests sistemes a un nivell mecànica molecular clàssica. Els quatre problemes biofísics estudiats en aquesta tesi són: acoblament induït entre la proteïna-lligand i entre DNA-lligant per comprendre el mecanisme d'unió, resposta de les proteïnes a l'estirament, i la generació/puntuació d'acoblament entre poses proteïna-proteïna. La tesi s'organitza de la següent manera: El primer capítol correspon a l'estat de l'art en mètodes computacionals per estudiar les interaccions biofísiques, que és el punt de partida d'aquesta tesi. El nostre PELE algoritme i els principals mètodes estàndard com ara la dinàmica molecular s'explicaran en detall. El capítol dos es centra en les principals modificacions PELE per afegir noves característiques, com ara l'addició d'un nou camp de força, solvent implícit i modes normals per aquests estudis de simulació d'ADN. Es fa un estudi, comparació i validació de les conformacions generades per sis fragments d'ADN representatius amb PELE utilitzant dinàmica molecular com a referència. El tercer capítol està dedicat a l'aplicació dels nous mètodes implementats i provats en PELE per estudiar les interaccions proteïna-lligand i la interacció lligand-DNA utilitzant quatre sistemes. En primer lloc, se estudia la unió a proteïnes GUN4 combinant PELE i simulacions de dinàmica molecular. A més, es proposa un acoblament que ha sigut corroborat per una nova estructura cristal·lina publicada durant el procés de revisió de l'estudi mostrant l'exactitud de les nostres prediccions. En el segon projecte, hem utilitzat la nostra versió millorada de PELE per generar el primer model estructural d'una glucosa alfa substrat 1,6-bisfosfat unit a la fosfomanomutasa humana 2, que demostra que aquest lligant pot adoptar dues orientacions de baiza energia. El tercer projecte és l'estudi de les interaccions d'ADN lligant per tres medicaments cisplatí on se avalua l'energia lliure d'unió utilitzant Markov States Models. Es mostren excel·lents resultats respecte d'altres mètodes d'energia lliure estudiats amb dinàmica molecular. L'últim projecte és l'estudi de l'intercalador d'ADN anomenat daunomicina on es simula i estudia el procés d'unió amb PELE. El capítol 4 es centra en l'estudi computacional dels perfils d'extensió de la força durant el desplegament de la proteïna. Hem afegit una restricció harmònica dinàmica seguint un procediment similar al aplicat en dinàmica molecular en el nostre algoritme Monte Carlo per fixar o moure alguns àtoms seleccionats obligant a desplegar la proteïna en una direcció definida. Aquesta tècnica s'ha implementat i comparat amb dinàmica molecular per les proteïnes ubiquitina i azurin. D'altra banda, hem afegit aquesta modificació a un algoritme ben conegut anomenat MCPRO del grup de William Jorgensen a la Universitat de Yale per avaluar l'energia lliure associada al desplegament del sistema deca alanina. El capítol cinc correspon a la introducció d'un enfocament multiescala per estudiar l'acoblament proteïna-proteïna. Un model de gra gruixut es combinat amb una exploració Monte Carlo per reduir els graus de llibertat i generar milers de poses proteïna-proteïna d'una manera ràpida. Les poses produides per aquest procediment es perfeccionan i evaluan a través d'una protonació, optimització d'enllaços d'hidrogen, i minimització a escala atòmica per identificar les millors poses. Es presenten dos casos de prova on s'ha aplicat aquest procediment que mostra una bona precisió en les prediccions: tryptogalinin i ferredoxina / flavodoxina systems

    Discovery and development of novel inhibitors for the kinase Pim-1 and G-Protein Coupled Receptor Smoothened

    Get PDF
    Investigation of the cause of disease is no easy business. This is particularly so when one reflects upon the lessons taught us in antiquity. Prior to the beginning of the last century, diagnosis and treatment of diseases such as cancers was so bereft of hope that there was little physicians could offer in the way of comfort, let alone treatment. One of the major insights from investigations into cancers this century has been that those involved in research leading to treatments are not dealing with a singular malady but multiple families of diseases with different mechanisms and modes of action. Therefore, despite the end game being similar in cancers, that of uncontrolled growth and replication leading to cellular dysfunction, different diseases require different approaches in targeting them. This leads us to a particular broad treatment approach, that of drug design. A drug is, in the classical sense, a small molecule that, upon introduction into the body, interacts with biochemical targets to induce a wider biological effect, ideally with both an intended target and intended effect. The conceptual basis underpinning this `lock-and-key' paradigm was elucidated over a century ago and the primary occupation of those involved in biochemical research has been to determine as much information as possible about both of these protein locks and drug keys. And, as inferred from the paradigm, molecular shape is all-important in determining and controlling action against the most important locks with the most potent and specific keys. The two most important target classes in drug discovery for some time have been protein kinases and G Protein-Coupled Receptors (GPCRs). Both classes of proteins are large families that perform very different tasks within the body. Kinases activate and inactive many cellular processes by catalysing the transfer of a phosphate group from Adenosine Tri-Phosphate (ATP) to other targets. GPCRs perform the job of interacting with chemical signals and communicating them into a biological response. Dysfunction in both types of proteins in certain cells can lead to a loss of biological control and, ultimately, a cancer. Both of kinases and GPCRs have entirely different chemical structures so structural knowledge therefore becomes crucial in any approach targeting cells where dysfunction has occurred. Thus, for this thesis, a member from each class was investigated using a combination of structural approaches. From the kinase class, the kinase Proviral Integration site for MuLV (Pim-1) and from the GPCR class, the cell membrane-bound Smoothened receptor (SMO). The kinase \pimone\ was the target of various approaches in \autoref{chap:three}. Although a heavily studied target from the mid-2000's, there is a paucity of inhibitors targeting residues more remote from structural characteristics that define kinases. Further limiting extension possibilities is that \pimone\ is constitutively active so no inhibitors targeting an inactive state are possible. An initial project (\pone) used the known binding properties of small molecules, or, `fragments' to elucidate structural and dynamic information useful for targeting \pimone. This was followed by three projects, all with the goal of inhibitor discovery, all with different foci. In \ptwo, fragment binding modes from \pone\ provided the basis for the extension and development of drug-like inhibitors with a focus on synthetic feasibility. In contrast, inhibitors were found in \pthree\ via a large-scale public dataset of purchasable molecules that possess drug-like properties. Finally, \pfour\ took the truncated form of a particularly attractive fragment from \pone\ that was crystallised with \pimone, verified its binding mode and then generated extensions with, again, a focus on synthetic feasibility. The GPCR \smo\ has fewer molecular studies and much about its structural behaviour remains unknown. As the most `druggable' protein in the Hedgehog pathway, structural studies have primarily focussed on stabilising its inactive state to prevent signal transduction. Allied to this is that there are generally few inhibitors for \smo\ and the drugs for cancers related to its dysfunction are vulnerable to mutations that significantly reduce their effectiveness or abrogate it entirely. The elucidation of structural information in therefore of high priority. An initial study attempting to identify an unknown molecule from prior experiments led to insights regarding binding characteristics of specific moieties. This was particularly important to understand not just where favourable moieties bind but also sections of the \smo\ binding pocket with unfavourable binding. In both subsequent virtual screens performed in Chapter 4, the primary aim was to find new drug-like inhibitors of \smo\ using large public datasets of commercially-available molecules. The initial screen retrieved relatively few inhibitors so the binding pocket was modified to find a structural state more amenable to small molecule binding. These modifications led to a significant number of new, chemically novel inhibitors for \smo, some structural information useful for future inhibitors and the elucidation of structure-activity relationships useful for inhibitor design. This underpins the idea that structural information is of critical importance in the discovery and design of molecular inhibitors

    Computational Analysis of T Cell Receptor Repertoire and Structure

    Get PDF
    The human adaptive immune system has evolved to provide a sophisticated response to a vast body of pathogenic microbes and toxic substances. The primary mediators of this response are T and B lymphocytes. Antigenic peptides presented at the surface of infected cells by major histocompatibility complex (MHC) molecules are recognised by T cell receptors (TCRs) with exceptional specificity. This specificity arises from the enormous diversity in TCR sequence and structure generated through an imprecise process of somatic gene recombination that takes place during T cell development. Quantification of the TCR repertoire through the analysis of data produced by high-throughput RNA sequencing allows for a characterisation of the immune response to disease over time and between patients, and the development of methods for diagnosis and therapeutic design. The latest version of the software package Decombinator extracts and quantifies the TCR repertoire with improved accuracy and compatibility with complementary experimental protocols and external computational tools. The software has been extended for analysis of fragmented short-read data from single cells, comparing favourably with two alternative tools. The development of cell-based therapeutics and vaccines is incomplete without an understanding of molecular level interactions. The breadth of TCR diversity and cross-reactivity presents a barrier for comprehensive structural resolution of the repertoire by traditional means. Computational modelling of TCR structures and TCR-pMHC complexes provides an efficient alternative. Four generalpurpose protein-protein docking platforms were compared in their ability to accurately model TCR-pMHC complexes. Each platform was evaluated against an expanded benchmark of docking test cases and in the context of varying additional information about the binding interface. Continual innovation in structural modelling techniques sets the stage for novel automated tools for TCR design. A prototype platform has been developed, integrating structural modelling and an optimisation routine, to engineer desirable features into TCR and TCR-pMHC complex models

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here
    corecore