16 research outputs found

    FATHMM: Frameshift Aware Translated Hidden Markov Models

    Get PDF

    Recognition of short functional motifs in protein sequences

    Get PDF
    The main goal of this study was to develop a method for computational de novo prediction of short linear motifs (SLiMs) in protein sequences that would provide advantages over existing solutions for the users. The users are typically biological laboratory researchers, who want to elucidate the function of a protein that is possibly mediated by a short motif. Such a process can be subcellular localization, secretion, post-translational modification or degradation of proteins. Conducting such studies only with experimental techniques is often associated with high costs and risks of uncertainty. Preliminary prediction of putative motifs with computational methods, them being fast and much less expensive, provides possibilities for generating hypotheses and therefore, more directed and efficient planning of experiments. To meet this goal, I have developed HH-MOTiF – a web-based tool for de novo discovery of SLiMs in a set of protein sequences. While working on the project, I have also detected patterns in sequence properties of certain SLiMs that make their de novo prediction easier. As some of these patterns are not yet described in the literature, I am sharing them in this thesis. While evaluating and comparing motif prediction results, I have identified conceptual gaps in theoretical studies, as well as existing practical solutions for comparing two sets of positional data annotating the same set of biological sequences. To close this gap and to be able to carry out in-depth performance analyses of HH-MOTiF in comparison to other predictors, I have developed a corresponding statistical method, SLALOM (for StatisticaL Analysis of Locus Overlap Method). It is currently available as a standalone command line tool

    Recognition of short functional motifs in protein sequences

    Get PDF
    The main goal of this study was to develop a method for computational de novo prediction of short linear motifs (SLiMs) in protein sequences that would provide advantages over existing solutions for the users. The users are typically biological laboratory researchers, who want to elucidate the function of a protein that is possibly mediated by a short motif. Such a process can be subcellular localization, secretion, post-translational modification or degradation of proteins. Conducting such studies only with experimental techniques is often associated with high costs and risks of uncertainty. Preliminary prediction of putative motifs with computational methods, them being fast and much less expensive, provides possibilities for generating hypotheses and therefore, more directed and efficient planning of experiments. To meet this goal, I have developed HH-MOTiF – a web-based tool for de novo discovery of SLiMs in a set of protein sequences. While working on the project, I have also detected patterns in sequence properties of certain SLiMs that make their de novo prediction easier. As some of these patterns are not yet described in the literature, I am sharing them in this thesis. While evaluating and comparing motif prediction results, I have identified conceptual gaps in theoretical studies, as well as existing practical solutions for comparing two sets of positional data annotating the same set of biological sequences. To close this gap and to be able to carry out in-depth performance analyses of HH-MOTiF in comparison to other predictors, I have developed a corresponding statistical method, SLALOM (for StatisticaL Analysis of Locus Overlap Method). It is currently available as a standalone command line tool

    Influence of non-synonymous sequence mutations on the architecture of HIV-1 clade C protease receptor site : docking and molecular dynamics studies

    Get PDF
    Despite the current interventions to avert contagions and AIDS-related deaths, sub-Saharan Africa is still the region most severely affected by the HIV/AIDS pandemic, where clade C is the dominant circulating HIV-1 strain. The pol-encoded HIV-1 protease enzyme has been extensively exploited as a drug target. Protease inhibitors have been engineered within the framework of clade B, the commonest in America, Europe and Australia. Recent studies have attested the existence of sequence and catalytic disparities between clades B and C proteases that could upset drug susceptibilities. Emergence of drug-resistant associated mutations and combinatorial explosions due to recombination thwarts the attempt to stabilize the current highly active antiretroviral therapy (HAART) baseline. The project aimed at identifying the structural and molecular mechanisms hired by mutants to affect the efficacies of both FDA approved and Rhodes University (RU)-synthesized inhibitors, in order to define how current and or future drugs ought to be modified or synthesized with the intent of combating drug resistance. The rationale involved the generation of homology models of the HIV-1 sequences from the South African infants failing treatment with two protease inhibitors: lopinavir and ritonavir (as monitored by alterations in surrogate markers: CD4 cell count decline and viral load upsurge). Consistent with previous studies, we established nine polymorphisms: 12S, 15V, 19I, 36I, 41K, 63P, 69K, 89M, and 93L, linked to subtype C wild-type; some of which are associated with protease treatment in clade B. Even though we predicted two occurrence patterns of M46I, I54V and V82A mutations as V82A→I54V→M46I and I54V→V82A→M46V, other possibilities might exist. Mutations either caused a protracted or contracted active site cleft, which enforced differential drug responses. The in silico docking indicated susceptibility discordances between clades B and C in certain polymorphisms and non-polymorphisms. The RU-synthesized ligands displayed varied efficacies that were below those of the FDA approved protease inhibitors. The flaps underwent a wide range of structural motions to accommodate and stabilize the ligands. Computational analyses unravelled the need for these potential drugs to be restructured by (de novo) drug engineers to improve their binding fits, affinities, energies and interactions with multiple key protease residues in order to target resilient HIV-1 assemblages. Accumulating evidences on contrasting drug-choice interpretations from the Stanford HIVdb should act as an impetus for the customization of a HIVdb for the sub-Saharan subcontinent

    Studies on the Modular Evolution of Genes

    Get PDF
    Gene evolution is primarily studied through the observations of comparative cumulative point mutations between homologs. Genes also evolve through “remodelling”, the process of repurposing and reorganising genes and gene fragments into novel sequences. Gene remodelling is a relatively underappreciated evolutionary concept. Remodelling events circumscribe the development of novel sequences via fusion or fission events and through the shuffling of exons or domains. To date, all studies into remodelling have focussed on specific remodelling events, for example gene fusions in cancer samples, or have used small datasets (<15 species). As such, a comparative remodelling analyses between two taxonomic Kingdoms has yet to be completed. In 2018, CompositeSearch was developed to overcome the computational bottlenecks associated with mining all possible combinations that may attribute to remodelling events. We used CompositeSearch to investigate the comparative extent of remodelling within large fungal (107 species) and plant (50 species) datasets. We observed approximately 50% of fungal genes and 61% of plant genes to have a history of remodelling despite robust controls against Type I errors. We observed the rate of remodelled family birth and decay to be clocklike in both datasets, and that remodelled genes were considerably more homoplastic than non-remodelled genes. Functional overrepresentation analysis concluded that remodelled genes were associated with rapidly evolving systems, such as secondary metabolism, and with phenotypic novelty, such as flowering in angiosperms. Remodelling events have been associated with the development of antimicrobial resistance (AMR). As CompositeSearch does not discern between a fusion event and any other remodelling event, we developed CompositeBLAST to detect novel AMR fusion events. CompositeBLAST was considerably faster and more sensitive than previously published fusion detection tools. Using this software, we detected previously unreported mupirocin and vancomycin resistance genes as being derived from remodelling events

    Biological applications of discrete molecular dynamics

    Get PDF
    [eng] Sequence, structure and dynamics are an indivisible tandem to understand protein function. Luckily, evolution imposed a hierarchical rational between that facilitates the analysis: dynamics are encoded in the structure, which in turn, is encoded in the sequence. Decipher the mechanisms governing protein function requires contributions from diverse fields, particularly to follow molecular motions. There are technological limitations to monitor local, elemental, protein movements, since they are too fast to be followed by current experimental set-ups. Theoretical models provide necessary assistance in this regard mainly through molecular simulations. But atomistically simulations of large functional motions make computations, currently, unaffordable. The problem is that large-scale motions are rooted in the very fast elemental ones; so, in order to observe a biological-functional conformational change we have to keep track of all the elemental motions occurring. The gap in the time scale of both extremes of motions is devastating: fast motions are over 1015 times faster than functional ones. In this Thesis, I present our contribution to extend the simulation time range, in an effort towards more predictive computational models. We explored alternative methods to retrieve molecular motions from the underlying physical forces governing proteins. The method used is named Discrete Molecular Dynamics and represents by itself a significant improvement in computational efficiency. In order to go further, we lower the resolution of protein models to a coarse-grained representation both in terms of number of particles and interaction functions. We benefited from several existing algorithms to simplify calculations keeping the models as much accurate as possible. Putting all this methodological innovations together, we developed models to follow conformational transitions of proteins, from local re-arrangements to motions changing drastically the protein structure. Also, we applied novel computational approaches to account for protein flexibility upon recognizing and binding other interacting proteins. In a second stage, we investigated the echo of protein flexibility and dynamics printed out in the sequence of the protein. We observed over the history of the sequence that instead of one single native structure, proteins were tuned to have several conformations. We exploited this flexibility signature in the sequence to predict protein motions and eventually alternative protein conformations. Finally, we use our efficient tools to move protein dynamics analysis to the proteome scale. We searched for all proteins having two known conformations, a symptom of a conformational transition, and then, we used those conformations to follow the motion from one state to the other. We analyzed and structured all that dynamical information of proteins and connected our results to the most detailed simulation methods available to dissect the fine details of proteins dynamical behavior when required.[spa] Secuencia, estructura y dinámica forman un trío un insoslayable en el funcionamiento de las proteínas. El proceso evolutivo codificó la dinámica en la estructura de las proteínas, que a su vez, está codificada en la secuencia. Descifrar los mecanismos que rigen el movimiento de las proteínas requiere la fusión de experimentos y modelos teóricos. Los modelos teóricos proporcionan asistencia necesaria a través de simulaciones moleculares, pero su costo computacional es tan elevado que puede impedir el estudio. El problema radica en que los movimientos biológicamente interesantes son la consecuencia de un cúmulo de movimientos de alta frecuencia, que es necesario seguir para comprender los movimientos funcionales. La brecha entre ambos tiempos asciende a un impresionante ratio de 1015. En esta Tesis, presento métodos para aumentar la eficacia de los cálculos moleculares con el objetivo de acortar la diferencia entre el tiempo de lo que es simulable a lo que es biológicamente interesante. El método utilizado es Discrete Molecular Dynarnics y representa por sí mismo una mejora significativa en la eficiencia computacional. En resumen, hemos desarrollado modelos para seguir transiciones conformacionales de proteínas, desde movimientos locales hasta otros que cambian radicalmente la forma de la proteína. Dichos métodos fueron aplicados tanto a transiciones conformacionales como a interacciones proteína-proteína. En una segunda etapa, buscamos la imprenta en la secuencia del patrón de flexibilidad de la proteína, con el objetivo de predecir los cambios de conformación. Finalmente, utilizando los métodos desarrollados hemos concluido un análisis a gran escala sobre la dinámica de las proteínas, simulando todas las transiciones cuyos dos extremos fueron determinados experimentalmente. Los resultados de dichas simulaciones fueron integrados con los métodos de simulación más fiables disponibles, para aumentar en nivel de detalle cuando sea necesario

    Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

    Get PDF
    The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at Università degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown
    corecore