762 research outputs found

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Computational biology in the 21st century

    Get PDF
    Computational biologists answer biological and biomedical questions by using computation in support of—or in place of—laboratory procedures, hoping to obtain more accurate answers at a greatly reduced cost. The past two decades have seen unprecedented technological progress with regard to generating biological data; next-generation sequencing, mass spectrometry, microarrays, cryo-electron microscopy, and other highthroughput approaches have led to an explosion of data. However, this explosion is a mixed blessing. On the one hand, the scale and scope of data should allow new insights into genetic and infectious diseases, cancer, basic biology, and even human migration patterns. On the other hand, researchers are generating datasets so massive that it has become difficult to analyze them to discover patterns that give clues to the underlying biological processes.National Institutes of Health. (U.S.) ( grant GM108348)Hertz Foundatio

    Planetary Biology and Microbial Ecology: Molecular Ecology and the Global Nitrogen cycle

    Get PDF
    This report summarizes the results of the Planetary Biology and Molecular Ecology's summer 1991 program, which was held at the Marine Biological Laboratory in Woods Hole, Massachusetts. The purpose of the interdisciplinary PBME program is to integrate, via lectures and laboratory work, the contributions of university and NASA scientists and student interns. The goals of the 1991 program were to examine several aspects of the biogeochemistry of the nitrogen cycle and to teach the application of modern methods of molecular genetics to field studies of organisms. Descriptions of the laboratory projects and protocols and abstracts and references of the lectures are presented

    All Fingers Are Not the Same: Handling Variable-Length Sequences in a Discriminative Setting Using Conformal Multi-Instance Kernels

    Get PDF
    Most string kernels for comparison of genomic sequences are generally tied to using (absolute) positional information of the features in the individual sequences. This poses limitations when comparing variable-length sequences using such string kernels. For example, profiling chromatin interactions by 3C-based experiments results in variable-length genomic sequences (restriction fragments). Here, exact position-wise occurrence of signals in sequences may not be as important as in the scenario of analysis of the promoter sequences, that typically have a transcription start site as reference. Existing position-aware string kernels have been shown to be useful for the latter scenario. In this work, we propose a novel approach for sequence comparison that enables larger positional freedom than most of the existing approaches, can identify a possibly dispersed set of features in comparing variable-length sequences, and can handle both the aforementioned scenarios. Our approach, emph{CoMIK}, identifies not just the features useful towards classification but also their locations in the variable-length sequences, as evidenced by the results of three binary classification experiments, aided by recently introduced visualization techniques. Furthermore, we show that we are able to efficiently retrieve and interpret the weight vector for the complex setting of multiple multi-instance kernels

    On Computable Protein Functions

    Get PDF
    Proteins are biological machines that perform the majority of functions necessary for life. Nature has evolved many different proteins, each of which perform a subset of an organism’s functional repertoire. One aim of biology is to solve the sparse high dimensional problem of annotating all proteins with their true functions. Experimental characterisation remains the gold standard for assigning function, but is a major bottleneck due to resource scarcity. In this thesis, we develop a variety of computational methods to predict protein function, reduce the functional search space for proteins, and guide the design of experimental studies. Our methods take two distinct approaches: protein-centric methods that predict the functions of a given protein, and function-centric methods that predict which proteins perform a given function. We applied our methods to help solve a number of open problems in biology. First, we identified new proteins involved in the progression of Alzheimer’s disease using proteomics data of brains from a fly model of the disease. Second, we predicted novel plastic hydrolase enzymes in a large data set of 1.1 billion protein sequences from metagenomes. Finally, we optimised a neural network method that extracts a small number of informative features from protein networks, which we used to predict functions of fission yeast proteins

    Cultivation of Phylogenetically Diverse and Metabolically Novel Atrazine Degrading Soil Bacteria using Bio-Sep® Beads

    Get PDF
    The s-triazine herbicide atrazine is among the most widely used herbicides worldwide. The human health effects of atrazine exposure remain unclear, but atrazine and its metabolites appear to cause developmental abnormalities in amphibians. A mounting body of knowledge concerning the ecology of atrazine degradation suggests the current collection of microorganisms and genetic biomarkers of atrazine degradation cannot accurately predict the natural attenuation of atrazine. To this end, a novel in situ enrichment approach using highly porous, atrazine-impregnated Bio-Sep® beads was employed to isolate a taxonomically diverse group of atrazine-degrading bacteria from soil and wetland environments in Tennessee and Ohio. The study greatly increased the scope and diversity of organisms previously shown to degrade atrazine. Most notable, a novel lineage within the Bacteriodetes phylum, Dyadobacter sp. was obtained, constituting the first report of the atrazine-degrading phenotype within this division. Although not taxonomically novel, previously unreported atrazine-degrading taxa from Actinobacteria (Catellatospora, Microbacterium, and Glycomyces), Alpha-Proteobacteria (Methylobacterium, Methylopila, and Sphingomonas), Beta-Proteobacteria (Variovorax and Acidovorax), and Gamma-Proteobacteria (Acinetobacter, Rahnella, and Pantoea) were also isolated. Evidence for metabolic diversity in atrazine catabolism was observed in the collection. Most significantly, the atrazine-chlorohydrolase gene, encoded by trzN, was the only known catabolic gene detected in our collection with the exception of the Arthrobacter strains which typically also possessed atzB and atzC, that code for enzymes needed for sequential dealkylation of 2-hydroxy atrazine. No other known genes for the intermediate metabolism were detected in many of the isolates suggesting the presence of alternative degradative pathways for atrazine among soil bacteria. Previously, trzN has only been reported in high G+C Gram-positive bacteria but our results revealed that this catabolic gene is much more broadly distributed among classes including the Alpha and Beta Proteobacteria. The results demonstrate that Bio-Sep® beads are a suitable matrix for recruiting a highly diverse subset of the bacterial community involved in atrazine degradation

    Discovering meaning from biological sequences: focus on predicting misannotated proteins, binding patterns, and G4-quadruplex secondary

    Get PDF
    Proteins are the principal catalytic agents, structural elements, signal transmitters, transporters, and molecular machines in cells. Experimental determination of protein function is expensive in time and resources compared to computational methods. Hence, assigning proteins function, predicting protein binding patterns, and understanding protein regulation are important problems in functional genomics and key challenges in bioinformatics. This dissertation comprises of three studies. In the first two papers, we apply machine-learning methods to (1) identify misannotated sequences and (2) predict the binding patterns of proteins. The third paper is (3) a genome-wide analysis of G4-quadruplex sequences in the maize genome. The first two papers are based on two-stage classification methods. The first stage uses machine-learning approaches that combine composition-based and sequence-based features. We use either a decision trees (HDTree) or support vector machines (SVM) as second-stage classifiers and show that classification performance reaches or outperforms more computationally expensive approaches. For study (1) our method identified potential misannotated sequences within a well-characterized set of proteins in a popular bioinformatics database. We identified misannotated proteins and show the proteins have contradicting AmiGO and UniProt annotations. For study (2), we developed a three-phase approach: Phase I classifies whether a protein binds with another protein. Phase II determines whether a protein-binding protein is a hub. Phase III classifies hub proteins based on the number of binding sites and the number of concurrent binding partners. For study (3), we carried out a computational genome-wide screen to identify non-telomeric G4-quadruplex (G4Q) elements in maize to explore their potential role in gene regulation for flowering plants. Analysis of G4Q-containing genes uncovered a striking tendency for their enrichment in genes of networks and pathways associated with electron transport, sugar degradation, and hypoxia responsiveness. The maize G4Q elements may play a previously unrecognized role in coordinating global regulation of gene expression in response to hypoxia to control carbohydrate metabolism for anaerobic metabolism. We demonstrated that our three studies have the ability to predict and provide new insights in classifying misannotated proteins, understanding protein binding patterns, and identifying a potentially new model for gene regulation

    Descoberta de novos vírus vegetais e estudo da diversidade viral intrahospedeiro a partir de dados gerados por sequenciamento em larga escala

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Departamento de Biologia Celular, Programa de Pós-Graduação em Biologia Molecular, 2018.As tecnologias de sequenciamento em larga escala permitem a caracterização genômica das comunidades virais presentes em tecidos vegetais e animais e em amostras ambientais com alta sensibilidade e acurácia. Devido ao sequenciamento simultâneo de várias sequências genômicas, essa técnica também permite o estudo da alta diversidade genética intra-hospedeiro apresentada pelos vírus de RNA. Nesse trabalho, estudamos e estabelecemos um pipeline para a análise de viroma em planta utilizando o modelo de pepino, reportamos a descoberta de dois novos vírus em videiras, Grapevine enamovirus1 (GEV-1) e Grapevine virga-like virus (GVLV). Após ensaios de amplificação rápida das extremidades do cDNA (rapid amplification of cDNA ends – RACE) da extremidade 5' do genoma do GEV-1, foi descrito a sequência genômica quase completa desse vírus (6227 bp), possibilitando a sua classificação como um membro do gênero Enamovirus (família Luteoviridae) com base na sua organização genômica, estudos filogenéticos e critérios estabelecidos pelo Comitê Internacional de Taxonomia de Vírus (International Committee on Taxonomy of Viruses – ICTV). Entretanto, o genoma do GVLV permanece parciamente sequenciado em duas partes: um contig de 3348 bp que contém os domínios metiltransferase (Met) e helicase (Hel); e um contig de 1272 bp que corresponde à RNA polimerase dependente de RNA (RdRp) parcial. Com base em estudos filogenéticos não foi possível classificar esse vírus, que mostra baixa identidade com ambas as famílias Virgaviridae e Bromoviridae. Adicionalmente, esse trabalho apresenta um estudo da diversidade genética intra-hospedeiro dos vírus associados ao enrolamento da folha da videira (Grapevine leafroll-associated virus – GLRaV), com foco na poliproteína dos GLRaV-2 e -3 (gêneros Closterovirus e Ampelovirus, respectivamente), assim como a detecção in silico de uma molécula defectiva de RNA do GLRaV-4 (Ampelovirus), a partir de dados gerados por HTS. As populações intra-hospedeiro encontradas em dois isolados de GLRaV-2 mostraram apenas 11 polimorfismos de único nucleotídeo (single nucleotide polymorphisms – SNPs) em comum (~14% dos SNPs em cada isolado). A diversidade intra-hospedeiro encontrada em dois isolados de GLRaV-3 foi baixa se comparada com os isolados de GLRaV-2.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) e Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).High-throughput sequencing technologies allow for the genomic characterization of viral communities present in plant and animal tissues and environmental samples with high accuracy and sensibility. The simultaneous sequencing of various genomic sequences by this technique also makes it useful for the study of the high intrahost genetic diversity presented by RNA viruses. In this work, we studied and established the conditions of analysis of plant virome using the cucumber model, the discovery of two novel grapevine viruses, Grapevine enamovirus-1 (GEV-1) and Grapevine virga-like virus (GVLV). After rapid amplification of cDNA ends (RACE) assays of the 5' end of GEV-1 genome, we obtained the near full genomic sequence of this virus (6227 bp), enabling its classification as a member of the genus Enamovirus (family Luteoviridae) based on its genomic properties, phylogenetic studies and criteria stablished by the International Committee on Taxonomy of Viruses (ICTV). However, the genome of GVLV remains only partially sequenced, separated in two parts: a 3348 bp contig containing the methyltranferase (Met) and helicase (Hel) domains; and a 1272 bp contig which corresponds to the partial RNA dependent RNA polimerase (RdRp). Based on phylogenetic studies, were not able to classify this novel virus, which shows low identity with viruses in the families Virgaviridae and Bromoviridae. Additionally, this works presents a study on the intrahost genetic diversity of Grapevine leafroll-associated viruses (GLRaVs), focusing on the polyprotein of GLRaV-2 and -3 (genera Closterovirus and Ampelovirus, respectively), as well as an in silico detection of a defective RNA molecule of GLRaV-4 (Ampelovirus). The intrahost population of two isolates of GLRaV-2 showed only 11 single nucleotide polymorphisms (SNPs) in common (~14 of the SNPs found on each isolate). The intrahost genetic diversity found on two isolates of GLRaV3 was low compared to GLRaV-2

    Comparative Genomics of Ape Plasmodium Parasites Reveals Key Evolutionary Events Leading to Human Malaria

    Get PDF
    African great apes are infected with at least six species of P. falciparum-like parasites, including the ancestor of P. falciparum. Comparative studies of these parasites and P. falciparum (collectively termed the Laverania subgenus) will provide insight into the evolutionary origins of P. falciparum and identify genetic features that influence host tropism. Here we show that ape Laverania parasites do not serve as a recurrent source of human malaria and use novel enrichment techniques to derive near full-length genomes of close and distant relatives of P. falciparum. Using a combination of single template amplification and deep sequencing, we observe no evidence of ape Laverania infections in forest dwelling humans in Cameroon. This result supports previous findings that ape Laverania parasites are host specific and have successfully colonized humans only once. To understand the determinants of host specificity and identify genetic characteristics unique to P. falciparum, we develop a novel method for selective enrichment of Plasmodium DNA from sub-microscopically infected whole blood samples. We use this technique to enrich for Laverania genomic DNA from chimpanzee blood samples and assemble near full length genomes for both close (P. reichenowi) and distant (P. gaboni) relatives of P. falciparum. Comparative analyses of these genomes to P. falciparum identify features that are conserved across the Laverania subgenus, including the expansion of the FIKK kinases and the presence of var-like multigene families in all Laverania species. Our analyses also identify genetic features that are unique to P. falciparum, such as a very low within-species diversity and a complex evolutionary history of the essential invasion genes RH5 and CyRPA. This dissertation lays the groundwork for future comparative analyses of the Laverania subgenus including population genomic analyses of ape parasites and comparisons of P. falciparum to its ancestor, P. praefalciparum
    • …
    corecore