3,990 research outputs found

    Crumple: A Method for Complete Enumeration of All Possible Pseudoknot-Free RNA Secondary Structures

    Get PDF
    The computing for this project was performed at the OU Supercomputing Center for Education & Research (OSCER) at the University of Oklahoma (OU). OSCER director Henry Neeman and OSCER staff provided valuable technical expertise. The authors acknowledge and appreciate the discussions about this work with Dr. Changwook Kim, Adam Heck, Sean Lavelle, and Jui-wen Liu.Conceived and designed the experiments: SB SJS. Performed the experiments: SB JWS. Analyzed the data: SB JWS SJS. Wrote the paper: SB JWS SJS.The diverse landscape of RNA conformational space includes many canyons and crevices that are distant from the lowest minimum free energy valley and remain unexplored by traditional RNA structure prediction methods. A complete description of the entire RNA folding landscape can facilitate identification of biologically important conformations. The Crumple algorithm rapidly enumerates all possible non-pseudoknotted structures for an RNA sequence without consideration of thermodynamics while filtering the output with experimental data. The Crumple algorithm provides an alternative approach to traditional free energy minimization programs for RNA secondary structure prediction. A complete computation of all non-pseudoknotted secondary structures can reveal structures that would not be predicted by methods that sample the RNA folding landscape based on thermodynamic predictions. The free energy minimization approach is often successful but is limited by not considering RNA tertiary and protein interactions and the possibility that kinetics rather than thermodynamics determines the functional RNA fold. Efficient parallel computing and filters based on experimental data make practical the complete enumeration of all non-pseudoknotted structures. Efficient parallel computing for Crumple is implemented in a ring graph approach. Filters for experimental data include constraints from chemical probing of solvent accessibility, enzymatic cleavage of paired or unpaired nucleotides, phylogenetic covariation, and the minimum number and lengths of helices determined from crystallography or cryo-electron microscopy. The minimum number and length of helices has a significant effect on reducing conformational space. Pairing constraints reduce conformational space more than single nucleotide constraints. Examples with Alfalfa Mosaic Virus RNA and Trypanosome brucei guide RNA demonstrate the importance of evaluating all possible structures when pseduoknots, RNA-protein interactions, and metastable structures are important for biological function. Crumple software is freely available at http://adenosine.chem.ou.edu/software.html.Yeshttp://www.plosone.org/static/editorial#pee

    PhagePro: prophage finding tool

    Get PDF
    Dissertação de mestrado em BioinformáticaBacteriophages are viruses that infect bacteria and use them to reproduce. Their reproductive cycle can be lytic or lysogenic. The lytic cycle leads to the bacteria death, given that the bacteriophage hijacks hosts machinery to produce phage parts necessary to assemble a new complete bacteriophage, until cell wall lyse occurs. On the other hand, the lysogenic reproductive cycle comprises the bacteriophage genetic material in the bacterial genome, becoming a prophage. Sometimes, due to external stimuli, these prophages can be induced to perform a lytic cycle. Moreover, the lysogenic cycle can lead to significant modifications in bacteria, for example, antibiotic resistance. To that end, PhagePro was created. This tool finds and characterises prophages inserted in the bacterial genome. Using 42 features, three datasets were created and five machine learning algorithms were tested. All models were evaluated in two phases, during testing and with real bacterial cases. During testing, all three datasets reached the 98 % F1 score mark in their best result. In the second phase, the results of the models were used to predict real bacterial cases and the results compared to the results of two tools, Prophage Hunter and PHASTER. The best model found 110 zones out of 154 and the model with the best result in dataset 3 had 94 in common. As a final test, Agrobacterium fabrum strC68 was extensively analysed. The results show that PhagePro was capable of detecting more regions with proteins associated with phages than the other two tools. In the ligth of the results obtained, PhagePro has shown great potential in the discovery and characterisation of bacterial alterations caused by prophages.Bacteriófagos são vírus que infetam bactérias usando-as para garantir a manutenção do seu genoma. Este processo pode ser realizado por ciclo lítico ou lipogénico. O ciclo lítico consiste em usar a célula para seu proveito, criar bacteriófagos e lisar a célula. Por outro lado, no ciclo lipogénico o bacteriófago insere o seu código genético no genoma da bactéria, o que pode levar à transferência de genes de interesse, tornando-se importante uma monitorização dos profagos. Assim foi desenvolvido o PhagePro, uma ferramenta capaz de encontrar e caracterizar bacteriófagos em genomas bactérias. Foram criadas features para distinguir profagos de bactérias, criando três datasets e usando algoritmos de aprendizagem de máquina. Os modelos foram avaliados durante duas fases, a fase de teste e a fase de casos reais. Na primeira fase de testes, o melhor modelo do dataset 1 teve 98% de F1 score, dataset 2 teve 98% e do dataset 3 também teve 98%. Todos os modelos, para teste em casos reais, foram comparados com previsões de duas ferramentas Prophage Hunter e PHASTER. O modelo com os melhores resultados obteve 110 de 154 zonas em comum com as duas ferramentas e o modelo do dataset 3 teve 94 zonas. Por fim, foi feita a análise dos resultados da bactéria Agrobacterium fabrum strC68. Os resultados obtidos mostram resultados diferentes, mas válidos, as ferramentas comparadas, visto que o PhagePro consegue detectar zonas com proteínas associadas a fagos que as outras tools não conseguem. Em virtude dos resultados obtidos, PhagePro mostrou que é capaz de encontrar e caracterizar profagos em bactérias.Este estudo contou com o apoio da Fundação para a Ciência e Tecnologia (FCT) portuguesa no âmbito do financiamento estratégico da unidade UIDB/04469/2020. A obra também foi parcialmente financiada pelo Projeto PTDC/SAU-PUB/29182/2017 [POCI-01-0145-FEDER-029182]

    Evolutionary and in silico analysis of the antiviral TRIM22 gene

    Get PDF
    Tripartite motif protein 22 (TRIM22) is an evolutionarily ancient interferon-induced protein that been shown to potently inhibit human immunodeficiency virus (HIV), hepatitis B virus (HBV), and influenza A virus (IAV) replication. Altered TRIM22 expression levels have also been linked to autoimmune disease, cancer, and cellular proliferation. Despite its important role in a number of biological processes, the factors that influence TRIM22 expression and/or antiviral activity remain largely unknown. To identify key functional sites in TRIM22, we performed extensive evolutionary and in silico analyses on the TRIM22 coding region. These tools allowed us to pinpoint multiple sites in TRIM22 that have evolved under positive selection during mammalian evolution, including one site that coincides with the location of a common non-synonymous SNP (nsSNP) in the human TRIM22 gene (TRIM22 rs1063303:G\u3eC). Remarkably, we found that the frequency of TRIM22 rs1063303:G\u3eC varied considerably among different ethnic populations and African (AFR), American (AMR), and European (EUR) populations contained an excess of intermediate frequency TRIM22 rs1063303:G\u3eC alleles when compared to a neutral model of evolution. The latter is typically indicative of balancing selection, a non-neutral selective process that maintains polymorphism in a population. Interestingly, we also found that the TRIM22 nsSNP rs1063303:G\u3eC had an inverse impact on TRIM22 function. TRIM22 rs1063303:G\u3eC increased TRIM22 expression levels, but decreased its anti-HIV activity and altered its subcellular localization pattern. In addition to these studies, we used a variety of in silico methods to prioritize and delineate other functional sites in TRIM22. We showed that the majority of positively selected sites in the C-terminal B30.2 domain of TRIM22 are located in one of four surface-exposed variable loops that are critical for the anti-HIV effects of the closely-related TRIM5α protein. Moreover, we used six different in silico nsSNP prediction programs to screen all of the nsSNPs in the TRIM22 gene and identified 14 high-risk nsSNPs that are predicted to be highly deleterious to TRIM22 function. Finally, to examine the TRIM22 nsSNP rs1063303:G\u3eC in a more isolated population, we genotyped this nsSNP in two Inuit populations (Canadian and Greenlandic Inuit). We found that the TRIM22 rs1063303:C allele is inordinately prevalent in the Inuit compared to non-Inuit populations and that these two populations do not contain an excess of intermediate frequency TRIM22 rs1063303:G\u3eC alleles compared to a neutral model of evolution, indicating that site TRIM22 rs1063303:G\u3eC has not evolved under balancing selection in the Inuit. Lastly, we found an interesting association between the TRIM22 rs1063303:C allele and serum levels of triglycerides (TG) and high-density lipoprotein (HDL). Taken together, the results presented here identify a number of pertinent sites in the TRIM22 protein that likely influence its biological and/or antiviral functions

    Computational Methods in Science and Engineering : Proceedings of the Workshop SimLabs@KIT, November 29 - 30, 2010, Karlsruhe, Germany

    Get PDF
    In this proceedings volume we provide a compilation of article contributions equally covering applications from different research fields and ranging from capacity up to capability computing. Besides classical computing aspects such as parallelization, the focus of these proceedings is on multi-scale approaches and methods for tackling algorithm and data complexity. Also practical aspects regarding the usage of the HPC infrastructure and available tools and software at the SCC are presented

    Deep Evolutionary Generative Molecular Modeling for RNA Aptamer Drug Design

    Get PDF
    Deep Aptamer Evolutionary Model (DAPTEV Model). Typical drug development processes are costly, time consuming and often manual with regard to research. Aptamers are short, single-stranded oligonucleotides (RNA/DNA) that bind to, and inhibit, target proteins and other types of molecules similar to antibodies. Compared with small-molecule drugs, these aptamers can bind to their targets with high affinity (binding strength) and specificity (designed to uniquely interact with the target only). The typical development process for aptamers utilizes a manual process known as Systematic Evolution of Ligands by Exponential Enrichment (SELEX), which is costly, slow, and often produces mild results. The focus of this research is to create a deep learning approach for the generating and evolving of aptamer sequences to support aptamer-based drug development. These sequences must be unique, contain at least some level of structural complexity, and have a high level of affinity and specificity for the intended target. Moreover, after training, the deep learning system, known as a Variational Autoencoder, must possess the ability to be queried for new sequences without the need for further training. Currently, this research is applied to the SARS-CoV-2 (Covid-19) spike protein’s receptor-binding domain (RBD). However, careful consideration has been placed in the intentional design of a general solution for future viral applications. Each individual run took five and a half days to complete. Over the course of two months, three runs were performed for three different models. After some sequence, score, and statistical comparisons, it was observed that the deep learning model was able to produce structurally complex aptamers with strong binding affinities and specificities to the target Covid-19 RBD. Furthermore, due to the nature of VAEs, this model is indeed able to be queried for new aptamers of similar quality based on previous training. Results suggest that VAE-based deep learning methods are capable of optimizing aptamer-target binding affinities and specificities (multi-objective learning), and are a strong tool to aid in aptamer-based drug development

    Automated Genome-Wide Protein Domain Exploration

    Get PDF
    Exploiting the exponentially growing genomics and proteomics data requires high quality, automated analysis. Protein domain modeling is a key area of molecular biology as it unravels the mysteries of evolution, protein structures, and protein functions. A plethora of sequences exist in protein databases with incomplete domain knowledge. Hence this research explores automated bioinformatics tools for faster protein domain analysis. Automated tool chains described in this dissertation generate new protein domain models thus enabling more effective genome-wide protein domain analysis. To validate the new tool chains, the Shewanella oneidensis and Escherichia coli genomes were processed, resulting in a new peptide domain database, detection of poor domain models, and identification of likely new domains. The automated tool chains will require months or years to model a small genome when executing on a single workstation. Therefore the dissertation investigates approaches with grid computing and parallel processing to significantly accelerate these bioinformatics tool chains
    corecore