1,669 research outputs found

    Automated Isolation of Translational Efficiency Bias that Resists the Confounding Effect of GC(AT)-Content

    Get PDF
    Genomic sequencing projects are an abundant source of information for biological studies ranging from the molecular to the ecological in scale; however, much of the information present may yet be hidden from casual analysis. One such information domain, trends in codon usage, can provide a wealth of information about an organism\u27s genes and their expression. Degeneracy in the genetic code allows more than one triplet codon to code for the same amino acid, and usage of these codons is often biased such that one or more of these synonymous codons is preferred. Detection of this bias is an important tool in the analysis of genomic data, particularly as a predictor of gene expressivity. Methods for identifying codon usage bias in genomic data that rely solely on genomic sequence data are susceptible to being confounded by the presence of several factors simultaneously influencing codon selection. Presented here is a new technique for removing the effects of one of the more common confounding factors, GC(AT)-content, and of visualizing the search-space for codon usage bias through the use of a solution landscape. This technique successfully isolates expressivity-related codon usage trends, using only genomic sequence information, where other techniques fail due to the presence of GC(AT)-content confounding influences

    REPARATION : ribosome profiling assisted (re-)annotation of bacterial genomes

    Get PDF
    Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation (https://github.com/Biobix/ REPARATION). REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs and >70% of all (variants of) annotated protein coding ORFs were predicted by REPARATION to be translated. Our predictions are supported by matching mass spectrometry proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to intrinsically perform a de novo ORF delineation in bacterial genomes irrespective of the sequence features linked to open reading frames

    Codon usage bias and the evolution of influenza A viruses. Codon Usage Biases of Influenza Virus

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The influenza A virus is an important infectious cause of morbidity and mortality in humans and was responsible for 3 pandemics in the 20<sup>th </sup>century. As the replication of the influenza virus is based on its host's machinery, codon usage of its viral genes might be subject to host selection pressures, especially after interspecies transmission. A better understanding of viral evolution and host adaptive responses might help control this disease.</p> <p>Results</p> <p>Relative Synonymous Codon Usage (RSCU) values of the genes from segment 1 to segment 6 of avian and human influenza viruses, including pandemic H1N1, were studied via Correspondence Analysis (CA). The codon usage patterns of seasonal human influenza viruses were distinct among their subtypes and different from those of avian viruses. Newly isolated viruses could be added to the CA results, creating a tool to investigate the host origin and evolution of viral genes. It was found that the 1918 pandemic H1N1 virus contained genes with mammalian-like viral codon usage patterns, indicating that the introduction of this virus to humans was not through <it>in toto </it>transfer of an avian influenza virus.</p> <p>Many human viral genes had directional changes in codon usage over time of viral isolation, indicating the effect of host selection pressures. These changes reduced the overall GC content and the usage of G at the third codon position in the viral genome. Limited evidence of translational selection pressure was found in a few viral genes.</p> <p>Conclusions</p> <p>Codon usage patterns from CA allowed identification of host origin and evolutionary trends in influenza viruses, providing an alternative method and a tool to understand the evolution of influenza viruses. Human influenza viruses are subject to selection pressure on codon usage which might assist in understanding the characteristics of newly emerging viruses.</p

    Prediction of Directional Changes of Influenza A Virus Genome Sequences with Emphasis on Pandemic H1N1/09 as a Model Case

    Get PDF
    Influenza virus poses a significant threat to public health, as exemplified by the recent introduction of the new pandemic strain H1N1/09 into human populations. Pandemics have been initiated by the occurrence of novel changes in animal sources that eventually adapt to human. One important issue in studies of viral genomes, particularly those of influenza virus, is to predict possible changes in genomic sequence that will become hazardous. We previously established a clustering method termed ‘BLSOM’ (batch-learning self-organizing map) that does not depend on sequence alignment and can characterize and compare even 1 million genomic sequences in one run. Strategies for comparing a vast number of genomic sequences simultaneously become increasingly important in genome studies because of remarkable progresses in nucleotide sequencing. In this study, we have constructed BLSOMs based on the oligonucleotide and codon composition of all influenza A viral strains available. Without prior information with regard to their hosts, sequences derived from strains isolated from avian or human sources were successfully clustered according to the hosts. Notably, the pandemic H1N1/09 strains have oligonucleotide and codon compositions that are clearly different from those of human seasonal influenza A strains. This enables us to infer future directional changes in the influenza A viral genome

    Evolution of the Sequence Composition of Flaviviruses

    Get PDF
    The adaption of pathogens to their host(s) is a major factor in the emergence of infectious disease and the persistent survival of many of the infectious diseases within the population. Since many of the smaller viral pathogens are entirely dependent upon host machinery, it has been postulated that they are under selection for a composition similar to that of their host. Analyses of sequence composition have been conducted for numerous small viral species including the Flavivirus genus. Examination of the species within this particular genus that infect vertebrate hosts revealed that sequence composition proclivities do not correspond with vector transmission as the evolutionary history of this species suggests. Recent sequencing efforts have generated complete genomes for many viral species including members of the Flavivirus genus. A thorough comparison of the sequence composition was conducted for all of the available Flaviviruses for which the complete genome is publicly available. This effort expands the work of previous studies to include new vector-borne species as well as members of the insect-specific group which previously have not been explored. Metrics, including mono-, di-, and trinucleotide abundances as well as NC values and codon usage preferences, were explored both for the entire polyprotein sequence as well as for each individual coding region. Preferences for compositions correspond to host-range rather than evolutionary history; species which infect vertebrate hosts exhibited particular preferences similar to each other as well as in correspondence with their host’s preferences. Flaviviruses which do not infect vertebrate hosts, however, did not show these proclivities, with the exception of the Kamiti River virus suggesting its recent (either past or present) infectivity of an unknown vertebrate host

    Predicting reservoir hosts and arthropod vectors from evolutionary signatures in RNA virus genomes

    Get PDF
    Identifying the animal origins of RNA viruses requires years of field and laboratory studies that stall responses to emerging infectious diseases. Using large genomic and ecological datasets, we demonstrate that animal reservoirs and the existence and identity of arthropod vectors can be predicted directly from viral genome sequences via machine learning. We illustrate the ability of these models to predict the epidemiology of diverse viruses across most human-infective families of single-stranded RNA viruses, including 69 viruses with previously elusive or never-investigated reservoirs or vectors. Models such as these, which capitalize on the proliferation of low-cost genomic sequencing, can narrow the time lag between virus discovery and targeted research, surveillance, and management

    Prey range and genome evolution of Halobacteriovorax marinus predatory bacteria from an estuary

    Get PDF
    © The Author(s), 2018. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in mSphere 3 (2018): e00508-17, doi:10.1128/mSphere.00508-17.Halobacteriovorax strains are saltwater-adapted predatory bacteria that attack Gram-negative bacteria and may play an important role in shaping microbial communities. To understand how Halobacteriovorax strains impact ecosystems and develop them as biocontrol agents, it is important to characterize variation in predation phenotypes and investigate Halobacteriovorax genome evolution. We isolated Halobacteriovorax marinus BE01 from an estuary in Rhode Island using Vibrio from the same site as prey. Small, fast-moving, attack-phase BE01 cells attach to and invade prey cells, consistent with the intraperiplasmic predation strategy of the H. marinus type strain, SJ. BE01 is a prey generalist, forming plaques on Vibrio strains from the estuary, Pseudomonas from soil, and Escherichia coli. Genome analysis revealed extremely high conservation of gene order and amino acid sequences between BE01 and SJ, suggesting strong selective pressure to maintain the genome in this H. marinus lineage. Despite this, we identified two regions of gene content difference that likely resulted from horizontal gene transfer. Analysis of modal codon usage frequencies supports the hypothesis that these regions were acquired from bacteria with different codon usage biases than H. marinus. In one of these regions, BE01 and SJ carry different genes associated with mobile genetic elements. Acquired functions in BE01 include the dnd operon, which encodes a pathway for DNA modification, and a suite of genes involved in membrane synthesis and regulation of gene expression that was likely acquired from another Halobacteriovorax lineage. This analysis provides further evidence that horizontal gene transfer plays an important role in genome evolution in predatory bacteria.This research was supported by an Institutional Development award (IDeA) from the National Institute of General Medical Sciences of the National Institutes of Health under grant no. P20GM103430 and funding from Providence College

    Community-wide analysis of microbial genome sequence signatures

    Get PDF
    Genome signatures are used to identify and cluster sequences de novo from an acid biofilm microbial community metagenomic dataset, revealing information about the low-abundance community members

    Integrative omics analysis of Pseudomonas aeruginosa virus PA5oct highlights the molecular complexity of jumbo phages

    Get PDF
    Pseudomonas virus vB_PaeM_PA5oct is proposed as a model jumbo bacteriophage to investigate phage-bacteria interactions and is a candidate for phage therapy applications. Combining hybrid sequencing, RNA-Seq and mass spectrometry allowed us to accurately annotate its 286,783 bp genome with 461 coding regions including four non-coding RNAs (ncRNAs) and 93 virion-associated proteins. PA5oct relies on the host RNA polymerase for the infection cycle and RNA-Seq revealed a gradual take-over of the total cell transcriptome from 21% in early infection to 93% in late infection. PA5oct is not organized into strictly contiguous regions of temporal transcription, but some genomic regions transcribed in early, middle and late phases of infection can be discriminated. Interestingly, we observe regions showing limited transcription activity throughout the infection cycle. We show that PA5oct upregulates specific bacterial operons during infection including operons pncA-pncB1-nadE involved in NAD biosynthesis, psl for exopolysaccharide biosynthesis and nap for periplasmic nitrate reductase production. We also observe a downregulation of T4P gene products suggesting mechanisms of superinfection exclusion. We used the proteome of PA5oct to position our isolate amongst other phages using a gene-sharing network. This integrative omics study illustrates the molecular diversity of jumbo viruses and raises new questions towards cellular regulation and phage-encoded hijacking mechanisms

    A comprehensive analysis of genome composition and codon usage patterns of emerging coronaviruses

    Get PDF
    An outbreak of atypical pneumonia caused by a novel Betacoronavirus (βCoV), named SARS-CoV-2 has been declared a public health emergency of international concern by the World Health Organization. In order to gain insight into the emergence, evolution and adaptation of SARS-CoV-2 viruses, a comprehensive analysis of genome composition and codon usage of βCoV circulating in China was performed. A biased nucleotide composition was found for SARS-CoV-2 genome. This bias in genomic composition is reflected in its codon and amino acid usage patterns. The overall codon usage in SARS-CoV-2 is similar among themselves and slightly biased. Most of the highly frequent codons are A- and U-ending, which strongly suggests that mutational bias is the main force shaping codon usage in this virus. Significant differences in relative synonymous codon usage frequencies among SARS-CoV-2 and human cells were found. These differences are due to codon usage preferences.ANII; CSI
    corecore