94 research outputs found

    VariantClassifier: A hierarchical variant classifier for annotated genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput DNA sequencing has produced a large number of closed and well annotated genomes. As the focus from whole genome sequencing and assembly moves towards resequencing, variant data is becoming more accessible and large quantities of polymorphisms are being detected. An easy-to-use tool for quickly assessing the potential importance of these discovered variants becomes ever important.</p> <p>Findings</p> <p>Written in Perl, the VariantClassifier receives a list of polymorphisms and genome annotation, and generates a hierarchically-structured classification for each variant. Depending on the available annotation, the VariantClassifier may assign each polymorphism to a large variety of feature types, such as intergenic or genic; upstream promoter region, intronic region, exonic region or downstream transcript region; 5' splice site or 3' splice site; 5' untranslated region (UTR), 3' UTR or coding sequence (CDS); impacted protein domain; substitution, insertion or deletion; synonymous or non-synonymous; conserved or unconserved; and frameshift or amino acid insertion or deletion (indel). If applicable, the truncated or altered protein sequence is also predicted. For organisms with annotation maintained at Ensembl, a software application for downloading the necessary annotation is also provided, although the classifier will function with properly formatted annotation provided through alternative means.</p> <p>Conclusions</p> <p>We have utilized the VariantClassifier for several projects since its implementation to quickly assess hundreds of thousands of variations on several genomes and have received requests to make the tool publically available. The project website can be found at: <url>http://www.jcvi.org/cms/research/projects/variantclassifier</url>.</p

    Genetic Variation in an Individual Human Exome

    Get PDF
    There is much interest in characterizing the variation in a human individual, because this may elucidate what contributes significantly to a person's phenotype, thereby enabling personalized genomics. We focus here on the variants in a person's ‘exome,’ which is the set of exons in a genome, because the exome is believed to harbor much of the functional variation. We provide an analysis of the ∼12,500 variants that affect the protein coding portion of an individual's genome. We identified ∼10,400 nonsynonymous single nucleotide polymorphisms (nsSNPs) in this individual, of which ∼15–20% are rare in the human population. We predict ∼1,500 nsSNPs affect protein function and these tend be heterozygous, rare, or novel. Of the ∼700 coding indels, approximately half tend to have lengths that are a multiple of three, which causes insertions/deletions of amino acids in the corresponding protein, rather than introducing frameshifts. Coding indels also occur frequently at the termini of genes, so even if an indel causes a frameshift, an alternative start or stop site in the gene can still be used to make a functional protein. In summary, we reduced the set of ∼12,500 nonsilent coding variants by ∼8-fold to a set of variants that are most likely to have major effects on their proteins' functions. This is our first glimpse of an individual's exome and a snapshot of the current state of personalized genomics. The majority of coding variants in this individual are common and appear to be functionally neutral. Our results also indicate that some variants can be used to improve the current NCBI human reference genome. As more genomes are sequenced, many rare variants and non-SNP variants will be discovered. We present an approach to analyze the coding variation in humans by proposing multiple bioinformatic methods to hone in on possible functional variation

    Nanoliter Reactors Improve Multiple Displacement Amplification of Genomes from Single Cells

    Get PDF
    Since only a small fraction of environmental bacteria are amenable to laboratory culture, there is great interest in genomic sequencing directly from single cells. Sufficient DNA for sequencing can be obtained from one cell by the Multiple Displacement Amplification (MDA) method, thereby eliminating the need to develop culture methods. Here we used a microfluidic device to isolate individual Escherichia coli and amplify genomic DNA by MDA in 60-nl reactions. Our results confirm a report that reduced MDA reaction volume lowers nonspecific synthesis that can result from contaminant DNA templates and unfavourable interaction between primers. The quality of the genome amplification was assessed by qPCR and compared favourably to single-cell amplifications performed in standard 50-μl volumes. Amplification bias was greatly reduced in nanoliter volumes, thereby providing a more even representation of all sequences. Single-cell amplicons from both microliter and nanoliter volumes provided high-quality sequence data by high-throughput pyrosequencing, thereby demonstrating a straightforward route to sequencing genomes from single cells

    Characterization of Uncultivable Bat Influenza Virus Using a Replicative Synthetic Virus

    Get PDF
    Bats harbor many viruses, which are periodically transmitted to humans resulting in outbreaks of disease (e.g., Ebola, SARSCoV). Recently, influenza virus-like sequences were identified in bats; however, the viruses could not be cultured. This discovery aroused great interest in understanding the evolutionary history and pandemic potential of bat-influenza. Using synthetic genomics, we were unable to rescue the wild type bat virus, but could rescue a modified bat-influenza virus that had the HA and NA coding regions replaced with those of A/PR/8/1934 (H1N1). This modified bat-influenza virus replicated efficiently in vitro and in mice, resulting in severe disease. Additional studies using a bat-influenza virus that had the HA and NA of A/swine/Texas/4199-2/1998 (H3N2) showed that the PR8 HA and NA contributed to the pathogenicity in mice. Unlike other influenza viruses, engineering truncations hypothesized to reduce interferon antagonism into the NS1 protein didn’t attenuate bat-influenza. In contrast, substitution of a putative virulence mutation from the bat-influenza PB2 significantly attenuated the virus in mice and introduction of a putative virulence mutation increased its pathogenicity. Mini-genome replication studies and virus reassortment experiments demonstrated that bat-influenza has very limited genetic and protein compatibility with Type A or Type B influenza viruses, yet it readily reassorts with another divergent bat-influenza virus, suggesting that the bat-influenza lineage may represent a new Genus/Species within the Orthomyxoviridae family. Collectively, our data indicate that the bat-influenza viruses recently identified are authentic viruses that pose little, if any, pandemic threat to humans; however, they provide new insights into the evolution and basic biology of influenza viruses

    Influenza A virus evolution and spatio-temporal dynamics in Eurasian wild birds: a phylogenetic and phylogeographical study of whole-genome sequence data.

    Get PDF
    Low pathogenic avian influenza A viruses (IAVs) have a natural host reservoir in wild waterbirds and the potential to spread to other host species. Here, we investigated the evolutionary, spatial and temporal dynamics of avian IAVs in Eurasian wild birds. We used whole-genome sequences collected as part of an intensive long-term Eurasian wild bird surveillance study, and combined this genetic data with temporal and spatial information to explore the virus evolutionary dynamics. Frequent reassortment and co-circulating lineages were observed for all eight genomic RNA segments over time. There was no apparent species-specific effect on the diversity of the avian IAVs. There was a spatial and temporal relationship between the Eurasian sequences and significant viral migration of avian IAVs from West Eurasia towards Central Eurasia. The observed viral migration patterns differed between segments. Furthermore, we discuss the challenges faced when analysing these surveillance and sequence data, and the caveats to be borne in mind when drawing conclusions from the apparent results of such analyses.We thank all ornithologists and other collaborators for their continuous support. We thank V. Munster, E. Skepner, O. Vuong, C. Baas, J. Guldemeester, M. Schutten, G. van der Water, D. Smith and E. Bortz for technical support and stimulating discussions. This manuscript was prepared while D.E. Wentworth was employed at the JCVI. The opinions expressed in this article are the author’s own and do not reflect the view of the Centers for Disease Control, the Department of Health and Human Services, or the United States government. This work was supported by NIAID/NIH contract HHSN266200700010C, HHSN272201400008C, HHSN272201400006C and HHSN272200900007C, a Wellcome Trust Fellowship Strategic Travel Award under contract WT089235MF, a DTRA FRCWMD Broad Agency Announcement under contract HDTRA1-09-14-FRCWMD GRANT11177182, by the EU Framework six program NewFluBird (044490) by contracts with the Dutch Ministry of Economic Affairs and a NIAID/NIH CEIRS travel grant under contract HHSN266200700010C. The Swedish sampling and analysis was supported by the Swedish Research Councils VR and FORMAS.This is the final version of the article. It first appeared from the Society for General Microbiology via http://dx.doi.org/10.1099/vir.0.00015

    Ecosystem Interactions Underlie the Spread of Avian Influenza A Viruses with Pandemic Potential

    Get PDF
    Despite evidence for avian influenza A virus (AIV) transmission between wild and domestic ecosystems, the roles of bird migration and poultry trade in the spread of viruses remain enigmatic. In this study, we integrate ecosystem interactions into a phylogeographic model to assess the contribution of wild and domestic hosts to AIV distribution and persistence. Analysis of globally sampled AIV datasets shows frequent two-way transmission between wild and domestic ecosystems. In general, viral flow from domestic to wild bird populations was restricted to within a geographic region. In contrast, spillover from wild to domestic populations occurred both within and between regions. Wild birds mediated long-distance dispersal at intercontinental scales whereas viral spread among poultry populations was a major driver of regional spread. Viral spread between poultry flocks frequently originated from persistent lineages circulating in regions of intensive poultry production. Our analysis of long-term surveillance data demonstrates that meaningful insights can be inferred from integrating ecosystem into phylogeographic reconstructions that may be consequential for pandemic preparedness and livestock protection.National Institutes of Health (U.S.) (NIH Centers for Excellence in Influenza Research and Surveillance (CEIRS, contract # HHSN266200700010C))National Institutes of Health (U.S.) (NIH Centers for Excellence in Influenza Research and Surveillance (CEIRS, contract # HHSN272201400008C))National Institutes of Health (U.S.) (NIH Centers for Excellence in Influenza Research and Surveillance (CEIRS, contract # HHSN272201400006C)

    A Universal Next-Generation Sequencing Protocol To Generate Noninfectious Barcoded cDNA Libraries from High-Containment RNA Viruses

    Get PDF
    ABSTRACT Several biosafety level 3 and/or 4 (BSL-3/4) pathogens are high-consequence, single-stranded RNA viruses, and their genomes, when introduced into permissive cells, are infectious. Moreover, many of these viruses are select agents (SAs), and their genomes are also considered SAs. For this reason, cDNAs and/or their derivatives must be tested to ensure the absence of infectious virus and/or viral RNA before transfer out of the BSL-3/4 and/or SA laboratory. This tremendously limits the capacity to conduct viral genomic research, particularly the application of next-generation sequencing (NGS). Here, we present a sequence-independent method to rapidly amplify viral genomic RNA while simultaneously abolishing both viral and genomic RNA infectivity across multiple single-stranded positive-sense RNA (ssRNA+) virus families. The process generates barcoded DNA amplicons that range in length from 300 to 1,000 bp, which cannot be used to rescue a virus and are stable to transport at room temperature. Our barcoding approach allows for up to 288 barcoded samples to be pooled into a single library and run across various NGS platforms without potential reconstitution of the viral genome. Our data demonstrate that this approach provides full-length genomic sequence information not only from high-titer virion preparations but it can also recover specific viral sequence from samples with limited starting material in the background of cellular RNA, and it can be used to identify pathogens from unknown samples. In summary, we describe a rapid, universal standard operating procedure that generates high-quality NGS libraries free of infectious virus and infectious viral RNA. IMPORTANCE This report establishes and validates a standard operating procedure (SOP) for select agents (SAs) and other biosafety level 3 and/or 4 (BSL-3/4) RNA viruses to rapidly generate noninfectious, barcoded cDNA amenable for next-generation sequencing (NGS). This eliminates the burden of testing all processed samples derived from high-consequence pathogens prior to transfer from high-containment laboratories to lower-containment facilities for sequencing. Our established protocol can be scaled up for high-throughput sequencing of hundreds of samples simultaneously, which can dramatically reduce the cost and effort required for NGS library construction. NGS data from this SOP can provide complete genome coverage from viral stocks and can also detect virus-specific reads from limited starting material. Our data suggest that the procedure can be implemented and easily validated by institutional biosafety committees across research laboratories

    Analysis of the Aedes albopictus C6/36 genome provides insight into cell line utility for viral propagation

    Get PDF
    BACKGROUND: The 50-year-old Aedes albopictus C6/36 cell line is a resource for the detection, amplification, and analysis of mosquito-borne viruses including Zika, dengue, and chikungunya. The cell line is derived from an unknown number of larvae from an unspecified strain of Aedes albopictus mosquitoes. Toward improved utility of the cell line for research in virus transmission, we present an annotated assembly of the C6/36 genome. RESULTS: The C6/36 genome assembly has the largest contig N50 (3.3 Mbp) of any mosquito assembly, presents the sequences of both haplotypes for most of the diploid genome, reveals independent null mutations in both alleles of the Dicer locus, and indicates a male-specific genome. Gene annotation was computed with publicly available mosquito transcript sequences. Gene expression data from cell line RNA sequence identified enrichment of growth-related pathways and conspicuous deficiency in aquaporins and inward rectifier K+ channels. As a test of utility, RNA sequence data from Zika-infected cells were mapped to the C6/36 genome and transcriptome assemblies. Host subtraction reduced the data set by 89%, enabling faster characterization of nonhost reads. CONCLUSIONS: The C6/36 genome sequence and annotation should enable additional uses of the cell line to study arbovirus vector interactions and interventions aimed at restricting the spread of human disease

    Avian Influenza Viruses in Wild Birds: Virus Evolution in a Multihost Ecosystem.

    Get PDF
    Wild ducks and gulls are the major reservoirs for avian influenza A viruses (AIVs). The mechanisms that drive AIV evolution are complex at sites where various duck and gull species from multiple flyways breed, winter, or stage. The Republic of Georgia is located at the intersection of three migratory flyways: the Central Asian flyway, the East Africa/West Asia flyway, and the Black Sea/Mediterranean flyway. For six complete study years (2010 to 2016), we collected AIV samples from various duck and gull species that breed, migrate, and overwinter in Georgia. We found a substantial subtype diversity of viruses that varied in prevalence from year to year. Low-pathogenic AIV (LPAIV) subtypes included H1N1, H2N3, H2N5, H2N7, H3N8, H4N2, H6N2, H7N3, H7N7, H9N1, H9N3, H10N4, H10N7, H11N1, H13N2, H13N6, H13N8, and H16N3, and two highly pathogenic AIVs (HPAIVs) belonging to clade 2.3.4.4, H5N5 and H5N8, were found. Whole-genome phylogenetic trees showed significant host species lineage restriction for nearly all gene segments and significant differences in observed reassortment rates, as defined by quantification of phylogenetic incongruence, and in nucleotide sequence diversity for LPAIVs among different host species. Hemagglutinin clade 2.3.4.4 H5N8 viruses, which circulated in Eurasia during 2014 and 2015, did not reassort, but analysis after their subsequent dissemination during 2016 and 2017 revealed reassortment in all gene segments except NP and NS. Some virus lineages appeared to be unrelated to AIVs in wild bird populations in other regions, with maintenance of local AIVs in Georgia, whereas other lineages showed considerable genetic interrelationships with viruses circulating in other parts of Eurasia and Africa, despite relative undersampling in the area.IMPORTANCE Waterbirds (e.g., gulls and ducks) are natural reservoirs of avian influenza viruses (AIVs) and have been shown to mediate the dispersal of AIVs at intercontinental scales during seasonal migration. The segmented genome of influenza viruses enables viral RNA from different lineages to mix or reassort when two viruses infect the same host. Such reassortant viruses have been identified in most major human influenza pandemics and several poultry outbreaks. Despite their importance, we have only recently begun to understand AIV evolution and reassortment in their natural host reservoirs. This comprehensive study illustrates AIV evolutionary dynamics within a multihost ecosystem at a stopover site where three major migratory flyways intersect. Our analysis of this ecosystem over a 6-year period provides a snapshot of how these viruses are linked to global AIV populations. Understanding the evolution of AIVs in the natural host is imperative to mitigating both the risk of incursion into domestic poultry and the potential risk to mammalian hosts, including humans

    ANDES: Statistical tools for the ANalyses of DEep Sequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The advancements in DNA sequencing technologies have allowed researchers to progress from the analyses of a single organism towards the deep sequencing of a sample of organisms. With sufficient sequencing depth, it is now possible to detect subtle variations between members of the same species, or between mixed species with shared biomarkers, such as the 16S rRNA gene. However, traditional sequencing analyses of samples from largely homogeneous populations are often still based on multiple sequence alignments (MSA), where each sequence is placed along a separate row and similarities between aligned bases can be followed down each column. While this visual format is intuitive for a small set of aligned sequences, the representation quickly becomes cumbersome as sequencing depths cover loci hundreds or thousands of reads deep.</p> <p>Findings</p> <p>We have developed ANDES, a software library and a suite of applications, written in Perl and R, for the statistical ANalyses of DEep Sequencing. The fundamental data structure underlying ANDES is the position profile, which contains the nucleotide distributions for each genomic position resultant from a multiple sequence alignment (MSA). Tools include the root mean square deviation (RMSD) plot, which allows for the visual comparison of multiple samples on a position-by-position basis, and the computation of base conversion frequencies (transition/transversion rates), variation (Shannon entropy), inter-sample clustering and visualization (dendrogram and multidimensional scaling (MDS) plot), threshold-driven consensus sequence generation and polymorphism detection, and the estimation of empirically determined sequencing quality values.</p> <p>Conclusions</p> <p>As new sequencing technologies evolve, deep sequencing will become increasingly cost-efficient and the inter and intra-sample comparisons of largely homogeneous sequences will become more common. We have provided a software package and demonstrated its application on various empirically-derived datasets. Investigators may download the software from Sourceforge at <url>https://sourceforge.net/projects/andestools</url>.</p
    corecore