123 research outputs found
VariantClassifier: A hierarchical variant classifier for annotated genomes
<p>Abstract</p> <p>Background</p> <p>High-throughput DNA sequencing has produced a large number of closed and well annotated genomes. As the focus from whole genome sequencing and assembly moves towards resequencing, variant data is becoming more accessible and large quantities of polymorphisms are being detected. An easy-to-use tool for quickly assessing the potential importance of these discovered variants becomes ever important.</p> <p>Findings</p> <p>Written in Perl, the VariantClassifier receives a list of polymorphisms and genome annotation, and generates a hierarchically-structured classification for each variant. Depending on the available annotation, the VariantClassifier may assign each polymorphism to a large variety of feature types, such as intergenic or genic; upstream promoter region, intronic region, exonic region or downstream transcript region; 5' splice site or 3' splice site; 5' untranslated region (UTR), 3' UTR or coding sequence (CDS); impacted protein domain; substitution, insertion or deletion; synonymous or non-synonymous; conserved or unconserved; and frameshift or amino acid insertion or deletion (indel). If applicable, the truncated or altered protein sequence is also predicted. For organisms with annotation maintained at Ensembl, a software application for downloading the necessary annotation is also provided, although the classifier will function with properly formatted annotation provided through alternative means.</p> <p>Conclusions</p> <p>We have utilized the VariantClassifier for several projects since its implementation to quickly assess hundreds of thousands of variations on several genomes and have received requests to make the tool publically available. The project website can be found at: <url>http://www.jcvi.org/cms/research/projects/variantclassifier</url>.</p
Mechanism of chimera formation during the Multiple Displacement Amplification reaction
BACKGROUND: Multiple Displacement Amplification (MDA) is a method used for amplifying limiting DNA sources. The high molecular weight amplified DNA is ideal for DNA library construction. While this has enabled genomic sequencing from one or a few cells of unculturable microorganisms, the process is complicated by the tendency of MDA to generate chimeric DNA rearrangements in the amplified DNA. Determining the source of the DNA rearrangements would be an important step towards reducing or eliminating them. RESULTS: Here, we characterize the major types of chimeras formed by carrying out an MDA whole genome amplification from a single E. coli cell and sequencing by the 454 Life Sciences method. Analysis of 475 chimeras revealed the predominant reaction mechanisms that create the DNA rearrangements. The highly branched DNA synthesized in MDA can assume many alternative secondary structures. DNA strands extended on an initial template can be displaced becoming available to prime on a second template creating the chimeras. Evidence supports a model in which branch migration can displace 3'-ends freeing them to prime on the new templates. More than 85% of the resulting DNA rearrangements were inverted sequences with intervening deletions that the model predicts. Intramolecular rearrangements were favored, with displaced 3'-ends reannealing to single stranded 5'-strands contained within the same branched DNA molecule. In over 70% of the chimeric junctions, the 3' termini had initiated priming at complimentary sequences of 2–21 nucleotides (nts) in the new templates. CONCLUSION: Formation of chimeras is an important limitation to the MDA method, particularly for whole genome sequencing. Identification of the mechanism for chimera formation provides new insight into the MDA reaction and suggests methods to reduce chimeras. The 454 sequencing approach used here will provide a rapid method to assess the utility of reaction modifications
Genetic Variation in an Individual Human Exome
There is much interest in characterizing the variation in a human individual, because this may elucidate what contributes significantly to a person's phenotype, thereby enabling personalized genomics. We focus here on the variants in a person's ‘exome,’ which is the set of exons in a genome, because the exome is believed to harbor much of the functional variation. We provide an analysis of the ∼12,500 variants that affect the protein coding portion of an individual's genome. We identified ∼10,400 nonsynonymous single nucleotide polymorphisms (nsSNPs) in this individual, of which ∼15–20% are rare in the human population. We predict ∼1,500 nsSNPs affect protein function and these tend be heterozygous, rare, or novel. Of the ∼700 coding indels, approximately half tend to have lengths that are a multiple of three, which causes insertions/deletions of amino acids in the corresponding protein, rather than introducing frameshifts. Coding indels also occur frequently at the termini of genes, so even if an indel causes a frameshift, an alternative start or stop site in the gene can still be used to make a functional protein. In summary, we reduced the set of ∼12,500 nonsilent coding variants by ∼8-fold to a set of variants that are most likely to have major effects on their proteins' functions. This is our first glimpse of an individual's exome and a snapshot of the current state of personalized genomics. The majority of coding variants in this individual are common and appear to be functionally neutral. Our results also indicate that some variants can be used to improve the current NCBI human reference genome. As more genomes are sequenced, many rare variants and non-SNP variants will be discovered. We present an approach to analyze the coding variation in humans by proposing multiple bioinformatic methods to hone in on possible functional variation
Nanoliter Reactors Improve Multiple Displacement Amplification of Genomes from Single Cells
Since only a small fraction of environmental bacteria are amenable to laboratory culture, there is great interest in genomic sequencing directly from single cells. Sufficient DNA for sequencing can be obtained from one cell by the Multiple Displacement Amplification (MDA) method, thereby eliminating the need to develop culture methods. Here we used a microfluidic device to isolate individual Escherichia coli and amplify genomic DNA by MDA in 60-nl reactions. Our results confirm a report that reduced MDA reaction volume lowers nonspecific synthesis that can result from contaminant DNA templates and unfavourable interaction between primers. The quality of the genome amplification was assessed by qPCR and compared favourably to single-cell amplifications performed in standard 50-μl volumes. Amplification bias was greatly reduced in nanoliter volumes, thereby providing a more even representation of all sequences. Single-cell amplicons from both microliter and nanoliter volumes provided high-quality sequence data by high-throughput pyrosequencing, thereby demonstrating a straightforward route to sequencing genomes from single cells
Evaluation of next generation sequencing platforms for population targeted sequencing studies
Human sequence generated from three next-generation sequencing platforms reveals systematic variability in sequence coverage due to local sequence characteristics
Characterization of Uncultivable Bat Influenza Virus Using a Replicative Synthetic Virus
Bats harbor many viruses, which are periodically transmitted to humans resulting in outbreaks of disease (e.g., Ebola, SARSCoV). Recently, influenza virus-like sequences were identified in bats; however, the viruses could not be cultured. This discovery aroused great interest in understanding the evolutionary history and pandemic potential of bat-influenza. Using synthetic genomics, we were unable to rescue the wild type bat virus, but could rescue a modified bat-influenza virus that had the HA and NA coding regions replaced with those of A/PR/8/1934 (H1N1). This modified bat-influenza virus replicated efficiently in vitro and in mice, resulting in severe disease. Additional studies using a bat-influenza virus that had the HA and NA of A/swine/Texas/4199-2/1998 (H3N2) showed that the PR8 HA and NA contributed to the pathogenicity in mice. Unlike
other influenza viruses, engineering truncations hypothesized to reduce interferon antagonism into the NS1 protein didn’t attenuate bat-influenza. In contrast, substitution of a putative virulence mutation from the bat-influenza PB2 significantly attenuated the virus in mice and introduction of a putative virulence mutation increased its pathogenicity. Mini-genome replication studies and virus reassortment experiments demonstrated that bat-influenza has very limited genetic and protein compatibility with Type A or Type B influenza viruses, yet it readily reassorts with another divergent bat-influenza
virus, suggesting that the bat-influenza lineage may represent a new Genus/Species within the Orthomyxoviridae family. Collectively, our data indicate that the bat-influenza viruses recently identified are authentic viruses that pose little, if any, pandemic threat to humans; however, they provide new insights into the evolution and basic biology of influenza viruses
Influenza A virus evolution and spatio-temporal dynamics in Eurasian wild birds: a phylogenetic and phylogeographical study of whole-genome sequence data.
Low pathogenic avian influenza A viruses (IAVs) have a natural host reservoir in wild waterbirds and the potential to spread to other host species. Here, we investigated the evolutionary, spatial and temporal dynamics of avian IAVs in Eurasian wild birds. We used whole-genome sequences collected as part of an intensive long-term Eurasian wild bird surveillance study, and combined this genetic data with temporal and spatial information to explore the virus evolutionary dynamics. Frequent reassortment and co-circulating lineages were observed for all eight genomic RNA segments over time. There was no apparent species-specific effect on the diversity of the avian IAVs. There was a spatial and temporal relationship between the Eurasian sequences and significant viral migration of avian IAVs from West Eurasia towards Central Eurasia. The observed viral migration patterns differed between segments. Furthermore, we discuss the challenges faced when analysing these surveillance and sequence data, and the caveats to be borne in mind when drawing conclusions from the apparent results of such analyses.We thank all ornithologists and other collaborators for their continuous support. We thank V. Munster, E. Skepner, O. Vuong, C. Baas, J. Guldemeester, M. Schutten, G. van der Water, D. Smith and E. Bortz for technical support and stimulating discussions. This manuscript was prepared while D.E. Wentworth was employed at the JCVI. The opinions expressed in this article are the author’s own and do not reflect the view of the Centers for Disease Control, the Department of Health and Human Services, or the United States government. This work was supported by NIAID/NIH contract HHSN266200700010C, HHSN272201400008C, HHSN272201400006C and HHSN272200900007C, a Wellcome Trust Fellowship Strategic Travel Award under contract WT089235MF, a DTRA FRCWMD Broad Agency Announcement under contract HDTRA1-09-14-FRCWMD GRANT11177182, by the EU Framework six program NewFluBird (044490) by contracts with the Dutch Ministry of Economic Affairs and a NIAID/NIH CEIRS travel grant under contract HHSN266200700010C. The Swedish sampling and analysis was supported by the Swedish Research Councils VR and FORMAS.This is the final version of the article. It first appeared from the Society for General Microbiology via http://dx.doi.org/10.1099/vir.0.00015
Ecosystem Interactions Underlie the Spread of Avian Influenza A Viruses with Pandemic Potential
Despite evidence for avian influenza A virus (AIV) transmission between wild and domestic ecosystems, the roles of bird migration and poultry trade in the spread of viruses remain enigmatic. In this study, we integrate ecosystem interactions into a phylogeographic model to assess the contribution of wild and domestic hosts to AIV distribution and persistence. Analysis of globally sampled AIV datasets shows frequent two-way transmission between wild and domestic ecosystems. In general, viral flow from domestic to wild bird populations was restricted to within a geographic region. In contrast, spillover from wild to domestic populations occurred both within and between regions. Wild birds mediated long-distance dispersal at intercontinental scales whereas viral spread among poultry populations was a major driver of regional spread. Viral spread between poultry flocks frequently originated from persistent lineages circulating in regions of intensive poultry production. Our analysis of long-term surveillance data demonstrates that meaningful insights can be inferred from integrating ecosystem into phylogeographic reconstructions that may be consequential for pandemic preparedness and livestock protection.National Institutes of Health (U.S.) (NIH Centers for Excellence in Influenza Research and Surveillance (CEIRS, contract # HHSN266200700010C))National Institutes of Health (U.S.) (NIH Centers for Excellence in Influenza Research and Surveillance (CEIRS, contract # HHSN272201400008C))National Institutes of Health (U.S.) (NIH Centers for Excellence in Influenza Research and Surveillance (CEIRS, contract # HHSN272201400006C)
A Universal Next-Generation Sequencing Protocol To Generate Noninfectious Barcoded cDNA Libraries from High-Containment RNA Viruses
ABSTRACT Several biosafety level 3 and/or 4 (BSL-3/4) pathogens are high-consequence, single-stranded RNA viruses, and their genomes, when introduced into permissive cells, are infectious. Moreover, many of these viruses are select agents (SAs), and their genomes are also considered SAs. For this reason, cDNAs and/or their derivatives must be tested to ensure the absence of infectious virus and/or viral RNA before transfer out of the BSL-3/4 and/or SA laboratory. This tremendously limits the capacity to conduct viral genomic research, particularly the application of next-generation sequencing (NGS). Here, we present a sequence-independent method to rapidly amplify viral genomic RNA while simultaneously abolishing both viral and genomic RNA infectivity across multiple single-stranded positive-sense RNA (ssRNA+) virus families. The process generates barcoded DNA amplicons that range in length from 300 to 1,000 bp, which cannot be used to rescue a virus and are stable to transport at room temperature. Our barcoding approach allows for up to 288 barcoded samples to be pooled into a single library and run across various NGS platforms without potential reconstitution of the viral genome. Our data demonstrate that this approach provides full-length genomic sequence information not only from high-titer virion preparations but it can also recover specific viral sequence from samples with limited starting material in the background of cellular RNA, and it can be used to identify pathogens from unknown samples. In summary, we describe a rapid, universal standard operating procedure that generates high-quality NGS libraries free of infectious virus and infectious viral RNA. IMPORTANCE This report establishes and validates a standard operating procedure (SOP) for select agents (SAs) and other biosafety level 3 and/or 4 (BSL-3/4) RNA viruses to rapidly generate noninfectious, barcoded cDNA amenable for next-generation sequencing (NGS). This eliminates the burden of testing all processed samples derived from high-consequence pathogens prior to transfer from high-containment laboratories to lower-containment facilities for sequencing. Our established protocol can be scaled up for high-throughput sequencing of hundreds of samples simultaneously, which can dramatically reduce the cost and effort required for NGS library construction. NGS data from this SOP can provide complete genome coverage from viral stocks and can also detect virus-specific reads from limited starting material. Our data suggest that the procedure can be implemented and easily validated by institutional biosafety committees across research laboratories
Analysis of the Aedes albopictus C6/36 genome provides insight into cell line utility for viral propagation
BACKGROUND: The 50-year-old Aedes albopictus C6/36 cell line is a resource for the detection, amplification, and analysis of mosquito-borne viruses including Zika, dengue, and chikungunya. The cell line is derived from an unknown number of larvae from an unspecified strain of Aedes albopictus mosquitoes. Toward improved utility of the cell line for research in virus transmission, we present an annotated assembly of the C6/36 genome. RESULTS: The C6/36 genome assembly has the largest contig N50 (3.3 Mbp) of any mosquito assembly, presents the sequences of both haplotypes for most of the diploid genome, reveals independent null mutations in both alleles of the Dicer locus, and indicates a male-specific genome. Gene annotation was computed with publicly available mosquito transcript sequences. Gene expression data from cell line RNA sequence identified enrichment of growth-related pathways and conspicuous deficiency in aquaporins and inward rectifier K+ channels. As a test of utility, RNA sequence data from Zika-infected cells were mapped to the C6/36 genome and transcriptome assemblies. Host subtraction reduced the data set by 89%, enabling faster characterization of nonhost reads. CONCLUSIONS: The C6/36 genome sequence and annotation should enable additional uses of the cell line to study arbovirus vector interactions and interventions aimed at restricting the spread of human disease
- …