201 research outputs found
Identity-by-descent filtering of exome sequence data for disease–gene identification in autosomal recessive disorders
Motivation: Next-generation sequencing and exome-capture technologies are currently revolutionizing the way geneticists screen for disease-causing mutations in rare Mendelian disorders. However, the identification of causal mutations is challenging due to the sheer number of variants that are identified in individual exomes. Although databases such as dbSNP or HapMap can be used to reduce the plethora of candidate genes by filtering out common variants, the remaining set of genes still remains on the order of dozens
Improved genome-wide localization by ChIP-chip using double-round T7 RNA polymerase-based amplification
Chromatin immunoprecipitation combined with DNA microarrays (ChIP-chip) is a powerful technique to detect in vivo protein–DNA interactions. Due to low yields, ChIP assays of transcription factors generally require amplification of immunoprecipitated genomic DNA. Here, we present an adapted linear amplification method that involves two rounds of T7 RNA polymerase amplification (double-T7). Using this we could successfully amplify as little as 0.4 ng of ChIP DNA to sufficient amounts for microarray analysis. In addition, we compared the double-T7 method to the ligation-mediated polymerase chain reaction (LM-PCR) method in a ChIP-chip of the yeast transcription factor Gsm1p. The double-T7 protocol showed lower noise levels and stronger binding signals compared to LM-PCR. Both LM-PCR and double-T7 identified strongly bound genomic regions, but the double-T7 method increased sensitivity and specificity to allow detection of weaker binding sites
Phenotype Sequencing: Identifying the Genes That Cause a Phenotype Directly from Pooled Sequencing of Independent Mutants
Random mutagenesis and phenotype screening provide a powerful method for dissecting microbial functions, but their results can be laborious to analyze experimentally. Each mutant strain may contain 50–100 random mutations, necessitating extensive functional experiments to determine which one causes the selected phenotype. To solve this problem, we propose a “Phenotype Sequencing” approach in which genes causing the phenotype can be identified directly from sequencing of multiple independent mutants. We developed a new computational analysis method showing that 1. causal genes can be identified with high probability from even a modest number of mutant genomes; 2. costs can be cut many-fold compared with a conventional genome sequencing approach via an optimized strategy of library-pooling (multiple strains per library) and tag-pooling (multiple tagged libraries per sequencing lane). We have performed extensive validation experiments on a set of E. coli mutants with increased isobutanol biofuel tolerance. We generated a range of sequencing experiments varying from 3 to 32 mutant strains, with pooling on 1 to 3 sequencing lanes. Our statistical analysis of these data (4099 mutations from 32 mutant genomes) successfully identified 3 genes (acrB, marC, acrA) that have been independently validated as causing this experimental phenotype. It must be emphasized that our approach reduces mutant sequencing costs enormously. Whereas a conventional genome sequencing experiment would have cost 1200. In fact, our smallest experiments reliably identified acrB and marC at a cost of only 340
RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles
Background: Computational analysis of metagenomes requires the taxonomical assignment of the genome contigs assembled from DNA reads of environmental samples. Because of the diverse nature of microbiomes, the length of the assemblies obtained can vary between a few hundred bp to a few hundred Kbp. Current taxonomic classification algorithms provide accurate classification for long contigs or for short fragments from organisms that have close relatives with annotated genomes. These are significant limitations for metagenome analysis because of the complexity of microbiomes and the paucity of existing annotated genomes.
Results: We propose a robust taxonomic classification method, RAIphy, that uses a novel sequence similarity metric with iterative refinement of taxonomic models and functions effectively without these limitations. We have tested RAIphy with synthetic metagenomics data ranging between 100 bp to 50 Kbp. Within a sequence read range of 100 bp-1000 bp, the sensitivity of RAIphy ranges between 38%-81% outperforming the currently popular composition-based methods for reads in this range. Comparison with computationally more intensive sequence similarity methods shows that RAIphy performs competitively while being significantly faster. The sensitivityspecificity characteristics for relatively longer contigs were compared with the PhyloPythia and TACOA algorithms. RAIphy performs better than these algorithms at varying clade-levels. For an acid mine drainage (AMD) metagenome, RAIphy was able to taxonomically bin the sequence read set more accurately than the currently available methods, Phymm and MEGAN, and more accurately in two out of three tests than the much more computationally intensive method, PhymmBL.
Conclusions: With the introduction of the relative abundance index metric and an iterative classification method, we propose a taxonomic classification algorithm that performs competitively for a large range of DNA contig lengths assembled from metagenome data. Because of its speed, simplicity, and accuracy RAIphy can be successfully used in the binning process for a broad range of metagenomic data obtained from environmental samples
Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions
The rapid development of next generation sequencing (NGS) technology provides a new chance to extend the scale and resolution of genomic research. How to efficiently map millions of short reads to the reference genome and how to make accurate SNP calls are two major challenges in taking full advantage of NGS. In this article, we reviewed the current software tools for mapping and SNP calling, and evaluated their performance on samples from The Cancer Genome Atlas (TCGA) project. We found that BWA and Bowtie are better than the other alignment tools in comprehensive performance for Illumina platform, while NovoalignCS showed the best overall performance for SOLiD. Furthermore, we showed that next-generation sequencing platform has significantly lower coverage and poorer SNP-calling performance in the CpG islands, promoter and 5′-UTR regions of the genome. NGS experiments targeting for these regions should have higher sequencing depth than the normal genomic region
Evaluation of Intra-Host Variants of the Entire Hepatitis B Virus Genome
Genetic analysis of hepatitis B virus (HBV) frequently involves study of intra-host variants, identification of which is commonly achieved using short regions of the HBV genome. However, the use of short sequences significantly limits evaluation of genetic relatedness among HBV strains. Although analysis of HBV complete genomes using genetic cloning has been developed, its application is highly labor intensive and practiced only infrequently. We describe here a novel approach to whole genome (WG) HBV quasispecies analysis based on end-point, limiting-dilution real-time PCR (EPLD-PCR) for amplification of single HBV genome variants, and their subsequent sequencing. EPLD-PCR was used to analyze WG quasispecies from serum samples of patients (n = 38) infected with HBV genotypes A, B, C, D, E and G. Phylogenetic analysis of the EPLD-isolated HBV-WG quasispecies showed the presence of mixed genotypes, recombinant variants and sub-populations of the virus. A critical observation was that HBV-WG consensus sequences obtained by direct sequencing of PCR fragments without EPLD are genetically close, but not always identical to the major HBV variants in the intra-host population, thus indicating that consensus sequences should be judiciously used in genetic analysis. Sequence-based studies of HBV WG quasispecies should afford a more accurate assessment of HBV evolution in various clinical and epidemiological settings
- …