12 research outputs found

    Towards accurate detection and genotyping of expressed variants from whole transcriptome sequencing data

    Get PDF
    BACKGROUND: Massively parallel transcriptome sequencing (RNA-Seq) is becoming the method of choice for studying functional effects of genetic variability and establishing causal relationships between genetic variants and disease. However, RNA-Seq poses new technical and computational challenges compared to genome sequencing. In particular, mapping transcriptome reads onto the genome is more challenging than mapping genomic reads due to splicing. Furthermore, detection and genotyping of single nucleotide variants (SNVs) requires statistical models that are robust to variability in read coverage due to unequal transcript expression levels. RESULTS: In this paper we present a strategy to more reliably map transcriptome reads by taking advantage of the availability of both the genome reference sequence and transcript databases such as CCDS. We also present a novel Bayesian model for SNV discovery and genotyping based on quality scores. CONCLUSIONS: Experimental results on RNA-Seq data generated from blood cell tissue of three Hapmap individuals show that our methods yield increased accuracy compared to several widely used methods. The open source code implementing our methods, released under the GNU General Public License, is available at http://dna.engr.uconn.edu/software/NGSTools/

    Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques

    Get PDF
    Determining the underlying haplotypes of individual human genomes is an essential, but currently difficult, step toward a complete understanding of genome function. Fosmid pool-based next-generation sequencing allows genome-wide generation of 40-kb haploid DNA segments, which can be phased into contiguous molecular haplotypes computationally by Single Individual Haplotyping (SIH). Many SIH algorithms have been proposed, but the accuracy of such methods has been difficult to assess due to the lack of real benchmark data. To address this problem, we generated whole genome fosmid sequence data from a HapMap trio child, NA12878, for which reliable haplotypes have already been produced. We assembled haplotypes using eight algorithms for SIH and carried out direct comparisons of their accuracy, completeness and efficiency. Our comparisons indicate that fosmid-based haplotyping can deliver highly accurate results even at low coverage and that our SIH algorithm, ReFHap, is able to efficiently produce high-quality haplotypes. We expanded the haplotypes for NA12878 by combining the current haplotypes with our fosmid-based haplotypes, producing near-to-complete new gold-standard haplotypes containing almost 98% of heterozygous SNPs. This improvement includes notable fractions of disease-related and GWA SNPs. Integrated with other molecular biological data sets, this phase information will advance the emerging field of diploid genomics

    Approaches for integrating heterogeneous RNA-seq data reveal cross-talk between microbes and genes in asthmatic patients.

    Get PDF
    Sputum induction is a non-invasive method to evaluate the airway environment, particularly for asthma. RNA sequencing (RNA-seq) of sputum samples can be challenging to interpret due to the complex and heterogeneous mixtures of human cells and exogenous (microbial) material. In this study, we develop a pipeline that integrates dimensionality reduction and statistical modeling to grapple with the heterogeneity. LDA(Latent Dirichlet allocation)-link connects microbes to genes using reduced-dimensionality LDA topics. We validate our method with single-cell RNA-seq and microscopy and then apply it to the sputum of asthmatic patients to find known and novel relationships between microbes and genes

    RNA intaktsus ja RNA-seq ekspressiooniandmed

    Get PDF

    Detection of genes for cold tolerance in Drosophila albomicans using pooled RNA-seq

    Get PDF
    In a recent few decades, Drosophila albomicans has expanded their distribution from tropical zone to temperate zone. It has been reported that the current northernmost limit of D. albomicans distribution is west or central Japan. In previous studies, the variation of cold tolerance among D. albomicans strains was observed: the temperate zone population has a stronger cold temperature than the tropical population. This suggests that D. albomicans has expanded their distribution to temperate zone by adapting to the cooler climate. Therefore, I tried to identify genes responsible for the adaptation, using pooled RNA-Seq. The pooled RNA-Seq is a very challenging way by applying RNA Seq to a pool of many strains. Because it is very new, I should create pipeline for analyzing the data from a next generation sequencer. However, I could get information on gene expression and single nucleotide polymorphism (SNP) in exons at a population level in a cost-effective way. According to previous studies, it was found that cold tolerance of insect was enhanced by cold acclimation. It was reported that cold tolerance was enhanced by a cold acclimation at 20 ℃ for several days accompanied by changes in expression level of many genes in a strain of D. albomicans, but these changes have not been examined at a population level. In this study, the total RNA was extracted from cold-acclimated flies from three populations classified by location and time, i.e., from Southeast Asia (SEA), Japan in 1991 (J-1991) and Japan in 2011 (J-2011). Using the transcriptome data obtained by pooled RNA-Seq by Illmina HiSeq2000 sequencer, I compared differences in gene expression levels and genetic variations at a genomic level among the populations. I calculated Fst, which is a measure of population differentiation due to genetic structure. As the result, I found that the genetic structure was different between SEA and the J-1991 and J-2011 populations, whereas genetic structure has not been differentiated between J-1991 and J-2011, compared to those and SEA. This indicates that in the west Japan population the genetic structure has not been changed during the last 20 years, suggesting that west Japan population has not been under strong natural selection to change the genetic structure of population. In addition, using Tajima\u27s D, I estimated that population size has been decreased during the distribution expansion to west Japan. I found that gene expression level was different in 22 genes between SEA and the west Japan population. Among these genes, three genes (Cyp12d1-1, CG13422 and CG11889) were included in the genes whose expression level was changed by cold acclimation according to Isobe (2014). So those three genes are candidates of the genes to be related to cold tolerance. To examine the effect of natural selection for the adaptation to cold environment, I computed nucleotide diversity, Waterson\u27s theta, and Tajima\u27s D for those candidate genes. All parameters indicate that J-1991 and J-2011 have lower genetic diversity than SEA, suggesting some effect of natural selection. However, the expected molecular functions of those genes are unlikely involved in cold tolerance directly and the observed expression changes of those genes are more likely attributed to cross-talk with a signaling pathway of other genes responding to cold stress.首都大学東京, 2016-03-25, 修士(理学)首都大学東

    Exploiting natural selection to study adaptive behavior

    Get PDF
    The research presented in this dissertation explores different computational and modeling techniques that combined with predictions from evolution by natural selection leads to the analysis of the adaptive behavior of populations under selective pressure. For this thesis three computational methods were developed: EXPLoRA, EVORhA and SSA-ME. EXPLoRA finds genomic regions associated with a trait of interests (QTL) by explicitly modeling the expected linkage disequilibrium of a population of sergeants under selection. Data from BSA experiments was analyzed to find genomic loci associated with ethanol tolerance. EVORhA explores the interplay between driving and hitchhiking mutations during evolution to reconstruct the subpopulation structure of clonal bacterial populations based on deep sequencing data. Data from mixed infections and evolution experiments of E. Coli was used and their population structure reconstructed. SSA-ME uses mutual exclusivity in cancer to prioritize cancer driver genes. TCGA data of breast cancer tumor samples were analyzed.status: publishe

    Genomic Imprinting and X Chromosome Dosage Compensation in Domestic Ruminants

    Get PDF
    In diploid cells, genes are presumed to be expressed from both alleles to maintain gene dosage for normal development. However, a small number of genes reach haplosufficiency even with only one functional allele per cell. Most of these genes are regulated through genomic imprinting and X chromosome inactivation (XCI). DNA methylation is an essential epigenetic regulation for developmental programming in embryogenesis and play crucial roles in genomic imprinting and XCI. This dissertation presents 1) effects of maternal diets on genome imprinting in fetal sheep (Chapter Two), 2) dosage compensation of the X chromosomes in bovine germline, embryos and somatic tissues (Chapter Three), 3) Whole genome DNA methylation in bovine in vivo preimplantation development (Chapter Four). In chapter two, we report the first throughput study of genomic imprinting in sheep and report the identification of 13 new imprinted genes as well as demonstrating that maternal diets affect expression of imprinted genes in fetuses. Our results determine maternal diets influence imprinted gene expression while the parental-of-origin expression pattern was not affected, further suggesting that gene expression levels and imprinted patterns may be regulated through different epigenetic mechanisms. In chapter three, we reported the up-regulation of X chromosome in bovine germline, embryos and somatic tissues, supporting a balanced expression between a single active X and autosome pairs. However, deviating from Ohno’s theory, dosage compensation to rescue X haploinsufficiency appears to be an incomplete process for expressed genes but a complete process for “dosage-sensitive” genes. In chapter four, we adopted the scWGBS-seq method to comprehensive profile 5-MeC in single-cytosine resolution in bovine sperm, immature oocyte, in vivo/vitro mature single oocyte, and in vivo developed 2-, 4-, 8-, 16-cell single embryos. We observed global demethylation during bovine embryo cleavage up to 8-cell stage and de novo methylation at 16-cell stage. Our results refined the current knowledge on bovine embryo DNA methylation dynamics and provide valuable resources for future studies
    corecore