4,903 research outputs found

    Computational methods for the analysis of next generation sequencing data

    Get PDF
    Recently, next generation sequencing (NGS) technology has emerged as a powerful approach and dramatically transformed biomedical research in an unprecedented scale. NGS is expected to replace the traditional hybridization-based microarray technology because of its affordable cost and high digital resolution. Although NGS has significantly extended the ability to study the human genome and to better understand the biology of genomes, the new technology has required profound changes to the data analysis. There is a substantial need for computational methods that allow a convenient analysis of these overwhelmingly high-throughput data sets and address an increasing number of compelling biological questions which are now approachable by NGS technology. This dissertation focuses on the development of computational methods for NGS data analyses. First, two methods are developed and implemented for detecting variants in analysis of individual or pooled DNA sequencing data. SNVer formulates variant calling as a hypothesis testing problem and employs a binomial-binomial model to test the significance of observed allele frequency by taking account of sequencing error. SNVerGUI is a GUI-based desktop tool that is built upon the SNVer model to facilitate the main users of NGS data, such as biologists, geneticists and clinicians who often lack of the programming expertise. Second, collapsing singletons strategy is explored for associating rare variants in a DNA sequencing study. Specifically, a gene-based genome-wide scan based on singleton collapsing is performed to analyze a whole genome sequencing data set, suggesting that collapsing singletons may boost signals for association studies of rare variants in sequencing study. Third, two approaches are proposed to address the 3’UTR switching problem. PolyASeeker is a novel bioinformatics pipeline for identifying polyadenylation cleavage sites from RNA sequencing data, which helps to enhance the knowledge of alternative polyadenylation mechanisms and their roles in gene regulation. A change-point model based on a likelihood ratio test is also proposed to solve such problem in analysis of RNA sequencing data. To date, this is the first method for detecting 3’UTR switching without relying on any prior knowledge of polyadenylation cleavage sites

    Upregulation of the microRNA cluster at the Dlk1-Dio3 locus in lung adenocarcinoma.

    Get PDF
    Mice in which lung epithelial cells can be induced to express an oncogenic Kras(G12D) develop lung adenocarcinomas in a manner analogous to humans. A myriad of genetic changes accompany lung adenocarcinomas, many of which are poorly understood. To get a comprehensive understanding of both the transcriptional and post-transcriptional changes that accompany lung adenocarcinomas, we took an omics approach in profiling both the coding genes and the non-coding small RNAs in an induced mouse model of lung adenocarcinoma. RNAseq transcriptome analysis of Kras(G12D) tumors from F1 hybrid mice revealed features specific to tumor samples. This includes the repression of a network of GTPase-related genes (Prkg1, Gnao1 and Rgs9) in tumor samples and an enrichment of Apobec1-mediated cytosine to uridine RNA editing. Furthermore, analysis of known single-nucleotide polymorphisms revealed not only a change in expression of Cd22 but also that its expression became allele specific in tumors. The most salient finding, however, came from small RNA sequencing of the tumor samples, which revealed that a cluster of ∼53 microRNAs and mRNAs at the Dlk1-Dio3 locus on mouse chromosome 12qF1 was markedly and consistently increased in tumors. Activation of this locus occurred specifically in sorted tumor-originating cancer cells. Interestingly, the 12qF1 RNAs were repressed in cultured Kras(G12D) tumor cells but reactivated when transplanted in vivo. These microRNAs have been implicated in stem cell pleuripotency and proteins targeted by these microRNAs are involved in key pathways in cancer as well as embryogenesis. Taken together, our results strongly imply that these microRNAs represent key targets in unraveling the mechanism of lung oncogenesis

    Tracking the In Vivo Dynamics of Antigenic Variation in the African Trypanosome

    Get PDF
    Trypanosoma brucei, a causative agent of African sleeping sickness in humans and nagana in animals, constantly changes its dense variant surface glycoprotein (VSG) coat to avoid elimination by the immune system of its mammalian host, using an extensive repertoire of dedicated genes. Although this process, referred to as antigenic variation, is the major mechanism of pathogenesis for T. brucei, the dynamics of VSG expression in T. brucei during an infection are poorly understood. In this thesis, I describe the development of VSG-seq, a method for quantitatively examining the diversity of expressed VSGs in any population of trypanosomes. Using VSG-seq, I monitored VSG expression dynamics in vivo during both acute and chronic mouse infections. My experiments revealed unexpected diversity within parasite populations, and the expression of as much as one-third of the functional genomic VSG repertoire after only one month of infection. In addition to suggesting that the host-pathogen interaction in T. brucei infection is substantially more dynamic and nuanced than previously expected, this observed diversity highlighted the importance of the mechanisms by which T. brucei diversifies its genome-encoded VSG repertoire. During infection, the parasite can form mosaic VSGs, novel variants that arise through recombination events within the parasite genome during infection. Though these novel variants had been identified previously, little was known about the mechanisms by which they form. VSG-seq facilitated the identification of mosaic VSGs during the infection, which allowed me to track their formation over time. My results provide the first temporal data on the formation of these variants and suggest that mosaic VSGs likely form at sites of VSG transcription. VSG-seq, which is based on the de novo assembly of VSGs, obviates the requirement for a reference genome for the analysis of expressed VSG populations. This allows the method to be used for the high-resolution study of VSG expression in any strain of T. brucei, whether in the lab or in the field. To this end, I have applied VSG-seq to samples grown in vitro, parasites isolated from natural infections, and extravascular parasites occupying various tissues in vivo. These extensions of the method reveal new aspects of T. brucei biology and demonstrate the potential of high-throughput approaches for studying antigenic variation, both in trypanosomes and in any pathogen that uses antigenic variation as a means of immune evasion

    Root Hair Single Cell Type Specific Profiles of Gene Expression and Alternative Polyadenylation Under Cadmium Stress

    Get PDF
    Transcriptional networks are tightly controlled in plant development and stress responses. Alternative polyadenylation (APA) has been found to regulate gene expression under abiotic stress by increasing the heterogeneity at mRNA 3′-ends. Heavy metals like cadmium pollute water and soil due to mining and industry applications. Understanding how plants cope with heavy metal stress remains an interesting question. The Arabidopsis root hair was chosen as a single cell model to investigate the functional role of APA in cadmium stress response. Primary root growth inhibition and defective root hair morphotypes were observed. Poly(A) tag (PAT) libraries from single cell types, i.e., root hair cells, non-hair epidermal cells, and whole root tip under cadmium stress were prepared and sequenced. Interestingly, a root hair cell type-specific gene expression under short term cadmium exposure, but not related to the prolonged treatment, was detected. Differentially expressed poly(A) sites were identified, which largely contributed to altered gene expression, and enriched in pentose and glucuronate interconversion pathways as well as phenylpropanoid biosynthesis pathways. Numerous genes with poly(A) site switching were found, particularly for functions in cell wall modification, root epidermal differentiation, and root hair tip growth. Our findings suggest that APA plays a functional role as a potential stress modulator in root hair cells under cadmium treatment

    MicroRNA-23a promotes myelination in the central nervous system.

    Get PDF
    Demyelinating disorders including leukodystrophies are devastating conditions that are still in need of better understanding, and both oligodendrocyte differentiation and myelin synthesis pathways are potential avenues for developing treatment. Overexpression of lamin B1 leads to leukodystrophy characterized by demyelination of the central nervous system, and microRNA-23 (miR-23) was found to suppress lamin B1 and enhance oligodendrocyte differentiation in vitro. Here, we demonstrated that miR-23a-overexpressing mice have increased myelin thickness, providing in vivo evidence that miR-23a enhances both oligodendrocyte differentiation and myelin synthesis. Using this mouse model, we explored possible miR-23a targets and revealed that the phosphatase and tensin homologue/phosphatidylinositol trisphosphate kinase/Akt/mammalian target of rapamycin pathway is modulated by miR-23a. Additionally, a long noncoding RNA, 2700046G09Rik, was identified as a miR-23a target and modulates phosphatase and tensin homologue itself in a miR-23a-dependent manner. The data presented here imply a unique role for miR-23a in the coordination of proteins and noncoding RNAs in generating and maintaining healthy myelin

    T-ALL and thymocytes : a message of noncoding RNAs

    Get PDF
    In the last decade, the role for noncoding RNAs in disease was clearly established, starting with microRNAs and later expanded towards long noncoding RNAs. This was also the case for T cell acute lymphoblastic leukemia, which is a malignant blood disorder arising from oncogenic events during normal T cell development in the thymus. By studying the transcriptomic profile of protein-coding genes, several oncogenic events leading to T cell acute lymphoblastic leukemia (T-ALL) could be identified. In recent years, it became apparent that several of these oncogenes function via microRNAs and long noncoding RNAs. In this review, we give a detailed overview of the studies that describe the noncoding RNAome in T-ALL oncogenesis and normal T cell development

    Untranslated Parts of Genes Interpreted: Making Heads or Tails of High-Throughput Transcriptomic Data via Computational Methods Computational methods to discover and quantify isoforms with alternative untranslated regions

    Get PDF
    In this review we highlight the importance of defining the untranslated parts of transcripts, and present a number of computational approaches for the discovery and quantification of alternative transcription start and poly‐adenylation events in high‐throughput transcriptomic data. The fate of eukaryotic transcripts is closely linked to their untranslated regions, which are determined by the position at which transcription starts and ends at a genomic locus. Although the extent of alternative transcription starts and alternative poly‐adenylation sites has been revealed by sequencing methods focused on the ends of transcripts, the application of these methods is not yet widely adopted by the community. We suggest that computational methods applied to standard high‐throughput technologies are a useful, albeit less accurate, alternative to the expertise‐demanding 5′ and 3′ sequencing and they are the only option for analysing legacy transcriptomic data. We review these methods here, focusing on technical challenges and arguing for the need to include better normalization of the data and more appropriate statistical models of the expected variation in the signal

    Untranslated parts of genes interpreted: making heads or tails of high-throughput transcriptomic data via computational methods

    Get PDF
    The fate of eukaryotic transcripts is closely linked to their untranslated regions, which are determined by where transcription starts and ends on a genomic locus. The extent of alternative transcription start and alternative poly-adenylation has been revealed by sequencing methods focused on the ends of transcripts, but the application of these methods is not yet widely adopted by the community. In this review we highlight the importance of defining the untranslated parts of transcripts and suggest that computational methods applied to standard high-throughput technologies are a useful alternative to the expertise-demanding 5’ and 3’ sequencing. We present a number of computational approaches for the discovery and quantification of alternative transcription start and poly-adenylation events, focusing on technical challenges and arguing for the need to include better normalization of the data and more appropriate statistical models of the expected variation in the signal
    corecore