99 research outputs found

    Domestic chickens activate a piRNA defense against avian leukosis virus

    Get PDF
    PIWI-interacting RNAs (piRNAs) protect the germ line by targeting transposable elements (TEs) through the base-pair complementarity. We do not know how piRNAs co-evolve with TEs in chickens. Here we reported that all active TEs in the chicken germ line are targeted by piRNAs, and as TEs lose their activity, the corresponding piRNAs erode away. We observed de novo piRNA birth as host responds to a recent retroviral invasion. Avian leukosis virus (ALV) has endogenized prior to chicken domestication, remains infectious, and threatens poultry industry. Domestic fowl produce piRNAs targeting ALV from one ALV provirus that was known to render its host ALV resistant. This proviral locus does not produce piRNAs in undomesticated wild chickens. Our findings uncover rapid piRNA evolution reflecting contemporary TE activity, identify a new piRNA acquisition modality by activating a pre-existing genomic locus, and extend piRNA defense roles to include the period when endogenous retroviruses are still infectious. DOI: http://dx.doi.org/10.7554/eLife.24695.00

    Computational regulatory genomics : motifs, networks, and dynamics

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (p. 147-169).Gene regulation, the process responsible for taking a static genome and producing the diversity and complexity of life, is largely mediated through the sequence specific binding of regulators. The short, degenerate nature of the recognized elements and the unknown rules through which they interact makes deciphering gene regulation a significant challenge. In this thesis, we utilize comparative genomics and other approaches to exploit large-scale experimental datasets and better understand the sequence elements and regulators responsible for regulatory programs. In particular, we develop new computational approaches to (1) predict the binding sites of regulators using the genomes of many, closely related species; (2) understand the sequence motifs associated with transcription factors; (3) discover and characterize microRNAs, an important class of regulators; (4) use static predictions for binding sites in conjunction with chromatin modifications to better understand the dynamics of regulation; and (5) systematically validate the predicted motif instances using a massively parallel reporter assay. We find that the predictions made by our algorithms are of high quality and are comparable to those made by leading experimental approaches. Moreover, we find that experimental and computational approaches are often complementary. Regions experimentally identified to be bound by a factor can be species and cell line specific, but they lack the resolution and unbiased nature of our predictions. Experimentally identified miRNAs have unmistakable signs of being processed, but cannot provide the same insights our machine learning framework does. Further emphasizing the importance of integration, combining chromatin mark annotations and gene expression from multiple cell types with our static motif instances allows for increasing our power and making additional biologically relevant insights. We successfully apply the algorithms in this thesis to 29 mammals and 12 flies and expect them to be applicable to other clades of eukaryotic species. Moreover, we find that our performance has not yet plateaued and believe these methods will continue to be relevant as sequencing becomes increasingly commonplace and thousands of genomes become available.by Pouya Kheradpour.Ph.D

    Exploiting gene expression and protein data for predicting remote homology and tissue specificity

    No full text
    In this thesis I describe my investigations of applying machine learning methods to high throughput experimental and predicted biological data. The importance of such analysis as a means of making inferences about biological functions is widely acknowledged in the bioinformatics community. Specifically, this work makes three novel contributions based on the systematic analysis of publicly archived data of protein sequences, three dimensional structures, gene expression and functional annotations: (a) remote homology detection based on amino acid sequences and secondary structures; (b) the analysis of tissue-specific gene expression for predictive signals in the sequence and secondary structure of the resulting protein product; and (c) a study of ageing in the fruit fly, a commonly used model organism, in which tissue specific and whole-organism gene expression changes are contrasted. In the problem of remote homology detection, a kernel-based method that combines pairwise alignment scores of amino acid sequences and secondary structures is shown to improve the prediction accuracies in a benchmark task defined using the Structural Classification of Proteins (SCOP) database. While the task of predicting SCOP superfamilies should be regarded as an easy one, with not much room for performance improvement, it is still widely accepted as the gold standard due to careful manual annotation by experts in the subject of protein evolution.A similar method is introduced to investigate whether tissue specificity of gene expression is correlated with the sequence and secondary structure of the resulting protein product. An information theoretic approach is adopted for sorting fruit fly and mouse genes according to their tissue specificity based on gene expression data. A classifier is then trained to predict the degree of specificity for these genes. The study concludes that the tissue specificity of gene expression is correlated with the sequence, and to a certain extent, with the secondary structure of the gene’s protein product.The sorted list of genes introduced in the previous chapter is used to investigate the tissue specificity of transcript profiles obtained from a study of ageing in the fruit fly. The same list is utilised to investigate how filtering tissue-restricted genes affects gene set enrichment analysis in the ageing study, and to examine the specificity of age-associated genes identified in the literature. The conclusion drawn in this chapter is that categorisation of genes according to their tissue specificity using Shannon’s information theory is useful for the interpretation of whole-fly gene expression data

    Putting the Pieces Together: Exons and piRNAs: A Dissertation

    Get PDF
    Analysis of gene expression has undergone a technological revolution. What was impossible 6 years ago is now routine. High-throughput DNA sequencing machines capable of generating hundreds of millions of reads allow, indeed force, a major revision toward the study of the genome’s functional output—the transcriptome. This thesis examines the history of DNA sequencing, measurement of gene expression by sequencing, isoform complexity driven by alternative splicing and mammalian piRNA precursor biogenesis. Examination of these topics is framed around development of a novel RNA-templated DNA-DNA ligation assay (SeqZip) that allows for efficient analysis of abundant, complex, and functional long RNAs. The discussion focuses on the future of transcriptome analysis, development and applications of SeqZip, and challenges presented to biomedical researchers by extremely large and rich datasets

    Consequences of DNA variation on gene regulation and human disease via RNA sequencing

    Get PDF

    Molecular Mechanisms of piRNA Biogenesis and Function in Drosophila: A Dissertation

    Get PDF
    In the Drosophila germ line, PIWI-interacting RNAs (piRNAs) ensure genomic stability by silencing endogenous selfish genetic elements such as retrotransposons and repetitive sequences. We examined the genetic requirements for the biogenesis and function of piRNAs in both female and male germ line. We found that piRNAs function through the PIWI, rather than the AGO, family Argonaute proteins, and the production of piRNAs requires neither microRNA (miRNA) nor small interfering RNA (siRNA) pathway machinery. These findings allowed the discovery of the third conserved small RNA silencing pathway, which is distinct from both the miRNA and RNAi pathways in its mechanisms of biogenesis and function. We also found piRNAs in flies are modified. We determined that the chemical structure of the 3´-terminal modification is a 2´-O-methyl group, and also demonstrated that the same modification occurs on the 3´ termini of siRNAs in flies. Furthermore, we identified the RNA methyltransferase Drosophila Hen1, which catalyzes 2´-O-methylation on both siRNAs and piRNAs. Our data suggest that 2´-O-methylation by Hen1 is the final step of biogenesis of both the siRNA pathway and piRNA pathway. Studies from the Hannon Lab and the Siomi Lab suggest a ping-pong amplification loop for piRNA biogenesis and function in the Drosophila germline. In this model, an antisense piRNA, bound to Aubergine or Piwi, triggers production of a sense piRNA bound to the PIWI protein Argonaute3 (Ago3). In turn, the new piRNA is envisioned to produce a second antisense piRNA. We isolated the loss-of-function mutations in ago3, allowing a direct genetic test of this model. We found that Ago3 acts to amplify piRNA pools and to enforce on them an antisense bias, increasing the number of piRNAs that can act to silence transposons. Moreover, we also discovered a second Ago3-independent piRNA pathway in somatic ovarian follicle cells, suggesting a role for piRNAs beyond the germ line

    Applied Bioinformatics for ncRNA Characterization - Case Studies Combining Next Generation Sequencing & Genomics

    Get PDF
    Non-coding RNAs (ncRNAs) present a diverse class of functional molecules inherent in virtually all forms of cellular life. Besides the canonical protein-encoding mRNAs the role of these abundant transcripts has been overlooked for decades. Defined by their highly conserved structure ncRNAs are resistant to degradation and perform various regulatory functions. Despite the poor sequence conservation, comparative genomics can be employed to identify homologous ncRNAs based on their structure in related species. Through the availability of next generation sequencing techniques, a rich corpus of datasets is available which grants a detailed look into cellular processes. The combination of genomic and transcriptomic data allows for a detailed understanding of molecular mechanism as well as characterization of individual gene functions and their evolution. However, analytical processing of modern high-throughput data is only made viable through optimized bioinformatic algorithms and reproducible automation pipelines. This thesis consists of four major parts highlighting the diverse roles of ncRNAs concerning the transcription process viewed from different vantage points. The first part concerns an unusually long untranslated region in Rhodobacter which harbors a ncRNA that regulates the expression of the downstream division cell wall cluster. Second, the degradation of 6S RNA in Bacillus subtilis is experimentally reconstructed to shed light on this final part of the RNA life cycle. This ncRNA is ubiquitous among bacteria and known to be a global transcription regulator itself. Next, the focus moves to the eukaryotic system and RNase P, an ancient ribozyme that is involved in tRNA maturation. Due to differences in composition with an optional RNA and multiple protein subunits, its phylogenetic distribution and deviant characteristics throughout the eukaryotic lineage are examined in order to trace its evolution. Finally, a diverse subgroup of non-translated RNAs are circRNAs which recently received increased attention due to their abundance in neural tissue. Resulting from post-transcriptional back-splicing events circRNAs compete with their host gene for expression. In a zoological study of social insects circRNA were for the first time identified in honeybees. The goal was to find task-related differences in circRNA expression between nurse bees and foragers and thus pinpoint potential functions of these elusive ncRNAs. The combination of genomic methods and transcriptomic data makes in-depth functional analysis of ncRNAs possible and enables us to understand the molecular mechanisms on multiple levels. Through structural predictions a riboswitch like transcriptional control of UpsM was revealed that is unique to Rhodobacteraceae. Transcriptomic analysis exposed that 6S RNA is primarily processed by RNase J1 for maturation and degraded at internal loops by RNase Y. Evolutionary comparison of organellar RNase P revealed that the RNA subunit is potentially less conserved than thought while organellar proteinonly variants are widespread potentially due to horizontal gene transfer. In the case of circRNA, an entire group of ncRNAs was characterized in the social model organism of honeybees and evidence of at least one gene where circRNA levels are significantly reduced during nurse-to-forager transition could be shown. Moreover, an unexpected link between elevated DNA methylation and RNA circularization was discovered. The bioinformatic findings in all of these cases provide a foundation for further experimental research and illustrate how scientific endeavors cannot be automated completely but require rigorous investigation with customized tools

    Bioinformatic analysis of genome-scale data reveals insights into host-pathogen interactions in farm animals

    Get PDF
    This thesis documents the contribution of my bioinformatics research activities, including novel software development, to a range of research projects aimed at investigating the interactions between bacterial and viral pathogens and their hosts. The focus is largely on farm animal species and their pathogens, although some of the research has a wider scientific impact. RNA interference (RNAi) refers to a variety of related regulatory pathways present in animals, plants and insects. The major pathways are microRNAs (miRNAs), small-interfering RNAs (siRNAs) and PIWI-interacting RNAs (piRNAs). Marek’s disease virus is an important pathogen of poultry, causing T-cell lymphoma. We identified the presence and expression patterns of several MDV-encoded microRNAs, including the identification of 5 novel microRNAs. We also showed that not only do virus-encoded microRNAs dominate the mirNome within chicken cells, but also that specific host-microRNAs are down-regulated. We also identify novel virus-encoded microRNAs in other Herpesviridae and provide the first evidence of miRNA evolution by duplication in viruses. In related work, we present a novel microRNA generated by the canonical miRNA biogenesis pathway in Avian Leukosis Virus, another avian oncogenic virus, and publish data showing the expression pattern of known chicken microRNAs across a range of important avian cells. Two of the other RNAi pathways (siRNA and piRNA) form an important part of the antiviral response in arthropods. We have published work demonstrating an siRNA antiviral response to bluetongue virus and Schmallenberg virus in cells from the Culicoides midge, an important insect vector, as well as work demonstrating the importance of the piRNA pathway in the antiviral response to Semliki forest virus (SFV). Further work on flaviviruses in ticks demonstrates the active suppression of the siRNA response by Langat Virus, as well as a key difference between the siRNA responses in Mosquitos compared to ticks. Salmonella is one of the most important zoonoses, with an estimated 1.4 million cases of human salmonellosis per annum in the USA alone. Salmonella infections of farm animals are an important route into the human food chain. This thesis presents work on the comparative structure and function of 13 fimbrial operons within Salmonella enterica serovar Enteritidis as well as a genomic comparison of that serovar with Salmonella enterica serovar Gallinarum, a chicken-specific serovar. We characterised the global expression profile of Salmonella enterica serovar Typhimurium during colonization of the chicken intestine, and we have published the genomes of four strains of Salmonella eneterica serovars of well-defined virulence in food-producing animals. Our work in this area led to us publishing an important and comprehensive review of the automatic annotation of bacterial genomes. Finally, I present work on novel software development. ProGenExpress, a software tool that allows the easy and accurate integration and visualisation of quantitative data with the genome annotation of bacteria; Meta4 is a web application that allows data sharing of bacterial genome annotations from metagenomes; CORNA, a software tool that allows scientists to link together microRNA targets, gene expression and functional annotation; viRome, a software tool for the analysis of siRNA and piRNA responses in virus-infection studies; DetectiV, a software tool for the analysis of pathogen-detection microarray data; and poRe, a software tool that enables users to organise and analyse nanopore sequencing dat
    corecore