1,951 research outputs found

    The ever-evolving concept of the gene: The use of RNA/Protein experimental techniques to understand genome functions

    Get PDF
    The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as "junk" DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs), which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins) endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years

    Spatial and topological organization of DNA chains induced by gene co-localization

    Get PDF
    Transcriptional activity has been shown to relate to the organization of chromosomes in the eukaryotic nucleus and in the bacterial nucleoid. In particular, highly transcribed genes, RNA polymerases and transcription factors gather into discrete spatial foci called transcription factories. However, the mechanisms underlying the formation of these foci and the resulting topological order of the chromosome remain to be elucidated. Here we consider a thermodynamic framework based on a worm-like chain model of chromosomes where sparse designated sites along the DNA are able to interact whenever they are spatially close-by. This is motivated by recurrent evidence that there exists physical interactions between genes that operate together. Three important results come out of this simple framework. First, the resulting formation of transcription foci can be viewed as a micro-phase separation of the interacting sites from the rest of the DNA. In this respect, a thermodynamic analysis suggests transcription factors to be appropriate candidates for mediating the physical interactions between genes. Next, numerical simulations of the polymer reveal a rich variety of phases that are associated with different topological orderings, each providing a way to increase the local concentrations of the interacting sites. Finally, the numerical results show that both one-dimensional clustering and periodic location of the binding sites along the DNA, which have been observed in several organisms, make the spatial co-localization of multiple families of genes particularly efficient.Comment: Figures and Supplementary Material freely available on http://dx.doi.org/10.1371/journal.pcbi.100067

    Transcription factor binding distribution and properties in prokaryotes

    Full text link
    The canonical model of transcriptional regulation in prokaryotes restricted binding site locations to promoter regions and suggested that the binding sequences serve as the main determinants of binding. In this dissertation, I challenge these assumptions. As a member of the TB Systems Biology Consortium, I analyzed and validated ChIP-Seq and microarray experiments for over 100 transcription factors (TFs). In order to study the transcriptional functions of predicted binding sites, I integrated binding and expression data and assigned potential regulatory roles to 20% of the binding sites. Stronger binding sites were more often associated with regulation than weaker sites, suggesting a correlation between binding strength and regulatory impact. Seventy-six percent of the sites fell into annotated coding regions and a significant proportion was assigned to regulatory functions. To study the importance of binding sequences, I compared experimental sites with computational motif predictions. Although a conservative binding motif was found for most TFs, only a fraction of the observed motifs appeared bound in the experiment. Some low-affinity binding sites appeared occupied by the corresponding TF while many high-affinity binding sites were not. Interestingly, I found exactly the same nucleotide sequences (up to 15 residues long) bound in one area of the genome but not bound in another area, pointing to DNA accessibility as an important factor for in vivo binding. To investigate the evolutionary conservation of binding-site occupancy, sequence, and transcriptional impact, I analyzed ChIP-Seq and expression experiments for five conserved TFs for two-to-four Mycobacterial relatives. The regulon composition showed significantly less conservation than expected from the overall gene conservation level across Mycobacteria. Despite expectations, sequence conservation did not serve as a good indicator of whether or not a computationally predicted motif was bound experimentally; and in some cases, a fully conserved motif was bound in one relative but not in the other. Conservation of genic binding sites was higher than expected from the random model, adding to the evidence that at least some genic sites are functional. Understanding the evolutionary story of binding sites allowed me to explain unusual site configurations, some of which indicated a role for DNA looping

    Genome-wide in silico identification and analysis of cis natural antisense transcripts (cis-NATs) in ten species

    Get PDF
    We developed a fast, integrative pipeline to identify cis natural antisense transcripts (cis-NATs) at genome scale. The pipeline mapped mRNAs and ESTs in UniGene to genome sequences in GoldenPath to find overlapping transcripts and combining information from coding sequence, poly(A) signal, poly(A) tail and splicing sites to deduce transcription orientation. We identified cis-NATs in 10 eukaryotic species, including 7830 candidate sense–antisense (SA) genes in 3915 SA pairs in human. The abundance of SA genes is remarkably low in worm and does not seem to be caused by the prevalence of operons. Hundreds of SA pairs are conserved across different species, even maintaining the same overlapping patterns. The convergent SA class is prevalent in fly, worm and sea squirt, but not in human or mouse as reported previously. The percentage of SA genes among imprinted genes in human and mouse is 24–47%, a range between the two previous reports. There is significant shortage of SA genes on Chromosome X in human and mouse but not in fly or worm, supporting X-inactivation in mammals as a possible cause. SA genes are over-represented in the catalytic activities and basic metabolism functions. All candidate cis-NATs can be downloaded from

    Deep Sequencing Whole Transcriptome Exploration of the σE Regulon in Neisseria meningitidis

    Get PDF
    Bacteria live in an ever-changing environment and must alter protein expression promptly to adapt to these changes and survive. Specific response genes that are regulated by a subset of alternative σ70-like transcription factors have evolved in order to respond to this changing environment. Recently, we have described the existence of a σE regulon including the anti-σ-factor MseR in the obligate human bacterial pathogen Neisseria meningitidis. To unravel the complete σE regulon in N. meningitidis, we sequenced total RNA transcriptional content of wild type meningococci and compared it with that of mseR mutant cells (ΔmseR) in which σE is highly expressed. Eleven coding genes and one non-coding gene were found to be differentially expressed between H44/76 wildtype and H44/76ΔmseR cells. Five of the 6 genes of the σE operon, msrA/msrB, and the gene encoding a pepSY-associated TM helix family protein showed enhanced transcription, whilst aniA encoding a nitrite reductase and nspA encoding the vaccine candidate Neisserial surface protein A showed decreased transcription. Analysis of differential expression in IGRs showed enhanced transcription of a non-coding RNA molecule, identifying a σE dependent small non-coding RNA. Together this constitutes the first complete exploration of an alternative σ-factor regulon in N. meningitidis. The results direct to a relatively small regulon indicative for a strictly defined response consistent with a relatively stable niche, the human throat, where N. meningitidis resides

    Homology-based annotation of non-coding RNAs in the genomes of Schistosoma mansoni and Schistosoma japonicum

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Schistosomes are trematode parasites of the phylum Platyhelminthes. They are considered the most important of the human helminth parasites in terms of morbidity and mortality. Draft genome sequences are now available for <it>Schistosoma mansoni </it>and <it>Schistosoma japonicum</it>. Non-coding RNA (ncRNA) plays a crucial role in gene expression regulation, cellular function and defense, homeostasis, and pathogenesis. The genome-wide annotation of ncRNAs is a non-trivial task unless well-annotated genomes of closely related species are already available.</p> <p>Results</p> <p>A homology search for structured ncRNA in the genome of <it>S. mansoni </it>resulted in 23 types of ncRNAs with conserved primary and secondary structure. Among these, we identified rRNA, snRNA, SL RNA, SRP, tRNAs and RNase P, and also possibly MRP and 7SK RNAs. In addition, we confirmed five miRNAs that have recently been reported in <it>S. japonicum </it>and found two additional homologs of known miRNAs. The tRNA complement of <it>S. mansoni </it>is comparable to that of the free-living planarian <it>Schmidtea mediterranea</it>, although for some amino acids differences of more than a factor of two are observed: Leu, Ser, and His are overrepresented, while Cys, Meth, and Ile are underrepresented in <it>S. mansoni</it>. On the other hand, the number of tRNAs in the genome of <it>S. japonicum </it>is reduced by more than a factor of four. Both schistosomes have a complete set of minor spliceosomal snRNAs. Several ncRNAs that are expected to exist in the <it>S. mansoni </it>genome were not found, among them the telomerase RNA, vault RNAs, and Y RNAs.</p> <p>Conclusion</p> <p>The ncRNA sequences and structures presented here represent the most complete dataset of ncRNA from any lophotrochozoan reported so far. This data set provides an important reference for further analysis of the genomes of schistosomes and indeed eukaryotic genomes at large.</p

    Methods for functional characterization of transcription factor binding sites in bacteria

    Full text link
    Thesis (Ph.D.)--Boston UniversityUnderstanding gene regulation is necessary to gain insight into and model important cellular processes including disease. Current inability to combat many diseases is partly because of incomplete understanding of gene circuitry. Regulation mechanisms of Mycobacterium tuberculosis, the causative agent of Tuberculosis are not properly understood. Transcriptional regulatory network (TRN) is a network comprising transcription factors (TF) and their targeted genes that provide a powerful framework to analyze the complete regulatory system. Chromatin immunoprecipitation followed by next generation sequencing (ChiP-Seq) is becoming the method of choice to identify genome wide TFBS . Therefore, we use ChiP-Seq on known transcription factors to reconstruct the TRN of Mycobacterium tuberculosis (Mtb) and other bacteria. ChiP-Seq reveals various transcription factor binding sites (TFBS) but doesn't provide any information on the mechanism of regulation of the genes by their corresponding TF's. Techniques to gain more insight into the mechanisms include microarray, knock out studies and qPCR. But, these techniques provide a static view of network. Also, they provide information at RNA level and mask the regulation happening at protein level. Therefore, in order to understand both the mechanism of regulation at protein level as well as to capture the network dynamics, we built a synthetic gene circuit in Mycobacterium smegmatis and defined input-output relationships between key TFs and their targeted promoters. We validated this system on kstR, a TF which is a known repressor. KstR regulates genes involved in cholesterol degradation and is shown to de- repress itself and its regulon genes in the presence of cholesterol as well as in hypoxia, where there are no exogenous lipids4- . We explored the possibility of other by-products that may be responsible for the de-repression of kstR and its regulon. The data suggests that propionyl-coA, a by-product from degradation of cholesterol, odd numbered fatty acids as well as branched chain amino-acids is causing the de-repression of kstR and its regulon. ChiP-Seq data on transcription factors in MTb as well as E.coli shows that many TFBS are located immediately upstream of open reading frame start sites, consistent with our understanding ofprokaryotic gene regulation. However, the data also suggests that many TFBS are located inside and also downstream of open reading frames6. One of our hypotheses is that these novel TFBS might be indirect binding sites that mediate chromatin looping . Therefore, we developed a method 3C (Chromosome Conformation Capture) to understand the regulation in the third dimension by analyzing the chromosomal interactions. We optimized the protocol in E.coli and validated using a known interaction mediated by a repressor GalR . We then identified two regions, 20 kbp apart, containing TFBS of StpA, a nucleoid associated protein, which are not directly involved in gene regulation of their downstream genes. The data from a 3C experiment on an E.coli strain with inducible StpA suggests that these two regions interact by an unknown mechanism. However, the interaction was not lost when a similar experiment is done in StpA knock out strain suggesting that StpA may not be a sole TF responsible for this interaction. Lastly, we developed Hi-C method on E.coli genomic DNA to identify long range interactions in a genome wide and unbiased manner

    RNA, the Epicenter of Genetic Information

    Get PDF
    The origin story and emergence of molecular biology is muddled. The early triumphs in bacterial genetics and the complexity of animal and plant genomes complicate an intricate history. This book documents the many advances, as well as the prejudices and founder fallacies. It highlights the premature relegation of RNA to simply an intermediate between gene and protein, the underestimation of the amount of information required to program the development of multicellular organisms, and the dawning realization that RNA is the cornerstone of cell biology, development, brain function and probably evolution itself. Key personalities, their hubris as well as prescient predictions are richly illustrated with quotes, archival material, photographs, diagrams and references to bring the people, ideas and discoveries to life, from the conceptual cradles of molecular biology to the current revolution in the understanding of genetic information. Key Features Documents the confused early history of DNA, RNA and proteins - a transformative history of molecular biology like no other. Integrates the influences of biochemistry and genetics on the landscape of molecular biology. Chronicles the important discoveries, preconceptions and misconceptions that retarded or misdirected progress. Highlights major pioneers and contributors to molecular biology, with a focus on RNA and noncoding DNA. Summarizes the mounting evidence for the central roles of non-protein-coding RNA in cell and developmental biology. Provides a thought-provoking retrospective and forward-looking perspective for advanced students and professional researchers

    Comprehensive analysis of the HEPN superfamily: identification of novel roles in intra-genomic conflicts, defense, pathogenesis and RNA processing

    Get PDF
    BACKGROUND: The major role of enzymatic toxins that target nucleic acids in biological conflicts at all levels has become increasingly apparent thanks in large part to the advances of comparative genomics. Typically, toxins evolve rapidly hampering the identification of these proteins by sequence analysis. Here we analyze an unexpectedly widespread superfamily of toxin domains most of which possess RNase activity. RESULTS: The HEPN superfamily is comprised of all α-helical domains that were first identified as being associated with DNA polymerase β-type nucleotidyltransferases in prokaryotes and animal Sacsin proteins. Using sensitive sequence and structure comparison methods, we vastly extend the HEPN superfamily by identifying numerous novel families and by detecting diverged HEPN domains in several known protein families. The new HEPN families include the RNase LS and LsoA catalytic domains, KEN domains (e.g. RNaseL and Ire1) and the RNase domains of RloC and PrrC. The majority of HEPN domains contain conserved motifs that constitute a metal-independent endoRNase active site. Some HEPN domains lacking this motif probably function as non-catalytic RNA-binding domains, such as in the case of the mannitol repressor MtlR. Our analysis shows that HEPN domains function as toxins that are shared by numerous systems implicated in intra-genomic, inter-genomic and intra-organismal conflicts across the three domains of cellular life. In prokaryotes HEPN domains are essential components of numerous toxin-antitoxin (TA) and abortive infection (Abi) systems and in addition are tightly associated with many restriction-modification (R-M) and CRISPR-Cas systems, and occasionally with other defense systems such as Pgl and Ter. We present evidence of multiple modes of action of HEPN domains in these systems, which include direct attack on viral RNAs (e.g. LsoA and RNase LS) in conjunction with other RNase domains (e.g. a novel RNase H fold domain, NamA), suicidal or dormancy-inducing attack on self RNAs (RM systems and possibly CRISPR-Cas systems), and suicidal attack coupled with direct interaction with phage components (Abi systems). These findings are compatible with the hypothesis on coupling of pathogen-targeting (immunity) and self-directed (programmed cell death and dormancy induction) responses in the evolution of robust antiviral strategies. We propose that altruistic cell suicide mediated by HEPN domains and other functionally similar RNases was essential for the evolution of kin and group selection and cell cooperation. HEPN domains were repeatedly acquired by eukaryotes and incorporated into several core functions such as endonucleolytic processing of the 5.8S-25S/28S rRNA precursor (Las1), a novel ER membrane-associated RNA degradation system (C6orf70), sensing of unprocessed transcripts at the nuclear periphery (Swt1). Multiple lines of evidence suggest that, similar to prokaryotes, HEPN proteins were recruited to antiviral, antitransposon, apoptotic systems or RNA-level response to unfolded proteins (Sacsin and KEN domains) in several groups of eukaryotes. CONCLUSIONS: Extensive sequence and structure comparisons reveal unexpectedly broad presence of the HEPN domain in an enormous variety of defense and stress response systems across the tree of life. In addition, HEPN domains have been recruited to perform essential functions, in particular in eukaryotic rRNA processing. These findings are expected to stimulate experiments that could shed light on diverse cellular processes across the three domains of life. REVIEWERS: This article was reviewed by Martijn Huynen, Igor Zhulin and Nick Grishi

    Comprehensive comparative-genomic analysis of Type 2 toxin-antitoxin systems and related mobile stress response systems in prokaryotes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The prokaryotic toxin-antitoxin systems (TAS, also referred to as TA loci) are widespread, mobile two-gene modules that can be viewed as selfish genetic elements because they evolved mechanisms to become addictive for replicons and cells in which they reside, but also possess "normal" cellular functions in various forms of stress response and management of prokaryotic population. Several distinct TAS of type 1, where the toxin is a protein and the antitoxin is an antisense RNA, and numerous, unrelated TAS of type 2, in which both the toxin and the antitoxin are proteins, have been experimentally characterized, and it is suspected that many more remain to be identified.</p> <p>Results</p> <p>We report a comprehensive comparative-genomic analysis of Type 2 toxin-antitoxin systems in prokaryotes. Using sensitive methods for distant sequence similarity search, genome context analysis and a new approach for the identification of mobile two-component systems, we identified numerous, previously unnoticed protein families that are homologous to toxins and antitoxins of known type 2 TAS. In addition, we predict 12 new families of toxins and 13 families of antitoxins, and also, predict a TAS or TAS-like activity for several gene modules that were not previously suspected to function in that capacity. In particular, we present indications that the two-gene module that encodes a minimal nucleotidyl transferase and the accompanying HEPN protein, and is extremely abundant in many archaea and bacteria, especially, thermophiles might comprise a novel TAS. We present a survey of previously known and newly predicted TAS in 750 complete genomes of archaea and bacteria, quantitatively demonstrate the exceptional mobility of the TAS, and explore the network of toxin-antitoxin pairings that combines plasticity with selectivity.</p> <p>Conclusion</p> <p>The defining properties of the TAS, namely, the typically small size of the toxin and antitoxin genes, fast evolution, and extensive horizontal mobility, make the task of comprehensive identification of these systems particularly challenging. However, these same properties can be exploited to develop context-based computational approaches which, combined with exhaustive analysis of subtle sequence similarities were employed in this work to substantially expand the current collection of TAS by predicting both previously unnoticed, derived versions of known toxins and antitoxins, and putative novel TAS-like systems. In a broader context, the TAS belong to the resistome domain of the prokaryotic mobilome which includes partially selfish, addictive gene cassettes involved in various aspects of stress response and organized under the same general principles as the TAS. The "selfish altruism", or "responsible selfishness", of TAS-like systems appears to be a defining feature of the resistome and an important characteristic of the entire prokaryotic pan-genome given that in the prokaryotic world the mobilome and the "stable" chromosomes form a dynamic continuum.</p> <p>Reviewers</p> <p>This paper was reviewed by Kenn Gerdes (nominated by Arcady Mushegian), Daniel Haft, Arcady Mushegian, and Andrei Osterman. For full reviews, go to the Reviewers' Reports section.</p
    corecore