1,045 research outputs found

    Local symmetry dynamics in one-dimensional aperiodic lattices

    Get PDF
    A unifying description of lattice potentials generated by aperiodic one-dimensional sequences is proposed in terms of their local reflection or parity symmetry properties. We demonstrate that the ranges and axes of local reflection symmetry possess characteristic distributional and dynamical properties which can be determined for every aperiodic binary lattice. A striking aspect of such a property is given by the return maps of sequential spacings of local symmetry axes, which typically traverse few-point symmetry orbits. This local symmetry dynamics allows for a classification of inherently different aperiodic lattices according to fundamental symmetry principles. Illustrating the local symmetry distributional and dynamical properties for several representative binary lattices, we further show that the renormalized axis spacing sequences follow precisely the particular type of underlying aperiodic order. Our analysis thus reveals that the long-range order of aperiodic lattices is characterized in a compellingly simple way by its local symmetry dynamics.Comment: 15 pages, 12 figure

    Global mapping of RNA homodimers in living cells

    Get PDF
    RNA homodimerization is important for various physiological processes, including the assembly of membraneless organelles, RNA subcellular localization, and packaging of viral genomes. However, understanding RNA dimerization has been hampered by the lack of systematic in vivo detection methods. Here, we show that CLASH, PARIS, and other RNA proximity ligation methods detect RNA homodimers transcriptome-wide as ā€œoverlappingā€ chimeric reads that contain more than one copy of the same sequence. Analyzing published proximity ligation data sets, we show that RNA:RNA homodimers mediated by direct base-pairing are rare across the human transcriptome, but highly enriched in specific transcripts, including U8 snoRNA, U2 snRNA, and a subset of tRNAs. Mutations in the homodimerization domain of U8 snoRNA impede dimerization in vitro and disrupt zebrafish development in vivo, suggesting an evolutionarily conserved role of this domain. Analysis of virus-infected cells reveals homodimerization of SARS-CoV-2 and Zika genomes, mediated by specific palindromic sequences located within protein-coding regions of N gene in SARS-CoV-2 and NS2A gene in Zika. We speculate that regions of viral genomes involved in homodimerization may constitute effective targets for antiviral therapies

    A computational study of nucleosomal binding and alternative isoforms of human transcription factors

    Get PDF
    Eukaryotic transcription factors (TFs) are proteins that bind short DNA motifs and regulate gene transcription. Because genomic DNA is organised into nucleosomes via binding histone octamers, TFs compete with histones for binding DNA. Also, the functions of a TF are mainly defined by its domains; therefore, a TF gene can vary the characteristics of its protein product through the expression of alternative isoforms with different domains. However, the mechanisms of TF-nucleosome interactions and the functional importance of alternative TF isoforms are not fully understood. Here, I address these two problems computationally via the integrative analysis of publicly available in vivo human sequencing data. First, I evaluated a novel, gyre-spanning, mode of TF-nucleosome binding proposed recently by another lab based on in vitro evidence. Analysing the nucleosome occupancy and TF binding in the human genome, I found no evidence of such binding and concluded that it must be extremely rare, if at all present. Secondly, I studied the alternative isoforms of human TFs genome-wide. I found that independently of the gene length and the number of exons, TF genes more efficiently sample the set of possible alternative isoforms than non-TF genes, suggesting the particular importance of alternative isoforms for TFs. Also, I found that TF isoforms without a DNA-binding domain (DBD) are produced by almost a third of all human TFs, tend to be tissue-specific and likely reverse the transcription regulation effect of DBD-containing isoforms. Moreover, I demonstrated that the switches of the highest-expressed TF isoforms across human adult tissues may represent a widespread functional mechanism. Finally, I collected a compendium of human TFs with experimentally characterised alternative isoforms which will hopefully serve as a resource for future studies. In summary, my analysis further developed the fundamental knowledge about the TF-nucleosome interactions and the alternative isoforms of TFs in humans.Open Acces

    Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach

    Get PDF
    We describe a powerful new approach for discovering globally conserved regulatory elements between two genomes. The method is fast, simple and comprehensive, without requiring alignments. Its application to pairs of yeasts, worms, flies and mammals yields a large number of known and novel putative regulatory elements. Many of these are validated by independent biological observations, have spatial and/or orientation biases, are co-conserved with other elements and show surprising conservation across large phylogenetic distances

    Systematic prediction of control proteins and their DNA binding sites

    Get PDF
    We present here the results of a systematic bioinformatics analysis of control (C) proteins, a class of DNA-binding regulators that control time-delayed transcription of their own genes as well as restriction endonuclease genes in many type II restriction-modification systems. More than 290 C protein homologs were identified and DNA-binding sites for āˆ¼70% of new and previously known C proteins were predicted by a combination of phylogenetic footprinting and motif searches in DNA upstream of C protein genes. Additional analysis revealed that a large proportion of C protein genes are translated from leaderless RNA, which may contribute to time-delayed nature of genetic switches operated by these proteins. Analysis of genetic contexts of newly identified C protein genes revealed that they are not exclusively associated with restriction-modification genes; numerous instances of associations with genes originating from mobile genetic elements were observed. These instances might be vestiges of ancient horizontal transfers and indicate that during evolution ancestral restriction-modification system genes were the sites of mobile elements insertions

    The pan-genome of Lactobacillus reuteri strains originating from the pig gastrointestinal tract

    Get PDF
    Background Lactobacillus reuteri is a gut symbiont of a wide variety of vertebrate species that has diversified into distinct phylogenetic clades which are to a large degree host-specific. Previous work demonstrated host specificity in mice and begun to determine the mechanisms by which gut colonisation and host restriction is achieved. However, how L. reuteri strains colonise the gastrointestinal (GI) tract of pigs is unknown. Results To gain insight into the ecology of L. reuteri in the pig gut, the genome sequence of the porcine small intestinal isolate L. reuteri ATCC 53608 was completed and consisted of a chromosome of 1.94 Mbp and two plasmids of 138.5 kbp and 9.09 kbp, respectively. Furthermore, we generated draft genomes of four additional L. reuteri strains isolated from pig faeces or lower GI tract, lp167-67, pg-3b, 20-2 and 3c6, and subjected all five genomes to a comparative genomic analysis together with the previously completed genome of strain I5007. A phylogenetic analysis based on whole genomes showed that porcine L. reuteri strains fall into two distinct clades, as previously suggested by multi-locus sequence analysis. These six pig L. reuteri genomes contained a core set of 1364 orthologous gene clusters, as determined by OrthoMCL analysis, that contributed to a pan-genome totalling 3373 gene clusters. Genome comparisons of the six pig L. reuteri strains with 14ā€‰L. reuteri strains from other host origins gave a total pan-genome of 5225 gene clusters that included a core genome of 851 gene clusters but revealed that there were no pig-specific genes per se. However, genes specific for and conserved among strains of the two pig phylogenetic lineages were detected, some of which encoded cell surface proteins that could contribute to the diversification of the two lineages and their observed host specificity. Conclusions This study extends the phylogenetic analysis of L. reuteri strains at a genome-wide level, pointing to distinct evolutionary trajectories of porcine L. reuteri lineages, and providing new insights into the genomic events in L. reuteri that occurred during specialisation to their hosts. The occurrence of two distinct pig-derived clades may reflect differences in host genotype, environmental factors such as dietary components or to evolution from ancestral strains of human and rodent origin following contact with pig populations

    Novel Sequence-Based Method for Identifying Transcription Factor Binding Sites in Prokaryotic Genomes

    Get PDF
    Computational techniques for microbial genomic sequence analysis are becoming increasingly important. With nextā€“generation sequencing technology and the human microbiome project underway, current sequencing capacity is significantly greater than the speed at which organisms of interest can be experimentally probed. We have developed a method that will primarily use available sequence data in order to determine prokaryotic transcription factor binding specificities. The prototypical prokaryotic transcription factor: TF) contains a helixā€“turnā€“helix: HTH) fold and bind DNA as homodimers, leading to their palindromic motif specificities. The connection between the TF and its promoter is based on the autoregulation phenomenon noticed in E. coli. Approximately 55% of the TFs analyzed were estimated to be autoregulated. Our preliminary analysis using RegulonDB indicates that this value increases to 79% if one considers the neighboring operons. Given the TF family of interest, it is necessary to find the relevant TF proteins and their associated genomes. Due to the scaleā€“free network topology of prokaryotic systems, many of the transcriptional regulators regulate only one or a few operons. Within a single genome, there would not be enough sequenceā€“based signal to determine the binding site using standard computational methods. Therefore, multiple bacterial genomes are used to overcome this lack of signal within a single genome. We use a distanceā€“based criteria to define the operon boundaries and their respective promoters. Several TFā€“DNA crystal structures are then used to determine the residues that interact with the DNA. These key residues are the basis for the TF comparison metric; the assumption being that similar residues should impart similar DNA binding specificities. After defining the sets of TF clusters using this metric, their respective promoters are used as input to a motif finding procedure. This method has currently been tested on the LacI and TetR TF families with successful results. On external validation sets, the specificity of prediction is āˆ¼80%. These results are important in developing methods to define the DNA binding preferences of the TF protein residues, known as the ā€œrecognition codeā€. This ā€œrecognition codeā€ would allow computational design and prediction of novel DNAā€“binding specificities, enabling protein-engineering and synthetic biology applications

    Real sequence effects on the search dynamics of transcription factors on DNA

    Get PDF
    Recent experiments show that transcription factors (TFs) indeed use the facilitated diffusion mechanism to locate their target sequences on DNA in living bacteria cells: TFs alternate between sliding motion along DNA and relocation events through the cytoplasm. From simulations and theoretical analysis we study the TF-sliding motion for a large section of the DNA-sequence of a common E. coli strain, based on the two-state TF-model with a fast-sliding search state and a recognition state enabling target detection. For the probability to detect the target before dissociating from DNA the TF-search times self-consistently depend heavily on whether or not an auxiliary operator (an accessible sequence similar to the main operator) is present in the genome section. Importantly, within our model the extent to which the interconversion rates between search and recognition states depend on the underlying nucleotide sequence is varied. A moderate dependence maximises the capability to distinguish between the main operator and similar sequences. Moreover, these auxiliary operators serve as starting points for DNA looping with the main operator, yielding a spectrum of target detection times spanning several orders of magnitude. Auxiliary operators are shown to act as funnels facilitating target detection by TFs.Comment: 26 pages, 7 figure

    Longest substring palindrome after edit

    Get PDF
    It is known that the length of the longest substring palindromes (LSPals) of a given string T of length n can be computed in O(n) time by Manacher\u27s algorithm [J. ACM \u2775]. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(log (min {sigma, log n })) time after single character substitution, insertion, or deletion, where sigma denotes the number of distinct characters appearing in T. We also propose an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(l + log n) time, after an existing substring in T is replaced by a string of arbitrary length l

    Variable structure motifs for transcription factor binding sites.

    Get PDF
    BACKGROUND: Classically, models of DNA-transcription factor binding sites (TFBSs) have been based on relatively few known instances and have treated them as sites of fixed length using position weight matrices (PWMs). Various extensions to this model have been proposed, most of which take account of dependencies between the bases in the binding sites. However, some transcription factors are known to exhibit some flexibility and bind to DNA in more than one possible physical configuration. In some cases this variation is known to affect the function of binding sites. With the increasing volume of ChIP-seq data available it is now possible to investigate models that incorporate this flexibility. Previous work on variable length models has been constrained by: a focus on specific zinc finger proteins in yeast using restrictive models; a reliance on hand-crafted models for just one transcription factor at a time; and a lack of evaluation on realistically sized data sets. RESULTS: We re-analysed binding sites from the TRANSFAC database and found motivating examples where our new variable length model provides a better fit. We analysed several ChIP-seq data sets with a novel motif search algorithm and compared the results to one of the best standard PWM finders and a recently developed alternative method for finding motifs of variable structure. All the methods performed comparably in held-out cross validation tests. Known motifs of variable structure were recovered for p53, Stat5a and Stat5b. In addition our method recovered a novel generalised version of an existing PWM for Sp1 that allows for variable length binding. This motif improved classification performance. CONCLUSIONS: We have presented a new gapped PWM model for variable length DNA binding sites that is not too restrictive nor over-parameterised. Our comparison with existing tools shows that on average it does not have better predictive accuracy than existing methods. However, it does provide more interpretable models of motifs of variable structure that are suitable for follow-up structural studies. To our knowledge, we are the first to apply variable length motif models to eukaryotic ChIP-seq data sets and consequently the first to show their value in this domain. The results include a novel motif for the ubiquitous transcription factor Sp1.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are
    • ā€¦
    corecore