128 research outputs found

    A functional hierarchical organization of the protein sequence space

    Get PDF
    BACKGROUND: It is a major challenge of computational biology to provide a comprehensive functional classification of all known proteins. Most existing methods seek recurrent patterns in known proteins based on manually-validated alignments of known protein families. Such methods can achieve high sensitivity, but are limited by the necessary manual labor. This makes our current view of the protein world incomplete and biased. This paper concerns ProtoNet, a automatic unsupervised global clustering system that generates a hierarchical tree of over 1,000,000 proteins, based solely on sequence similarity. RESULTS: In this paper we show that ProtoNet correctly captures functional and structural aspects of the protein world. Furthermore, a novel feature is an automatic procedure that reduces the tree to 12% its original size. This procedure utilizes only parameters intrinsic to the clustering process. Despite the substantial reduction in size, the system's predictive power concerning biological functions is hardly affected. We then carry out an automatic comparison with existing functional protein annotations. Consequently, 78% of the clusters in the compressed tree (5,300 clusters) get assigned a biological function with a high confidence. The clustering and compression processes are unsupervised, and robust. CONCLUSIONS: We present an automatically generated unbiased method that provides a hierarchical classification of all currently known proteins

    DNA-guided establishment of canonical nucleosome patterns in a eukaryotic genome [preprint]

    Get PDF
    A conserved hallmark of eukaryotic chromatin architecture is the distinctive array of well-positioned nucleosomes downstream of transcription start sites (TSS). Recent studies indicate that trans-acting factors establish this stereotypical array. Here, we present the first genome-wide in vitro and in vivo nucleosome maps for the ciliate Tetrahymena thermophila. In contrast with previous studies in yeast, we find that the stereotypical nucleosome array is preserved in the in vitro reconstituted map, which is governed only by the DNA sequence preferences of nucleosomes. Remarkably, this average in vitro pattern arises from the presence of subsets of nucleosomes, rather than the whole array, in individual Tetrahymena genes. Variation in GC content contributes to the positioning of these sequence-directed nucleosomes, and affects codon usage and amino acid composition in genes. We propose that these β€˜seed’ nucleosomes may aid the AT-rich Tetrahymena genome – which is intrinsically unfavorable for nucleosome formation – in establishing nucleosome arrays in vivo in concert with trans-acting factors, while minimizing changes to the coding sequences they are embedded within

    Distinct Modes of Regulation by Chromatin Encoded through Nucleosome Positioning Signals

    Get PDF
    The detailed positions of nucleosomes profoundly impact gene regulation and are partly encoded by the genomic DNA sequence. However, less is known about the functional consequences of this encoding. Here, we address this question using a genome-wide map of ∼380,000 yeast nucleosomes that we sequenced in their entirety. Utilizing the high resolution of our map, we refine our understanding of how nucleosome organizations are encoded by the DNA sequence and demonstrate that the genomic sequence is highly predictive of the in vivo nucleosome organization, even across new nucleosome-bound sequences that we isolated from fly and human. We find that Poly(dA:dT) tracts are an important component of these nucleosome positioning signals and that their nucleosome-disfavoring action results in large nucleosome depletion over them and over their flanking regions and enhances the accessibility of transcription factors to their cognate sites. Our results suggest that the yeast genome may utilize these nucleosome positioning signals to regulate gene expression with different transcriptional noise and activation kinetics and DNA replication with different origin efficiency. These distinct functions may be achieved by encoding both relatively closed (nucleosome-covered) chromatin organizations over some factor binding sites, where factors must compete with nucleosomes for DNA access, and relatively open (nucleosome-depleted) organizations over other factor sites, where factors bind without competition

    Sources of marine debris for Seychelles and other remote islands in the western Indian Ocean

    Get PDF
    Vast quantities of debris are beaching at remote islands in the western Indian Ocean. We carry out marine dispersal simulations incorporating currents, waves, winds, beaching, and sinking, for both terrestrial and marine sources of debris, to predict where this debris comes from. Our results show that most terrestrial debris beaching at these remote western Indian Ocean islands drifts from Indonesia, India, and Sri Lanka. Debris associated with fisheries and shipping also poses a major risk. Debris accumulation at Seychelles is likely seasonal, peaking during February–April. This pattern is driven by monsoonal winds and may be amplified during positive Indian Ocean Dipole and El-NiΓ±o events. Our results underline the vulnerability of small island states to marine plastic pollution, and are a crucial step towards improved management of the issue. The trajectories used in this study are available for download, and our analyses can be rerun under different parameter choices.journal articl

    High Nucleosome Occupancy Is Encoded at Human Regulatory Sequences

    Get PDF
    Active eukaryotic regulatory sites are characterized by open chromatin, and yeast promoters and transcription factor binding sites (TFBSs) typically have low intrinsic nucleosome occupancy. Here, we show that in contrast to yeast, DNA at human promoters, enhancers, and TFBSs generally encodes high intrinsic nucleosome occupancy. In most cases we examined, these elements also have high experimentally measured nucleosome occupancy in vivo. These regions typically have high G+C content, which correlates positively with intrinsic nucleosome occupancy, and are depleted for nucleosome-excluding poly-A sequences. We propose that high nucleosome preference is directly encoded at regulatory sequences in the human genome to restrict access to regulatory information that will ultimately be utilized in only a subset of differentiated cells

    Two Components of Long-Distance Extraction: Successive Cyclicity in Dinka

    Get PDF
    This article presents novel data from the Nilotic language Dinka, in which the syntax of successive-cyclic movement is remarkably transparent. We show that Dinka provides strong support for the view that long-distance extraction proceeds through the edge of every verb phrase and every clause on the path of movement (Chomsky 1986, 2000, 2001, 2008). In addition, long-distance dependencies in Dinka offer evidence that extraction from a CP requires agreement between v and the CP that is extracted from (Rackowski and Richards 2005, Den Dikken 2009b, 2012a,b). The claim that both of these components constrain long-distance movement is important, as much contemporary work on extraction incorporates only one of them. To accommodate this conclusion, we propose a modification of Rackowski and Richards 2005, in which both intermediate movement and Agree relations between phase heads are necessary steps in establishing a long-distance dependency

    Gene expression divergence in yeast is coupled to evolution of DNA-encoded nucleosome organization

    Get PDF
    Eukaryotic transcription occurs within a chromatin environment, whose organization plays an important regulatory role and is partly encoded in cis by the DNA sequence itself1-6. Here, we examine whether evolutionary changes in gene expression are linked to changes in the DNA-encoded nucleosome organization of promoters. We find that in aerobic yeast species, where cellular respiration genes are active under typical growth conditions, the promoter sequences of these genes encode a relatively open (nucleosome-depleted) chromatin organization. This nucleosome-depleted organization requires only DNA sequence information, is independent of any co-factors and of transcription, and is a general property of growth-related genes. In contrast, in anaerobic yeast species, where cellular respiration genes are inactive under typical growth conditions, respiration gene promoters encode relatively closed (nucleosome-occupied) chromatin organizations. Thus, our results suggest a previously unidentified genetic mechanism underlying phenotypic diversity, consisting of DNA sequence changes that directly alter the DNA-encoded nucleosome organization of promoters
    • …
    corecore