217 research outputs found

    Protein-RNA interactions: a structural analysis

    Get PDF
    A detailed computational analysis of 32 protein-RNA complexes is presented. A number of physical and chemical properties of the intermolecular interfaces are calculated and compared with those observed in protein-double-stranded DNA and protein-single-stranded DNA complexes. The interface properties of the protein-RNA complexes reveal the diverse nature of the binding sites. van der Waals contacts played a more prevalent role than hydrogen bond contacts, and preferential binding to guanine and uracil was observed. The positively charged residue, arginine, and the single aromatic residues, phenylalanine and tyrosine, all played key roles in the RNA binding sites. A comparison between protein-RNA and protein-DNA complexes showed that whilst base and backbone contacts (both hydrogen bonding and van der Waals) were observed with equal frequency in the protein-RNA complexes, backbone contacts were more dominant in the protein-DNA complexes. Although similar modes of secondary structure interactions have been observed in RNA and DNA binding proteins, the current analysis emphasises the differences that exist between the two types of nucleic acid binding protein at the atomic contact level

    Integrated analysis sheds light on evolutionary trajectories of young transcription start sites in the human genome

    Get PDF
    Understanding the molecular mechanisms and evolution of the gene regulatory system remains a major challenge in biology. Transcription start sites (TSSs) are especially interesting because they are central to initiating gene expression. Previous studies revealed widespread transcription initiation and fast turnover of TSSs in mammalian genomes. Yet, how new TSSs originate and how they evolve over time remain poorly understood. To address these questions, we analyzed ∼200,000 human TSSs by integrating evolutionary (inter- and intra-species) and functional genomic data, particularly focusing on evolutionarily young TSSs that emerged in the primate lineage. TSSs were grouped according to their evolutionary age using sequence alignment information as a proxy. Comparisons of young and old TSSs revealed that (1) new TSSs emerge through a combination of intrinsic factors, like the sequence properties of transposable elements and tandem repeats, and extrinsic factors such as their proximity to existing regulatory modules; (2) new TSSs undergo rapid evolution that reduces the inherent instability of repeat sequences associated with a high propensity of TSS emergence; and (3) once established, the transcriptional competence of surviving TSSs is gradually enhanced, with evolutionary changes subject to temporal (fewer regulatory changes in younger TSSs) and spatial constraints (fewer regulatory changes in more isolated TSSs). These findings advance our understanding of how regulatory innovations arise in the genome throughout evolution and highlight the genomic robustness and evolvability in these processes

    Backmasking in the yeast genome: encoding overlapping information for protein-coding and RNA degradation

    Get PDF
    Backmasking is a recording technique used to hide a sound or message in a music track in reverse, meaning that it is only audible when the record is played backwards. Analogously, the compact yeast genome encodes for diverse sources of information such as overlapping coding and non-coding transcripts, and protein-binding sites on the two complementary DNA strands. Examples are the consensus binding site sequences of the RNA-binding proteins Nrd1 and Nab3 that target non-coding transcripts for degradation. Here, by examining the overlap of stable (SUTs, stable unannotated transcripts) and unstable (CUTs, cryptic unstable transcripts) transcripts with protein-coding genes, we show that the predicted Nrd1 and Nab3-binding site sequences occur at differing frequencies. They are always depleted in the sense direction of protein-coding genes, thus avoiding degradation of the transcript. However in the antisense direction, predicted binding sites occur at high frequencies in genes with overlapping unstable ncRNAs (CUTs), so limiting the availability of non-functional transcripts. In contrast they are depleted in genes with overlapping stable ncRNAs (SUTs), presumably to avoid degrading the non-coding transcript. The protein-coding genes maintain similar amino-acid contents, but they display distinct codon usages so that Nrd1 and Nab3-binding sites can arise at differing frequencies in antisense depending on the overlapping transcript type. Our study demonstrates how yeast has evolved to encode multiple layers of information—protein-coding genes in one strand and the relative chance of degrading antisense RNA in the other strand—in the same regions of a compact genome

    Finding cell-specific expression patterns in the early Ciona embryo with single-cell RNA-seq

    Get PDF
    Single-cell RNA-seq has been established as a reliable and accessible technique enabling new types of analyses, such as identifying cell types and studying spatial and temporal gene expression variation and change at single-cell resolution. Recently, single-cell RNA-seq has been applied to developing embryos, which offers great potential for finding and characterising genes controlling the course of development along with their expression patterns. In this study, we applied single-cell RNA-seq to the 16-cell stage of the Ciona embryo, a marine chordate and performed a computational search for cell-specific gene expression patterns. We recovered many known expression patterns from our single-cell RNA-seq data and despite extensive previous screens, we succeeded in finding new cell-specific patterns, which we validated by in situ and single-cell qPCR

    Genomic landscape of oxidative DNA damage and repair reveals regioselective protection from mutagenesis

    Get PDF
    BACKGROUND: DNA is subject to constant chemical modification and damage, which eventually results in variable mutation rates throughout the genome. Although detailed molecular mechanisms of DNA damage and repair are well understood, damage impact and execution of repair across a genome remain poorly defined. RESULTS: To bridge the gap between our understanding of DNA repair and mutation distributions, we developed a novel method, AP-seq, capable of mapping apurinic sites and 8-oxo-7,8-dihydroguanine bases at approximately 250-bp resolution on a genome-wide scale. We directly demonstrate that the accumulation rate of apurinic sites varies widely across the genome, with hot spots acquiring many times more damage than cold spots. Unlike single nucleotide variants (SNVs) in cancers, damage burden correlates with marks for open chromatin notably H3K9ac and H3K4me2. Apurinic sites and oxidative damage are also highly enriched in transposable elements and other repetitive sequences. In contrast, we observe a reduction at chromatin loop anchors with increased damage load towards inactive compartments. Less damage is found at promoters, exons, and termination sites, but not introns, in a seemingly transcription-independent but GC content-dependent manner. Leveraging cancer genomic data, we also find locally reduced SNV rates in promoters, coding sequence, and other functional elements. CONCLUSIONS: Our study reveals that oxidative DNA damage accumulation and repair differ strongly across the genome, but culminate in a previously unappreciated mechanism that safeguards the regulatory and coding regions of genes from mutations

    High-resolution analysis of cell-state transitions in yeast suggests widespread transcriptional tuning by alternative starts

    Get PDF
    Background: The start and end sites of messenger RNAs (TSSs and TESs) are highly regulated, often in a cell-type-specific manner. Yet the contribution of transcript diversity in regulating gene expression remains largely elusive. We perform an integrative analysis of multiple highly synchronized cell-fate transitions and quantitative genomic techniques in Saccharomyces cerevisiae to identify regulatory functions associated with transcribing alternative isoforms. Results: Cell-fate transitions feature widespread elevated expression of alternative TSS and, to a lesser degree, TES usage. These dynamically regulated alternative TSSs are located mostly upstream of canonical TSSs, but also within gene bodies possibly encoding for protein isoforms. Increased upstream alternative TSS usage is linked to various effects on canonical TSS levels, which range from co-activation to repression. We identified two key features linked to these outcomes: an interplay between alternative and canonical promoter strengths, and distance between alternative and canonical TSSs. These two regulatory properties give a plausible explanation of how locally transcribed alternative TSSs control gene transcription. Additionally, we find that specific chromatin modifiers Set2, Set3, and FACT play an important role in mediating gene repression via alternative TSSs, further supporting that the act of upstream transcription drives the local changes in gene transcription. Conclusions: The integrative analysis of multiple cell-fate transitions suggests the presence of a regulatory control system of alternative TSSs that is important for dynamic tuning of gene expression. Our work provides a framework for understanding how TSS heterogeneity governs eukaryotic gene expression, particularly during cell-fate changes

    Transcription levels of a noncoding RNA orchestrate opposing regulatory and cell fate outcomes in yeast

    Get PDF
    Transcription through noncoding regions of the genome is pervasive. How these transcription events regulate gene expression remains poorly understood. Here, we report that, in S. cerevisiae, the levels of transcription through a noncoding region, IRT2, located upstream in the promoter of the inducer of meiosis, IME1, regulate opposing chromatin and transcription states. At low levels, the act of IRT2 transcription promotes histone exchange, delivering acetylated histone H3 lysine 56 to chromatin locally. The subsequent open chromatin state directs transcription factor recruitment and induces downstream transcription to repress the IME1 promoter and meiotic entry. Conversely, increasing transcription turns IRT2 into a repressor by promoting transcription-coupled chromatin assembly. The two opposing functions of IRT2 transcription shape a regulatory circuit, which ensures a robust cell-type-specific control of IME1 expression and yeast meiosis. Our data illustrate how intergenic transcription levels are key to controlling local chromatin state, gene expression, and cell fate outcomes

    H3S28P Antibody Staining of Okinawan Oikopleura dioica Suggests the Presence of Three Chromosomes [version 2; peer review: 2 approved]

    Get PDF
    Oikopleura dioica is a ubiquitous marine zooplankton of biological interest owing to features that include dioecious reproduction, a short life cycle, conserved chordate body plan, and a compact genome. It is an important tunicate model for evolutionary and developmental research, as well as investigations into marine ecosystems. The genome of north Atlantic O. dioica comprises three chromosomes. However, comparisons with the genomes of O. dioica sampled from mainland and southern Japan revealed extensive sequence differences. Moreover, historical studies have reported widely varying chromosome counts. We recently initiated a project to study the genomes of O. dioica individuals collected from the coastline of the Ryukyu (Okinawa) Islands in southern Japan. Given the potentially large extent of genomic diversity, we employed karyological techniques to count individual animals’ chromosomes in situ using centromere-specific antibodies directed against H3S28P, a prophase-metaphase cell cycle-specific marker of histone H3. Epifluorescence and confocal images were obtained of embryos and oocytes stained with two commercial anti-H3S28P antibodies (Abcam ab10543 and Thermo Fisher 07-145). The data lead us to conclude that diploid cells from Okinawan O. dioica contain three pairs of chromosomes, in line with the north Atlantic populations. The finding facilitates the telomere-to-telomere assembly of Okinawan O. dioica genome sequences and gives insight into the genomic diversity of O. dioica from different geographical locations. The data deposited in the EBI BioImage Archive provide representative images of the antibodies’ staining properties for use in epifluorescent and confocal based fluorescent microscopy

    Chromatin-contact atlas reveals disorder-mediated protein interactions and moonlighting chromatin-associated RBPs

    Get PDF
    RNA-binding proteins (RBPs) play diverse roles in regulating co-transcriptional RNA-processing and chromatin functions, but our knowledge of the repertoire of chromatin-associated RBPs (caRBPs) and their interactions with chromatin remains limited. Here, we developed SPACE (Silica Particle Assisted Chromatin Enrichment) to isolate global and regional chromatin components with high specificity and sensitivity, and SPACEmap to identify the chromatin-contact regions in proteins. Applied to mouse embryonic stem cells, SPACE identified 1459 chromatin-associated proteins, ∼48% of which are annotated as RBPs, indicating their dual roles in chromatin and RNA-binding. Additionally, SPACEmap stringently verified chromatin-binding of 403 RBPs and identified their chromatin-contact regions. Notably, SPACEmap showed that about 40% of the caRBPs bind chromatin by intrinsically disordered regions (IDRs). Studying SPACE and total proteome dynamics from mES cells grown in 2iL and serum medium indicates significant correlation (R = 0.62). One of the most dynamic caRBPs is Dazl, which we find co-localized with PRC2 at transcription start sites of genes that are distinct from Dazl mRNA binding. Dazl and other PRC2-colocalised caRBPs are rich in intrinsically disordered regions (IDRs), which could contribute to the formation and regulation of phase-separated PRC condensates. Together, our approach provides an unprecedented insight into IDR-mediated interactions and caRBPs with moonlighting functions in native chromatin

    Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease

    Get PDF
    Prognostic modelling is important in clinical practice and epidemiology for patient management and research. Electronic health records (EHR) provide large quantities of data for such models, but conventional epidemiological approaches require significant researcher time to implement. Expert selection of variables, fine-tuning of variable transformations and interactions, and imputing missing values are time-consuming and could bias subsequent analysis, particularly given that missingness in EHR is both high, and may carry meaning. Using a cohort of 80,000 patients from the CALIBER programme, we compared traditional modelling and machine-learning approaches in EHR. First, we used Cox models and random survival forests with and without imputation on 27 expert-selected, preprocessed variables to predict all-cause mortality. We then used Cox models, random forests and elastic net regression on an extended dataset with 586 variables to build prognostic models and identify novel prognostic factors without prior expert input. We observed that data-driven models used on an extended dataset can outperform conventional models for prognosis, without data preprocessing or imputing missing values. An elastic net Cox regression based with 586 unimputed variables with continuous values discretised achieved a C-index of 0.801 (bootstrapped 95% CI 0.799 to 0.802), compared to 0.793 (0.791 to 0.794) for a traditional Cox model comprising 27 expert-selected variables with imputation for missing values. We also found that data-driven models allow identification of novel prognostic variables; that the absence of values for particular variables carries meaning, and can have significant implications for prognosis; and that variables often have a nonlinear association with mortality, which discretised Cox models and random forests can elucidate. This demonstrates that machine-learning approaches applied to raw EHR data can be used to build models for use in research and clinical practice, and identify novel predictive variables and their effects to inform future research
    • …
    corecore