4,348 research outputs found

    Improved analysis of (e)CLIP data with RCRUNCH yields a compendium of RNA-binding protein binding sites and motifs

    Get PDF
    We present RCRUNCH, an end-to-end solution to CLIP data analysis for identification of binding sites and sequence specificity of RNA-binding proteins. RCRUNCH can analyze not only reads that map uniquely to the genome but also those that map to multiple genome locations or across splice boundaries and can consider various types of background in the estimation of read enrichment. By applying RCRUNCH to the eCLIP data from the ENCODE project, we have constructed a comprehensive and homogeneous resource of in-vivo-bound RBP sequence motifs. RCRUNCH automates the reproducible analysis of CLIP data, enabling studies of post-transcriptional control of gene expression

    The first peptides: the evolutionary transition between prebiotic amino acids and early proteins

    Get PDF
    The issues we attempt to tackle here are what the first peptides did look like when they emerged on the primitive earth, and what simple catalytic activities they fulfilled. We conjecture that the early functional peptides were short (3 to 8 amino acids long), were made of those amino acids, Gly, Ala, Val and Asp, that are abundantly produced in many prebiotic synthesis experiments and observed in meteorites, and that the neutralization of Asp's negative charge is achieved by metal ions. We further assume that some traces of these prebiotic peptides still exist, in the form of active sites in present-day proteins. Searching these proteins for prebiotic peptide candidates led us to identify three main classes of motifs, bound mainly to Mg^{2+} ions: D(F/Y)DGD corresponding to the active site in RNA polymerases, DGD(G/A)D present in some kinds of mutases, and DAKVGDGD in dihydroxyacetone kinase. All three motifs contain a DGD submotif, which is suggested to be the common ancestor of all active peptides. Moreover, all three manipulate phosphate groups, which was probably a very important biological function in the very first stages of life. The statistical significance of our results is supported by the frequency of these motifs in today's proteins, which is three times higher than expected by chance, with a P-value of 3 10^{-2}. The implications of our findings in the context of the appearance of life and the possibility of an experimental validation are discussed.Comment: 22 pages, 2 figures, J. Theor. Biol. (2009) in pres

    A Multi-Layered Study on Harmonic Oscillations in Mammalian Genomics and Proteomics

    Get PDF
    Cellular, organ, and whole animal physiology show temporal variation predominantly featuring 24-h (circadian) periodicity. Time-course mRNA gene expression profiling in mouse liver showed two subsets of genes oscillating at the second (12-h) and third (8-h) harmonic of the prime (24-h) frequency. The aim of our study was to identify specific genomic, proteomic, and functional properties of ultradian and circadian subsets. We found hallmarks of the three oscillating gene subsets, including different (i) functional annotation, (ii) proteomic and electrochemical features, and (iii) transcription factor binding motifs in upstream regions of 8-h and 12-h oscillating genes that seemingly allow the link of the ultradian gene sets to a known circadian network. Our multifaceted bioinformatics analysis of circadian and ultradian genes suggests that the different rhythmicity of gene expression impacts physiological outcomes and may be related to transcriptional, translational and post-translational dynamics, as well as to phylogenetic and evolutionary components

    Targeting determinants of dosage compensation in Drosophila

    Get PDF
    The dosage compensation complex (DCC) in Drosophila melanogaster is responsible for up-regulating transcription from the single male X chromosome to equal the transcription from the two X chromosomes in females. Visualization of the DCC, a large ribonucleoprotein complex, on male larval polytene chromosomes reveals that the complex binds selectively to many interbands on the X chromosome. The targeting of the DCC is thought to be in part determined by DNA sequences that are enriched on the X. So far, lack of knowledge about DCC binding sites has prevented the identification of sequence determinants. Only three binding sites have been identified to date, but analysis of their DNA sequence did not allow the prediction of further binding sites. We have used chromatin immunoprecipitation to identify a number of new DCC binding fragments and characterized them in vivo by visualizing DCC binding to autosomal insertions of these fragments, and we have demonstrated that they possess a wide range of potential to recruit the DCC. By varying the in vivo concentration of the DCC, we provide evidence that this range of recruitment potential is due to differences in affinity of the complex to these sites. We were also able to establish that DCC binding to ectopic high-affinity sites can allow nearby low-affinity sites to recruit the complex. Using the sequences of the newly identified and previously characterized binding fragments, we have uncovered a number of short sequence motifs, which in combination may contribute to DCC recruitment. Our findings suggest that the DCC is recruited to the X via a number of binding sites of decreasing affinities, and that the presence of high-and moderate-affinity sites on the X may ensure that lower-affinity sites are occupied in a context-dependent manner. Our bioinformatics analysis suggests that DCC binding sites may be composed of variable combinations of degenerate motifs

    The La-Related Proteins, a Family with Connections to Cancer

    Get PDF
    The evolutionarily-conserved La-related protein (LARP) family currently comprises Genuine La, LARP1, LARP1b, LARP4, LARP4b, LARP6 and LARP7. Emerging evidence suggests each LARP has a distinct role in transcription and/or mRNA translation that is attributable to subtle sequence variations within their La modules and specific C-terminal domains. As emerging research uncovers the function of each LARP, it is evident that La, LARP1, LARP6, LARP7 and possibly LARP4a and 4b are dysregulated in cancer. Of these, LARP1 is the first to be demonstrated to drive oncogenesis. Here, we review the role of each LARP and the evidence linking it to malignancy. We discuss a future strategy of targeting members of this protein family as cancer therapy

    Computational prediction of splicing regulatory elements shared by Tetrapoda organisms

    Get PDF
    Background: auxiliary splicing sequences play an important role in ensuring accurate and efficient splicing by promoting or repressing recognition of authentic splice sites. These cis-acting motifs have been termed splicing enhancers and silencers and are located both in introns and exons. They co-evolved into an intricate splicing code together with additional functional constraints, such as tissue-specific and alternative splicing patterns. We used orthologous exons extracted from the University of California Santa Cruz multiple genome alignments of human and 22 Tetrapoda organisms to predict candidate enhancers and silencers that have reproducible and statistically significant bias towards annotated exonic boundaries.Results: a total of 2,546 Tetrapoda enhancers and silencers were clustered into 15 putative core motifs based on their Markov properties. Most of these elements have been identified previously, but 118 putative silencers and 260 enhancers (~15%) were novel. Examination of previously published experimental data for the presence of predicted elements showed that their mutations in 21/23 (91.3%) cases altered the splicing pattern as expected. Predicted intronic motifs flanking 3' and 5' splice sites had higher evolutionary conservation than other sequences within intronic flanks and the intronic enhancers were markedly differed between 3' and 5' intronic flanks.Conclusion: difference in intronic enhancers supporting 5' and 3' splice sites suggests an independent splicing commitment for neighboring exons. Increased evolutionary conservation for ISEs/ISSs within intronic flanks and effect of modulation of predicted elements on splicing suggest functional significance of found elements in splicing regulation. Most of the elements identified were shown to have direct implications in human splicing and therefore could be useful for building computational splicing models in biomedical researc

    Analysis of the Genome of the Sexually Transmitted Insect Virus Helicoverpa zea Nudivirus 2

    Get PDF
    The sexually transmitted insect virus Helicoverpa zea nudivirus 2 (HzNV-2) was determined to have a circular double-stranded DNA genome of 231,621 bp coding for an estimated 113 open reading frames (ORFs). HzNV-2 is most closely related to the nudiviruses, a sister group of the insect baculoviruses. Several putative ORFs that share homology with the baculovirus core genes were identified in the viral genome. However, HzNV-2 lacks several key genetic features of baculoviruses including the late transcriptional regulation factor, LEF-1 and the palindromic hrs, which serve as origins of replication. The HzNV-2 genome was found to code for three ORFs that had significant sequence homology to cellular genes which are not generally found in viral genomes. These included a presumed juvenile hormone esterase gene, a gene coding for a putative zinc-dependent matrix metalloprotease, and a major facilitator superfamily protein gene; all of which are believed to play a role in the cellular proliferation and the tissue hypertrophy observed in the malformation of reproductive organs observed in HzNV-2 infected corn earworm moths, Helicoverpa zea

    Integration of CLIP experiments of RNAbinding proteins: a novel approach to predict context-dependent splicing factors from transcriptomic data

    Get PDF
    Background: Splicing is a genetic process that has important implications in several diseases including cancer. Deciphering the complex rules of splicing regulation is crucial to understand and treat splicing-related diseases. Splicing factors and other RNA-binding proteins (RBPs) play a key role in the regulation of splicing. The specific binding sites of an RBP can be measured using CLIP experiments. However, to unveil which RBPs regulate a condition, it is necessary to have a priori hypotheses, as a single CLIP experiment targets a single protein. Results: In this work, we present a novel methodology to predict context-specific splicing factors from transcriptomic data. For this, we systematically collect, integrate and analyze more than 900 CLIP experiments stored in four CLIP databases: POSTAR2, CLIPdb, DoRiNA and StarBase. The analysis of these experiments shows the strong coherence between the binding sites of RBPs of similar families. Augmenting this information with expression changes, we are able to correctly predict the splicing factors that regulate splicing in two gold-standard experiments in which specific splicing factors are knocked-down. Conclusions: The methodology presented in this study allows the prediction of active splicing factors in either cancer or any other condition by only using the information of transcript expression. This approach opens a wide range of possible studies to understand the splicing regulation of different conditions. A tutorial with the source code and databases is available at https://gitlab.com/fcarazo.m/sfprediction

    SARS-CoV-2 contributes to altering the post-transcriptional regulatory networks across human tissues by sponging RNA binding proteins and micro-RNAs

    Get PDF
    The outbreak of a novel coronavirus SARS-CoV2 responsible for COVID-19 pandemic has caused worldwide public health emergency. Due to the constantly evolving nature of the coronaviruses, SARS-CoV-2 mediated alteration on post-transcriptional gene regulation across human tissues remains elusive. In this study, we systematically dissected the crosstalk and dysregulation of human post-transcriptional regulatory networks governed by RNA binding proteins (RBPs) and micro-RNAs (miRs), due to SARS-CoV-2 infection. We uncovered that 13 out of 29 SARS-CoV- 2 encoded proteins directly interact with 51 human RBPs of which majority of them were abundantly expressed in gonadal tissues and immune cells. We further performed functional analysis of differentially expressed genes in mock treated versus SARS-CoV-2 infected lung cells that revealed an enrichment for immune response, cytokine mediated signaling, and metabolism associated genes. This study also characterized the alternative splicing events in SARS-CoV-2 infected cells compared to control demonstrating that skipped exons and mutually exclusive exons were the most abundant events that potentially contributed to differential outcomes in response to viral infection. Motif enrichment analysis on the RNA genomic sequence of SARS-CoV-2 clearly revealed an enrichment for RBPs such as SRSFs, PCBPs, ELAVs and HNRNPs illustrating the sponging of RBPs by SARS-CoV-2 genome. Similar analysis to study the interactions of miRs with SARS-CoV-2 revealed the potential for several miRs to be sponged, suggesting that these interactions may contribute to altered pos-transcriptional regulation across human tissues. Given the need to understand the interactions of SARS-CoV-2 with key pos-transcriptional regulators in the human genome, this study provides a systematic analysis to dissect the role of dysregulated post-transcriptional regulatory networks controlled by RBPs and miRs, across tissues types during SARS-CoV2 infection.This work was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R01GM123314 (SCJ). We also thank the lab members for their valuable suggestions and supporting dataset required for completion of this project
    • …
    corecore