124 research outputs found

    Probabilistic Clustering of Sequences: Inferring new bacterial regulons by comparative genomics

    Full text link
    Genome wide comparisons between enteric bacteria yield large sets of conserved putative regulatory sites on a gene by gene basis that need to be clustered into regulons. Using the assumption that regulatory sites can be represented as samples from weight matrices we derive a unique probability distribution for assignments of sites into clusters. Our algorithm, 'PROCSE' (probabilistic clustering of sequences), uses Monte-Carlo sampling of this distribution to partition and align thousands of short DNA sequences into clusters. The algorithm internally determines the number of clusters from the data, and assigns significance to the resulting clusters. We place theoretical limits on the ability of any algorithm to correctly cluster sequences drawn from weight matrices (WMs) when these WMs are unknown. Our analysis suggests that the set of all putative sites for a single genome (e.g. E. coli) is largely inadequate for clustering. When sites from different genomes are combined and all the homologous sites from the various species are used as a block, clustering becomes feasible. We predict 50-100 new regulons as well as many new members of existing regulons, potentially doubling the number of known regulatory sites in E. coli.Comment: 27 pages including 9 figures and 3 table

    Effects of Dicer and Argonaute down-regulation on mRNA levels in human HEK293 cells

    Get PDF
    RNA interference and the microRNA (miRNA) pathway can induce sequence-specific mRNA degradation and/or translational repression. The human genome encodes hundreds of miRNAs that can post-transcriptionally repress thousands of genes. Using reporter constructs, we observed that degradation of mRNAs bearing sites imperfectly complementary to the endogenous let-7 miRNA is considerably stronger in human HEK293 than HeLa cells. The degradation did not result from the Ago2-mediated endonucleolytic cleavage but it was Dicer- and Ago2-dependent. We used this feature of HEK293 to address the size of a pool of transcripts regulated by RNA silencing in a single cell type. We generated HEK293 cell lines depleted of Dicer or individual Ago proteins. The cell lines were used for microarray analyses to obtain a comprehensive picture of RNA silencing. The 3'-untranslated region sequences of a few hundred transcripts that were commonly up-regulated upon Ago2 and Dicer knock-downs showed a significant enrichment of putative miRNA-binding sites. The up-regulation upon Ago2 and Dicer knock-downs was moderate and we found no evidence, at the mRNA level, for activation of silenced genes. Taken together, our data suggest that, independent of the effect on translation, miRNAs affect levels of a few hundred mRNAs in HEK293 cells

    Structural and functional implications of the QUA2 domain on RNA recognition by GLD-1

    Get PDF
    The STAR family comprises ribonucleic acid (RNA)-binding proteins that play key roles in RNA-regulatory processes. RNA recognition is achieved by a KH domain with an additional α-helix (QUA2) that seems to extend the RNA-binding surface to six nucleotides for SF1 (Homo sapiens) and seven nucleotides for GLD-1 (Caenorhabditis elegans). To understand the structural basis of this probable difference in specificity, we determined the solution structure of GLD-1 KH-QUA2 with the complete consensus sequence identified in the tra-2 gene. Compared to SF1, the GLD-1 KH-QUA2 interface adopts a different conformation resulting indeed in an additional sequence-specific binding pocket for a uracil at the 5′end. The functional relevance of this binding pocket is emphasized by our bioinformatics analysis showing that GLD-1 binding sites with this 5′end uracil are more predictive for the functional response of the messenger RNAs to gld-1 knockout. We further reveal the importance of the KH-QUA2 interface in vitro and that its alteration in vivo affects the level of translational repression dependent on the sequence of the GLD-1 binding motif. In conclusion, we demonstrate that the QUA2 domain distinguishes GLD-1 from other members of the STAR family and contributes more generally to the modulation of RNA-binding affinity and specificity of KH domain containing protein

    Fifteen years SIB Swiss Institute of Bioinformatics: life science databases, tools and support

    Get PDF
    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) was created in 1998 as an institution to foster excellence in bioinformatics. It is renowned worldwide for its databases and software tools, such as UniProtKB/Swiss-Prot, PROSITE, SWISS-MODEL, STRING, etc, that are all accessible on ExPASy.org, SIB's Bioinformatics Resource Portal. This article provides an overview of the scientific and training resources SIB has consistently been offering to the life science community for more than 15 year

    Fifteen years SIB Swiss Institute of Bioinformatics: life science databases, tools and support.

    Get PDF
    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) was created in 1998 as an institution to foster excellence in bioinformatics. It is renowned worldwide for its databases and software tools, such as UniProtKB/Swiss-Prot, PROSITE, SWISS-MODEL, STRING, etc, that are all accessible on ExPASy.org, SIB's Bioinformatics Resource Portal. This article provides an overview of the scientific and training resources SIB has consistently been offering to the life science community for more than 15 years

    CLIP and complementary methods

    Get PDF
    RNA molecules start assembling into ribonucleoprotein (RNP) complexes during transcription. Dynamic RNP assembly, largely directed by cis-acting elements on the RNA, coordinates all processes in which the RNA is involved. To identify the sites bound by a specific RNA-binding protein on endogenous RNAs, cross-linking and immunoprecipitation (CLIP) and complementary, proximity-based methods have been developed. In this Primer, we discuss the main variants of these protein-centric methods and the strategies for their optimization and quality assessment, as well as RNA-centric methods that identify the protein partners of a specific RNA. We summarize the main challenges of computational CLIP data analysis, how to handle various sources of background and how to identify functionally relevant binding regions. We outline the various applications of CLIP and available databases for data sharing. We discuss the prospect of integrating data obtained by CLIP with complementary methods to gain a comprehensive view of RNP assembly and remodelling, unravel the spatial and temporal dynamics of RNPs in specific cell types and subcellular compartments and understand how defects in RNPs can lead to disease. Finally, we present open questions in the field and give directions for further development and applications

    Untargeted sequencing of circulating microRNAs in a healthy and diseased older population

    Get PDF
    We performed untargeted profiling of circulating microRNAs (miRNAs) in a well characterized cohort of older adults to verify associations of health and disease-related biomarkers with systemic miRNA expression. Differential expression analysis revealed 30 miRNAs that significantly differed between healthy active, healthy sedentary and sedentary cardiovascular risk patients. Increased expression of miRNAs miR-193b-5p, miR-122-5p, miR-885-3p, miR-193a-5p, miR-34a-5p, miR-505-3p, miR-194-5p, miR-27b-3p, miR-885-5p, miR-23b-5b, miR-365a-3p, miR-365b-3p, miR-22-5p was associated with a higher metabolic risk profile, unfavourable macro- and microvascular health, lower physical activity (PA) as well as cardiorespiratory fitness (CRF) levels. Increased expression of miR-342-3p, miR-1-3p, miR-92b-5p, miR-454-3p, miR-190a-5p and miR-375-3p was associated with a lower metabolic risk profile, favourable macro- and microvascular health as well as higher PA and CRF. Of note, the first two principal components explained as much as 20% and 11% of the data variance. miRNAs and their potential target genes appear to mediate disease- and health-related physiological and pathophysiological adaptations that need to be validated and supported by further downstream analysis in future studies.Clinical Trial Registration: ClinicalTrials.gov: NCT02796976 ( https://clinicaltrials.gov/ct2/show/NCT02796976 )

    Transcriptional Enhancer Factor Domain Family member 4 Exerts an Oncogenic Role in Hepatocellular Carcinoma by Hippo-Independent Regulation of Heat Shock Protein 70 Family Members.

    Get PDF
    Transcriptional enhancer factor domain family member 4 (TEAD4) is a downstream effector of the conserved Hippo signaling pathway, regulating the expression of genes involved in cell proliferation and differentiation. It is up-regulated in several cancer types and is associated with metastasis and poor prognosis. However, its role in hepatocellular carcinoma (HCC) remains largely unexplored. Using data from The Cancer Genome Atlas, we found that TEAD4 was overexpressed in HCC and was associated with aggressive HCC features and worse outcome. Overexpression of TEAD4 significantly increased proliferation and migration rates in HCC cells in vitro as well as tumor growth in vivo. Additionally, RNA sequencing analysis of TEAD4-overexpressing HCC cells demonstrated that TEAD4 overexpression was associated with the up-regulation of genes involved in epithelial-to-mesenchymal transition, proliferation, and protein-folding pathways. Among the most up-regulated genes following TEAD4 overexpression were the 70-kDa heat shock protein (HSP70) family members HSPA6 and HSPA1A. Chromatin immunoprecipitation-quantitative real-time polymerase chain reaction experiments demonstrated that TEAD4 regulates HSPA6 and HSPA1A expression by directly binding to their promoter and enhancer regions. The pharmacologic inhibition of HSP70 expression in TEAD4-overexpressing cells reduced the effect of TEAD4 on cell proliferation. Finally, by overexpressing TEAD4 in yes-associated protein (YAP)/transcriptional coactivator with PDZ binding motif (TAZ)-knockdown HCC cells, we showed that the effect of TEAD4 on cell proliferation and its regulation of HSP70 expression does not require YAP and TAZ, the main effectors of the Hippo signaling pathway. Conclusion: A novel Hippo-independent mechanism for TEAD4 promotes cell proliferation and tumor growth in HCC by directly regulating HSP70 family members

    Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity are traditionally modeled in statistics by log-linear models. It is challenging to extend these to the high dimensional and potentially sparse data arising in computational biology. An important example, which provides the motivation for this article, is the analysis of so-called full-length cDNA libraries of alternatively spliced genes, where we investigate relationships among the presence of various exons in transcript species.</p> <p>Results</p> <p>We develop methods to perform model selection and parameter estimation in log-linear models for the analysis of sparse contingency tables, to study the interaction of two or more factors. Maximum Likelihood estimation of log-linear model coefficients might not be appropriate because of the presence of zeros in the table's cells, and new methods are required. We propose a computationally efficient ℓ<sub>1</sub>-penalization approach extending the Lasso algorithm to this context, and compare it to other procedures in a simulation study. We then illustrate these algorithms on contingency tables arising from full-length cDNA libraries.</p> <p>Conclusion</p> <p>We propose regularization methods that can be used successfully to detect complex interaction patterns among categorical variables in a broad range of biological problems involving categorical variables.</p
    corecore