125 research outputs found

    A Bayesian Search for Transcriptional Motifs

    Get PDF
    Identifying transcription factor (TF) binding sites (TFBSs) is an important step towards understanding transcriptional regulation. A common approach is to use gaplessly aligned, experimentally supported TFBSs for a particular TF, and algorithmically search for more occurrences of the same TFBSs. The largest publicly available databases of TF binding specificities contain models which are represented as position weight matrices (PWM). There are other methods using more sophisticated representations, but these have more limited databases, or aren't publicly available. Therefore, this paper focuses on methods that search using one PWM per TF. An algorithm, MATCHTM, for identifying TFBSs corresponding to a particular PWM is available, but is not based on a rigorous statistical model of TF binding, making it difficult to interpret or adjust the parameters and output of the algorithm. Furthermore, there is no public description of the algorithm sufficient to exactly reproduce it. Another algorithm, MAST, computes a p-value for the presence of a TFBS using true probabilities of finding each base at each offset from that position. We developed a statistical model, BaSeTraM, for the binding of TFs to TFBSs, taking into account random variation in the base present at each position within a TFBS. Treating the counts in the matrices and the sequences of sites as random variables, we combine this TFBS composition model with a background model to obtain a Bayesian classifier. We implemented our classifier in a package (SBaSeTraM). We tested SBaSeTraM against a MATCHTM implementation by searching all probes used in an experimental Saccharomyces cerevisiae TF binding dataset, and comparing our predictions to the data. We found no statistically significant differences in sensitivity between the algorithms (at fixed selectivity), indicating that SBaSeTraM's performance is at least comparable to the leading currently available algorithm. Our software is freely available at: http://wiki.github.com/A1kmm/sbasetram/building-the-tools

    Comparative analysis of human and mouse transcriptomes of Th17 cell priming

    Get PDF
    Uncontrolled Th17 cell activity is associated with cancer and autoimmune and inflammatory diseases. To validate the potential relevance of mouse models of targeting the Th17 pathway in human diseases we used RNA sequencing to compare the expression of coding and non-coding transcripts during the priming of Th17 cell differentiation in both human and mouse. In addition to already known targets, several transcripts not previously linked to Th17 cell polarization were found in both species. Moreover, a considerable number of human-specific long non-coding RNAs were identified that responded to cytokines stimulating Th17 cell differentiation. We integrated our transcriptomics data with known disease-associated polymorphisms and show that conserved regulation pinpoints genes that are relevant to Th17 cell-mediated human diseases and that can be modelled in mouse. Substantial differences observed in non-coding transcriptomes between the two species as well as increased overlap between Th17 cell-specific gene expression and disease-associated polymorphisms underline the need of parallel analysis of human and mouse models. Comprehensive analysis of genes regulated during Th17 cell priming and their classification to conserved and non-conserved between human and mouse facilitates translational research, pointing out which candidate targets identified in human are worth studying by using in vivo mouse models

    Partial Support for an Interaction Between a Polygenic Risk Score for Major Depressive Disorder and Prenatal Maternal Depressive Symptoms on Infant Right Amygdalar Volumes

    Get PDF
    Psychiatric disease susceptibility partly originates prenatally and is shaped by an interplay of genetic and environmental risk factors. A recent study has provided preliminary evidence that an offspring polygenic risk score for major depressive disorder (PRS-MDD), based on European ancestry, interacts with prenatal maternal depressive symptoms (GxE) on neonatal right amygdalar (US and Asian cohort) and hippocampal volumes (Asian cohort). However, to date, this GxE interplay has only been addressed by one study and is yet unknown for a European ancestry sample. We investigated in 105 Finnish mother-infant dyads (44 female, 11-54 days old) how offspring PRS-MDD interacts with prenatal maternal depressive symptoms (Edinburgh Postnatal Depression Scale, gestational weeks 14, 24, 34) on infant amygdalar and hippocampal volumes. We found a GxE effect on right amygdalar volumes, significant in the main analysis, but nonsignificant after multiple comparison correction and some of the control analyses, whose direction paralleled the US cohort findings. Additional exploratory analyses suggested a sex-specific GxE effect on right hippocampal volumes. Our study is the first to provide support, though statistically weak, for an interplay of offspring PRS-MDD and prenatal maternal depressive symptoms on infant limbic brain volumes in a cohort matched to the PRS-MDD discovery sample

    Robust computational reconstitution – a new method for the comparative analysis of gene expression in tissues and isolated cell fractions

    Get PDF
    BACKGROUND: Biological tissues consist of various cell types that differentially contribute to physiological and pathophysiological processes. Determining and analyzing cell type-specific gene expression under diverse conditions is therefore a central aim of biomedical research. The present study compares gene expression profiles in whole tissues and isolated cell fractions purified from these tissues in patients with rheumatoid arthritis and osteoarthritis. RESULTS: The expression profiles of the whole tissues were compared to computationally reconstituted expression profiles that combine the expression profiles of the isolated cell fractions (macrophages, fibroblasts, and non-adherent cells) according to their relative mRNA proportions in the tissue. The mRNA proportions were determined by trimmed robust regression using only the most robustly-expressed genes (1/3 to 1/2 of all measured genes), i.e. those showing the most similar expression in tissue and isolated cell fractions. The relative mRNA proportions were determined using several different chip evaluation methods, among which the MAS 5.0 signal algorithm appeared to be most robust. The computed mRNA proportions agreed well with the cell proportions determined by immunohistochemistry except for a minor number of outliers. Genes that were either regulated (i.e. differentially-expressed in tissue and isolated cell fractions) or robustly-expressed in all patients were identified using different test statistics. CONCLUSION: Robust Computational Reconstitution uses an intermediate number of robustly-expressed genes to estimate the relative mRNA proportions. This avoids both the exclusive dependence on the robust expression of individual, highly cell type-specific marker genes and the bias towards an equal distribution upon inclusion of all genes for computation

    Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources

    Get PDF
    An important problem in molecular biology is to build a complete understanding of transcriptional regulatory processes in the cell. We have developed a flexible, probabilistic framework to predict TF binding from multiple data sources that differs from the standard hypothesis testing (scanning) methods in several ways. Our probabilistic modeling framework estimates the probability of binding and, thus, naturally reflects our degree of belief in binding. Probabilistic modeling also allows for easy and systematic integration of our binding predictions into other probabilistic modeling methods, such as expression-based gene network inference. The method answers the question of whether the whole analyzed promoter has a binding site, but can also be extended to estimate the binding probability at each nucleotide position. Further, we introduce an extension to model combinatorial regulation by several TFs. Most importantly, the proposed methods can make principled probabilistic inference from multiple evidence sources, such as, multiple statistical models (motifs) of the TFs, evolutionary conservation, regulatory potential, CpG islands, nucleosome positioning, DNase hypersensitive sites, ChIP-chip binding segments and other (prior) sequence-based biological knowledge. We developed both a likelihood and a Bayesian method, where the latter is implemented with a Markov chain Monte Carlo algorithm. Results on a carefully constructed test set from the mouse genome demonstrate that principled data fusion can significantly improve the performance of TF binding prediction methods. We also applied the probabilistic modeling framework to all promoters in the mouse genome and the results indicate a sparse connectivity between transcriptional regulators and their target promoters. To facilitate analysis of other sequences and additional data, we have developed an on-line web tool, ProbTF, which implements our probabilistic TF binding prediction method using multiple data sources. Test data set, a web tool, source codes and supplementary data are available at: http://www.probtf.org

    Peripheral blood DNA methylation differences in twin pairs discordant for Alzheimer's disease

    Get PDF
    Background Alzheimer's disease results from a neurodegenerative process that starts well before the diagnosis can be made. New prognostic or diagnostic markers enabling early intervention into the disease process would be highly valuable. Environmental and lifestyle factors largely modulate the disease risk and may influence the pathogenesis through epigenetic mechanisms, such as DNA methylation. As environmental and lifestyle factors may affect multiple tissues of the body, we hypothesized that the disease-associated DNA methylation signatures are detectable in the peripheral blood of discordant twin pairs. Results Comparison of 23 disease discordant Finnish twin pairs with reduced representation bisulfite sequencing revealed peripheral blood DNA methylation differences in 11 genomic regions with at least 15.0% median methylation difference and FDR adjusted p value Conclusions DNA methylation differences can be detected in the peripheral blood of twin pairs discordant for Alzheimer's disease. These DNA methylation signatures may have value as disease markers and provide insights into the molecular mechanisms of pathogenesis. We found no evidence that the DNA methylation marks would be associated with gene expression in blood. Further studies are needed to elucidate the potential importance of the associated genes in neuronal functions and to validate the prognostic or diagnostic value of the individual marks or marker panels.</p

    Reconstruction and Validation of RefRec: A Global Model for the Yeast Molecular Interaction Network

    Get PDF
    Molecular interaction networks establish all cell biological processes. The networks are under intensive research that is facilitated by new high-throughput measurement techniques for the detection, quantification, and characterization of molecules and their physical interactions. For the common model organism yeast Saccharomyces cerevisiae, public databases store a significant part of the accumulated information and, on the way to better understanding of the cellular processes, there is a need to integrate this information into a consistent reconstruction of the molecular interaction network. This work presents and validates RefRec, the most comprehensive molecular interaction network reconstruction currently available for yeast. The reconstruction integrates protein synthesis pathways, a metabolic network, and a protein-protein interaction network from major biological databases. The core of the reconstruction is based on a reference object approach in which genes, transcripts, and proteins are identified using their primary sequences. This enables their unambiguous identification and non-redundant integration. The obtained total number of different molecular species and their connecting interactions is ∼67,000. In order to demonstrate the capacity of RefRec for functional predictions, it was used for simulating the gene knockout damage propagation in the molecular interaction network in ∼590,000 experimentally validated mutant strains. Based on the simulation results, a statistical classifier was subsequently able to correctly predict the viability of most of the strains. The results also showed that the usage of different types of molecular species in the reconstruction is important for accurate phenotype prediction. In general, the findings demonstrate the benefits of global reconstructions of molecular interaction networks. With all the molecular species and their physical interactions explicitly modeled, our reconstruction is able to serve as a valuable resource in additional analyses involving objects from multiple molecular -omes. For that purpose, RefRec is freely available in the Systems Biology Markup Language format
    • …
    corecore