111 research outputs found

    Sequential Monte Carlo multiple testing

    Get PDF
    Motivation: In molecular biology, as in many other scientific fields, the scale of analyses is ever increasing. Often, complex Monte Carlo simulation is required, sometimes within a large-scale multiple testing setting. The resulting computational costs may be prohibitively high

    Vitamin D receptor binding, chromatin states and association with multiple sclerosis.

    Get PDF
    Both genetic and environmental factors contribute to the aetiology of multiple sclerosis (MS). More than 50 genomic regions have been associated with MS susceptibility and vitamin D status also influences the risk of this complex disease. However, how these factors interact in disease causation is unclear. We aimed to investigate the relationship between vitamin D receptor (VDR) binding in lymphoblastoid cell lines (LCLs), chromatin states in LCLs and MS-associated genomic regions. Using the Genomic Hyperbrowser, we found that VDR-binding regions overlapped with active regulatory regions [active promoter (AP) and strong enhancer (SE)] in LCLs more than expected by chance [45.3-fold enrichment for SE (P < 2.0e-05) and 63.41-fold enrichment for AP (P < 2.0e-05)]. Approximately 77% of VDR regions were covered by either AP or SE elements. The overlap between VDR binding and regulatory elements was significantly greater in LCLs than in non-immune cells (P < 2.0e-05). VDR binding also occurred within MS regions more than expected by chance (3.7-fold enrichment, P < 2.0e-05). Furthermore, regions of joint overlap SE-VDR and AP-VDR were even more enriched within MS regions and near to several disease-associated genes. These findings provide relevant insights into how vitamin D influences the immune system and the risk of MS through VDR interactions with the chromatin state inside MS regions. Furthermore, the data provide additional evidence for an important role played by B cells in MS. Further analyses in other immune cell types and functional studies are warranted to fully elucidate the role of vitamin D in the immune system

    Bayesian Centroid Estimation for Motif Discovery

    Get PDF
    Biological sequences may contain patterns that are signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We present a Bayesian model that is an extended version of the model adopted by the Gibbs motif sampler, and propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the maximum a posteriori estimator.Comment: 24 pages, 9 figure

    Immunologic Profiling of the Atlantic Salmon Gill by Single Nuclei Transcriptomics

    Get PDF
    ACKNOWLEDGMENTS The authors thank all of the animal staff at Kårvik havbruksstasjonen for their expert care of the research animals, and the University of Manchester Genomics Technology core facility (UK) for performing chromium 10x library preparation for snRNAseq. We also thanks the reviewers for their constructive comments on the original manuscript FUNDING AW is supported by the Tromsø forskningsstiftelse (TFS) grant awarded to DH (TFS2016DH). The Sentinel North Transdisciplinary Research Program Université Laval and UiT awarded to DH supports this work. SW is supported a grant from the Tromsø forskningsstiftelse (TFS) starter grant TFS2016SW. Experimental costs were covered by HFSP grant “Evolution of seasonal timers” RGP0030/2015 awarded to AL and DH. Storage resources were provided by the Norwegian National Infrastructure for Research Data (NIRD, project NS9055K).Peer reviewedPublisher PD

    FITBAR: a web tool for the robust prediction of prokaryotic regulons

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The binding of regulatory proteins to their specific DNA targets determines the accurate expression of the neighboring genes. The <it>in silico </it>prediction of new binding sites in completely sequenced genomes is a key aspect in the deeper understanding of gene regulatory networks. Several algorithms have been described to discriminate against false-positives in the prediction of new binding targets; however none of them has been implemented so far to assist the detection of binding sites at the genomic scale.</p> <p>Results</p> <p>FITBAR (Fast Investigation Tool for Bacterial and Archaeal Regulons) is a web service designed to identify new protein binding sites on fully sequenced prokaryotic genomes. This tool consists in a workbench where the significance of the predictions can be compared using different statistical methods, a feature not found in existing resources. The Local Markov Model and the Compound Importance Sampling algorithms have been implemented to compute the P-value of newly discovered binding sites. In addition, FITBAR provides two optimized genomic scanning algorithms using either log-odds or entropy-weighted position-specific scoring matrices. Other significant features include the production of a detailed genomic context map for each detected binding site and the export of the search results in spreadsheet and portable document formats. FITBAR discovery of a high affinity <it>Escherichia coli </it>NagC binding site was validated experimentally <it>in vitro </it>as well as <it>in vivo </it>and published.</p> <p>Conclusions</p> <p>FITBAR was developed in order to allow fast, accurate and statistically robust predictions of prokaryotic regulons. This feature constitutes the main advantage of this web tool over other matrix search programs and does not impair its performance. The web service is available at <url>http://archaea.u-psud.fr/fitbar</url>.</p

    FunClust: a web server for the identification of structural motifs in a set of non-homologous protein structures

    Get PDF
    The occurrence of very similar structural motifs brought about by different parts of non homologous proteins is often indicative of a common function. Indeed, relatively small local structures can mediate binding to a common partner, be it a protein, a nucleic acid, a cofactor or a substrate. While it is relatively easy to identify short amino acid or nucleotide sequence motifs in a given set of proteins or genes, and many methods do exist for this purpose, much more challenging is the identification of common local substructures, especially if they are formed by non consecutive residues in the sequence

    A ChIP-Seq Benchmark Shows That Sequence Conservation Mainly Improves Detection of Strong Transcription Factor Binding Sites

    Get PDF
    Transcription factors are important controllers of gene expression and mapping transcription factor binding sites (TFBS) is key to inferring transcription factor regulatory networks. Several methods for predicting TFBS exist, but there are no standard genome-wide datasets on which to assess the performance of these prediction methods. Also, it is believed that information about sequence conservation across different genomes can generally improve accuracy of motif-based predictors, but it is not clear under what circumstances use of conservation is most beneficial.Here we use published ChIP-seq data and an improved peak detection method to create comprehensive benchmark datasets for prediction methods which use known descriptors or binding motifs to detect TFBS in genomic sequences. We use this benchmark to assess the performance of five different prediction methods and find that the methods that use information about sequence conservation generally perform better than simpler motif-scanning methods. The difference is greater on high-affinity peaks and when using short and information-poor motifs. However, if the motifs are specific and information-rich, we find that simple motif-scanning methods can perform better than conservation-based methods.Our benchmark provides a comprehensive test that can be used to rank the relative performance of transcription factor binding site prediction methods. Moreover, our results show that, contrary to previous reports, sequence conservation is better suited for predicting strong than weak transcription factor binding sites

    De-Novo Discovery of Differentially Abundant Transcription Factor Binding Sites Including Their Positional Preference

    Get PDF
    Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open-source Java framework Jstacs and as a stand-alone application at http://www.jstacs.de/index.php/Dispom
    corecore