18 research outputs found

    Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data

    Get PDF
    A major goal of molecular biology is determining the mechanisms that control the transcription of genes. Motif Enrichment Analysis (MEA) seeks to determine which DNA-binding transcription factors control the transcription of a set of genes by detecting enrichment of known binding motifs in the genes' regulatory regions. Typically, the biologist specifies a set of genes believed to be co-regulated and a library of known DNA-binding models for transcription factors, and MEA determines which (if any) of the factors may be direct regulators of the genes. Since the number of factors with known DNA-binding models is rapidly increasing as a result of high-throughput technologies, MEA is becoming increasingly useful. In this paper, we explore ways to make MEA applicable in more settings, and evaluate the efficacy of a number of MEA approaches.We first define a mathematical framework for Motif Enrichment Analysis that relaxes the requirement that the biologist input a selected set of genes. Instead, the input consists of all regulatory regions, each labeled with the level of a biological signal. We then define and implement a number of motif enrichment analysis methods. Some of these methods require a user-specified signal threshold, some identify an optimum threshold in a data-driven way and two of our methods are threshold-free. We evaluate these methods, along with two existing methods (Clover and PASTAA), using yeast ChIP-chip data. Our novel threshold-free method based on linear regression performs best in our evaluation, followed by the data-driven PASTAA algorithm. The Clover algorithm performs as well as PASTAA if the user-specified threshold is chosen optimally. Data-driven methods based on three statistical tests-Fisher Exact Test, rank-sum test, and multi-hypergeometric test--perform poorly, even when the threshold is chosen optimally. These methods (and Clover) perform even worse when unrestricted data-driven threshold determination is used.Our novel, threshold-free linear regression method works well on ChIP-chip data. Methods using data-driven threshold determination can perform poorly unless the range of thresholds is limited a priori. The limits implemented in PASTAA, however, appear to be well-chosen. Our novel algorithms--AME (Analysis of Motif Enrichment)-are available at http://bioinformatics.org.au/ame/

    Convergence of marine megafauna movement patterns in coastal and open oceans

    Get PDF
    Author Posting. © The Author(s), 2017. This is the author's version of the work. It is posted here for personal use, not for redistribution. The definitive version was published in Proceedings of the National Academy of Sciences of the United States of America 115 (2018): 3072-3077, doi:10.1073/pnas.1716137115.The extent of increasing anthropogenic impacts on large marine vertebrates partly depends on the animals’ movement patterns. Effective conservation requires identification of the key drivers of movement including intrinsic properties and extrinsic constraints associated with the dynamic nature of the environments the animals inhabit. However, the relative importance of intrinsic versus extrinsic factors remains elusive. We analyse a global dataset of 2.8 million locations from > 2,600 tracked individuals across 50 marine vertebrates evolutionarily separated by millions of years and using different locomotion modes (fly, swim, walk/paddle). Strikingly, movement patterns show a remarkable convergence, being strongly conserved across species and independent of body length and mass, despite these traits ranging over 10 orders of magnitude among the species studied. This represents a fundamental difference between marine and terrestrial vertebrates not previously identified, likely linked to the reduced costs of locomotion in water. Movement patterns were primarily explained by the interaction between species-specific traits and the habitat(s) they move through, resulting in complex movement patterns when moving close to coasts compared to more predictable patterns when moving in open oceans. This distinct difference may be associated with greater complexity within coastal micro-habitats, highlighting a critical role of preferred habitat in shaping marine vertebrate global movements. Efforts to develop understanding of the characteristics of vertebrate movement should consider the habitat(s) through which they move to identify how movement patterns will alter with forecasted severe ocean changes, such as reduced Arctic sea ice cover, sea level rise and declining oxygen content.Workshops funding granted by the UWA Oceans Institute, AIMS, and KAUST. AMMS was supported by an ARC Grant DE170100841 and an IOMRC (UWA, AIMS, CSIRO) fellowship; JPR by MEDC (FPU program, Spain); DWS by UK NERC and Save Our Seas Foundation; NQ by FCT (Portugal); MMCM by a CAPES fellowship (Ministry of Education)

    NFIX regulates neural progenitor cell differentiation during hippocampal morphogenesis

    No full text
    During forebrain development, radial glia generate neurons through the production of intermediate progenitor cells (IPCs). The production of IPCs is a central tenet underlying the generation of the appropriate number of cortical neurons, but the transcriptional logic underpinning this process remains poorly defined. Here, we examined IPC production using mice lacking the transcription factor nuclear factor I/X (Nfix). We show that Nfix deficiency delays IPC production and prolongs the neurogenic window, resulting in an increased number of neurons in the postnatal forebrain. Loss of additional Nfi alleles (Nfib) resulted in a severe delay in IPC generation while, conversely, overexpression of NFIX led to precocious IPC generation. Mechanistically, analyses of microarray and ChIP-seq datasets, coupled with the investigation of spindle orientation during radial glial cell division, revealed that NFIX promotes the generation of IPCs via the transcriptional upregulation of inscuteable (Insc). These data thereby provide novel insights into the mechanisms controlling the timely transition of radial glia into IPCs during forebrain development

    Epigenetic priors for identifying active transcription factor binding sites

    No full text
    Motivation Accurate knowledge of the genome-wide binding of transcription factors in a particular cell type or under a particular condition is necessary for understanding transcriptional regulation. Using epigenetic data such as histone modification and DNase I, accessibility data has been shown to improve motif-based in silico methods for predicting such binding, but this approach has not yet been fully explored. Results We describe a probabilistic method for combining one or more tracks of epigenetic data with a standard DNA sequence motif model to improve our ability to identify active transcription factor binding sites (TFBSs). We convert each data type into a position-specific probabilistic prior and combine these priors with a traditional probabilistic motif model to compute a log-posterior odds score. Our experiments, using histone modifications H3K4me1, H3K4me3, H3K9ac and H3K27ac, as well as DNase I sensitivity, show conclusively that the log-posterior odds score consistently outperforms a simple binary filter based on the same data. We also show that our approach performs competitively with a more complex method, CENTIPEDE, and suggest that the relative simplicity of the log-posterior odds scoring method makes it an appealing and very general method for identifying functional TFBSs on the basis of DNA and epigenetic evidence

    Genome-wide in silico prediction of gene expression

    No full text
    Motivation: Modelling the regulation of gene expression can provide insight into the regulatory roles of individual transcription factors (TFs) and histone modifications. Recently, Ouyang et al. in 2009 modelled gene expression levels in mouse embryonic stem (mES) cells using in vivo ChIP-seq measurements of TF binding. ChIP-seq TF binding data, however, are tissue-specific and relatively difficult to obtain. This limits the applicability of gene expression models that rely on ChIP-seq TF binding data.Results: In this study, we build regression-based models that relate gene expression to the binding of 12 different TFs, 7 histone modifications and chromatin accessibility (DNase I hypersensitivity) in two different tissues. We find that expression models based on computationally predicted TF binding can achieve similar accuracy to those using in vivo TF binding data and that including binding at weak sites is critical for accurate prediction of gene expression. We also find that incorporating histone modification and chromatin accessibility data results in additional accuracy. Surprisingly, we find that models that use no TF binding data at all, but only histone modification and chromatin accessibility data, can be as (or more) accurate than those based on in vivo TF binding data
    corecore