10,993 research outputs found

    GPUmotif: An Ultra-Fast and Energy-Efficient Motif Analysis Program Using Graphics Processing Units

    Get PDF
    Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a “fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif

    MTAP: The Motif Tool Assessment Platform

    Get PDF
    Background: In recent years, substantial effort has been applied to de novo regulatory motif discovery. At this time, more than 150 software tools exist to detect regulatory binding sites given a set of genomic sequences. As the number of software packages increases, it becomes more important to identify the tools with the best performance characteristics for specific problem domains. Identifying the correct tool is difficult because of the great variability in motif detection software. Consequently, many labs spend considerable effort testing methods to find one that works well in their problem of interest. Results: In this work, we propose a method (MTAP) that substantially reduces the effort required to assess de novo regulatory motif discovery software. MTAP differs from previous attempts at regulatory motif assessment in that it automates motif discovery tool pipelines (something that traditionally required many manual steps), automatically constructs orthologous upstream sequences, and provides automated benchmarks for many popular tools. As a proof of concept, we have run benchmarks over human, mouse, fly, yeast, E. coli and B. subtilis. Conclusion: MTAP presents a new approach to the challenging problem of assessing regulatory motif discovery methods. The most current version of MTAP can be downloaded from http://biobase.ist.unomaha.edu

    Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts

    Get PDF
    Over the last decades a revolution in novel measurement techniques has permeated the biological sciences filling the databases with unprecedented amounts of data ranging from genomics, transcriptomics, proteomics and metabolomics to structural and ecological data. In order to extract insights from the vast quantity of data, computational and statistical methods are nowadays crucial tools in the toolbox of every biological researcher. In this thesis I summarize my contributions in two data-rich fields in biological sciences: transcription factor binding to DNA and protein structure prediction from protein sequences with shared evolutionary ancestry. In the first part of my thesis I introduce our work towards a web server for analysing transcription factor binding data with Bayesian Markov Models. In contrast to classical PWM or di-nucleotide models, Bayesian Markov models can capture complex inter-nucleotide dependencies that can arise from shape-readout and alternative binding modes. In addition to giving access to our methods in an easy-to-use, intuitive web-interface, we provide our users with novel tools and visualizations to better evaluate the biological relevance of the inferred binding motifs. We hope that our tools will prove useful for investigating weak and complex transcription factor binding motifs which cannot be predicted accurately with existing tools. The second part discusses a statistical attempt to correct out the phylogenetic bias arising in co-evolution methods applied to the contact prediction problem. Co-evolution methods have revolutionized the protein-structure prediction field more than 10 years ago, and, until very recently, have retained their importance as crucial input features to deep neural networks. As the co-evolution information is extracted from evolutionarily related sequences, we investigated whether the phylogenetic bias to the signal can be corrected out in a principled way using a variation of the Felsenstein's tree-pruning algorithm applied in combination with an independent-pair assumption to derive pairwise amino counts that are corrected for the evolutionary history. Unfortunately, the contact prediction derived from our corrected pairwise amino acid counts did not yield a competitive performance.2021-09-2

    MTAP: The Motif Tool Assessment Platform

    Get PDF
    Background: In recent years, substantial effort has been applied to de novo regulatory motif discovery. At this time, more than 150 software tools exist to detect regulatory binding sites given a set of genomic sequences. As the number of software packages increases, it becomes more important to identify the tools with the best performance characteristics for specific problem domains. Identifying the correct tool is difficult because of the great variability in motif detection software. Consequently, many labs spend considerable effort testing methods to find one that works well in their problem of interest. Results: In this work, we propose a method (MTAP) that substantially reduces the effort required to assess de novo regulatory motif discovery software. MTAP differs from previous attempts at regulatory motif assessment in that it automates motif discovery tool pipelines (something that traditionally required many manual steps), automatically constructs orthologous upstream sequences, and provides automated benchmarks for many popular tools. As a proof of concept, we have run benchmarks over human, mouse, fly, yeast, E. coli and B. subtilis. Conclusion: MTAP presents a new approach to the challenging problem of assessing regulatory motif discovery methods. The most current version of MTAP can be downloaded from http:// biobase.ist.unomaha.edu

    A catalog of stability-associated sequence elements in 3' UTRs of yeast mRNAs

    Get PDF
    BACKGROUND: In recent years, intensive computational efforts have been directed towards the discovery of promoter motifs that correlate with mRNA expression profiles. Nevertheless, it is still not always possible to predict steady-state mRNA expression levels based on promoter signals alone, suggesting that other factors may be involved. Other genic regions, in particular 3' UTRs, which are known to exert regulatory effects especially through controlling RNA stability and localization, were less comprehensively investigated, and deciphering regulatory motifs within them is thus crucial. RESULTS: By analyzing 3' UTR sequences and mRNA decay profiles of Saccharomyces cerevisiae genes, we derived a catalog of 53 sequence motifs that may be implicated in stabilization or destabilization of mRNAs. Some of the motifs correspond to known RNA-binding protein sites, and one of them may act in destabilization of ribosome biogenesis genes during stress response. In addition, we present for the first time a catalog of 23 motifs associated with subcellular localization. A significant proportion of the 3' UTR motifs is highly conserved in orthologous yeast genes, and some of the motifs are strikingly similar to recently published mammalian 3' UTR motifs. We classified all genes into those regulated only at transcription initiation level, only at degradation level, and those regulated by a combination of both. Interestingly, different biological functionalities and expression patterns correspond to such classification. CONCLUSION: The present motif catalogs are a first step towards the understanding of the regulation of mRNA degradation and subcellular localization, two important processes which - together with transcription regulation - determine the cell transcriptome
    • …
    corecore