Search CORE

2,727 research outputs found

Spectral Sequence Motif Discovery

Author: Colombo Nicolò
Vlassis Nikos
Publication venue
Publication date: 01/01/2014
Field of study

Sequence discovery tools play a central role in several fields of computational biology. In the framework of Transcription Factor binding studies, motif finding algorithms of increasingly high performance are required to process the big datasets produced by new high-throughput sequencing technologies. Most existing algorithms are computationally demanding and often cannot support the large size of new experimental data. We present a new motif discovery algorithm that is built on a recent machine learning technique, referred to as Method of Moments. Based on spectral decompositions, this method is robust under model misspecification and is not prone to locally optimal solutions. We obtain an algorithm that is extremely fast and designed for the analysis of big sequencing data. In a few minutes, we can process datasets of hundreds of thousand sequences and extract motif profiles that match those computed by various state-of-the-art algorithms.Comment: 20 pages, 3 figures, 1 tabl

arXiv.org e-Print Archive

CiteSeerX

Open Repository and Bibliography - Luxembourg

FisherMP: fully parallel algorithm for detecting combinatorial motifs from large ChIP-seq datasets.

Author: Chen Yong
Liang Ying
Su Zhengchang
Wang Xiangyun
Zhang Shaoqiang
Publication venue: Rowan Digital Works
Publication date: 01/06/2019
Field of study

Detecting binding motifs of combinatorial transcription factors (TFs) from chromatin immunoprecipitation sequencing (ChIP-seq) experiments is an important and challenging computational problem for understanding gene regulations. Although a number of motif-finding algorithms have been presented, most are either time consuming or have sub-optimal accuracy for processing large-scale datasets. In this article, we present a fully parallelized algorithm for detecting combinatorial motifs from ChIP-seq datasets by using Fisher combined method and OpenMP parallel design. Large scale validations on both synthetic data and 350 ChIP-seq datasets from the ENCODE database showed that FisherMP has not only super speeds on large datasets, but also has high accuracy when compared with multiple popular methods. By using FisherMP, we successfully detected combinatorial motifs of CTCF, YY1, MAZ, STAT3 and USF2 in chromosome X, suggesting that they are functional co-players in gene regulation and chromosomal organization. Integrative and statistical analysis of these TF-binding peaks clearly demonstrate that they are not only highly coordinated with each other, but that they are also correlated with histone modifications. FisherMP can be applied for integrative analysis of binding motifs and for predicting cis-regulatory modules from a large number of ChIP-seq datasets

Rowan University

On the detection and refinement of transcription factor binding sites using ChIP-Seq data

Author: Arul M. Chinnaiyan
Bailey
Bailey
Barash
Barski
Benos
Bulyk
Bussemaker
Bussemaker
Choi
Conlon
Fejes
Gupta
Hanai
Iyer
Jensen
Jeremy M. G. Taylor
Ji
Jindan Yu
Johnson
Jothi
Kharchenko
Kim
King
Lawrence
Lawrence
Leach
Lee
Liu
Liu
Liu
Liu
Lockhart
Man
McCue
Mikkelsen
Ming Hu
Neuwald
Nix
Orlando
Ren
Robertson
Roth
Rozowsky
Schena
Schneider
Shim
Solomon
Staden
Stormo
Tompa
Tusher
Valouev
Zhang
Zhaohui S. Qin
Zhou
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Coupling chromatin immunoprecipitation (ChIP) with recently developed massively parallel sequencing technologies has enabled genome-wide detection of protein–DNA interactions with unprecedented sensitivity and specificity. This new technology, ChIP-Seq, presents opportunities for in-depth analysis of transcription regulation. In this study, we explore the value of using ChIP-Seq data to better detect and refine transcription factor binding sites (TFBS). We introduce a novel computational algorithm named Hybrid Motif Sampler (HMS), specifically designed for TFBS motif discovery in ChIP-Seq data. We propose a Bayesian model that incorporates sequencing depth information to aid motif identification. Our model also allows intra-motif dependency to describe more accurately the underlying motif pattern. Our algorithm combines stochastic sampling and deterministic ‘greedy’ search steps into a novel hybrid iterative scheme. This combination accelerates the computation process. Simulation studies demonstrate favorable performance of HMS compared to other existing methods. When applying HMS to real ChIP-Seq datasets, we find that (i) the accuracy of existing TFBS motif patterns can be significantly improved; and (ii) there is significant intra-motif dependency inside all the TFBS motifs we tested; modeling these dependencies further improves the accuracy of these TFBS motif patterns. These findings may offer new biological insights into the mechanisms of transcription factor regulation

CiteSeerX

Crossref

PubMed Central

Discovery and prediction of protein binding sites in DNA and RNA sequences using Bayesian Markov models

Author: Ge Wanwan
Publication venue
Publication date: 10/07/2020
Field of study

Georg-August-University Göttingen

A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information

Author: Ashwinikumar Kulkarni
Bailey
Bailey
Barski
Berger
Bradley
Buhler
Cao
Chen
Corbo
Dean
Eskin
Ettwiller
Fratkin
Gerstein
Hu
Ji
Johnson
Jothi
Keilwagen
Kim
Kim
Kulakovskiy
Lawrence
Liang
Linhart
Liu
Mahony
Marsan
Michael Q. Zhang
Narang
Pavesi
Portales-Casamar
Robert Serfling
Robertson
Roth
Roy
Schmid
Schones
Sinha
Smith
Stormo
Sumazin
Tompa
Tuteja
Valouev
Vaquerizas
Vardhanabhuti
Wederell
Wei
Whitington
Wilbanks
Xiaotu Ma
Zhang
Zhang
Zhenyu Xuan
Zhihua Zhang
Publication venue: Oxford University Press
Publication date
Field of study

Identification of DNA motifs from ChIP-seq/ChIP-chip [chromatin immunoprecipitation (ChIP)] data is a powerful method for understanding the transcriptional regulatory network. However, most established methods are designed for small sample sizes and are inefficient for ChIP data. Here we propose a new k-mer occurrence model to reflect the fact that functional DNA k-mers often cluster around ChIP peak summits. With this model, we introduced a new measure to discover functional k-mers. Using simulation, we demonstrated that our method is more robust against noises in ChIP data than available methods. A novel word clustering method is also implemented to group similar k-mers into position weight matrices (PWMs). Our method was applied to a diverse set of ChIP experiments to demonstrate its high sensitivity and specificity. Importantly, our method is much faster than several other methods for large sample sizes. Thus, we have developed an efficient and effective motif discovery method for ChIP experiments

Crossref

PubMed Central

Development of Computational Techniques for Regulatory DNA Motif Identification Based on Big Biological Data

Author: Yang JInyu
Publication venue: Open PRAIRIE: Open Public Research Access Institutional Repository and Information Exchange
Publication date: 01/01/2017
Field of study

Accurate regulatory DNA motif (or motif) identification plays a fundamental role in the elucidation of transcriptional regulatory mechanisms in a cell and can strongly support the regulatory network construction for both prokaryotic and eukaryotic organisms. Next-generation sequencing techniques generate a huge amount of biological data for motif identification. Specifically, Chromatin Immunoprecipitation followed by high throughput DNA sequencing (ChIP-seq) enables researchers to identify motifs on a genome scale. Recently, technological improvements have allowed for DNA structural information to be obtained in a high-throughput manner, which can provide four DNA shape features. The DNA shape has been found as a complementary factor to genomic sequences in terms of transcription factor (TF)-DNA binding specificity prediction based on traditional machine learning models. Recent studies have demonstrated that deep learning (DL), especially the convolutional neural network (CNN), enables identification of motifs from DNA sequence directly. Although numerous algorithms and tools have been proposed and developed in this field, (1) the lack of intuitive and integrative web servers impedes the progress of making effective use of emerging algorithms and tools; (2) DNA shape has not been integrated with DL; and (3) existing DL models still suffer high false positive and false negative issues in motif identification. This thesis focuses on developing an integrated web server for motif identification based on DNA sequences either from users or built-in databases. This web server allows further motif-related analysis and Cytoscape-like network interpretation and visualization. We then proposed a DL framework for both sequence and shape motif identification from ChIP-seq data using a binomial distribution strategy. This framework can accept as input the different combinations of DNA sequence and DNA shape. Finally, we developed a gated convolutional neural network (GCNN) for capturing motif dependencies among long DNA sequences. Results show that our developed web server enables providing comprehensive motif analysis functionalities compared with existing web servers. The DL framework can identify motifs using an optimized threshold and disclose the strong predictive power of DNA shape in TF-DNA binding specificity. The identified sequence and shape motifs can contribute to TF-DNA binding mechanism interpretation. Additionally, GCNN can improve TF-DNA binding specificity prediction than CNN on most of the datasets

Public Research Access Institutional Repository and Information Exchange