Search CORE

2,188 research outputs found

MER41 Repeat Sequences Contain Inducible STAT1 Binding Sites

Author: A Mortazavi
A Siepel
A Tanay
A Valouev
AG Robertson
AI Su
AP Fejes
C Iseli
CD Schmid
Christoph D. Schmid
D Karolchik
D Laperriere
DE Levy
E van Nimwegen
F Schutz
G Badis
G Bourque
G Robertson
GB Ehret
GD Stormo
GR Stark
Guillaume Bourque
H Ji
H Xu
J Hu
J Jurka
J Rozowsky
JA Yoder
JE Darnell Jr
LC Platanias
LD Ward
M Tompa
N Kaplan
OG Berg
P Lecine
P Polak
Philipp Bucher
PV Kharchenko
R Johnson
R Johnson
R Jothi
S Wormald
SB Montgomery
T Sepp
T Wang
T Whitington
TL Bailey
TL Bailey
U Ackermann-Liebrich
U Vinkemeier
X Li
ZD Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Chromatin immunoprecipitation combined with massively parallel sequencing methods (ChIP-seq) is becoming the standard approach to study interactions of transcription factors (TF) with genomic sequences. At the example of public STAT1 ChIP-seq data sets, we present novel approaches for the interpretation of ChIP-seq data

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Public Library of Science (PLOS)

Crossref

edoc

Directory of Open Access Journals

PubMed Central

ZINBA integrates local covariates with DNA-seq data to identify broad and narrow regions of enrichment, even within amplified genomic regions

Author: Giresi Paul G
Ibrahim Joseph G
Lieb Jason D
Rashid Naim U
Sun Wei
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

ZINBA (Zero-Inflated Negative Binomial Algorithm) identifies genomic regions enriched in a variety of ChIP-seq and related next-generation sequencing experiments (DNA-seq), calling both broad and narrow modes of enrichment across a range of signal-to-noise ratios. ZINBA models and accounts for factors that co-vary with background or experimental signal, such as G/C content, and identifies enrichment in genomes with complex local copy number variations. ZINBA provides a single unified framework for analyzing DNA-seq experiments in challenging genomic contexts

Crossref

Springer - Publisher Connector

PubMed Central

Carolina Digital Repository

FisherMP: fully parallel algorithm for detecting combinatorial motifs from large ChIP-seq datasets.

Author: Chen Yong
Liang Ying
Su Zhengchang
Wang Xiangyun
Zhang Shaoqiang
Publication venue: Rowan Digital Works
Publication date: 01/06/2019
Field of study

Detecting binding motifs of combinatorial transcription factors (TFs) from chromatin immunoprecipitation sequencing (ChIP-seq) experiments is an important and challenging computational problem for understanding gene regulations. Although a number of motif-finding algorithms have been presented, most are either time consuming or have sub-optimal accuracy for processing large-scale datasets. In this article, we present a fully parallelized algorithm for detecting combinatorial motifs from ChIP-seq datasets by using Fisher combined method and OpenMP parallel design. Large scale validations on both synthetic data and 350 ChIP-seq datasets from the ENCODE database showed that FisherMP has not only super speeds on large datasets, but also has high accuracy when compared with multiple popular methods. By using FisherMP, we successfully detected combinatorial motifs of CTCF, YY1, MAZ, STAT3 and USF2 in chromosome X, suggesting that they are functional co-players in gene regulation and chromosomal organization. Integrative and statistical analysis of these TF-binding peaks clearly demonstrate that they are not only highly coordinated with each other, but that they are also correlated with histone modifications. FisherMP can be applied for integrative analysis of binding motifs and for predicting cis-regulatory modules from a large number of ChIP-seq datasets

Rowan University

Discovery and prediction of protein binding sites in DNA and RNA sequences using Bayesian Markov models

Author: Ge Wanwan
Publication venue
Publication date: 10/07/2020
Field of study

Georg-August-University Göttingen

An Entropy-Based Position Projection Algorithm for Motif Discovery

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Crossref

Novel pattern recognition approaches for transcriptomics data analysis

Author: Rezaeian Iman
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2014
Field of study

We proposed a family of methods for transcriptomics and genomics data analysis based on multi-level thresholding approach, such as OMTG for sub-grid and spot detection in DNA microarrays, and OMT for detecting significant regions based on next generation sequencing data. Extensive experiments on real-life datasets and a comparison to other methods show that the proposed methods perform these tasks fully automatically and with a very high degree of accuracy. Moreover, unlike previous methods, the proposed approaches can be used in various types of transcriptome analysis problems such as microarray image gridding with different resolutions and spot sizes as well as finding the interacting regions of DNA with a protein of interest using ChIP-Seq data without any need for parameter adjustment. We also developed constrained multi-level thresholding (CMT), an algorithm used to detect enriched regions on ChIP-Seq data with the ability of targeting regions within a specific range. We show that CMT has higher accuracy in detecting enriched regions (peaks) by objectively assessing its performance relative to other previously proposed peak finders. This is shown by testing three algorithms on the well-known FoxA1 Data set, four transcription factors (with a total of six antibodies) for Drosophila melanogaster and the H3K4ac antibody dataset. Finally, we propose a tree-based approach that conducts gene selection and builds a classifier simultaneously, in order to select the minimal number of genes that would reliably predict a given breast cancer subtype. Our results support that this modified approach to gene selection yields a small subset of genes that can predict subtypes with greater than 95%overall accuracy. In addition to providing a valuable list of targets for diagnostic purposes, the gene ontologies of the selected genes suggest that these methods have isolated a number of potential genes involved in breast cancer biology, etiology and potentially novel therapeutics

Scholarship at UWindsor

Quality assessment and refinement of chromatin accessibility data using a sequence-based predictive model

Author: Chakravarti Aravinda
Han Seong Kyu
Humphreys Benjamin D
Lee Dongwon
Muto Yoshiharu
Sampson Matthew G
Wilson Parker C
Publication venue: Digital Commons@Becker
Publication date: 20/12/2022
Field of study

Chromatin accessibility assays are central to the genome-wide identification of gene regulatory elements associated with transcriptional regulation. However, the data have highly variable quality arising from several biological and technical factors. To surmount this problem, we developed a sequence-based machine learning method to evaluate and refine chromatin accessibility data. Our framework, gapped k-mer SVM quality check (gkmQC), provides the quality metrics for a sample based on the prediction accuracy of the trained models. We tested 886 DNase-seq samples from the ENCODE/Roadmap projects to demonstrate that gkmQC can effectively identify high-quality (HQ) samples with low conventional quality scores owing to marginal read depths. Peaks identified in HQ samples are more accurately aligned at functional regulatory elements, show greater enrichment of regulatory elements harboring functional variants, and explain greater heritability of phenotypes from their relevant tissues. Moreover, gkmQC can optimize the peak-calling threshold to identify additional peaks, especially for rare cell types in single-cell chromatin accessibility data

Digital Commons@Becker

An Integrated Pipeline for the Genome-Wide Analysis of Transcription Factor Binding Sites from ChIP-Seq

ChIP-Seq has become the standard method for genome-wide profiling DNA association of transcription factors. To simplify analyzing and interpreting ChIP-Seq data, which typically involves using multiple applications, we describe an integrated, open source, R-based analysis pipeline. The pipeline addresses data input, peak detection, sequence and motif analysis, visualization, and data export, and can readily be extended via other R and Bioconductor packages. Using a standard multicore computer, it can be used with datasets consisting of tens of thousands of enriched regions. We demonstrate its effectiveness on published human ChIP-Seq datasets for FOXA1, ER, CTCF and STAT1, where it detected co-occurring motifs that were consistent with the literature but not detected by other methods. Our pipeline provides the first complete set of Bioconductor tools for sequence and motif analysis of ChIP-Seq and ChIP-chip data

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central