Search CORE

16 research outputs found

Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks

Author: AA Bhinge
AJ Cox
David A Nix
DS Johnson
G Robertson
IV Yang
JD Storey
JP Shaffer
Kenneth M Boucher
ML Bulyk
P Ng
Samir J Courdy
Publication venue: BioMed Central
Publication date
Field of study

Crossref

PubMed Central

Shape-based peak identification for ChIP-Seq

Author: A Barski
AA Bhinge
B Wold
EG Wilbanks
ET Wang
G Carlsson
G Robertson
GR Grimmett
J Rozowsky
Lior Pachter
M Lupien
MB Noyes
PJ Park
R Development Core Team
RK Bradley
S Bhamidi
S Evans
S MacArthur
S Pepke
SN Evans
Steven N Evans
T Barrett
T Laajala
Valerie Hower
WJ Kent
Y Benjamini
Y Benjamini
Y Zhang
Publication venue
Publication date: 05/05/2010
Field of study

We present a new algorithm for the identification of bound regions from ChIP-seq experiments. Our method for identifying statistically significant peaks from read coverage is inspired by the notion of persistence in topological data analysis and provides a non-parametric approach that is robust to noise in experiments. Specifically, our method reduces the peak calling problem to the study of tree-based statistics derived from the data. We demonstrate the accuracy of our method on existing datasets, and we show that it can discover previously missed regions and can more clearly discriminate between multiple binding events. The software T-PIC (Tree shape Peak Identification for ChIP-Seq) is available at http://math.berkeley.edu/~vhower/tpic.htmlComment: 12 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Caltech Authors

Inherent Signals in Sequencing-Based Chromatin-ImmunoPrecipitation Control Libraries

Author: AA Bhinge
CL Wei
CY Lin
D Karolchik
DE Schones
DS Johnson
Edwin Cheung
G Bourque
I. King Jordan
JC Dohm
L Conti
LW Hillier
Nallasivam Palanisamy
S Impey
TS Mikkelsen
VB Vega
Vinsensius B. Vega
Wing-Kin Sung
X Chen
Y Benjamini
Y Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

The growth of sequencing-based Chromatin Immuno-Precipitation studies call for a more in-depth understanding of the nature of the technology and of the resultant data to reduce false positives and false negatives. Control libraries are typically constructed to complement such studies in order to mitigate the effect of systematic biases that might be present in the data. In this study, we explored multiple control libraries to obtain better understanding of what they truly represent.First, we analyzed the genome-wide profiles of various sequencing-based libraries at a low resolution of 1 Mbp, and compared them with each other as well as against aCGH data. We found that copy number plays a major influence in both ChIP-enriched as well as control libraries. Following that, we inspected the repeat regions to assess the extent of mapping bias. Next, significantly tag-rich 5 kbp regions were identified and they were associated with various genomic landmarks. For instance, we discovered that gene boundaries were surprisingly enriched with sequenced tags. Further, profiles between different cell types were noticeably distinct although the cell types were somewhat related and similar.We found that control libraries bear traces of systematic biases. The biases can be attributed to genomic copy number, inherent sequencing bias, plausible mapping ambiguity, and cell-type specific chromatin structure. Our results suggest careful analysis of control libraries can reveal promising biological insights

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

Genomic sequencing in clinical trials

Author: A Mortazavi
A Zimprich
AA Bhinge
BJ O'Roak
C Allen
C Betancur
C Mele
C Vilarino-Guell
CS Ku
DA Rasko
DI Shalowitz
DS Johnson
E Hodges
EE Schadt
ER Mardis
ES Martens-Uzunova
ET Wang
EW Clayton
G Xu
GH Fernald
GJ Porreca
H Greulich
J Amberger
J Rios
JF Thompson
K Kannan
K Musunuru
Karen K Mestan
KJ Buckingham
Leonard Ilkhanoff
LG Biesecker
M Allison
M Kircher
MA Chapman
O Harismendy
P Aldhous
PJ Campbell
Q Zhao
R Drmanac
RM Toydemir
Samdeep Mouli
Simon Lin
SM Hawkins
SP Shah
ST Bennett
TJ Ley
W Lee
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Human genome sequencing is the process by which the exact order of nucleic acid base pairs in the 24 human chromosomes is determined. Since the completion of the Human Genome Project in 2003, genomic sequencing is rapidly becoming a major part of our translational research efforts to understand and improve human health and disease. This article reviews the current and future directions of clinical research with respect to genomic sequencing, a technology that is just beginning to find its way into clinical trials both nationally and worldwide. We highlight the currently available types of genomic sequencing platforms, outline the advantages and disadvantages of each, and compare first- and next-generation techniques with respect to capabilities, quality, and cost. We describe the current geographical distributions and types of disease conditions in which these technologies are used, and how next-generation sequencing is strategically being incorporated into new and existing studies. Lastly, recent major breakthroughs and the ongoing challenges of using genomic sequencing in clinical research are discussed

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Feature-Based Approach to Modeling Protein–DNA Interactions

Author: A Brazma
A Ng
A Sandelin
A Tomovic
AA Bhinge
AR Oliphant
AS Weinmann
B Ren
B Ren
C Grandori
CE Horak
CL Wei
CT Harbison
DC Look
DL Solomon
DS Johnson
DT Odom
E Birney
E Eden
E Segal
Eilon Sharon
EP Xing
Eran Segal
FP Roth
G Robertson
Gary Stormo
HJ Bussemaker
I Ben-Gal
I Simon
J Pearl
JL Reid
JS Yedidia
K Ellrott
KD MacIsaac
KI Zeller
L Elnitski
L Gold
L Narlikar
LA Boyer
M Ashburner
M Kellis
M Renda
MB Eisen
MF Berger
ML Bulyk
ML Bulyk
ML Bulyk
P Cliften
P Hong
Q Zhou
R Elkon
R Pudimat
R Tibshirani
S Perkins
S Sinha
S Tavazoie
S-I Lee
SD Pietra
Shai Lubliner
SJ Maerkl
T Heinemeyer
T Minka
TH Kim
TI Lee
TI Lee
VR Iyer
X Liu
X Xie
X Xie
X Zhao
Y Barash
Y Barash
Y Benjamini
Y Pilpel
Y Qi
YH Loh
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. However, in many cases, this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF–DNA interactions, based on log-linear models. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our model and devise an algorithm for learning its structural features from binding site data. We also developed a discriminative motif finder, which discovers de novo FMMs that are enriched in target sets of sequences compared to background sets. We evaluate our approach on synthetic data and on the widely used TF chromatin immunoprecipitation (ChIP) dataset of Harbison et al. We then apply our algorithm to high-throughput TF ChIP data from mouse and human, reveal sequence features that are present in the binding specificities of mouse and human TFs, and show that FMMs explain TF binding significantly better than PSSMs. Our FMM learning and motif finder software are available at http://genie.weizmann.ac.il/

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Replication Fork Polarity Gradients Revealed by Megabase-Sized U-Shaped Replication Timing Domains in Human Cell Lines

Author: A Arneodo
A Baker
A Goldar
AA Bhinge
AJ McNairn
Alain Arneodo
Antoine Baker
Antoine Leleu
AP Boyle
AP Boyle
APS de Moura
Arach Goldar
Aurélien Rappailles
B Audit
B Audit
Benjamin Audit
Benoit Moindrot
C Conti
C Hou
Chun-Long Chen
CL Chen
CL Chen
Claude Thermes
Cédric Vaillant
D Karolchik
DM Gilbert
E Chargaff
E Lieberman-Aiden
E Yaffe
EB Brodie of Brodie
F Antequera
Fabien Mongelard
G Guilbaud
Guillaume Guilbaud
I Hiratani
I Hiratani
JA Bogan
JC Cadoret
JE Phillips
JL Hamlin
JL Hamlin
JR Lobry
JW Fickett
K Woodfine
L Handoko
L Ponger
LD Mesner
M Buongiorno-Nardelli
M Huvet
M Méchali
M Méchali
M Touchon
M Touchon
M Touchon
MI Aladjem
MM Suzuki
N Gilbert
N Karnani
N Sueoka
Olivier Hyrien
P Green
PJ Sabo
R Berezney
R Desprat
R Ohlsson
R Rudner
RS Hansen
S Courbet
S Farkash-Amar
S Nicolay
SCH Yang
SP Bell
T Ryba
WH Li
William Stafford Noble
Yves d'Aubenton-Carafa
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

In higher eukaryotes, replication program specification in different cell types remains to be fully understood. We show for seven human cell lines that about half of the genome is divided in domains that display a characteristic U-shaped replication timing profile with early initiation zones at borders and late replication at centers. Significant overlap is observed between U-domains of different cell lines and also with germline replication domains exhibiting a N-shaped nucleotide compositional skew. From the demonstration that the average fork polarity is directly reflected by both the compositional skew and the derivative of the replication timing profile, we argue that the fact that this derivative displays a N-shape in U-domains sustains the existence of large-scale gradients of replication fork polarity in somatic and germline cells. Analysis of chromatin interaction (Hi-C) and chromatin marker data reveals that U-domains correspond to high-order chromatin structural units. We discuss possible models for replication origin activation within U/N-domains. The compartmentalization of the genome into replication U/N-domains provides new insights on the organization of the replication program in the human genome

HAL-ENS-LYON

Public Library of Science (PLOS)

Crossref

HAL-Inserm

Directory of Open Access Journals

PubMed Central

HAL-CEA

ProdInra

The Francis Crick Institute

Catalytic residues in hydrolases: analysis of methods designed for ligand-binding site prediction

Author: A Armon
A Bhinge
A Eichinger
A Gutteridge
A Pingoud
A Shulman-Peleg
A Stark
A Stark
A Stark
AA Bliznyuk
AC Stuart
AC Wallace
AH Elcock
AJ Chalk
ATR Laurie
ATR Laurie
B Huang
B Lee
B Zhang
C Taroni
CA Orengo
CM Seibert
CT Porter
D Pantoja-Uceda
DG Levitt
DJ Vocadlo
DT-H Chang
E Kellenberger
E Youn
FX Gomis-Rüth
G Nimrod
G Pugalenthi
GG Hammes
GJ Bartlett
GJ Kleywegt
GL Holliday
GL Holliday
GP Brady
H Yao
HM Berman
I Botos
Irena Roterman
J An
J An
J Dundas
J Liang
J Teyra
J Weigelt
J-M Chandonia
JA Barker
JM Yon
K Henrick
K Katayanagi
K Kinoshita
K Kinoshita
K Stummeyer
K Zhang
KA Snyder
Katarzyna Prymula
KP Peters
M Bryliński
M Grabowski
M Hendlich
M Jambon
M Jambon
M Kanehisa
M Landau
M Levitt
M Stahl
MA Kurowski
MJ Ondrechen
MP Liang
MR Landon
N Kallenbach
O Gileadi
O Goldenberg
O Lichtarge
O Lichtarge
P Aloy
P Baldi
P Reis
PJ Hajduk
PJ Hajduk
PP Wangikar
R Landgraf
RA Laskowski
RA Laskowski
RA Laskowski
RV Spriggs
S Madabushi
S Vajda
SE Brenner
T Fawcett
T Kortvelyesi
T Pupko
T Tadokoro
T Zhang
TA Binkowski
Tomasz Jadczyk
UniProt Consortium The Universal Protein Resource (UniProt)
V Siksnys
W Kabsch
Y Dou
Y Oda
Y Tsunaka
Y-R Tang
Publication venue: Springer Netherlands
Publication date: 01/01/2010
Field of study

The comparison of eight tools applicable to ligand-binding site prediction is presented. The methods examined cover three types of approaches: the geometrical (CASTp, PASS, Pocket-Finder), the physicochemical (Q-SiteFinder, FOD) and the knowledge-based (ConSurf, SuMo, WebFEATURE). The accuracy of predictions was measured in reference to the catalytic residues documented in the Catalytic Site Atlas. The test was performed on a set comprising selected chains of hydrolases. The results were analysed with regard to size, polarity, secondary structure, accessible solvent area of predicted sites as well as parameters commonly used in machine learning (F-measure, MCC). The relative accuracies of predictions are presented in the ROC space, allowing determination of the optimal methods by means of the ROC convex hull. Additionally the minimum expected cost analysis was performed. Both advantages and disadvantages of the eight methods are presented. Characterization of protein chains in respect to the level of difficulty in the active site prediction is introduced. The main reasons for failures are discussed. Overall, the best performance offers SuMo followed by FOD, while Pocket-Finder is the best method among the geometrical approaches

Crossref

Springer - Publisher Connector

PubMed Central

Jagiellonian Univeristy Repository

A user's guide to the Encyclopedia of DNA elements (ENCODE)

Author: Abdelhamid RF
Absher DM
Abyzov A
Aken B
Alioto T
Altshuler RC
Amrhein H
Antonarakis SE
Auerbach RK
Balasubramanian S
Bansal A
Barber GP
Battenhouse A
Batut P
Batzoglou S
Beal K
Bell I
Bell K
Bernstein BE
Bhardwaj N
Bhinge AA
Bickel PJ
Bignell A
Bild N
Birney E
Blahnik KR
Boley N
Borel C
Bowling KM
Boychenko V
Boyle AP
Brazma A
Brent M
Brown JB
Brown RH
Buske OJ
Canfield T
Cao AR
Carninci P
Cayting P
Chakrabortty S
Charos A
Chen X
Cheng C
Chittur S
Chrast J
Cline MS
Collins PJ
Coyne MJ
Crawford GE
Davis CA
Dekker J
Derrien T
DeSalvo G
Despacio-Reyes G
Diekhans M
Dillon LAL
Dilocker JA
Djebali S
Dobin A
Dong X
Doyle F
Drenkow J
Dreszer TR
Du J
Dumais E
Dumais J
Dunham I
Durham T
Ebersol AK
Elnitski L
Epstein CB
Ernst J
Euskirchen G
Farnham PJ
Feingold EA
Fejes K
Fisher K
Fleming JD
Frankish A
Frietze S
Frum T
Fujita PA
Furey TS
Gao H
Gerstein M
Gertz J
Gibson T
Giddings MC
Gingeras TR
Giresi PG
Giste E
Good PJ
Gordon A
Graison EAY
Grasfeder LL
Green ED
Grossman RL
Grubert F
Guigo R
Gunawardena H
Habegger L
Hannon G
Hardison RC
Hariharan M
Harris RS
Harrow J
Harte R
Haugen E
Haussler D
Hayashizaki Y
Herrero J
Hoffman MM
Howald C
Huang H
Hubbard T
Humbert R
Hunt T
Issner R
Iyengar S
Iyer VR
Jain P
Jameel N
Jee J
Jha S
Johnson A
Johnson EM
Kapranov P
Karmakar S
Karolchik D
Kasowski M
Kaul R
Kay M
Keefe D
Kellis M
Kent WJ
Khatun J
Kheradpour P
Khurana E
Kim SKC
King B
Kingswood C
Kirilusha A
Knowles DG
Kokocinski F
Ku M
Kuhn RM
Kundaje A
Kutyavin T
Lacroute P
Lagarde J
Lajoie BR
Lam H
Lamarre N
Landt SG
Lassmann T
Learned K
Lee B-K
Lee K
Leng J
Li Q
Lian J
Libbrecht M
Lieb JD
Lin M
Lin MF
Lin W
Lindahl M
Liu Z
Lochovsky L
London D
Lotakis D
Lowdon RF
Lu Z
Lukk M
Luscombe NM
Maier C
Malladi VS
Margulies EH
Marinov G
Mariotti M
McCue K
McDaniell RM
Merkel A
Meyer LR
Mikkelsen TS
Miller W
Miotto B
Monahan H
Moqtaderi Z
Mortazavi A
Mukherjee G
Muratet MA
Myers RM
Navas PA
Neph S
Neri J
Nesmith AS
Newberry KM
Newburger P
Nguyen ED
Noble WS
O'Geen H
Parker SCJ
Parker SL
Partridge EC
Patacsil D
Paten B
Pauli F
Penalva LO
Pepke S
Poh WT
Preall J
Pusey B
Raha D
Raney BJ
Rauch R
Reddy TE
Reed B
Reymond A
Reynolds A
Rhead B
Ribeca P
Risk B
Roach V
Roberts K
Robilotto R
Rodriguez JM
Rosenbloom KR
Roskin K
Rozowsky J
Ruan X
Ruan Y
Rynes E
Sabo PJ
Sammeth M
Sanchez ME
Sandstrom R
Sanyal A
Saunders G
Sboner A
Schlesinger F
Searle S
Shafer T
Shahab A
Sheffer HH
Sheffield NC
Sherlock G
Shestak C
Shi M
Shibata Y
Shoresh N
Showers KA
Sidow A
Slifer T
Sloan CA
Snyder M
Sobral D
Song G
Song L
Sotirova V
Sprouse RO
Stamatoyannopoulos J
Stamatoyannopoulos JA
Struhl K
Suh B
Swing VK
Takahashi H
Tanzer A
Tenenbaum SA
Thibeault K
Thurman RE
Tilgner H
Tress M
Trinklein ND
Trout D
Truong T
Tullius TD
Ucla C
Valencia A
Vales T
van Baren MJ
Vaquerizas JM
Varley KE
Victorsen A
Vielmetter J
Vong S
Waite LL
Wang H
Wang J
Wang L
Wang T
Ward LD
Weaver M
Wei C-L
Weissman SM
Weng Z
White KP
Whitfield TW
Wilder SP
Williams B
Winter D
Wold B
Wu L
Xi H
Xu X
Xu Y
Yan K-K
Yang X
Yip KY
Yu Y
Zaleski C
Zhang X
Zhang Z
Zweig AS
Publication venue
Publication date: 19/04/2011
Field of study

The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome

UCL Discovery

Design and analysis of ChIP-seq experiments for DNA-binding proteins

Author: A Barski
A Mortazavi
AA Bhinge
AD Smith
DR Bentley
DS Johnson
F Eckhardt
G Robertson
Michael Y Tolstorukov
Peter J Park
Peter V Kharchenko
RM Price
S Impey
S Peng
TH Kim
TH Kim
TL Bailey
TY Roh
V Matys
WE Johnson
WJ Kent
Y Qi
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

OCT4 and PAX6 determine the dual function of SOX2 in human ESCs as a key pluripotent or neural factor

Author: A Bhinge
A Rada-Iglesias
A Remenyi
AA Avilion
AJ Bass
AK Teo
AM Bolger
AM Tsankov
B Langmead
C Zhou
H Fong
H Kondoh
H Mi
J Feng
J Gertz
JC Yeo
JL Kopp
JQ Wu
JR Timmer
JS Yu
K Adachi
K Narasimhan
K Takahashi
KL Ring
KM Loh
L Gerrard
L Gerrard
LA Boyer
M Iwafuchi-Doi
M Thomson
M Wiznerowicz
MA Lodato
O Gafni
P Noisa
P Ovando-Roche
PJ Ross
R Edgar
S Heinz
S Masui
S Tanaka
S Zhang
S Zhao
SY Ng
T Vierbuchen
TW Theunissen
V Graham
X Ji
X Zhang
Y Kamachi
Y Kamachi
Z Wang
Z Yao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref