Search CORE

110 research outputs found

Modeling ChIP Sequencing In Silico with Applications

Author: Chang Joseph
Gerstein Mark
Rozowsky Joel
Snyder Michael
Zhang Zhengdong D.
Publication venue: Public Library of Science
Publication date: 01/08/2008
Field of study

ChIP sequencing (ChIP-seq) is a new method for genomewide mapping of protein binding sites on DNA. It has generated much excitement in functional genomics. To score data and determine adequate sequencing depth, both the genomic background and the binding sites must be properly modeled. To develop a computational foundation to tackle these issues, we first performed a study to characterize the observed statistical nature of this new type of high-throughput data. By linking sequence tags into clusters, we show that there are two components to the distribution of tag counts observed in a number of recent experiments: an initial power-law distribution and a subsequent long right tail. Then we develop in silico ChIP-seq, a computational method to simulate the experimental outcome by placing tags onto the genome according to particular assumed distributions for the actual binding sites and for the background genomic sequence. In contrast to current assumptions, our results show that both the background and the binding sites need to have a markedly nonuniform distribution in order to correctly model the observed ChIP-seq data, with, for instance, the background tag counts modeled by a gamma distribution. On the basis of these results, we extend an existing scoring approach by using a more realistic genomic-background model. This enables us to identify transcription-factor binding sites in ChIP-seq data in a statistically rigorous fashion

Directory of Open Access Journals

PubMed Central

Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates

Author: Frankish Adam
Gerstein Mark
Harrow Jennifer
Hunt Toby
Zhang Zhengdong D
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Novel human pseudogenes are identified that had previous functionality and their age is estimated. The rate of loss-of-function occurred uniformly

Crossref

PubMed Central

Substance abuse and the risk of severe COVID-19: Mendelian randomization confirms the causal role of opioids but hints a negative causal effect for cannabinoids

Author: M. Reza Jabalameli
Zhengdong D. Zhang
Publication venue: 'Frontiers Media SA'
Publication date: 01/12/2022
Field of study

Since the start of the COVID-19 global pandemic, our understanding of the underlying disease mechanism and factors associated with the disease severity has dramatically increased. A recent study investigated the relationship between substance use disorders (SUD) and the risk of severe COVID-19 in the United States and concluded that the risk of hospitalization and death due to COVID-19 is directly correlated with substance abuse, including opioid use disorder (OUD) and cannabis use disorder (CUD). While we found this analysis fascinating, we believe this observation may be biased due to comorbidities (such as hypertension, diabetes, and cardiovascular disease) confounding the direct effect of SUD on severe COVID-19 illness. To answer this question, we sought to investigate the causal relationship between substance abuse and medication-taking history (as a proxy trait for comorbidities) with the risk of COVID-19 adverse outcomes. Our Mendelian randomization analysis confirms the causal relationship between OUD and severe COVID-19 illness but suggests an inverse causal effect for cannabinoids. Considering that COVID-19 mortality is largely attributed to disturbed immune regulation, the possible modulatory impact of cannabinoids in alleviating cytokine storms merits further investigation

Directory of Open Access Journals

Tilescope: online analysis pipeline for high-density tiling microarray data

Author: Du Jiang
Gerstein Mark
Lam Hugo YK
Rozowsky Joel
Snyder Michael
Zhang Zhengdong D
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Tilescope is a fully integrated and automated new data-processing pipeline for analyzing high-density tiling-array data

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

Identification of genomic indels and structural variations using split reads

Author: Abyzov Alex
Du Jiang
Gerstein Mark
Lam Hugo
Snyder Michael
Urban Alexander E
Zhang Zhengdong D
Publication venue: BioMed Central
Publication date: 01/07/2011
Field of study

Abstract Background Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs) in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC), a sequence-based method for SV detection. Results We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read). All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions). A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models). This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions <it>vs</it>. short insertions). We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events) allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs. Conclusions Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole size spectrum for deletions. Moreover, with the advent of the third-generation sequencing technologies that produce longer reads, we expect our method to be even more useful.</p

Directory of Open Access Journals

PubMed Central

Mechanosignaling activation of TGFβ maintains intervertebral disc homeostasis

Author: Bian Qin
Cao Xu
Crane Janet L.
Edward Guo X.
Jain Amit
Kebaish Khaled
Ma Lei
Riley Lee H.
Sponseller Paul D.
Séguin Cheryle A.
Wan Mei
Wang Yongjun
Zhang Zhengdong
Publication venue: Scholarship@Western
Publication date: 21/03/2017
Field of study

Intervertebral disc (IVD) degeneration is the leading cause of disability with no disease-modifying treatment. IVD degeneration is associated with instable mechanical loading in the spine, but little is known about how mechanical stress regulates nucleus notochordal (NC) cells to maintain IVD homeostasis. Here we report that mechanical stress can result in excessive integrin αv β6-mediated activation of transforming growth factor beta (TGFβ), decreased NC cell vacuoles, and increased matrix proteoglycan production, and results in degenerative disc disease (DDD). Knockout of TGFβ type II receptor (TβRII) or integrin α v in the NC cells inhibited functional activity of postnatal NC cells and also resulted in DDD under mechanical loading. Administration of RGD peptide, TGFβ, and α v β 6-neutralizing antibodies attenuated IVD degeneration. Thus, integrin-mediated activation of TGFβ plays a critical role in mechanical signaling transduction to regulate IVD cell function and homeostasis. Manipulation of this signaling pathway may be a potential therapeutic target to modify DDD

Scholarship@Western

PubMed Central

The DNA Repair Gene APE1 T1349G Polymorphism and Risk of Gastric Cancer in a Chinese Population

Author: A Ramos-De la Medina
D Gu
D Palli
D Wu
DM Parkin
DM Parkin
DM Wilson 3rd
Dongying Gu
E Canbay
EC Friedberg
H Zhu
JH Hoeijmakers
Jinfei Chen
JJ Hu
KD Crew
KD Crew
L Yang
M Christmann
Meilin Wang
MR Kelley
MZ Hadi
O Popanda
Paolo Peterlongo
Q Zhao
RD Wood
RD Wood
RR Misra
SA Miller
Shizhi Wang
SS Hecht
T Izumi
T Xi
WQ Li
Zhengdong Zhang
ZX Li
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Background: Apurinic/apyrimidinic endonuclease 1 (APE1) has a central role in the repair of apurinic apyrimidic sites through both its endonuclease and its phosphodiesterase activities. A common APE1 polymorphism, T1349G (rs3136820), was previously shown to be associated with the risk of cancers. Objective: We hypothesized that the APE1 T1349G polymorphism is also associated with risk of gastric cancer. Methods: In a hospital-based case-control study of 338 case patients with newly diagnosed gastric cancer and 362 cancerfree controls frequency-matched by age and sex, we genotyped the T1349G polymorphism and assessed its associations with risk of gastric cancer. Results: Compared with the APE1 TT genotype, individuals with the variant TG/GG genotypes had a significantly increased risk of gastric cancer (odds ratio = 1.69, 95 % confidence interval = 1.19–2.40), which was more pronounced among subgroups of aged #60 years, male, ever smokers, and ever drinkers. Further analyses revealed that the variant genotypes were associated with an increased risk for diffuse-type, low depth of tumor infiltration (T1 and T2), and lymph node metastasis gastric cancer. Conclusions: The APE1 T1349G polymorphism may be a marker for the development of gastric cancer in the Chinese population. Larger studies are required to validate these findings in diverse populations

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants

Author: AE Urban
D Pinkel
DA Wheeler
DR Bentley
DR Zerbino
F Sanger
GH Perry
J Butler
J Rozowsky
JC Dohm
JC Venter
Jiang Du
JO Korbel
JO Korbel
JY Hehir-Kwa
M Margulies
M Pop
M Pop
Mark B. Gerstein
Michael Snyder
MJ Chaisson
PA Pevzner
R Lippert
R Redon
R Schmid
RL Warren
Robert D. Bjornson
RR Selzer
S Batzoglou
S Levy
SMD Goldberg
V Bansal
William Stafford Noble
Yong Kong
Zhengdong D. Zhang
Publication venue: Public Library of Science
Publication date: 01/07/2009
Field of study

The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Detection of copy number variation from array intensity and sequencing read depth using a stepwise Bayesian model

Author: AB Olshen
AE Urban
AJ Iafrate
C Erdman
CL Myers
E Ben-Yaacov
E Tuzun
F Forozan
F Picard
J Fridlyand
J Sebat
J Shendure
K Jong
L Hsu
LY Wu
M Bredel
M Fedurco
M Margulies
MA Newton
Mark B Gerstein
N Metropolis
OC Lingjaerde
OM Rueda
P Broet
P Cahan
P Hupe
P Wang
PH Eilers
R Development Core Team
R Pique-Regi
R Redon
S Geman
SP Shah
V Jobanputra
WK Hastings
WR Lai
Zhengdong D Zhang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Copy number variants (CNVs) have been demonstrated to occur at a high frequency and are now widely believed to make a significant contribution to the phenotypic variation in human populations. Array-based comparative genomic hybridization (array-CGH) and newly developed read-depth approach through ultrahigh throughput genomic sequencing both provide rapid, robust, and comprehensive methods to identify CNVs on a whole-genome scale. Results We developed a Bayesian statistical analysis algorithm for the detection of CNVs from both types of genomic data. The algorithm can analyze such data obtained from PCR-based bacterial artificial chromosome arrays, high-density oligonucleotide arrays, and more recently developed high-throughput DNA sequencing. Treating parameters--e.g., the number of CNVs, the position of each CNV, and the data noise level--that define the underlying data generating process as random variables, our approach derives the posterior distribution of the genomic CNV structure given the observed data. Sampling from the posterior distribution using a Markov chain Monte Carlo method, we get not only best estimates for these unknown parameters but also Bayesian credible intervals for the estimates. We illustrate the characteristics of our algorithm by applying it to both synthetic and experimental data sets in comparison to other segmentation algorithms. Conclusions In particular, the synthetic data comparison shows that our method is more sensitive than other approaches at low false positive rates. Furthermore, given its Bayesian origin, our method can also be seen as a technique to refine CNVs identified by fast point-estimate methods and also as a framework to integrate array-CGH and sequencing data with other CNV-related biological knowledge, all through informative priors.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Associations of IL-4, IL-4R, and IL-13 Gene Polymorphisms in Coal Workers' Pneumoconiosis in China: A Case-Control Study

Author: AE Kelly-Welch
B Beghe
B Liu
B Yucesoy
BS Choi
C Modesto
Chunhui Ni
D Vercelli
G Grunig
H Mitsuyasu
H Mitsuyasu
I Ates
I Franjkovic
I Shirakawa
J Cisneros-Lira
JA Elias
Jianwei Zhou
Jos H. Verbeek
KC Chang
KM Murphy
L Cameron
LJ Rosenwasser
M Wang
M Wills-Karp
M Yazdanbakhsh
Meilin Wang
MV Rockman
N Noben-Trauth
P Chomarat
P Miossec
PA Hessel
PE Graves
S Kruse
S Zhu
Shasha Wang
TA Wynn
TP Ng
TS Nawrot
WE Paul
X Huang
Xiaomin Ji
Z Song
Zhengdong Zhang
Zhifang Song
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Background: The IL-4, IL-4 receptor (IL4R), and IL-13 genes are crucial immune factors and may influence the course of various diseases. In the present study, we investigated the association between the potential functional polymorphisms in IL-4, IL-4R, and IL-13 and coal workers ’ pneumoconiosis (CWP) risk in a Chinese population. Methods: Six polymorphisms (C-590T in IL-4, Ile50Val, Ser478Pro, and Gln551Arg in IL-4R, C-1055T and Arg130Gln in IL-13) were genotyped and analyzed in a case-control study of 556 CWP and 541 control subjects. Results: Our results revealed that the IL-4 CT/CC genotypes were associated with a significantly decreased risk of CWP (odd

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central