43,435 research outputs found
A hierarchical Bayesian model for inference of copy number variants and their association to gene expression
A number of statistical models have been successfully developed for the
analysis of high-throughput data from a single source, but few methods are
available for integrating data from different sources. Here we focus on
integrating gene expression levels with comparative genomic hybridization (CGH)
array measurements collected on the same subjects. We specify a measurement
error model that relates the gene expression levels to latent copy number
states which, in turn, are related to the observed surrogate CGH measurements
via a hidden Markov model. We employ selection priors that exploit the
dependencies across adjacent copy number states and investigate MCMC stochastic
search techniques for posterior inference. Our approach results in a unified
modeling framework for simultaneously inferring copy number variants (CNV) and
identifying their significant associations with mRNA transcripts abundance. We
show performance on simulated data and illustrate an application to data from a
genomic study on human cancer cell lines.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS705 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Cancer gene prioritization by integrative analysis of mRNA expression and DNA copy number data: a comparative review
A variety of genome-wide profiling techniques are available to probe
complementary aspects of genome structure and function. Integrative analysis of
heterogeneous data sources can reveal higher-level interactions that cannot be
detected based on individual observations. A standard integration task in
cancer studies is to identify altered genomic regions that induce changes in
the expression of the associated genes based on joint analysis of genome-wide
gene expression and copy number profiling measurements. In this review, we
provide a comparison among various modeling procedures for integrating
genome-wide profiling data of gene copy number and transcriptional alterations
and highlight common approaches to genomic data integration. A transparent
benchmarking procedure is introduced to quantitatively compare the cancer gene
prioritization performance of the alternative methods. The benchmarking
algorithms and data sets are available at http://intcomp.r-forge.r-project.orgComment: PDF file including supplementary material. 9 pages. Preprin
Sparse integrative clustering of multiple omics data sets
High resolution microarrays and second-generation sequencing platforms are
powerful tools to investigate genome-wide alterations in DNA copy number,
methylation and gene expression associated with a disease. An integrated
genomic profiling approach measures multiple omics data types simultaneously in
the same set of biological samples. Such approach renders an integrated data
resolution that would not be available with any single data type. In this
study, we use penalized latent variable regression methods for joint modeling
of multiple omics data types to identify common latent variables that can be
used to cluster patient samples into biologically and clinically relevant
disease subtypes. We consider lasso [J. Roy. Statist. Soc. Ser. B 58 (1996)
267-288], elastic net [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005)
301-320] and fused lasso [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005)
91-108] methods to induce sparsity in the coefficient vectors, revealing
important genomic features that have significant contributions to the latent
variables. An iterative ridge regression is used to compute the sparse
coefficient vectors. In model selection, a uniform design [Monographs on
Statistics and Applied Probability (1994) Chapman & Hall] is used to seek
"experimental" points that scattered uniformly across the search domain for
efficient sampling of tuning parameter combinations. We compared our method to
sparse singular value decomposition (SVD) and penalized Gaussian mixture model
(GMM) using both real and simulated data sets. The proposed method is applied
to integrate genomic, epigenomic and transcriptomic data for subtype analysis
in breast and lung cancer data sets.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS578 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Analysing multiple types of molecular profiles simultaneously: connecting the needles in the haystack
It has been shown that a random-effects framework can be used to test the
association between a gene's expression level and the number of DNA copies of a
set of genes. This gene-set modelling framework was later applied to find
associations between mRNA expression and microRNA expression, by defining the
gene sets using target prediction information.
Here, we extend the model introduced by Menezes et al (2009) to consider the
effect of not just copy number, but also of other molecular profiles such as
methylation changes and loss-of-heterozigosity (LOH), on gene expression
levels. We will consider again sets of measurements, to improve robustness of
results and increase the power to find associations. Our approach can be used
genome-wide to find associations, yields a test to help separate true
associations from noise and can include confounders.
We apply our method to colon and to breast cancer samples, for which
genome-wide copy number, methylation and gene expression profiles are
available. Our findings include interesting gene expression-regulating
mechanisms, which may involve only one of copy number or methylation, or both
for the same samples. We even are able to find effects due to different
molecular mechanisms in different samples.
Our method can equally well be applied to cases where other types of
molecular (high-dimensional) data are collected, such as LOH, SNP genotype and
microRNA expression data. Computationally efficient, it represents a flexible
and powerful tool to study associations between high-dimensional datasets. The
method is freely available via the SIM BioConductor package
Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines
DNA copy number and mRNA expression are widely used data types in cancer
studies, which combined provide more insight than separately. Whereas in
existing literature the form of the relationship between these two types of
markers is fixed a priori, in this paper we model their association. We employ
piecewise linear regression splines (PLRS), which combine good interpretation
with sufficient flexibility to identify any plausible type of relationship. The
specification of the model leads to estimation and model selection in a
constrained, nonstandard setting. We provide methodology for testing the effect
of DNA on mRNA and choosing the appropriate model. Furthermore, we present a
novel approach to obtain reliable confidence bands for constrained PLRS, which
incorporates model uncertainty. The procedures are applied to colorectal and
breast cancer data. Common assumptions are found to be potentially misleading
for biologically relevant genes. More flexible models may bring more insight in
the interaction between the two markers.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS605 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bayesian DNA copy number analysis
BACKGROUND: Some diseases, like tumors, can be related to chromosomal aberrations, leading to
changes of DNA copy number. The copy number of an aberrant genome can be represented as a
piecewise constant function, since it can exhibit regions of deletions or gains. Instead, in a healthy
cell the copy number is two because we inherit one copy of each chromosome from each our
parents.
Bayesian Piecewise Constant Regression (BPCR) is a Bayesian regression method for data that are
noisy observations of a piecewise constant function. The method estimates the unknown segment
number, the endpoints of the segments and the value of the segment levels of the underlying
piecewise constant function. The Bayesian Regression Curve (BRC) estimates the same data with
a smoothing curve. However, in the original formulation, some estimators failed to properly
determine the corresponding parameters. For example, the boundary estimator did not take into
account the dependency among the boundaries and succeeded in estimating more than one
breakpoint at the same position, losing segments.
RESULTS: We derived an improved version of the BPCR (called mBPCR) and BRC, changing the
segment number estimator and the boundary estimator to enhance the fitting procedure. We also
proposed an alternative estimator of the variance of the segment levels, which is useful in case of
data with high noise. Using artificial data, we compared the original and the modified version of
BPCR and BRC with other regression methods, showing that our improved version of BPCR
generally outperformed all the others. Similar results were also observed on real data.
CONCLUSION: We propose an improved method for DNA copy number estimation, mBPCR, which
performed very well compared to previously published algorithms. In particular, mBPCR was more
powerful in the detection of the true position of the breakpoints and of small aberrations in very
noisy data. Hence, from a biological point of view, our method can be very useful, for example, to
find targets of genomic aberrations in clinical cancer samples
Methods for Joint Normalization and Comparison of Hi-C data
The development of chromatin conformation capture technology has opened new avenues of study into the 3D structure and function of the genome. Chromatin structure is known to influence gene regulation, and differences in structure are now emerging as a mechanism of regulation between, e.g., cell differentiation and disease vs. normal states. Hi-C sequencing technology now provides a way to study the 3D interactions of the chromatin over the whole genome. However, like all sequencing technologies, Hi-C suffers from several forms of bias stemming from both the technology and the DNA sequence itself. Several normalization methods have been developed for normalizing individual Hi-C datasets, but little work has been done on developing joint normalization methods for comparing two or more Hi-C datasets. To make full use of Hi-C data, joint normalization and statistical comparison techniques are needed to carry out experiments to identify regions where chromatin structure differs between conditions.
We develop methods for the joint normalization and comparison of two Hi-C datasets, which we then extended to more complex experimental designs. Our normalization method is novel in that it makes use of the distance-dependent nature of chromatin interactions. Our modification of the Minus vs. Average (MA) plot to the Minus vs. Distance (MD) plot allows for a nonparametric data-driven normalization technique using loess smoothing. Additionally, we present a simple statistical method using Z-scores for detecting differentially interacting regions between two datasets. Our initial method was published as the Bioconductor R package HiCcompare [http://bioconductor.org/packages/HiCcompare/](http://bioconductor.org/packages/HiCcompare/).
We then further extended our normalization and comparison method for use in complex Hi-C experiments with more than two datasets and optional covariates. We extended the normalization method to jointly normalize any number of Hi-C datasets by using a cyclic loess procedure on the MD plot. The cyclic loess normalization technique can remove between dataset biases efficiently and effectively even when several datasets are analyzed at one time. Our comparison method implements a generalized linear model-based approach for comparing complex Hi-C experiments, which may have more than two groups and additional covariates. The extended methods are also available as a Bioconductor R package [http://bioconductor.org/packages/multiHiCcompare/](http://bioconductor.org/packages/multiHiCcompare/). Finally, we demonstrate the use of HiCcompare and multiHiCcompare in several test cases on real data in addition to comparing them to other similar methods (https://doi.org/10.1002/cpbi.76)
Negative association of the chemokine receptor CCR5 d32 polymorphism with systemic inflammatory response, extra-articular symptoms and joint erosion in rheumatoid arthritis
Introduction Chemokines and their receptors control immune cell migration during infections as well as in autoimmune responses. A 32 bp deletion in the gene of the chemokine receptor CCR5 confers protection against HIV infection, but has also been reported to decrease susceptibility to rheumatoid arthritis (RA). The influence of this deletion variant on the clinical course of this autoimmune disease was investigated. Methods Genotyping for CCR5d32 was performed by PCR and subsequent electrophoretic fragment length determination. For the clinical analysis, the following extra-articular manifestations of RA were documented by the rheumatologist following the patient: presence of rheumatoid nodules, major organ vasculitis, pulmonary fibrosis, serositis or a Raynaud's syndrome. All documented CRP levels were analyzed retrospectively, and the last available hand and feet radiographs were analyzed with regards to the presence or absence of erosive disease. Results Analysis of the CCR5 polymorphism in 503 RA patients and in 459 age-matched healthy controls revealed a significantly decreased disease susceptibility for carriers of the CCR5d32 deletion (Odds ratio 0.67, P = 0.0437). Within the RA patient cohort, CCR5d32 was significantly less frequent in patients with extra-articular manifestations compared with those with limited, articular disease (13.2% versus 22.8%, P = 0.0374). In addition, the deletion was associated with significantly lower average CRP levels over time (median 8.85 vs. median 14.1, P = 0.0041) and had a protective effect against the development of erosive disease (OR = 0.40, P = 0.0047). Intriguingly, homozygosity for the RA associated DNASE2 -1066 G allele had an additive effect on the disease susceptibility conferred by the wt allele of CCR5 (OR = 2.24, P = 0.0051 for carrier of both RA associated alleles) Conclusions The presence of CCR5d32 significantly influenced disease susceptibility to and clinical course of RA in a German study population. The protective effect of this deletion, which has been described to lead to a decreased receptor expression in heterozygous patients, underlines the importance of chemokines in the pathogenesis of RA
Recommended from our members
Tandem quadruplication of HMA4 in the zinc (Zn) and cadmium (Cd) hyperaccumulator noccaea caerulescens
Zinc (Zn) and cadmium (Cd) hyperaccumulation may have evolved twice in the Brassicaceae, in Arabidopsis halleri and in the Noccaea genus. Tandem gene duplication and deregulated expression of the Zn transporter, HMA4, has previously been linked to Zn/Cd hyperaccumulation in A. halleri. Here, we tested the hypothesis that tandem duplication and deregulation of HMA4 expression also occurs in Noccaea. A Noccaea caerulescens genomic library was generated, containing 36,864 fosmid pCC1FOSTM clones with insert sizes ~20–40 kbp, and screened with a PCR-generated HMA4 genomic probe. Gene copy number within the genome was estimated through DNA fingerprinting and pooled fosmid pyrosequencing. Gene copy numbers within individual clones was determined by PCR analyses with novel locus specific primers. Entire fosmids were then sequenced individually and reads equivalent to 20-fold coverage were assembled to generate complete whole contigs. Four tandem HMA4 repeats were identified in a contiguous sequence of 101,480 bp based on sequence overlap identities. These were flanked by regions syntenous with up and downstream regions of AtHMA4 in Arabidopsis thaliana. Promoter-reporter b-glucuronidase (GUS) fusion analysis of a NcHMA4 in A. thaliana revealed deregulated expression in roots and shoots, analogous to AhHMA4 promoters, but distinct from AtHMA4 expression which localised to the root vascular tissue. This remarkable consistency in tandem duplication and deregulated expression of metal transport genes between N. caerulescens and A. halleri, which last shared a common ancestor >40 mya, provides intriguing evidence that parallel evolutionary pathways may underlie Zn/Cd hyperaccumulation in Brassicaceae
Expansion of the Parkinson disease-associated SNCA-Rep1 allele upregulates human alpha-synuclein in transgenic mouse brain.
Alpha-synuclein (SNCA) gene has been implicated in the development of rare forms of familial Parkinson disease (PD). Recently, it was shown that an increase in SNCA copy numbers leads to elevated levels of wild-type SNCA-mRNA and protein and is sufficient to cause early-onset, familial PD. A critical question concerning the molecular pathogenesis of PD is what contributory role, if any, is played by the SNCA gene in sporadic PD. The expansion of SNCA-Rep1, an upstream, polymorphic microsatellite of the SNCA gene, is associated with elevated risk for sporadic PD. However, whether SNCA-Rep1 is the causal variant and the underlying mechanism with which its effect is mediated by remained elusive. We report here the effects of three distinct SNCA-Rep1 variants in the brains of 72 mice transgenic for the entire human SNCA locus. Human SNCA-mRNA and protein levels were increased 1.7- and 1.25-fold, respectively, in homozygotes for the expanded, PD risk-conferring allele compared with homozygotes for the shorter, protective allele. When adjusting for the total SNCA-protein concentration (endogenous mouse and transgenic human) expressed in each brain, the expanded risk allele contributed 2.6-fold more to the SNCA steady-state than the shorter allele. Furthermore, targeted deletion of Rep1 resulted in the lowest human SNCA-mRNA and protein concentrations in murine brain. In contrast, the Rep1 effect was not observed in blood lysates from the same mice. These results demonstrate that Rep1 regulates human SNCA expression by enhancing its transcription in the adult nervous system and suggest that homozygosity for the expanded Rep1 allele may mimic locus multiplication, thereby elevating PD risk
- …