Search CORE

43,435 research outputs found

A hierarchical Bayesian model for inference of copy number variants and their association to gene expression

Author: Cassese Alberto
Falciani Francesco
Guindani Michele
Tadesse Mahlet G.
Vannucci Marina
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2014
Field of study

A number of statistical models have been successfully developed for the analysis of high-throughput data from a single source, but few methods are available for integrating data from different sources. Here we focus on integrating gene expression levels with comparative genomic hybridization (CGH) array measurements collected on the same subjects. We specify a measurement error model that relates the gene expression levels to latent copy number states which, in turn, are related to the observed surrogate CGH measurements via a hidden Markov model. We employ selection priors that exploit the dependencies across adjacent copy number states and investigate MCMC stochastic search techniques for posterior inference. Our approach results in a unified modeling framework for simultaneously inferring copy number variants (CNV) and identifying their significant associations with mRNA transcripts abundance. We show performance on simulated data and illustrate an application to data from a genomic study on human cancer cell lines.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS705 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Maastricht University Research Portal

Crossref

Florence Research

PubMed Central

eScholarship - University of California

DSpace at Rice University

Cancer gene prioritization by integrative analysis of mRNA expression and DNA copy number data: a comparative review

Author: Akavia
Andrews
Baasiri
Chin
Dai
De Bie
Futreal
H.-U. Klein
Haverty
Hawkins
Hyman
Johnson
Kao
L. Lahti
M. Dugas
M. Schafer
McLendon
Menezes
Mullighan
Mullighan
Myllykangas
Olshen
Ortiz-Estevez
Phillips
Qin
S. Bicciato
Solvang
Soneson
Stranger
van Wieringen
van Wieringen
Publication venue: 'Oxford University Press (OUP)'
Publication date: 20/11/2011
Field of study

A variety of genome-wide profiling techniques are available to probe complementary aspects of genome structure and function. Integrative analysis of heterogeneous data sources can reveal higher-level interactions that cannot be detected based on individual observations. A standard integration task in cancer studies is to identify altered genomic regions that induce changes in the expression of the associated genes based on joint analysis of genome-wide gene expression and copy number profiling measurements. In this review, we provide a comparison among various modeling procedures for integrating genome-wide profiling data of gene copy number and transcriptional alterations and highlight common approaches to genomic data integration. A transparent benchmarking procedure is introduced to quantitatively compare the cancer gene prioritization performance of the alternative methods. The benchmarking algorithms and data sets are available at http://intcomp.r-forge.r-project.orgComment: PDF file including supplementary material. 9 pages. Preprin

arXiv.org e-Print Archive

Crossref

PubMed Central

Wageningen University & Research Publications

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Sparse integrative clustering of multiple omics data sets

Author: Mo Qianxing
Shen Ronglai
Wang Sijian
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 13/02/2012
Field of study

High resolution microarrays and second-generation sequencing platforms are powerful tools to investigate genome-wide alterations in DNA copy number, methylation and gene expression associated with a disease. An integrated genomic profiling approach measures multiple omics data types simultaneously in the same set of biological samples. Such approach renders an integrated data resolution that would not be available with any single data type. In this study, we use penalized latent variable regression methods for joint modeling of multiple omics data types to identify common latent variables that can be used to cluster patient samples into biologically and clinically relevant disease subtypes. We consider lasso [J. Roy. Statist. Soc. Ser. B 58 (1996) 267-288], elastic net [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 301-320] and fused lasso [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 91-108] methods to induce sparsity in the coefficient vectors, revealing important genomic features that have significant contributions to the latent variables. An iterative ridge regression is used to compute the sparse coefficient vectors. In model selection, a uniform design [Monographs on Statistics and Applied Probability (1994) Chapman & Hall] is used to seek "experimental" points that scattered uniformly across the search domain for efficient sampling of tuning parameter combinations. We compared our method to sparse singular value decomposition (SVD) and penalized Gaussian mixture model (GMM) using both real and simulated data sets. The proposed method is applied to integrate genomic, epigenomic and transcriptomic data for subtype analysis in breast and lung cancer data sets.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS578 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

PubMed Central

Collection Of Biostatistics Research Archive

Analysing multiple types of molecular profiles simultaneously: connecting the needles in the haystack

Author: Boer Judith
Goeman Jelle
Menezes Renée
Mohammadi Leila
Publication venue
Publication date: 08/10/2015
Field of study

It has been shown that a random-effects framework can be used to test the association between a gene's expression level and the number of DNA copies of a set of genes. This gene-set modelling framework was later applied to find associations between mRNA expression and microRNA expression, by defining the gene sets using target prediction information. Here, we extend the model introduced by Menezes et al (2009) to consider the effect of not just copy number, but also of other molecular profiles such as methylation changes and loss-of-heterozigosity (LOH), on gene expression levels. We will consider again sets of measurements, to improve robustness of results and increase the power to find associations. Our approach can be used genome-wide to find associations, yields a test to help separate true associations from noise and can include confounders. We apply our method to colon and to breast cancer samples, for which genome-wide copy number, methylation and gene expression profiles are available. Our findings include interesting gene expression-regulating mechanisms, which may involve only one of copy number or methylation, or both for the same samples. We even are able to find effects due to different molecular mechanisms in different samples. Our method can equally well be applied to cases where other types of molecular (high-dimensional) data are collected, such as LOH, SNP genotype and microRNA expression data. Computationally efficient, it represents a flexible and powerful tool to study associations between high-dimensional datasets. The method is freely available via the SIM BioConductor package

arXiv.org e-Print Archive

Crossref

PubMed Central

EUR Research Repository

Leiden University Scholary Publications

Erasmus University Digital Repository

FigShare

Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines

Author: Leday Gwenaël G. R.
van de Wiel Mark A.
van der Vaart Aad W.
van Wieringen Wessel N.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2013
Field of study

DNA copy number and mRNA expression are widely used data types in cancer studies, which combined provide more insight than separately. Whereas in existing literature the form of the relationship between these two types of markers is fixed a priori, in this paper we model their association. We employ piecewise linear regression splines (PLRS), which combine good interpretation with sufficient flexibility to identify any plausible type of relationship. The specification of the model leads to estimation and model selection in a constrained, nonstandard setting. We provide methodology for testing the effect of DNA on mRNA and choosing the appropriate model. Furthermore, we present a novel approach to obtain reliable confidence bands for constrained PLRS, which incorporates model uncertainty. The procedures are applied to colorectal and breast cancer data. Common assumptions are found to be potentially misleading for biologically relevant genes. More flexible models may bring more insight in the interaction between the two markers.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS605 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

VU Research Portal

Crossref

Leiden University Scholary Publications

Bayesian DNA copy number analysis

Author: Bertoni Francesco
Hutter Marcus
Kwee Ivo
Rancoita P M V
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/02/2016
Field of study

BACKGROUND: Some diseases, like tumors, can be related to chromosomal aberrations, leading to changes of DNA copy number. The copy number of an aberrant genome can be represented as a piecewise constant function, since it can exhibit regions of deletions or gains. Instead, in a healthy cell the copy number is two because we inherit one copy of each chromosome from each our parents. Bayesian Piecewise Constant Regression (BPCR) is a Bayesian regression method for data that are noisy observations of a piecewise constant function. The method estimates the unknown segment number, the endpoints of the segments and the value of the segment levels of the underlying piecewise constant function. The Bayesian Regression Curve (BRC) estimates the same data with a smoothing curve. However, in the original formulation, some estimators failed to properly determine the corresponding parameters. For example, the boundary estimator did not take into account the dependency among the boundaries and succeeded in estimating more than one breakpoint at the same position, losing segments. RESULTS: We derived an improved version of the BPCR (called mBPCR) and BRC, changing the segment number estimator and the boundary estimator to enhance the fitting procedure. We also proposed an alternative estimator of the variance of the segment levels, which is useful in case of data with high noise. Using artificial data, we compared the original and the modified version of BPCR and BRC with other regression methods, showing that our improved version of BPCR generally outperformed all the others. Similar results were also observed on real data. CONCLUSION: We propose an improved method for DNA copy number estimation, mBPCR, which performed very well compared to previously published algorithms. In particular, mBPCR was more powerful in the detection of the true position of the breakpoints and of small aberrations in very noisy data. Hence, from a biological point of view, our method can be very useful, for example, to find targets of genomic aberrations in clinical cancer samples

The Australian National University

Methods for Joint Normalization and Comparison of Hi-C data

Author: Stansfield John C
Publication venue: VCU Scholars Compass
Publication date: 01/01/2019
Field of study

The development of chromatin conformation capture technology has opened new avenues of study into the 3D structure and function of the genome. Chromatin structure is known to influence gene regulation, and differences in structure are now emerging as a mechanism of regulation between, e.g., cell differentiation and disease vs. normal states. Hi-C sequencing technology now provides a way to study the 3D interactions of the chromatin over the whole genome. However, like all sequencing technologies, Hi-C suffers from several forms of bias stemming from both the technology and the DNA sequence itself. Several normalization methods have been developed for normalizing individual Hi-C datasets, but little work has been done on developing joint normalization methods for comparing two or more Hi-C datasets. To make full use of Hi-C data, joint normalization and statistical comparison techniques are needed to carry out experiments to identify regions where chromatin structure differs between conditions. We develop methods for the joint normalization and comparison of two Hi-C datasets, which we then extended to more complex experimental designs. Our normalization method is novel in that it makes use of the distance-dependent nature of chromatin interactions. Our modification of the Minus vs. Average (MA) plot to the Minus vs. Distance (MD) plot allows for a nonparametric data-driven normalization technique using loess smoothing. Additionally, we present a simple statistical method using Z-scores for detecting differentially interacting regions between two datasets. Our initial method was published as the Bioconductor R package HiCcompare [http://bioconductor.org/packages/HiCcompare/](http://bioconductor.org/packages/HiCcompare/). We then further extended our normalization and comparison method for use in complex Hi-C experiments with more than two datasets and optional covariates. We extended the normalization method to jointly normalize any number of Hi-C datasets by using a cyclic loess procedure on the MD plot. The cyclic loess normalization technique can remove between dataset biases efficiently and effectively even when several datasets are analyzed at one time. Our comparison method implements a generalized linear model-based approach for comparing complex Hi-C experiments, which may have more than two groups and additional covariates. The extended methods are also available as a Bioconductor R package [http://bioconductor.org/packages/multiHiCcompare/](http://bioconductor.org/packages/multiHiCcompare/). Finally, we demonstrate the use of HiCcompare and multiHiCcompare in several test cases on real data in addition to comparing them to other similar methods (https://doi.org/10.1002/cpbi.76)

VCU Scholars Compass

Negative association of the chemokine receptor CCR5 d32 polymorphism with systemic inflammatory response, extra-articular symptoms and joint erosion in rheumatoid arthritis

Author: Arnold Sybille
Baerwald Christoph
Burkhardt Harald
Keyßer Gernot
Pierer Matthias
Rossol Manuela
Wagner Ulf
Publication venue
Publication date: 01/01/2009
Field of study

Introduction Chemokines and their receptors control immune cell migration during infections as well as in autoimmune responses. A 32 bp deletion in the gene of the chemokine receptor CCR5 confers protection against HIV infection, but has also been reported to decrease susceptibility to rheumatoid arthritis (RA). The influence of this deletion variant on the clinical course of this autoimmune disease was investigated. Methods Genotyping for CCR5d32 was performed by PCR and subsequent electrophoretic fragment length determination. For the clinical analysis, the following extra-articular manifestations of RA were documented by the rheumatologist following the patient: presence of rheumatoid nodules, major organ vasculitis, pulmonary fibrosis, serositis or a Raynaud's syndrome. All documented CRP levels were analyzed retrospectively, and the last available hand and feet radiographs were analyzed with regards to the presence or absence of erosive disease. Results Analysis of the CCR5 polymorphism in 503 RA patients and in 459 age-matched healthy controls revealed a significantly decreased disease susceptibility for carriers of the CCR5d32 deletion (Odds ratio 0.67, P = 0.0437). Within the RA patient cohort, CCR5d32 was significantly less frequent in patients with extra-articular manifestations compared with those with limited, articular disease (13.2% versus 22.8%, P = 0.0374). In addition, the deletion was associated with significantly lower average CRP levels over time (median 8.85 vs. median 14.1, P = 0.0041) and had a protective effect against the development of erosive disease (OR = 0.40, P = 0.0047). Intriguingly, homozygosity for the RA associated DNASE2 -1066 G allele had an additive effect on the disease susceptibility conferred by the wt allele of CCR5 (OR = 2.24, P = 0.0051 for carrier of both RA associated alleles) Conclusions The presence of CCR5d32 significantly influenced disease susceptibility to and clinical course of RA in a German study population. The protective effect of this deletion, which has been described to lead to a decreased receptor expression in heterozygous patients, underlines the importance of chemokines in the pathogenesis of RA

Crossref

PubMed Central

Hochschulschriftenserver - Universität Frankfurt am Main

Recommended from our members

Tandem quadruplication of HMA4 in the zinc (Zn) and cadmium (Cd) hyperaccumulator noccaea caerulescens

Author: A Papoyan
AJ Pollard
B Frey
C Bernard
C Koncz
CK Wong
D Bikard
D Hussain
D Roze
E Lombi
E Pettersson
EP Colangelo
F Verret
F Verret
G Jiménez-Ambriz
Graham J. King
H Küpper
Helen C. Bowen
IN Talke
Ivan Baxter
J Sambrook
J Wild
JE van de Mortel
JF Ma
John P. Hammond
JP Hammond
K Higgins
K Munamenhof
L Santuari
M Courbot
M Hanikenne
M Koch
M Meyer
MA Beilstein
Martin R. Broadley
MJ Haydon
MR Broadley
MR Macnair
Neil S. Graham
P Nyrén
Philip J. White
PJ White
PJ White
R Riley
RA Jefferson
RA Swanson-Wagner
RF Mills
RF Mills
RF Mills
Rupert G. Fray
S Dubois
S Ó Lochlainn
SA Sinclair
Seosamh Ó Lochlainn
SI Taylor
SJ Clough
T Nakagawa
U Krämer
U Krämer
UJ Kim
WA Peer
X Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Zinc (Zn) and cadmium (Cd) hyperaccumulation may have evolved twice in the Brassicaceae, in Arabidopsis halleri and in the Noccaea genus. Tandem gene duplication and deregulated expression of the Zn transporter, HMA4, has previously been linked to Zn/Cd hyperaccumulation in A. halleri. Here, we tested the hypothesis that tandem duplication and deregulation of HMA4 expression also occurs in Noccaea. A Noccaea caerulescens genomic library was generated, containing 36,864 fosmid pCC1FOSTM clones with insert sizes ~20–40 kbp, and screened with a PCR-generated HMA4 genomic probe. Gene copy number within the genome was estimated through DNA fingerprinting and pooled fosmid pyrosequencing. Gene copy numbers within individual clones was determined by PCR analyses with novel locus specific primers. Entire fosmids were then sequenced individually and reads equivalent to 20-fold coverage were assembled to generate complete whole contigs. Four tandem HMA4 repeats were identified in a contiguous sequence of 101,480 bp based on sequence overlap identities. These were flanked by regions syntenous with up and downstream regions of AtHMA4 in Arabidopsis thaliana. Promoter-reporter b-glucuronidase (GUS) fusion analysis of a NcHMA4 in A. thaliana revealed deregulated expression in roots and shoots, analogous to AhHMA4 promoters, but distinct from AtHMA4 expression which localised to the root vascular tissue. This remarkable consistency in tandem duplication and deregulated expression of metal transport genes between N. caerulescens and A. halleri, which last shared a common ancestor >40 mya, provides intriguing evidence that parallel evolutionary pathways may underlie Zn/Cd hyperaccumulation in Brassicaceae

Central Archive at the University of Reading

Public Library of Science (PLOS)

ePublications@SCU

Crossref

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

Rothamsted Repository

Expansion of the Parkinson disease-associated SNCA-Rep1 allele upregulates human alpha-synuclein in transgenic mouse brain.

Author: Bernard David J
Chiba-Falek Ornit
Cronin Kenneth D
El-Agnaf Omar MA
Ge Dongliang
Linnertz Colton
Manninger Paul
Nussbaum Robert L
Orrison Bonnie M
Rossoshek Anna
Schlossmacher Michael G
Publication venue: eScholarship, University of California
Publication date: 04/06/2009
Field of study

Alpha-synuclein (SNCA) gene has been implicated in the development of rare forms of familial Parkinson disease (PD). Recently, it was shown that an increase in SNCA copy numbers leads to elevated levels of wild-type SNCA-mRNA and protein and is sufficient to cause early-onset, familial PD. A critical question concerning the molecular pathogenesis of PD is what contributory role, if any, is played by the SNCA gene in sporadic PD. The expansion of SNCA-Rep1, an upstream, polymorphic microsatellite of the SNCA gene, is associated with elevated risk for sporadic PD. However, whether SNCA-Rep1 is the causal variant and the underlying mechanism with which its effect is mediated by remained elusive. We report here the effects of three distinct SNCA-Rep1 variants in the brains of 72 mice transgenic for the entire human SNCA locus. Human SNCA-mRNA and protein levels were increased 1.7- and 1.25-fold, respectively, in homozygotes for the expanded, PD risk-conferring allele compared with homozygotes for the shorter, protective allele. When adjusting for the total SNCA-protein concentration (endogenous mouse and transgenic human) expressed in each brain, the expanded risk allele contributed 2.6-fold more to the SNCA steady-state than the shorter allele. Furthermore, targeted deletion of Rep1 resulted in the lowest human SNCA-mRNA and protein concentrations in murine brain. In contrast, the Rep1 effect was not observed in blood lysates from the same mice. These results demonstrate that Rep1 regulates human SNCA expression by enhancing its transcription in the adult nervous system and suggest that homozygosity for the expanded Rep1 allele may mimic locus multiplication, thereby elevating PD risk

PubMed Central

eScholarship - University of California