Search CORE

1,727 research outputs found

Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines

Author: Leday Gwenaël G. R.
van de Wiel Mark A.
van der Vaart Aad W.
van Wieringen Wessel N.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2013
Field of study

DNA copy number and mRNA expression are widely used data types in cancer studies, which combined provide more insight than separately. Whereas in existing literature the form of the relationship between these two types of markers is fixed a priori, in this paper we model their association. We employ piecewise linear regression splines (PLRS), which combine good interpretation with sufficient flexibility to identify any plausible type of relationship. The specification of the model leads to estimation and model selection in a constrained, nonstandard setting. We provide methodology for testing the effect of DNA on mRNA and choosing the appropriate model. Furthermore, we present a novel approach to obtain reliable confidence bands for constrained PLRS, which incorporates model uncertainty. The procedures are applied to colorectal and breast cancer data. Common assumptions are found to be potentially misleading for biologically relevant genes. More flexible models may bring more insight in the interaction between the two markers.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS605 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

VU Research Portal

Crossref

Leiden University Scholary Publications

Bayesian Hidden Markov Modeling of Array CGH Data

Author: Guha Subharup
Li Yi
Neuberg Donna
Publication venue: Collection of Biostatistics Research Archive
Publication date: 13/10/2006
Field of study

Genomic alterations have been linked to the development and progression of cancer. The technique of Comparative Genomic Hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array-CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for algorithms that can identify gains and losses in the number of copies based on statistical considerations, rather than merely detect trends in the data. We adopt a Bayesian approach, relying on the hidden Markov model to account for the inherent dependence in the intensity ratios. Posterior inferences are made about gains and losses in copy number. Localized amplifications (associated with oncogene mutations) and deletions (associated with mutations of tumor suppressors) are identified using posterior probabilities. Global trends such as extended regions of altered copy number are detected. Since the posterior distribution is analytically intractable, we implement a Metropolis-within-Gibbs algorithm for efficient simulation-based inference. Publicly available data on pancreatic adenocarcinoma, glioblastoma multiforme and breast cancer are analyzed, and comparisons are made with some widely-used algorithms to illustrate the reliability and success of the technique

CiteSeerX

Collection Of Biostatistics Research Archive

Finding Recurrent Regions of Copy Number Variation: A Review

Author: Diaz-Uriarte Ramon
Rueda Oscar M.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 18/11/2008
Field of study

Copy number variation (CNV) in genomic DNA is linked to a variety of human diseases, and array-based CGH (aCGH) is currently the main technology to locate CNVs. Although many methods have been developed to analyze aCGH from a single array/subject, disease-critical genes are more likely to be found in regions that are common or recurrent among subjects. Unfortunately, finding recurrent CNV regions remains a challenge. We review existing methods for the identification of recurrent CNV regions. The working definition of ``common\u27\u27 or ``recurrent\u27\u27 region differs between methods, leading to approaches that use different types of input (discretized output from a previous CGH segmentation analysis or intensity ratios), or that incorporate to varied degrees biological considerations (which play a role in the identification of ``interesting\u27\u27 regions and in the details of null models used to assess statistical significance). Very few approaches use and/or return probabilities, and code is not easily available for several methods. We suggest that finding recurrent CNVs could benefit from reframing the problem in a biclustering context. We also emphasize that, when analyzing data from complex diseases with significant among-subject heterogeneity, methods should be able to identify CNVs that affect only a subset of subjects. We make some recommendations about choice among existing methods, and we suggest further methodological research

Collection Of Biostatistics Research Archive

Functional Copy-Number Alterations in Cancer

Understanding the molecular basis of cancer requires characterization of its genetic defects. DNA microarray technologies can provide detailed raw data about chromosomal aberrations in tumor samples. Computational analysis is needed (1) to deduce from raw array data actual amplification or deletion events for chromosomal fragments and (2) to distinguish causal chromosomal alterations from functionally neutral ones. We present a comprehensive computational approach, RAE, designed to robustly map chromosomal alterations in tumor samples and assess their functional importance in cancer. To demonstrate the methodology, we experimentally profile copy number changes in a clinically aggressive subtype of soft-tissue sarcoma, pleomorphic liposarcoma, and computationally derive a portrait of candidate oncogenic alterations and their target genes. Many affected genes are known to be involved in sarcomagenesis; others are novel, including mediators of adipocyte differentiation, and may include valuable therapeutic targets. Taken together, we present a statistically robust methodology applicable to high-resolution genomic data to assess the extent and function of copy-number alterations in cancer

Public Library of Science (PLOS)

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Inferring copy number and genotype in tumour exome data

Author: Amarasinghe Kaushalya C.
Campbell Ian G.
Cowin Prue A
Halgamuge Saman
Hunter Sally M
Li Jason
Ryland Georgina L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/11/2018
Field of study

Background: Using whole exome sequencing to predict aberrations in tumours is a cost effective alternative to whole genome sequencing, however is predominantly used for variant detection and infrequently utilised for detection of somatic copy number variation. Results: We propose a new method to infer copy number and genotypes using whole exome data from paired tumour/normal samples. Our algorithm uses two Hidden Markov Models to predict copy number and genotypes and computationally resolves polyploidy/aneuploidy, normal cell contamination and signal baseline shift. Our method makes explicit detection on chromosome arm level events, which are commonly found in tumour samples. The methods are combined into a package named ADTEx (Aberration Detection in Tumour Exome). We applied our algorithm to a cohort of 17 in-house generated and 18 TCGA paired ovarian cancer/normal exomes and evaluated the performance by comparing against the copy number variations and genotypes predicted using Affymetrix SNP 6.0 data of the same samples. Further, we carried out a comparison study to show that ADTEx outperformed its competitors in terms of precision and F-measure. Conclusions: Our proposed method, ADTEx, uses both depth of coverage ratios and B allele frequencies calculated from whole exome sequencing data, to predict copy number variations along with their genotypes. ADTEx is implemented as a user friendly software package using Python and R statistical language. Source code and sample data are freely available under GNU license (GPLv3) at http://adtex.sourceforge.net/

The Australian National University

A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data

Author: Colella Stefano
Harris Adrian
Holmes Christopher C
Jorissen Robert N
Mirza Ghazala
Mouradov Dmitri
Ragoussis Jiannis
Sieber Oliver
Steers Graham
Yau Christopher
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

We describe a statistical method for the characterization of genomic aberrations in single nucleotide polymorphism microarray data acquired from cancer genomes. Our approach allows us to model the joint effect of polyploidy, normal DNA contamination and intra-tumour heterogeneity within a single unified Bayesian framework. We demonstrate the efficacy of our method on numerous datasets including laboratory generated mixtures of normal-cancer cell lines and real primary tumours

Springer - Publisher Connector

PubMed Central

Oxford University Research Archive

ProdInra

University of Melbourne Institutional Repository

FISH Oracle: a web server for flexible visualization of DNA copy number data in a genomic context

Author: Kurtz Stefan
Mader Malte
Simon Ronald
Steinbiss Sascha
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data

Author: AB Olshen
AJ Iafrate
BE Stranger
C Alkan
C Xie
Chihyun Park
DF Conrad
DP Locke
E Ben-Yaacov
F Hormozdiari
F Picard
GH Perry
H Lee
H Park
H Willenbrock
J Huang
JA Berger
Jaegyoon Ahn
JC Marioni
JI Kim
K Bleakley
K Wang
KK Wong
M Wigler
NP Carter
NR Zhang
Olivier Lespinet
OM Rueda
P Hupe
PHC Eilers
QY Zhang
R Pique-Regi
R Pique-Regi
R Redon
R Tibshirani
RM Durbin
S Levy
Sanghyun Park
SJ Diskin
SP Shah
T LaFramboise
TS Price
TW Yu
WR Lai
Youngmi Yoon
Publication venue: Public Library of Science
Publication date
Field of study

BACKGROUND: It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. METHODOLOGY AND PRINCIPAL FINDINGS: We developed a multi-sample-based genomic variations detector (MGVD) that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs); CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR). CONCLUSIONS AND SIGNIFICANCE: We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php

Crossref

Directory of Open Access Journals

PubMed Central