405 research outputs found
A Faster Circular Binary Segmentation Algorithm for the Analysis of Array CGH Data
Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number (Olshen {\it et~al}, 2004). The algorithm tests for change-points using a maximal -statistic with a permutation reference distribution to obtain the corresponding -value. The number of computations required for the maximal test statistic is where is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster. algorithm.
Results: We present a hybrid approach to obtain the -value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analysis of array CGH data from a breast cancer cell line to show the impact of the new approaches on the analysis of real data.
Availability: An R (R Development Core Team, 2006) version of the CBS algorithm has been implemented in the ``DNAcopy\u27\u27 package of the Bioconductor project (Gentleman {\it et~al}, 2004). The proposed hybrid method for the -value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher
Recommended from our members
GENOME WIDE DNA METHYLATION PROFILING IS PREDICTIVE OF OUTCOME IN JUVENILE MYELOMONOCYTIC LEUKEMIA
Statistical Evaluation of Evidence for Clonal Allelic Alterations in array-CGH Experiments
In recent years numerous investigators have conducted genetic studies of pairs of tumor specimens from the same patient to determine whether the tumors share a clonal origin. These studies have the potential to be of considerable clinical significance, especially in clinical settings where the distinction of a new primary cancer and metastatic spread of a previous cancer would lead to radically different indications for treatment. Studies of clonality have typically involved comparison of the patterns of somatic mutations in the tumors at candidate genetic loci to see if the patterns are sufficiently similar to indicate a clonal origin. More recently, some investigators have explored the use of array CGH for this purpose. Standard clustering approaches have been used to analyze the data, but these existing statistical methods are not suited to this problem due to the paired nature of the data, and the fact that there exists no “gold standard” diagnosis to provide a definitive determination of which pairs are clonal and which pairs are of independent origin. In this article we propose a new statistical method that focuses on the individual allelic gains or losses that have been identified in both tumors, and a statistical test is developed that assesses the degree of matching of the locations of the markers that indicate the endpoints of the allelic change. The validity and statistical power of the test is evaluated, and it is shown to be a promising approach for establishing clonality in tumor samples
A Metastasis or a Second Independent Cancer? Evaluating the Clonal Origin of Tumors Using Array-CGH Data
When a cancer patient develops a new tumor it is necessary to determine if this is a recurrence (metastasis) of the original cancer, or an entirely new occurrence of the disease. This is accomplished by assessing the histo-pathology of the lesions, and it is frequently relatively straightforward. However, there are many clinical scenarios in which this pathological diagnosis is difficult. Since each tumor is characterized by a genetic fingerprint of somatic mutations, a more definitive diagnosis is possible in principle in these difficult clinical scenarios by comparing the fingerprints. In this article we develop and evaluate a statistical strategy for this comparison when the data are derived from array comparative genomic hybridization, a technique designed to identify all of the somatic allelic gains and losses across the genome. Our method involves several stages. First a segmentation algorithm is used to estimate the regions of allelic gain and loss. Then the broad correlation in these patterns between the two tumors is assessed, leading to an initial likelihood ratio for the two diagnoses. This is then further refined by comparing in detail each plausibly clonal mutation within individual chromosome arms, and the results are aggregated to determine a final likelihood ratio. The method is employed to diagnose patients from several clinical scenarios, and the results show that in many cases a strong clonal signal emerges, occasionally contradicting the clinical diagnosis. The “quality” of the arrays can be summarized by a parameter that characterizes the clarity with which allelic changes are detected. Sensitivity analyses show that most of the diagnoses are robust when the data are of high quality
Changes in gene expression during the development of mammary tumors in MMTV-Wnt-1 transgenic mice
BACKGROUND: In human breast cancer normal mammary cells typically develop into hyperplasia, ductal carcinoma in situ, invasive cancer, and metastasis. The changes in gene expression associated with this stepwise progression are unclear. Mice transgenic for mouse mammary tumor virus (MMTV)-Wnt-1 exhibit discrete steps of mammary tumorigenesis, including hyperplasia, invasive ductal carcinoma, and distant metastasis. These mice might therefore be useful models for discovering changes in gene expression during cancer development. RESULTS: We used cDNA microarrays to determine the expression profiles of five normal mammary glands, seven hyperplastic mammary glands and 23 mammary tumors from MMTV-Wnt-1 transgenic mice, and 12 mammary tumors from MMTV-Neu transgenic mice. Adipose tissues were used to control for fat cells in the vicinity of the mammary glands. In these analyses, we found that the progression of normal virgin mammary glands to hyperplastic tissues and to mammary tumors is accompanied by differences in the expression of several hundred genes at each step. Some of these differences appear to be unique to the effects of Wnt signaling; others seem to be common to tumors induced by both Neu and Wnt-1 oncogenes. CONCLUSION: We described gene-expression patterns associated with breast-cancer development in mice, and identified genes that may be significant targets for oncogenic events. The expression data developed provide a resource for illuminating the molecular mechanisms involved in breast cancer development, especially through the identification of genes that are critical in cancer initiation and progression
Recurrent epimutations activate gene body promoters in primary glioblastoma
Aberrant DNA hypomethylation may play an important role in the growth rate of glioblastoma (GBM), but the functional impact on transcription remains poorly understood. We assayed the GBM methylome with MeDIP-seq and MRE-seq, adjusting for copy number differences, in a small set of non-glioma CpG island methylator phenotype (non-G-CIMP) primary tumors. Recurrent hypomethylated loci were enriched within a region of chromosome 5p15 that is specified as a cancer amplicon and also encompasses TERT, encoding telomerase reverse transcriptase, which plays a critical role in tumorigenesis. Overall, 76 gene body promoters were recurrently hypomethylated, including TERT and the oncogenes GLI3 and TP73. Recurring hypomethylation also affected previously unannotated alternative promoters, and luciferase reporter assays for three of four of these promoters confirmed strong promoter activity in GBM cells. Histone H3 lysine 4 trimethylation (H3K4me3) ChIP-seq on tissue from the GBMs uncovered peaks that coincide precisely with tumor-specific decrease of DNA methylation at 200 loci, 133 of which are in gene bodies. Detailed investigation of TP73 and TERT gene body hypomethylation demonstrated increased expression of corresponding alternate transcripts, which in TP73 encodes a truncated p73 protein with oncogenic function and in TERT encodes a putative reverse transcriptase-null protein. Our findings suggest that recurring gene body promoter hypomethylation events, along with histone H3K4 trimethylation, alter the transcriptional landscape of GBM through the activation of a limited number of normally silenced promoters within gene bodies, in at least one case leading to expression of an oncogenic protein
Copy number and gene expression differences between African American and Caucasian American prostate cancer
<p>Abstract</p> <p>Background</p> <p>The goal of our study was to investigate the molecular underpinnings associated with the relatively aggressive clinical behavior of prostate cancer (PCa) in African American (AA) compared to Caucasian American (CA) patients using a genome-wide approach.</p> <p>Methods</p> <p>AA and CA patients treated with radical prostatectomy (RP) were frequency matched for age at RP, Gleason grade, and tumor stage. Array-CGH (BAC SpectralChip2600) was used to identify genomic regions with significantly different DNA copy number between the groups. Gene expression profiling of the same set of tumors was also evaluated using Affymetrix HG-U133 Plus 2.0 arrays. Concordance between copy number alteration and gene expression was examined. A second aCGH analysis was performed in a larger validation cohort using an oligo-based platform (Agilent 244K).</p> <p>Results</p> <p>BAC-based array identified 27 chromosomal regions with significantly different copy number changes between the AA and CA tumors in the first cohort (Fisher's exact test, P < 0.05). Copy number alterations in these 27 regions were also significantly associated with gene expression changes. aCGH performed in a larger, independent cohort of AA and CA tumors validated 4 of the 27 (15%) most significantly altered regions from the initial analysis (3q26, 5p15-p14, 14q32, and 16p11). Functional annotation of overlapping genes within the 4 validated regions of AA/CA DNA copy number changes revealed significant enrichment of genes related to immune response.</p> <p>Conclusions</p> <p>Our data reveal molecular alterations at the level of gene expression and DNA copy number that are specific to African American and Caucasian prostate cancer and may be related to underlying differences in immune response.</p
A classification model for distinguishing copy number variants from cancer-related alterations
<p>Abstract</p> <p>Background</p> <p>Both somatic copy number alterations (CNAs) and germline copy number variants (CNVs) that are prevalent in healthy individuals can appear as recurrent changes in comparative genomic hybridization (CGH) analyses of tumors. In order to identify important cancer genes CNAs and CNVs must be distinguished. Although the Database of Genomic Variants (DGV) contains a list of all known CNVs, there is no standard methodology to use the database effectively.</p> <p>Results</p> <p>We develop a prediction model that distinguishes CNVs from CNAs based on the information contained in the DGV and several other variables, including segment's length, height, closeness to a telomere or centromere and occurrence in other patients. The models are fitted on data from glioblastoma and their corresponding normal samples that were collected as part of The Cancer Genome Atlas project and hybridized to Agilent 244 K arrays.</p> <p>Conclusions</p> <p>Using the DGV alone CNVs in the test set can be correctly identified with about 85% accuracy if the outliers are removed before segmentation and with 72% accuracy if the outliers are included, and additional variables improve the prediction by about 2-3% and 12%, respectively. Final models applied to data from ovarian tumors have about 90% accuracy with all the variables and 86% accuracy with the DGV alone.</p
- …