4,193 research outputs found

    Exploiting noise in array CGH data to improve detection of DNA copy number change

    Get PDF
    Developing effective methods for analyzing array-CGH data to detect chromosomal aberrations is very important for the diagnosis of pathogenesis of cancer and other diseases. Current analysis methods, being largely based on smoothing and/or segmentation, are not quite capable of detecting both the aberration regions and the boundary break points very accurately. Furthermore, when evaluating the accuracy of an algorithm for analyzing array-CGH data, it is commonly assumed that noise in the data follows normal distribution. A fundamental question is whether noise in array-CGH is indeed Gaussian, and if not, can one exploit the characteristics of noise to develop novel analysis methods that are capable of detecting accurately the aberration regions as well as the boundary break points simultaneously? By analyzing bacterial artificial chromosomes (BACs) arrays with an average 1 mb resolution, 19 k oligo arrays with the average probe spacing <100 kb and 385 k oligo arrays with the average probe spacing of about 6 kb, we show that when there are aberrations, noise in all three types of arrays is highly non-Gaussian and possesses long-range spatial correlations, and that such noise leads to worse performance of existing methods for detecting aberrations in array-CGH than the Gaussian noise case. We further develop a novel method, which has optimally exploited the character of the noise, and is capable of identifying both aberration regions as well as the boundary break points very accurately. Finally, we propose a new concept, posteriori signal-to-noise ratio (p-SNR), to assign certain confidence level to an aberration region and boundaries detected

    arrEYE : a customized platform for high-resolution copy number analysis of coding and noncoding regions of known and candidate retinal dystrophy genes and retinal noncoding RNAs

    Get PDF
    Purpose: Our goal was to design a customized microarray, arrEYE, for high-resolution copy number variant (CNV) analysis of known and candidate genes for inherited retinal dystrophy (iRD) and retina expressed noncoding RNAs (ncRNAs). Methods: arrEYE contains probes for the full genomic region of 106 known iRD genes, including those implicated in retinitis pigmentosa (RP) (the most frequent iRD), cone rod dystrophies, macular dystrophies, and an additional 60 candidate iRD genes and 196 ncRNAs. Eight CNVs in iRD genes identified by other techniques were used as positive controls. The test cohort consisted of 57 patients with autosomal dominant, X-linked, or simplex RP. Results: In an RP patient, a novel heterozygous deletion of exons 7 and 8 of the HGSNAT gene was identified: c.634-408_820+338delins AGAATATG, p.(G1u2 I 2Glyfs*2). A known variant was found on the second allele: c.1843G>A, p.(A1a615Thr). Furthermore, we expanded the allelic spectrum of USH2A and RCBTB1 with novel CNVs. Conclusion: The arrEYE platform revealed subtle single-exon to larger CNVs in iRD genes that could be characterized at the nucleotide level, facilitated by the high resolution of the platform. We report the first CNV in HGSNAT that, combined with another mutation, leads to RP, further supporting its recently identified role in nonsyndromic iRD

    Joint segmentation of many aCGH profiles using fast group LARS

    Full text link
    Array-Based Comparative Genomic Hybridization (aCGH) is a method used to search for genomic regions with copy numbers variations. For a given aCGH profile, one challenge is to accurately segment it into regions of constant copy number. Subjects sharing the same disease status, for example a type of cancer, often have aCGH profiles with similar copy number variations, due to duplications and deletions relevant to that particular disease. We introduce a constrained optimization algorithm that jointly segments aCGH profiles of many subjects. It simultaneously penalizes the amount of freedom the set of profiles have to jump from one level of constant copy number to another, at genomic locations known as breakpoints. We show that breakpoints shared by many different profiles tend to be found first by the algorithm, even in the presence of significant amounts of noise. The algorithm can be formulated as a group LARS problem. We propose an extremely fast way to find the solution path, i.e., a sequence of shared breakpoints in order of importance. For no extra cost the algorithm smoothes all of the aCGH profiles into piecewise-constant regions of equal copy number, giving low-dimensional versions of the original data. These can be shown for all profiles on a single graph, allowing for intuitive visual interpretation. Simulations and an implementation of the algorithm on bladder cancer aCGH profiles are provided

    Combining chromosomal arm status and significantly aberrant genomic locations reveals new cancer subtypes

    Get PDF
    Many types of tumors exhibit chromosomal losses or gains, as well as local amplifications and deletions. Within any given tumor type, sample specific amplifications and deletionsare also observed. Typically, a region that is aberrant in more tumors,or whose copy number change is stronger, would be considered as a more promising candidate to be biologically relevant to cancer. We sought for an intuitive method to define such aberrations and prioritize them. We define V, the volume associated with an aberration, as the product of three factors: a. fraction of patients with the aberration, b. the aberrations length and c. its amplitude. Our algorithm compares the values of V derived from real data to a null distribution obtained by permutations, and yields the statistical significance, p value, of the measured value of V. We detected genetic locations that were significantly aberrant and combined them with chromosomal arm status to create a succint fingerprint of the tumor genome. This genomic fingerprint is used to visualize the tumors, highlighting events that are co ocurring or mutually exclusive. We allpy the method on three different public array CGH datasets of Medulloblastoma and Neuroblastoma, and demonstrate its ability to detect chromosomal regions that were known to be altered in the tested cancer types, as well as to suggest new genomic locations to be tested. We identified a potential new subtype of Medulloblastoma, which is analogous to Neuroblastoma type 1.Comment: 34 pages, 3 figures; to appear in Cancer Informatic

    Generalized Species Sampling Priors with Latent Beta reinforcements

    Full text link
    Many popular Bayesian nonparametric priors can be characterized in terms of exchangeable species sampling sequences. However, in some applications, exchangeability may not be appropriate. We introduce a {novel and probabilistically coherent family of non-exchangeable species sampling sequences characterized by a tractable predictive probability function with weights driven by a sequence of independent Beta random variables. We compare their theoretical clustering properties with those of the Dirichlet Process and the two parameters Poisson-Dirichlet process. The proposed construction provides a complete characterization of the joint process, differently from existing work. We then propose the use of such process as prior distribution in a hierarchical Bayes modeling framework, and we describe a Markov Chain Monte Carlo sampler for posterior inference. We evaluate the performance of the prior and the robustness of the resulting inference in a simulation study, providing a comparison with popular Dirichlet Processes mixtures and Hidden Markov Models. Finally, we develop an application to the detection of chromosomal aberrations in breast cancer by leveraging array CGH data.Comment: For correspondence purposes, Edoardo M. Airoldi's email is [email protected]; Federico Bassetti's email is [email protected]; Michele Guindani's email is [email protected] ; Fabrizo Leisen's email is [email protected]. To appear in the Journal of the American Statistical Associatio

    DNA polymerase B deficiency is linked to aggressive breast cancer: a comprehensive analysis of gene copy number, mRNA and protein expression in multiple cohorts

    Get PDF
    Short arm of chromosome 8 is a hot spot for chromosomal breaks, losses and amplifications in breast cancer. Although such genetic changes may have phenotypic consequences, the identity of candidate gene(s) remains to be clearly defined. Pol β gene is localized to chromosome 8p12 - p11 and encodes a key DNA base excision repair protein. Pol β may be a tumour suppressor and involved in breast cancer pathogenesis. We conducted the first and the largest study to comprehensively evaluate pol β in breast cancer. We investigated pol β gene copy number changes in two cohorts (n=128 & n=1952), pol β mRNA expression in two cohorts (n=249 & n=1952) and pol β protein expression in two cohorts (n=1406 & n=252). Artificial neural network analysis for pol β interacting genes was performed in 249 tumours. For mechanistic insights, pol β gene copy number changes, mRNA and protein levels were investigated together in 1 28 tumours and validated in 1952 tumours. Low pol β mRNA expression as well as low pol β protein expression was associated high grade, lymph node positivity, pleomorphism, triple negative, basal - like phenotypes and poor survival (ps<0.001). In oestrogen receptor (ER) positive sub - group that received tamoxifen, low pol β protein remains associated with aggressive phenotype and poor survival (ps<0.001). Artificial neural network analysis revealed ER as a top pol β interacting gene. Mechanistically, there was strong positive correlation between pol β gene copy number changes and pol β mRNA expression (p<0.0000001) and between pol β mRNA and pol β protein expression (p<0.0000001). This is the first study to provide evidence that pol β deficiency is linked to aggressive breast cancer and may have prognostic and predictive significance in patients
    corecore