4,193 research outputs found
Exploiting noise in array CGH data to improve detection of DNA copy number change
Developing effective methods for analyzing array-CGH data to detect chromosomal aberrations is very important for the diagnosis of pathogenesis of cancer and other diseases. Current analysis methods, being largely based on smoothing and/or segmentation, are not quite capable of detecting both the aberration regions and the boundary break points very accurately. Furthermore, when evaluating the accuracy of an algorithm for analyzing array-CGH data, it is commonly assumed that noise in the data follows normal distribution. A fundamental question is whether noise in array-CGH is indeed Gaussian, and if not, can one exploit the characteristics of noise to develop novel analysis methods that are capable of detecting accurately the aberration regions as well as the boundary break points simultaneously? By analyzing bacterial artificial chromosomes (BACs) arrays with an average 1 mb resolution, 19 k oligo arrays with the average probe spacing <100 kb and 385 k oligo arrays with the average probe spacing of about 6 kb, we show that when there are aberrations, noise in all three types of arrays is highly non-Gaussian and possesses long-range spatial correlations, and that such noise leads to worse performance of existing methods for detecting aberrations in array-CGH than the Gaussian noise case. We further develop a novel method, which has optimally exploited the character of the noise, and is capable of identifying both aberration regions as well as the boundary break points very accurately. Finally, we propose a new concept, posteriori signal-to-noise ratio (p-SNR), to assign certain confidence level to an aberration region and boundaries detected
arrEYE : a customized platform for high-resolution copy number analysis of coding and noncoding regions of known and candidate retinal dystrophy genes and retinal noncoding RNAs
Purpose: Our goal was to design a customized microarray, arrEYE, for high-resolution copy number variant (CNV) analysis of known and candidate genes for inherited retinal dystrophy (iRD) and retina expressed noncoding RNAs (ncRNAs).
Methods: arrEYE contains probes for the full genomic region of 106 known iRD genes, including those implicated in retinitis pigmentosa (RP) (the most frequent iRD), cone rod dystrophies, macular dystrophies, and an additional 60 candidate iRD genes and 196 ncRNAs. Eight CNVs in iRD genes identified by other techniques were used as positive controls. The test cohort consisted of 57 patients with autosomal dominant, X-linked, or simplex RP.
Results: In an RP patient, a novel heterozygous deletion of exons 7 and 8 of the HGSNAT gene was identified: c.634-408_820+338delins AGAATATG, p.(G1u2 I 2Glyfs*2). A known variant was found on the second allele: c.1843G>A, p.(A1a615Thr). Furthermore, we expanded the allelic spectrum of USH2A and RCBTB1 with novel CNVs.
Conclusion: The arrEYE platform revealed subtle single-exon to larger CNVs in iRD genes that could be characterized at the nucleotide level, facilitated by the high resolution of the platform. We report the first CNV in HGSNAT that, combined with another mutation, leads to RP, further supporting its recently identified role in nonsyndromic iRD
Joint segmentation of many aCGH profiles using fast group LARS
Array-Based Comparative Genomic Hybridization (aCGH) is a method used to
search for genomic regions with copy numbers variations. For a given aCGH
profile, one challenge is to accurately segment it into regions of constant
copy number. Subjects sharing the same disease status, for example a type of
cancer, often have aCGH profiles with similar copy number variations, due to
duplications and deletions relevant to that particular disease. We introduce a
constrained optimization algorithm that jointly segments aCGH profiles of many
subjects. It simultaneously penalizes the amount of freedom the set of profiles
have to jump from one level of constant copy number to another, at genomic
locations known as breakpoints. We show that breakpoints shared by many
different profiles tend to be found first by the algorithm, even in the
presence of significant amounts of noise. The algorithm can be formulated as a
group LARS problem. We propose an extremely fast way to find the solution path,
i.e., a sequence of shared breakpoints in order of importance. For no extra
cost the algorithm smoothes all of the aCGH profiles into piecewise-constant
regions of equal copy number, giving low-dimensional versions of the original
data. These can be shown for all profiles on a single graph, allowing for
intuitive visual interpretation. Simulations and an implementation of the
algorithm on bladder cancer aCGH profiles are provided
Combining chromosomal arm status and significantly aberrant genomic locations reveals new cancer subtypes
Many types of tumors exhibit chromosomal losses or gains, as well as local
amplifications and deletions. Within any given tumor type, sample specific
amplifications and deletionsare also observed. Typically, a region that is
aberrant in more tumors,or whose copy number change is stronger, would be
considered as a more promising candidate to be biologically relevant to cancer.
We sought for an intuitive method to define such aberrations and prioritize
them. We define V, the volume associated with an aberration, as the product of
three factors: a. fraction of patients with the aberration, b. the aberrations
length and c. its amplitude. Our algorithm compares the values of V derived
from real data to a null distribution obtained by permutations, and yields the
statistical significance, p value, of the measured value of V. We detected
genetic locations that were significantly aberrant and combined them with
chromosomal arm status to create a succint fingerprint of the tumor genome.
This genomic fingerprint is used to visualize the tumors, highlighting events
that are co ocurring or mutually exclusive. We allpy the method on three
different public array CGH datasets of Medulloblastoma and Neuroblastoma, and
demonstrate its ability to detect chromosomal regions that were known to be
altered in the tested cancer types, as well as to suggest new genomic locations
to be tested. We identified a potential new subtype of Medulloblastoma, which
is analogous to Neuroblastoma type 1.Comment: 34 pages, 3 figures; to appear in Cancer Informatic
Generalized Species Sampling Priors with Latent Beta reinforcements
Many popular Bayesian nonparametric priors can be characterized in terms of
exchangeable species sampling sequences. However, in some applications,
exchangeability may not be appropriate. We introduce a {novel and
probabilistically coherent family of non-exchangeable species sampling
sequences characterized by a tractable predictive probability function with
weights driven by a sequence of independent Beta random variables. We compare
their theoretical clustering properties with those of the Dirichlet Process and
the two parameters Poisson-Dirichlet process. The proposed construction
provides a complete characterization of the joint process, differently from
existing work. We then propose the use of such process as prior distribution in
a hierarchical Bayes modeling framework, and we describe a Markov Chain Monte
Carlo sampler for posterior inference. We evaluate the performance of the prior
and the robustness of the resulting inference in a simulation study, providing
a comparison with popular Dirichlet Processes mixtures and Hidden Markov
Models. Finally, we develop an application to the detection of chromosomal
aberrations in breast cancer by leveraging array CGH data.Comment: For correspondence purposes, Edoardo M. Airoldi's email is
[email protected]; Federico Bassetti's email is
[email protected]; Michele Guindani's email is
[email protected] ; Fabrizo Leisen's email is
[email protected]. To appear in the Journal of the American
Statistical Associatio
DNA polymerase B deficiency is linked to aggressive breast cancer: a comprehensive analysis of gene copy number, mRNA and protein expression in multiple cohorts
Short arm of chromosome 8 is a hot spot for chromosomal breaks, losses and amplifications in breast cancer. Although such genetic changes may have phenotypic consequences, the identity of candidate gene(s) remains to be clearly defined. Pol β gene is localized to chromosome 8p12 - p11 and encodes a key DNA base excision repair protein. Pol β may be a tumour suppressor and involved in breast cancer pathogenesis. We conducted the first and the largest study to comprehensively evaluate pol β in breast cancer. We investigated pol β gene copy number changes in two cohorts (n=128 & n=1952), pol β mRNA expression in two cohorts (n=249 & n=1952) and pol β protein expression in two cohorts (n=1406 & n=252). Artificial neural network analysis for pol β interacting genes was performed in 249 tumours. For mechanistic insights, pol β gene copy number changes, mRNA and protein levels were investigated together in 1 28 tumours and validated in 1952 tumours. Low pol β mRNA expression as well as low pol β protein expression was associated high grade, lymph node positivity, pleomorphism, triple negative, basal - like phenotypes and poor survival (ps<0.001). In oestrogen receptor (ER) positive sub - group that received tamoxifen, low pol β protein remains associated with aggressive phenotype and poor survival (ps<0.001). Artificial neural network analysis revealed ER as a top pol β interacting gene. Mechanistically, there was strong positive correlation between pol β gene copy number changes and pol β mRNA expression (p<0.0000001) and between pol β mRNA and pol β protein expression (p<0.0000001). This is the first study to provide evidence that pol β deficiency is linked to aggressive breast cancer and may have prognostic and predictive significance in patients
- …