283 research outputs found
Combining chromosomal arm status and significantly aberrant genomic locations reveals new cancer subtypes
Many types of tumors exhibit chromosomal losses or gains, as well as local
amplifications and deletions. Within any given tumor type, sample specific
amplifications and deletionsare also observed. Typically, a region that is
aberrant in more tumors,or whose copy number change is stronger, would be
considered as a more promising candidate to be biologically relevant to cancer.
We sought for an intuitive method to define such aberrations and prioritize
them. We define V, the volume associated with an aberration, as the product of
three factors: a. fraction of patients with the aberration, b. the aberrations
length and c. its amplitude. Our algorithm compares the values of V derived
from real data to a null distribution obtained by permutations, and yields the
statistical significance, p value, of the measured value of V. We detected
genetic locations that were significantly aberrant and combined them with
chromosomal arm status to create a succint fingerprint of the tumor genome.
This genomic fingerprint is used to visualize the tumors, highlighting events
that are co ocurring or mutually exclusive. We allpy the method on three
different public array CGH datasets of Medulloblastoma and Neuroblastoma, and
demonstrate its ability to detect chromosomal regions that were known to be
altered in the tested cancer types, as well as to suggest new genomic locations
to be tested. We identified a potential new subtype of Medulloblastoma, which
is analogous to Neuroblastoma type 1.Comment: 34 pages, 3 figures; to appear in Cancer Informatic
Finding Recurrent Regions of Copy Number Variation: A Review
Copy number variation (CNV) in genomic DNA is linked to a variety of human diseases, and array-based CGH (aCGH) is currently the main technology to locate CNVs. Although many methods have been developed to analyze aCGH from a single array/subject, disease-critical genes are more likely to be found in regions that are common or recurrent among subjects. Unfortunately, finding recurrent CNV regions remains a challenge. We review existing methods for the identification of recurrent CNV regions. The working definition of ``common\u27\u27 or ``recurrent\u27\u27 region differs between methods, leading to approaches that use different types of input (discretized output from a previous CGH segmentation analysis or intensity ratios), or that incorporate to varied degrees biological considerations (which play a role in the identification of ``interesting\u27\u27 regions and in the details of null models used to assess statistical significance). Very few approaches use and/or return probabilities, and code is not easily available for several methods. We suggest that finding recurrent CNVs could benefit from reframing the problem in a biclustering context. We also emphasize that, when analyzing data from complex diseases with significant among-subject heterogeneity, methods should be able to identify CNVs that affect only a subset of subjects. We make some recommendations about choice among existing methods, and we suggest further methodological research
Detection of recurrent copy number alterations in the genome: taking among-subject heterogeneity seriously
Se adjunta un fichero pdf con los datos de investigación titulado "Supplementary Material for \Detection of Recurrent Copy
Number Alterations in the Genome: taking among-subject
heterogeneity seriously"Background: Alterations in the number of copies of genomic DNA that are common or recurrent
among diseased individuals are likely to contain disease-critical genes. Unfortunately, defining
common or recurrent copy number alteration (CNA) regions remains a challenge. Moreover, the
heterogeneous nature of many diseases requires that we search for common or recurrent CNA
regions that affect only some subsets of the samples (without knowledge of the regions and subsets
affected), but this is neglected by most methods.
Results: We have developed two methods to define recurrent CNA regions from aCGH data.
Our methods are unique and qualitatively different from existing approaches: they detect regions
over both the complete set of arrays and alterations that are common only to some subsets of the
samples (i.e., alterations that might characterize previously unknown groups); they use probabilities
of alteration as input and return probabilities of being a common region, thus allowing researchers
to modify thresholds as needed; the two parameters of the methods have an immediate,
straightforward, biological interpretation. Using data from previous studies, we show that we can
detect patterns that other methods miss and that researchers can modify, as needed, thresholds of
immediate interpretability and develop custom statistics to answer specific research questions.
Conclusion: These methods represent a qualitative advance in the location of recurrent CNA
regions, highlight the relevance of population heterogeneity for definitions of recurrence, and can
facilitate the clustering of samples with respect to patterns of CNA. Ultimately, the methods
developed can become important tools in the search for genomic regions harboring disease-critical
genesFunding provided by Fundación de Investigación Médica Mutua
Madrileña. Publication charges covered by projects CONSOLIDER:
CSD2007-00050 of the Spanish Ministry of Science and Innovation and by
RTIC COMBIOMED RD07/0067/0014 of the Spanish Health Ministr
Joint segmentation of many aCGH profiles using fast group LARS
Array-Based Comparative Genomic Hybridization (aCGH) is a method used to
search for genomic regions with copy numbers variations. For a given aCGH
profile, one challenge is to accurately segment it into regions of constant
copy number. Subjects sharing the same disease status, for example a type of
cancer, often have aCGH profiles with similar copy number variations, due to
duplications and deletions relevant to that particular disease. We introduce a
constrained optimization algorithm that jointly segments aCGH profiles of many
subjects. It simultaneously penalizes the amount of freedom the set of profiles
have to jump from one level of constant copy number to another, at genomic
locations known as breakpoints. We show that breakpoints shared by many
different profiles tend to be found first by the algorithm, even in the
presence of significant amounts of noise. The algorithm can be formulated as a
group LARS problem. We propose an extremely fast way to find the solution path,
i.e., a sequence of shared breakpoints in order of importance. For no extra
cost the algorithm smoothes all of the aCGH profiles into piecewise-constant
regions of equal copy number, giving low-dimensional versions of the original
data. These can be shown for all profiles on a single graph, allowing for
intuitive visual interpretation. Simulations and an implementation of the
algorithm on bladder cancer aCGH profiles are provided
Detection of Recurrent Copy Number Alterations in the Genome: a Probabilistic Approach
Copy number variation (CNV) in genomic DNA is linked to a variety of human diseases (including cancer, HIV acquisition, autoimmune and neurodegenerative diseases), and array-based CGH (aCGH) is currently the main technology to locate CNVs. Several methods can analyze aCGH data at the single sample level, but disease-critical genes are more likely to be found in regions that are common or recurrent among samples. Unfortunately, defining recurrent CNV regions remains a challenge. Moreover, the heterogeneous nature of many diseases requires that we search for CNVs that affect only some subsets of the samples (without prior knowledge of which regions and subsets of samples are affected), but this is neglected by current methods.
We have developed two methods to define recurrent CNV regions. Our methods are unique and qualitatively different from existing approaches: they detect both regions over the complete set of arrays and alterations that are common only to some subsets of the samples and, thus, CNV alterations that might characterize previously unknown groups; they use probabilities of alteration as input (not discretized gain/loss calls, which discard uncertainty and variability) and return probabilities of being a shared common region, thus allowing researchers to modify thresholds as needed; the two parameters of the methods have an immediate, straightforward, biological interpretation. Using data from previous studies, we show that we can detect patterns that other methods miss and, by using probabilities, that researchers can modify, as needed, thresholds of immediate interpretability to answer specific research questions.
These methods are a qualitative advance in the location of recurrent CNV regions and will be instrumental in efforts to standardize definitions of recurrent CNVs and cluster samples with respect to patterns of CNV, and ultimately in the search for genomic regions harboring disease-critical genes
RJaCGH: Bayesian analysis of aCGH arrays for detecting copy number changes and recurrent regions
Summary: Several methods have been proposed to detect copy number changes and recurrent regions of copy number variation from aCGH, but few methods return probabilities of alteration explicitly, which are the direct answer to the question ‘is this probe/region altered?’ RJaCGH fits a Non-Homogeneous Hidden Markov model to the aCGH data using Markov Chain Monte Carlo with Reversible Jump, and returns the probability that each probe is gained or lost. Using these probabilites, recurrent regions (over sets of individuals) of copy number alteration can be found
Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH
Genomic DNA copy-number alterations (CNAs) are associated with complex diseases, including cancer: CNAs are indeed related to tumoral grade, metastasis, and patient survival. CNAs discovered from array-based comparative genomic hybridization (aCGH) data have been instrumental in identifying disease-related genes and potential therapeutic targets. To be immediately useful in both clinical and basic research scenarios, aCGH data analysis requires accurate methods that do not impose unrealistic biological assumptions and that provide direct answers to the key question, “What is the probability that this gene/region has CNAs?” Current approaches fail, however, to meet these requirements. Here, we introduce reversible jump aCGH (RJaCGH), a new method for identifying CNAs from aCGH; we use a nonhomogeneous hidden Markov model fitted via reversible jump Markov chain Monte Carlo; and we incorporate model uncertainty through Bayesian model averaging. RJaCGH provides an estimate of the probability that a gene/region has CNAs while incorporating interprobe distance and the capability to analyze data on a chromosome or genome-wide basis. RJaCGH outperforms alternative methods, and the performance difference is even larger with noisy data and highly variable interprobe distance, both commonly found features in aCGH data. Furthermore, our probabilistic method allows us to identify minimal common regions of CNAs among samples and can be extended to incorporate expression data. In summary, we provide a rigorous statistical framework for locating genes and chromosomal regions with CNAs with potential applications to cancer and other complex human diseases
Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH
Genomic DNA copy-number alterations (CNAs) are associated with complex diseases, including cancer: CNAs are indeed related to tumoral grade, metastasis, and patient survival. CNAs discovered from array-based comparative genomic hybridization (aCGH) data have been instrumental in identifying disease-related genes and potential therapeutic targets. To be immediately useful in both clinical and basic research scenarios, aCGH data analysis requires accurate methods that do not impose unrealistic biological assumptions and that provide direct answers to the key question, “What is the probability that this gene/region has CNAs?” Current approaches fail, however, to meet these requirements. Here, we introduce reversible jump aCGH (RJaCGH), a new method for identifying CNAs from aCGH; we use a nonhomogeneous hidden Markov model fitted via reversible jump Markov chain Monte Carlo; and we incorporate model uncertainty through Bayesian model averaging. RJaCGH provides an estimate of the probability that a gene/region has CNAs while incorporating interprobe distance and the capability to analyze data on a chromosome or genome-wide basis. RJaCGH outperforms alternative methods, and the performance difference is even larger with noisy data and highly variable interprobe distance, both commonly found features in aCGH data. Furthermore, our probabilistic method allows us to identify minimal common regions of CNAs among samples and can be extended to incorporate expression data. In summary, we provide a rigorous statistical framework for locating genes and chromosomal regions with CNAs with potential applications to cancer and other complex human diseases
- …