8 research outputs found
Performance evaluation of DNA copy number segmentation methods
A number of bioinformatic or biostatistical methods are available for
analyzing DNA copy number profiles measured from microarray or sequencing
technologies. In the absence of rich enough gold standard data sets, the
performance of these methods is generally assessed using unrealistic simulation
studies, or based on small real data analyses. We have designed and implemented
a framework to generate realistic DNA copy number profiles of cancer samples
with known truth. These profiles are generated by resampling real SNP
microarray data from genomic regions with known copy-number state. The original
real data have been extracted from dilutions series of tumor cell lines with
matched blood samples at several concentrations. Therefore, the signal-to-noise
ratio of the generated profiles can be controlled through the (known)
percentage of tumor cells in the sample. In this paper, we describe this
framework and illustrate some of the benefits of the proposed data generation
approach on a practical use case: a comparison study between methods for
segmenting DNA copy number profiles from SNP microarrays. This study indicates
that no single method is uniformly better than all others. It also helps
identifying pros and cons for the compared methods as a function of
biologically informative parameters, such as the fraction of tumor cells in the
sample and the proportion of heterozygous markers. Availability: R package
jointSeg: http://r-forge.r-project.org/R/?group\_id=156
Patterns of chromosomal copy-number alterations in intrahepatic cholangiocarcinoma
International audienceBackground: Intrahepatic cholangiocarcinomas (ICC) are relatively rare malignant tumors associated with a poor prognosis. Recent studies using genome-wide sequencing technologies have mainly focused on identifying new driver mutations. There is nevertheless a need to investigate the spectrum of copy number aberrations in order to identify potential target genes in the altered chromosomal regions. The aim of this study was to characterize the patterns of chromosomal copy-number alterations (CNAs) in ICC. Methods: 53 patients having ICC with frozen material were selected. In 47 cases, DNA hybridization has been performed on a genomewide SNP array. A procedure with a segmentation step and a calling step classified genomic regions into copy-number aberration states. We identified the exclusively amplified and deleted recurrent genomic areas. These areas are those showing the highest estimated propensity level for copy loss (resp. copy gain) together with the lowest level for copy gain (resp. copy loss). We investigated ICC clustering. We analyzed the relationships between CNAs and clinico-pathological characteristics. Results: The overall genomic profile of ICC showed many alterations with higher rates for the deletions. Exclusively deleted genomic areas were 1p, 3p and 14q. The main exclusively amplified genomic areas were 1q, 7p, 7q and 8q. Based on the exclusively deleted/amplified genomic areas, a clustering analysis identified three tumors groups: the first group characterized by copy loss of 1p and copy gain of 7p, the second group characterized by 1p and 3p copy losses without 7p copy gain, the last group characterized mainly by very few CNAs. From univariate analyses, the number of tumors, the size of the largest tumor and the stage were significantly associated with shorter time recurrence. We found no relationship between the number of altered cytobands or tumor groups and time to recurrence. Conclusion: This study describes the spectrum of chromosomal aberrations across the whole genome. Some of the recurrent exclusive CNAs harbor candidate target genes. Despite the absence of correlation between CNAs and clinico-pathological characteristics, the co-occurence of 7p gain and 1p loss in a subgroup of patients may suggest a differential activation of EGFR and its downstream pathways, which may have a potential effect on targeted therapies
Weighted Consensus Segmentations
The problem of segmenting linearly ordered data is frequently encountered in time-series analysis, computational biology, and natural language processing. Segmentations obtained independently from replicate data sets or from the same data with different methods or parameter settings pose the problem of computing an aggregate or consensus segmentation. This Segmentation Aggregation problem amounts to finding a segmentation that minimizes the sum of distances to the input segmentations. It is again a segmentation problem and can be solved by dynamic programming. The aim of this contribution is (1) to gain a better mathematical understanding of the Segmentation Aggregation problem and its solutions and (2) to demonstrate that consensus segmentations have useful applications. Extending previously known results we show that for a large class of distance functions only breakpoints present in at least one input segmentation appear in the consensus segmentation. Furthermore, we derive a bound on the size of consensus segments. As show-case applications, we investigate a yeast transcriptome and show that consensus segments provide a robust means of identifying transcriptomic units. This approach is particularly suited for dense transcriptomes with polycistronic transcripts, operons, or a lack of separation between transcripts. As a second application, we demonstrate that consensus segmentations can be used to robustly identify growth regimes from sets of replicate growth curves
Patterns of chromosomal copy-number alterations in intrahepatic cholangiocarcinoma
BACKGROUND: Intrahepatic cholangiocarcinomas (ICC) are relatively rare malignant tumors associated with a poor prognosis. Recent studies using genome-wide sequencing technologies have mainly focused on identifying new driver mutations. There is nevertheless a need to investigate the spectrum of copy number aberrations in order to identify potential target genes in the altered chromosomal regions. The aim of this study was to characterize the patterns of chromosomal copy-number alterations (CNAs) in ICC. METHODS: 53 patients having ICC with frozen material were selected. In 47 cases, DNA hybridization has been performed on a genomewide SNP array. A procedure with a segmentation step and a calling step classified genomic regions into copy-number aberration states. We identified the exclusively amplified and deleted recurrent genomic areas. These areas are those showing the highest estimated propensity level for copy loss (resp. copy gain) together with the lowest level for copy gain (resp. copy loss). We investigated ICC clustering. We analyzed the relationships between CNAs and clinico-pathological characteristics. RESULTS: The overall genomic profile of ICC showed many alterations with higher rates for the deletions. Exclusively deleted genomic areas were 1p, 3p and 14q. The main exclusively amplified genomic areas were 1q, 7p, 7q and 8q. Based on the exclusively deleted/amplified genomic areas, a clustering analysis identified three tumors groups: the first group characterized by copy loss of 1p and copy gain of 7p, the second group characterized by 1p and 3p copy losses without 7p copy gain, the last group characterized mainly by very few CNAs. From univariate analyses, the number of tumors, the size of the largest tumor and the stage were significantly associated with shorter time recurrence. We found no relationship between the number of altered cytobands or tumor groups and time to recurrence. CONCLUSION: This study describes the spectrum of chromosomal aberrations across the whole genome. Some of the recurrent exclusive CNAs harbor candidate target genes. Despite the absence of correlation between CNAs and clinico-pathological characteristics, the co-occurence of 7p gain and 1p loss in a subgroup of patients may suggest a differential activation of EGFR and its downstream pathways, which may have a potential effect on targeted therapies
Recommended from our members
Driver genes, mutational signatures and the timing of mutations in oesophageal adenocarcinoma
The development of oesophageal adenocarcinoma (OAC) from Barrett’s oesophagus provides an excellent model of the step-wise progression of malignancy. This process is strongly associated with the reflux of stomach contents into the oesophagus. However the exact mechanism by which low pH and bile acids contribute to the development of OAC remains unclear. The disease mostly presents late and treatment options are limited resulting in poor outcomes. A paucity of information regarding the mutations and mutational processes that drive OAC is likely contributing to this. However to understand the development of a cancer it is not enough to simply identify commonly mutated genes, rather it is also crucial to identify the timings at which these mutations occur in the development of disease.
My aims in this thesis were to develop pipelines for the identification of somatic mutations using next-generation sequencing and to utilize these to provide an initial insight into the mutational signatures and genes that drive OAC. Using the unique opportunity presented by having material from multiple stages of disease development I aimed to understand better the timing of mutations in the development of cancer.
By studying single nucleotide variants from 43 tumours I was able to identify the signatures of 7 mutational processes acting on the OAC genome. These include ageing, enzymatic DNA damage (by the APOBEC enzymes) and homologous recombination deficiency. Two novel signatures dominated the genomes and were seen only very rarely in tumours from other sites. These signatures may represent the action of mutagens in the novel environment found around the oesophagus and stomach with bile acids and low pH being potential culprits.
Previous work has suggested that OAC may harbour large numbers of complex rearrangements and that reflux may contribute to this. I have developed and validated a pipeline for the sensitive and specific detection of structural variants in cancer. As part of this I have explored the factors contributing to false positive structural variant calls in ‘next-generation’ sequencing and developed filters to remove these. I have shown that mismappings and germline variants are the greatest source of error and therefore that choice of a highly accurate aligner is essential. Most importantly I have shown that a simple filter using mismapped reads seen in a large panel of normals is capable of filtering the vast majority of false positive variants. I also provide here a detailed sensitivity estimate for our pipeline. Using this pipeline I was able to identify a further 5 structural variant mutational processes molding the OAC genome. Finally I have identified new potential OAC driver genes including members of the SWI/SNF complex and the toll-like receptor signaling pathway.
To understand the role these mutations play in the development of OAC I screened for these mutations in samples representing multiple stages in the progression from Barrett’s oesophagus to cancer. Intriguingly almost all putative driver genes were found mutated at the earliest stages of disease development at the same frequency as seen in cancer. Importantly this questions the role of these mutations in the development of the malignant phenotype.
This first pass analysis of the OAC genome has highlighted novel mutational signatures that point to the central role of the unique mutagenic exposures seen in OAC. Most importantly I have shown that the majority of OAC drivers are mutated early in the development of disease and do not predict risk of progression to invasive cancer.Wellcome Trust Translational Medicine and Therapeutics (TMAT) Fellowshi