2,352 research outputs found

    A robust penalized method for the analysis of noisy DNA copy number data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Deletions and amplifications of the human genomic DNA copy number are the causes of numerous diseases, such as, various forms of cancer. Therefore, the detection of DNA copy number variations (CNV) is important in understanding the genetic basis of many diseases. Various techniques and platforms have been developed for genome-wide analysis of DNA copy number, such as, array-based comparative genomic hybridization (aCGH) and high-resolution mapping with high-density tiling oligonucleotide arrays. Since complicated biological and experimental processes are often associated with these platforms, data can be potentially contaminated by outliers.</p> <p>Results</p> <p>We propose a penalized LAD regression model with the adaptive fused lasso penalty for detecting CNV. This method contains robust properties and incorporates both the spatial dependence and sparsity of CNV into the analysis. Our simulation studies and real data analysis indicate that the proposed method can correctly detect the numbers and locations of the true breakpoints while appropriately controlling the false positives.</p> <p>Conclusions</p> <p>The proposed method has three advantages for detecting CNV change points: it contains robustness properties; incorporates both spatial dependence and sparsity; and estimates the true values at each marker accurately.</p

    A robust penalized method for the analysis of noisy DNA copy number data

    Get PDF
    BackgroundDeletions and amplifications of the human genomic DNA copy number are the causes ofnumerous diseases, such as, various forms of cancer. Therefore, the detection of DNAcopy number variations (CNV) is important in understanding the genetic basis of manydiseases. Various techniques and platforms have been developed for genome-wideanalysis of DNA copy number, such as, array-based comparative genomic hybridization(aCGH) and high-resolution mapping with high-density tiling oligonucleotide arrays.Since complicated biological and experimental processes are often associated with theseplatforms, data can be potentially contaminated by outliers.ResultsWe propose a penalized LAD regression model with the adaptive fused lasso penalty fordetecting CNV. This method contains robust properties and incorporates both the spatialdependence and sparsity of CNV into the analysis. Our simulation studies and real dataanalysis indicate that the proposed method can correctly detect the numbers and locationsof the true breakpoints while appropriately controlling the false positives.ConclusionsThe proposed method has three advantages for detecting CNV change points: it containsrobustness properties; incorporates both spatial dependence and sparsity; and estimatesthe true values at each marker accurately

    Copynumber: Efficient algorithms for single- and multi-track copy number segmentation.

    Get PDF
    BACKGROUND: Cancer progression is associated with genomic instability and an accumulation of gains and losses of DNA. The growing variety of tools for measuring genomic copy numbers, including various types of array-CGH, SNP arrays and high-throughput sequencing, calls for a coherent framework offering unified and consistent handling of single- and multi-track segmentation problems. In addition, there is a demand for highly computationally efficient segmentation algorithms, due to the emergence of very high density scans of copy number. RESULTS: A comprehensive Bioconductor package for copy number analysis is presented. The package offers a unified framework for single sample, multi-sample and multi-track segmentation and is based on statistically sound penalized least squares principles. Conditional on the number of breakpoints, the estimates are optimal in the least squares sense. A novel and computationally highly efficient algorithm is proposed that utilizes vector-based operations in R. Three case studies are presented. CONCLUSIONS: The R package copynumber is a software suite for segmentation of single- and multi-track copy number data using algorithms based on coherent least squares principles.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Median inverse problem and approximating the number of kk-median inverses of a permutation

    Full text link
    We introduce the "Median Inverse Problem" for metric spaces. In particular, having a permutation π\pi in the symmetric group SnS_n (endowed with the breakpoint distance), we study the set of all kk-subsets {x1,...,xk}Sn\{x_1,...,x_k\}\subset S_n for which π\pi is a breakpoint median. The set of all kk-tuples (x1,...,xk)(x_1,...,x_k) with this property is called the kk-median inverse of π\pi. Finding an upper bound for the cardinality of this set, we provide an asymptotic upper bound for the probability that π\pi is a breakpoint median of kk permutations ξ1(n),...,ξk(n)\xi_1^{(n)},...,\xi_k^{(n)} chosen uniformly and independently at random from SnS_n

    Phylogenetic reconstruction from transpositions

    Get PDF
    Background Because of the advent of high-throughput sequencing and the consequent reduction in the cost of sequencing, many organisms have been completely sequenced and most of their genes identified. It thus has become possible to represent whole genomes as ordered lists of gene identifiers and to study the rearrangement of these entities through computational means. As a result, genome rearrangement data has attracted increasing attentions from both biologists and computer scientists as a new type of data for phylogenetic analysis. The main events of genome rearrangements include inversions, transpositions and transversions. To date, GRAPPA and MGR are the most accurate methods for rearrangement phylogeny, both assuming inversion as the only event. However, due to the complexity of computing transposition distance, it is very difficult to analyze datasets when transpositions are dominant. Results We extend GRAPPA to handle transpositions. The new method is named GRAPPA-TP, with two major extensions: a heuristic method to estimate transposition distance, and a new transposition median solver for three genomes. Although GRAPPA-TP uses a greedy approach to compute the transposition distance, it is very accurate when genomes are relatively close. The new GRAPPA-TP is available from http://phylo.cse.sc.edu/ Conclusion Our extensive testing using simulated datasets shows that GRAPPA-TP is very accurate in terms of ancestor genome inference and phylogenetic reconstruction. Simulation results also suggest that model match is critical in genome rearrangement analysis: it is not accurate to simulate transpositions with other events including inversions

    Recurrent patterns of DNA copy number alterations in tumors reflect metabolic selection pressures.

    Get PDF
    Copy number alteration (CNA) profiling of human tumors has revealed recurrent patterns of DNA amplifications and deletions across diverse cancer types. These patterns are suggestive of conserved selection pressures during tumor evolution but cannot be fully explained by known oncogenes and tumor suppressor genes. Using a pan-cancer analysis of CNA data from patient tumors and experimental systems, here we show that principal component analysis-defined CNA signatures are predictive of glycolytic phenotypes, including 18F-fluorodeoxy-glucose (FDG) avidity of patient tumors, and increased proliferation. The primary CNA signature is enriched for p53 mutations and is associated with glycolysis through coordinate amplification of glycolytic genes and other cancer-linked metabolic enzymes. A pan-cancer and cross-species comparison of CNAs highlighted 26 consistently altered DNA regions, containing 11 enzymes in the glycolysis pathway in addition to known cancer-driving genes. Furthermore, exogenous expression of hexokinase and enolase enzymes in an experimental immortalization system altered the subsequent copy number status of the corresponding endogenous loci, supporting the hypothesis that these metabolic genes act as drivers within the conserved CNA amplification regions. Taken together, these results demonstrate that metabolic stress acts as a selective pressure underlying the recurrent CNAs observed in human tumors, and further cast genomic instability as an enabling event in tumorigenesis and metabolic evolution

    Reconstructing the Genomic Architecture of Mammalian Ancestors Using Multispecies Comparative Maps

    Get PDF
    Rapidly developing comparative gene maps in selected mammal species are providing an opportunity to reconstruct the genomic architecture of mammalian ancestors and study rearrangements that transformed this ancestral genome into existing mammalian genomes. Here, the recently developed Multiple Genome Rearrangement (MGR) algorithm is applied to human, mouse, cat and cattle comparative maps (with 311-470 shared markers) to impute the ancestral mammalian genome. Reconstructed ancestors consist of 70-100 conserved segments shared across the genomes that have been exchanged by rearrangement events along the ordinal lineages leading to modern species genomes. Genomic distances between species, dominated by inversions (reversals) and translocations, are presented in a first multispecies attempt using ordered mapping data to reconstruct the evolutionary exchanges that preceded modern placental mammal genomes
    corecore