977 research outputs found

    Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Copy number data are routinely being extracted from genome-wide association study chips using a variety of software. We empirically evaluated and compared four freely-available software packages designed for Affymetrix SNP chips to estimate copy number: Affymetrix Power Tools (APT), Aroma.Affymetrix, PennCNV and CRLMM. Our evaluation used 1,418 GENOA samples that were genotyped on the Affymetrix Genome-Wide Human SNP Array 6.0. We compared bias and variance in the locus-level copy number data, the concordance amongst regions of copy number gains/deletions and the false-positive rate amongst deleted segments.</p> <p>Results</p> <p>APT had median locus-level copy numbers closest to a value of two, whereas PennCNV and Aroma.Affymetrix had the smallest variability associated with the median copy number. Of those evaluated, only PennCNV provides copy number specific quality-control metrics and identified 136 poor CNV samples. Regions of copy number variation (CNV) were detected using the hidden Markov models provided within PennCNV and CRLMM/VanillaIce. PennCNV detected more CNVs than CRLMM/VanillaIce; the median number of CNVs detected per sample was 39 and 30, respectively. PennCNV detected most of the regions that CRLMM/VanillaIce did as well as additional CNV regions. The median concordance between PennCNV and CRLMM/VanillaIce was 47.9% for duplications and 51.5% for deletions. The estimated false-positive rate associated with deletions was similar for PennCNV and CRLMM/VanillaIce.</p> <p>Conclusions</p> <p>If the objective is to perform statistical tests on the locus-level copy number data, our empirical results suggest that PennCNV or Aroma.Affymetrix is optimal. If the objective is to perform statistical tests on the summarized segmented data then PennCNV would be preferred over CRLMM/VanillaIce. Specifically, PennCNV allows the analyst to estimate locus-level copy number, perform segmentation and evaluate CNV-specific quality-control metrics within a single software package. PennCNV has relatively small bias, small variability and detects more regions while maintaining a similar estimated false-positive rate as CRLMM/VanillaIce. More generally, we advocate that software developers need to provide guidance with respect to evaluating and choosing optimal settings in order to obtain optimal results for an individual dataset. Until such guidance exists, we recommend trying multiple algorithms, evaluating concordance/discordance and subsequently consider the union of regions for downstream association tests.</p

    An integrated analysis tool for analyzing hybridization intensities and genotypes using new-generation population-optimized human arrays

    Get PDF
    The cross-sample plot of the multipoint LOH/LCSH analyses of the three samples used in Fig. 5. The plot comprises four panels: (a) The top-left panel is a cross-sample and cross-chromosome plot. The vertical axis is the index of study samples, and the horizontal axis is the physical position (Mb) on each of the 23 chromosomes. The blue and red bars represent SNPs without and with LOH/LSCH, respectively. (b) The top-right panel is a histogram of cross-chromosome aberration frequency. The vertical axis is the index of study samples, and the horizontal axis is the cross-chromosome aberration frequency of the corresponding samples. The pink (skyblue) background represents that the genetic gender of a sample is female (male). The histogram represents the aberration frequency of LOH/LCSH SNPs across the chromosomes of the corresponding samples. (c) The bottom-left panel is a histogram of the cross-sample aberration frequency. The vertical axis is the cross-sample aberration frequency of a SNP, and the horizontal axis is the physical position (Mb) on each of the 23 chromosomes. The purple line represents the aberration proportion of samples carrying the SNPs with LOH/LCSH. (d) The bottom-right panel is the legend of the genetic gender that is used in panel (b), where the pink (skyblue) background represents that the genetic gender of a sample is female (male). (TIFF 1656 kb

    Comprehensive Assessment of CNV Calling Algorithms: A Family Based Study Involving Monozygotic Twins Discordant for Schizophrenia

    Get PDF
    Genetic variability is essential to human individuality. Genetic variation includes differences in sequence at the single nucleotide level to structural variations of large segments of DNA called copy number variations (CNVs). CNVs within a genome can be identified using microarray technology; however, the analysis of microarray results resulting in the “calling” of CNVs is not always precise. The research included in this manuscript describes the identification and analysis of CNVs using three commercially-available packages, Affymetrix® Genotyping ConsoleTM, Partek® Genome SuiteTM and PennCNV, that are most commonly used in the analysis of SNP and CNV data. Specifically, this research assessed the ability of these platforms to successfully analyze Affymetrix® Genome-Wide Human SNP Array 6.0 data for CNVs within two families, each with a set of monozygotic twins discordant for schizophrenia. Results show that the three methods identified a set of CNVs in each individual, but the specific sets identified were not identical between softwares. Affymetrix® Genotyping ConsoleTM detected a wide variety of sizes of CNVs while the other two methods were able to identify only CNVs greater than 1 Mb in size. Interestingly, all platforms showed that monozygotic twins differ for some CNVs, a difference that may be acquired during their somatic development. This suggests that CNV differences between monozygotic twins may offer an explanation for discordance of phenotype, such as schizophrenia. Also, this analysis of CNVs within related individuals may identify previously unreported unusual features, including the repeated CNVs on chromosome 13q observed in the father of family 2. Such results support the use of CNV in familial studies, but argue for a careful assessment of CNVs including a careful selection of analysis tools and the necessity of independent confirmation

    Biological relevance of CNV calling methods using familial relatedness including monozygotic twins

    Get PDF
    Studies involving the analysis of structural variation including Copy Number Variation (CNV) have recently exploded in the literature. Furthermore, CNVs have been associated with a number of complex diseases and neurodevelopmental disorders. Common methods for CNV detection use SNP, CNV, or CGH arrays, where the signal intensities of consecutive probes are used to define the number of copies associated with a given genomic region. These practices pose a number of challenges that interfere with the ability of available methods to accurately call CNVs. It has, therefore, become necessary to develop experimental protocols to test the reliability of CNV calling methods from microarray data so that researchers can properly discriminate biologically relevant data from noise

    An optimization framework for unsupervised identification of rare copy number variation from SNP array data

    Get PDF
    A highly sensitive and configurable method for calling copy number variants from SNP array data is presented that can identify even rare CNV

    A Computational Framework Discovers New Copy Number Variants with Functional Importance

    Get PDF
    Structural variants which cause changes in copy numbers constitute an important component of genomic variability. They account for 0.7% of genomic differences in two individual genomes, of which copy number variants (CNVs) are the largest component. A recent population-based CNV study revealed the need of better characterization of CNVs, especially the small ones (<500 bp).We propose a three step computational framework (Identification of germline Changes in Copy Number or IgC2N) to discover and genotype germline CNVs. First, we detect candidate CNV loci by combining information across multiple samples without imposing restrictions to the number of coverage markers or to the variant size. Secondly, we fine tune the detection of rare variants and infer the putative copy number classes for each locus. Last, for each variant we combine the relative distance between consecutive copy number classes with genetic information in a novel attempt to estimate the reference model bias. This computational approach is applied to genome-wide data from 1250 HapMap individuals. Novel variants were discovered and characterized in terms of size, minor allele frequency, type of polymorphism (gains, losses or both), and mechanism of formation. Using data generated for a subset of individuals by a 42 million marker platform, we validated the majority of the variants with the highest validation rate (66.7%) was for variants of size larger than 1 kb. Finally, we queried transcriptomic data from 129 individuals determined by RNA-sequencing as further validation and to assess the functional role of the new variants. We investigated the possible enrichment for variant's regulatory effect and found that smaller variants (<1 Kb) are more likely to regulate gene transcript than larger variants (p-value = 2.04e-08). Our results support the validity of the computational framework to detect novel variants relevant to disease susceptibility studies and provide evidence of the importance of genetic variants in regulatory network studies

    The degree of segmental aneuploidy measured by total copy number abnormalities predicts survival and recurrence in superficial gastroesophageal adenocarcinoma

    Get PDF
    Background: Prognostic biomarkers are needed for superficial gastroesophageal adenocarcinoma (EAC) to predict clinical outcomes and select therapy. Although recurrent mutations have been characterized in EAC, little is known about their clinical and prognostic significance. Aneuploidy is predictive of clinical outcome in many malignancies but has not been evaluated in superficial EAC. Methods: We quantified copy number changes in 41 superficial EAC using Affymetrix SNP 6.0 arrays. We identified recurrent chromosomal gains and losses and calculated the total copy number abnormality (CNA) count for each tumor as a measure of aneuploidy. We correlated CNA count with overall survival and time to first recurrence in univariate and multivariate analyses. Results: Recurrent segmental gains and losses involved multiple genes, including: HER2, EGFR, MET, CDK6, KRAS (recurrent gains); and FHIT, WWOX, CDKN2A/B, SMAD4, RUNX1 (recurrent losses). There was a 40-fold variation in CNA count across all cases. Tumors with the lowest and highest quartile CNA count had significantly better overall survival (p = 0.032) and time to first recurrence (p = 0.010) compared to those with intermediate CNA counts. These associations persisted when controlling for other prognostic variables. Significance: SNP arrays facilitate the assessment of recurrent chromosomal gain and loss and allow high resolution, quantitative assessment of segmental aneuploidy (total CNA count). The non-monotonic association of segmental aneuploidy with survival has been described in other tumors. The degree of aneuploidy is a promising prognostic biomarker in a potentially curable form of EAC. © 2014 Davison et al

    COLONOMICS - integrative omics data of one hundred paired normal-tumoral samples from colon cancer patients

    Full text link
    Colonomics is a multi-omics dataset that includes 250 samples: 50 samples from healthy colon mucosa donors and 100 paired samples from colon cancer patients (tumor/adjacent). From these samples, Colonomics project includes data from genotyping, DNA methylation, gene expression, whole exome sequencing and micro-RNAs (miRNAs) expression. It also includes data from copy number variation (CNV) from tumoral samples. In addition, clinical data from all these samples is available. The aims of the project were to explore and integrate these datasets to describe colon cancer at molecular level and to compare normal and tumoral tissues. Also, to improve screening by finding biomarkers for the diagnosis and prognosis of colon cancer. This project has its own website including four browsers allowing users to explore Colonomics datasets. Since generated data could be reuse for the scientific community for exploratory or validation purposes, here we describe omics datasets included in the Colonomics project as well as results from multi-omics layers integration

    Inferring copy number and genotype in tumour exome data

    Get PDF
    Background: Using whole exome sequencing to predict aberrations in tumours is a cost effective alternative to whole genome sequencing, however is predominantly used for variant detection and infrequently utilised for detection of somatic copy number variation. Results: We propose a new method to infer copy number and genotypes using whole exome data from paired tumour/normal samples. Our algorithm uses two Hidden Markov Models to predict copy number and genotypes and computationally resolves polyploidy/aneuploidy, normal cell contamination and signal baseline shift. Our method makes explicit detection on chromosome arm level events, which are commonly found in tumour samples. The methods are combined into a package named ADTEx (Aberration Detection in Tumour Exome). We applied our algorithm to a cohort of 17 in-house generated and 18 TCGA paired ovarian cancer/normal exomes and evaluated the performance by comparing against the copy number variations and genotypes predicted using Affymetrix SNP 6.0 data of the same samples. Further, we carried out a comparison study to show that ADTEx outperformed its competitors in terms of precision and F-measure. Conclusions: Our proposed method, ADTEx, uses both depth of coverage ratios and B allele frequencies calculated from whole exome sequencing data, to predict copy number variations along with their genotypes. ADTEx is implemented as a user friendly software package using Python and R statistical language. Source code and sample data are freely available under GNU license (GPLv3) at http://adtex.sourceforge.net/
    corecore