77 research outputs found

    Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Copy number data are routinely being extracted from genome-wide association study chips using a variety of software. We empirically evaluated and compared four freely-available software packages designed for Affymetrix SNP chips to estimate copy number: Affymetrix Power Tools (APT), Aroma.Affymetrix, PennCNV and CRLMM. Our evaluation used 1,418 GENOA samples that were genotyped on the Affymetrix Genome-Wide Human SNP Array 6.0. We compared bias and variance in the locus-level copy number data, the concordance amongst regions of copy number gains/deletions and the false-positive rate amongst deleted segments.</p> <p>Results</p> <p>APT had median locus-level copy numbers closest to a value of two, whereas PennCNV and Aroma.Affymetrix had the smallest variability associated with the median copy number. Of those evaluated, only PennCNV provides copy number specific quality-control metrics and identified 136 poor CNV samples. Regions of copy number variation (CNV) were detected using the hidden Markov models provided within PennCNV and CRLMM/VanillaIce. PennCNV detected more CNVs than CRLMM/VanillaIce; the median number of CNVs detected per sample was 39 and 30, respectively. PennCNV detected most of the regions that CRLMM/VanillaIce did as well as additional CNV regions. The median concordance between PennCNV and CRLMM/VanillaIce was 47.9% for duplications and 51.5% for deletions. The estimated false-positive rate associated with deletions was similar for PennCNV and CRLMM/VanillaIce.</p> <p>Conclusions</p> <p>If the objective is to perform statistical tests on the locus-level copy number data, our empirical results suggest that PennCNV or Aroma.Affymetrix is optimal. If the objective is to perform statistical tests on the summarized segmented data then PennCNV would be preferred over CRLMM/VanillaIce. Specifically, PennCNV allows the analyst to estimate locus-level copy number, perform segmentation and evaluate CNV-specific quality-control metrics within a single software package. PennCNV has relatively small bias, small variability and detects more regions while maintaining a similar estimated false-positive rate as CRLMM/VanillaIce. More generally, we advocate that software developers need to provide guidance with respect to evaluating and choosing optimal settings in order to obtain optimal results for an individual dataset. Until such guidance exists, we recommend trying multiple algorithms, evaluating concordance/discordance and subsequently consider the union of regions for downstream association tests.</p

    Bayesian estimation of genomic copy number with single nucleotide polymorphism genotyping arrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The identification of copy number aberration in the human genome is an important area in cancer research. We develop a model for determining genomic copy numbers using high-density single nucleotide polymorphism genotyping microarrays. The method is based on a Bayesian spatial normal mixture model with an unknown number of components corresponding to true copy numbers. A reversible jump Markov chain Monte Carlo algorithm is used to implement the model and perform posterior inference.</p> <p>Results</p> <p>The performance of the algorithm is examined on both simulated and real cancer data, and it is compared with the popular CNAG algorithm for copy number detection.</p> <p>Conclusions</p> <p>We demonstrate that our Bayesian mixture model performs at least as well as the hidden Markov model based CNAG algorithm and in certain cases does better. One of the added advantages of our method is the flexibility of modeling normal cell contamination in tumor samples.</p

    The quest for genetic risk factors for Crohn's disease in the post-GWAS era

    Get PDF
    Multiple genome-wide association studies (GWASs) and two large scale meta-analyses have been performed for Crohn's disease and have identified 71 susceptibility loci. These findings have contributed greatly to our current understanding of the disease pathogenesis. Yet, these loci only explain approximately 23% of the disease heritability. One of the future challenges in this post-GWAS era is to identify potential sources of the remaining heritability. Such sources may include common variants with limited effect size, rare variants with higher effect sizes, structural variations, or even more complicated mechanisms such as epistatic, gene-environment and epigenetic interactions. Here, we outline potential sources of this hidden heritability, focusing on Crohn's disease and the currently available data. We also discuss future strategies to determine more about the heritability; these strategies include expanding current GWAS, fine-mapping, whole genome sequencing or exome sequencing, and using family-based approaches. Despite the current limitations, such strategies may help to transfer research achievements into clinical practice and guide the improvement of preventive and therapeutic measures

    DNA isolation protocol effects on nuclear DNA analysis by microarrays, droplet digital PCR, and whole genome sequencing, and on mitochondrial DNA copy number estimation.

    Get PDF
    Potential bias introduced during DNA isolation is inadequately explored, although it could have significant impact on downstream analysis. To investigate this in human brain, we isolated DNA from cerebellum and frontal cortex using spin columns under different conditions, and salting-out. We first analysed DNA using array CGH, which revealed a striking wave pattern suggesting primarily GC-rich cerebellar losses, even against matched frontal cortex DNA, with a similar pattern on a SNP array. The aCGH changes varied with the isolation protocol. Droplet digital PCR of two genes also showed protocol-dependent losses. Whole genome sequencing showed GC-dependent variation in coverage with spin column isolation from cerebellum. We also extracted and sequenced DNA from substantia nigra using salting-out and phenol / chloroform. The mtDNA copy number, assessed by reads mapping to the mitochondrial genome, was higher in substantia nigra when using phenol / chloroform. We thus provide evidence for significant method-dependent bias in DNA isolation from human brain, as reported in rat tissues. This may contribute to array "waves", and could affect copy number determination, particularly if mosaicism is being sought, and sequencing coverage. Variations in isolation protocol may also affect apparent mtDNA abundance

    Genomic characteristics of cattle copy number variations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Copy number variation (CNV) represents another important source of genetic variation complementary to single nucleotide polymorphism (SNP). High-density SNP array data have been routinely used to detect human CNVs, many of which have significant functional effects on gene expression and human diseases. In the dairy industry, a large quantity of SNP genotyping results are becoming available and can be used for CNV discovery to understand and accelerate genetic improvement for complex traits.</p> <p>Results</p> <p>We performed a systematic analysis of CNV using the Bovine HapMap SNP genotyping data, including 539 animals of 21 modern cattle breeds and 6 outgroups. After correcting genomic waves and considering the pedigree information, we identified 682 candidate CNV regions, which represent 139.8 megabases (~4.60%) of the genome. Selected CNVs were further experimentally validated and we found that copy number "gain" CNVs were predominantly clustered in tandem rather than existing as interspersed duplications. Many CNV regions (~56%) overlap with cattle genes (1,263), which are significantly enriched for immunity, lactation, reproduction and rumination. The overlap of this new dataset and other published CNV studies was less than 40%; however, our discovery of large, high frequency (> 5% of animals surveyed) CNV regions showed 90% agreement with other studies. These results highlight the differences and commonalities between technical platforms.</p> <p>Conclusions</p> <p>We present a comprehensive genomic analysis of cattle CNVs derived from SNP data which will be a valuable genomic variation resource. Combined with SNP detection assays, gene-containing CNV regions may help identify genes undergoing artificial selection in domesticated animals.</p

    A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data

    Get PDF
    BACKGROUND: It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. METHODOLOGY AND PRINCIPAL FINDINGS: We developed a multi-sample-based genomic variations detector (MGVD) that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs); CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR). CONCLUSIONS AND SIGNIFICANCE: We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php

    The use of race, ethnicity and ancestry in human genetic research

    Get PDF
    Post-Human Genome Project progress has enabled a new wave of population genetic research, and intensified controversy over the use of race/ethnicity in this work. At the same time, the development of methods for inferring genetic ancestry offers more empirical means of assigning group labels. Here, we provide a systematic analysis of the use of race/ethnicity and ancestry in current genetic research. We base our analysis on key published recommendations for the use and reporting of race/ethnicity which advise that researchers: explain why the terms/categories were used and how they were measured, carefully define them, and apply them consistently. We studied 170 population genetic research articles from high impact journals, published 2008–2009. A comparative perspective was obtained by aligning study metrics with similar research from articles published 2001–2004. Our analysis indicates a marked improvement in compliance with some of the recommendations/guidelines for the use of race/ethnicity over time, while showing that important shortfalls still remain: no article using ‘race’, ‘ethnicity’ or ‘ancestry’ defined or discussed the meaning of these concepts in context; a third of articles still do not provide a rationale for their use, with those using ‘ancestry’ being the least likely to do so. Further, no article discussed potential socio-ethical implications of the reported research. As such, there remains a clear imperative for highlighting the importance of consistent and comprehensive reporting on human populations to the genetics/genomics community globally, to generate explicit guidelines for the uses of ancestry and genetic ancestry, and importantly, to ensure that guidelines are followed

    Mesenchymal Transition and PDGFRA Amplification/Mutation Are Key Distinct Oncogenic Events in Pediatric Diffuse Intrinsic Pontine Gliomas

    Get PDF
    Diffuse intrinsic pontine glioma (DIPG) is one of the most frequent malignant pediatric brain tumor and its prognosis is universaly fatal. No significant improvement has been made in last thirty years over the standard treatment with radiotherapy. To address the paucity of understanding of DIPGs, we have carried out integrated molecular profiling of a large series of samples obtained with stereotactic biopsy at diagnosis. While chromosomal imbalances did not distinguish DIPG and supratentorial tumors on CGHarrays, gene expression profiling revealed clear differences between them, with brainstem gliomas resembling midline/thalamic tumours, indicating a closely-related origin. Two distinct subgroups of DIPG were identified. The first subgroup displayed mesenchymal and pro-angiogenic characteristics, with stem cell markers enrichment consistent with the possibility to grow tumor stem cells from these biopsies. The other subgroup displayed oligodendroglial features, and appeared largely driven by PDGFRA, in particular through amplification and/or novel missense mutations in the extracellular domain. Patients in this later group had a significantly worse outcome with an hazard ratio for early deaths, ie before 10 months, 8 fold greater that the ones in the other subgroup (p = 0.041, Cox regression model). The worse outcome of patients with the oligodendroglial type of tumors was confirmed on a series of 55 paraffin-embedded biopsy samples at diagnosis (median OS of 7.73 versus 12.37 months, p = 0.045, log-rank test). Two distinct transcriptional subclasses of DIPG with specific genomic alterations can be defined at diagnosis by oligodendroglial differentiation or mesenchymal transition, respectively. Classifying these tumors by signal transduction pathway activation and by mutation in pathway member genes may be particularily valuable for the development of targeted therapies
    corecore