15 research outputs found

    Resample-smoothing of Voronoi intensity estimators

    Get PDF
    Voronoi estimators are non-parametric and adaptive estimators of the intensity of a point process. The intensity estimate at a given location is equal to the reciprocal of the size of the Voronoi/Dirichlet cell containing that location. Their major drawback is that they tend to paradoxically under-smooth the data in regions where the point density of the observed point pattern is high, and over-smooth where the point density is low. To remedy this behaviour, we propose to apply an additional smoothing operation to the Voronoi estimator, based on resampling the point pattern by independent random thinning. Through a simulation study we show that our resample-smoothing technique improves the estimation substantially. In addition, we study statistical properties such as unbiasedness and variance, and propose a rule-of-thumb and a data-driven cross-validation approach to choose the amount of smoothing to apply. Finally we apply our proposed intensity estimation scheme to two datasets: locations of pine saplings (planar point pattern) and motor vehicle traffic accidents (linear network point pattern)

    Genome-wide association study of copy number variation with lung function identifies a novel signal of association near BANP for forced vital capacity

    Get PDF
    Background: Genome-wide association studies of Single Nucleotide Polymorphisms (SNPs) have identified 55 SNPs associated with lung function. However, little is known about the effect of copy number variants (CNVs) on lung function, although CNVs represent a significant proportion of human genetic polymorphism. To assess the effect of CNVs on lung function quantitative traits, we measured copy number at 2788 previously characterised, common copy number variable regions in 6 independent cohorts (n = 24,237) using intensity data from SNP genotyping experiments. We developed a pipeline for genome-wide association analysis and meta-analysis of CNV genotypes measured across multiple studies using SNP genotype array intensity data from different platform technologies. We then undertook cohort-level genome-wide association studies of CNV with lung function in a subset of 4 cohorts (n < = 12,403) with lung function measurements and meta-analysed the results. Follow-up was undertaken for CNVs which were well tagged by SNPs, in up to 146,871 individuals.Results: We generated robust copy number calls for 1962 out of 2788 (70 %) known CNV regions genome-wide, with 1103 measured with compatible class frequencies in at least 2 cohorts. We report a novel CNV association (discovery P = 0.0007) with Forced Vital Capacity (FVC) downstream of BANP on chromosome 16 that shows evidence of replication by a tag SNP in two independent studies (replication P = 0.004). In addition, we provide suggestive evidence (discovery P = 0.0002) for a role of complex copy number variation at a previously reported lung function locus, containing the rootletin gene CROCC, that is not tagged by SNPs.Conclusions: We demonstrate how common CNV regions can be reliably and consistently called across cohorts, using an existing calling algorithm and rigorous quality control steps, using SNP genotyping array intensity data. Although many common biallelic CNV regions were well-tagged by common SNPs, we also identified associations with untagged mulitallelic CNV regions thereby illustrating the potential of our approach to identify some of the missing heritability of complex traits

    Pan-cancer analysis of whole genomes

    Get PDF
    Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale(1-3). Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter(4); identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation(5,6); analyses timings and patterns of tumour evolution(7); describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity(8,9); and evaluates a range of more-specialized features of cancer genomes(8,10-18).Peer reviewe
    corecore