72 research outputs found

    Statistical Models of Repeated Categorical Ratings: The R package rater

    Full text link
    A common occurrence in many disciplines is the need to assign a set of items into categories or classes with known labels. This is often done by one or more expert raters, or sometimes by an automated process. If these assignments, or 'ratings', are difficult to do, a common tactic is to repeat them by different raters, or even by the same rater multiple times on different occasions. We present an R package, rater, available on CRAN, that implements Bayesian versions of several statistical models that allow analysis of repeated categorical rating data. Inference is possible for the true underlying (latent) class of each item, as well as the accuracy of each rater. The models are based on, and include, the Dawid-Skene model. We use the Stan probabilistic programming language as the main computational engine. We illustrate usage of rater through a few examples. We also discuss in detail the techniques of marginalisation and conditioning, which are necessary for these models but also apply more generally to other models implemented in Stan.Comment: 28 pages, 6 figure

    Quantifying the Underestimation of Relative Risks from Genome-Wide Association Studies

    Get PDF
    Genome-wide association studies (GWAS) have identified hundreds of associated loci across many common diseases. Most risk variants identified by GWAS will merely be tags for as-yet-unknown causal variants. It is therefore possible that identification of the causal variant, by fine mapping, will identify alleles with larger effects on genetic risk than those currently estimated from GWAS replication studies. We show that under plausible assumptions, whilst the majority of the per-allele relative risks (RR) estimated from GWAS data will be close to the true risk at the causal variant, some could be considerable underestimates. For example, for an estimated RR in the range 1.2–1.3, there is approximately a 38% chance that it exceeds 1.4 and a 10% chance that it is over 2. We show how these probabilities can vary depending on the true effects associated with low-frequency variants and on the minor allele frequency (MAF) of the most associated SNP. We investigate the consequences of the underestimation of effect sizes for predictions of an individual's disease risk and interpret our results for the design of fine mapping experiments. Although these effects mean that the amount of heritability explained by known GWAS loci is expected to be larger than current projections, this increase is likely to explain a relatively small amount of the so-called “missing” heritability

    Adaptively Weighted Audits of Instant-Runoff Voting Elections: AWAIRE

    Full text link
    An election audit is risk-limiting if the audit limits (to a pre-specified threshold) the chance that an erroneous electoral outcome will be certified. Extant methods for auditing instant-runoff voting (IRV) elections are either not risk-limiting or require cast vote records (CVRs), the voting system's electronic record of the votes on each ballot. CVRs are not always available, for instance, in jurisdictions that tabulate IRV contests manually. We develop an RLA method (AWAIRE) that uses adaptively weighted averages of test supermartingales to efficiently audit IRV elections when CVRs are not available. The adaptive weighting 'learns' an efficient set of hypotheses to test to confirm the election outcome. When accurate CVRs are available, AWAIRE can use them to increase the efficiency to match the performance of existing methods that require CVRs. We provide an open-source prototype implementation that can handle elections with up to six candidates. Simulations using data from real elections show that AWAIRE is likely to be efficient in practice. We discuss how to extend the computational approach to handle elections with more candidates. Adaptively weighted averages of test supermartingales are a general tool, useful beyond election audits to test collections of hypotheses sequentially while rigorously controlling the familywise error rate.Comment: 16 pages, 3 figures, accepted for E-Vote-ID 202

    Random errors are not necessarily politically neutral

    Full text link
    Errors are inevitable in the implementation of any complex process. Here we examine the effect of random errors on Single Transferable Vote (STV) elections, a common approach to deciding multi-seat elections. It is usually expected that random errors should have nearly equal effects on all candidates, and thus be fair. We find to the contrary that random errors can introduce systematic bias into election results. This is because, even if the errors are random, votes for different candidates occur in different patterns that are affected differently by random errors. In the STV context, the most important effect of random errors is to invalidate the ballot. This removes far more votes for those candidates whose supporters tend to list a lot of preferences, because their ballots are much more likely to be invalidated by random error. Different validity rules for different voting styles mean that errors are much more likely to penalise some types of votes than others. For close elections this systematic bias can change the result of the election

    Auditing Ranked Voting Elections with Dirichlet-Tree Models: First Steps

    Get PDF
    Ranked voting systems, such as instant-runo voting (IRV) and single transferable vote (STV), are used in many places around the world. They are more complex than plurality and scoring rules, pre- senting a challenge for auditing their outcomes: there is no known risk- limiting audit (RLA) method for STV other than a full hand count. We present a new approach to auditing ranked systems that uses a sta- tistical model, a Dirichlet-tree, that can cope with high-dimensional pa- rameters in a computationally e cient manner. We demonstrate this ap- proach with a ballot-polling Bayesian audit for IRV elections. Although the technique is not known to be risk-limiting, we suggest some strategies that might allow it to be calibrated to limit risk

    Bayesian test for colocalisation between pairs of genetic association studies using summary statistics.

    Get PDF
    Genetic association studies, in particular the genome-wide association study (GWAS) design, have provided a wealth of novel insights into the aetiology of a wide range of human diseases and traits, in particular cardiovascular diseases and lipid biomarkers. The next challenge consists of understanding the molecular basis of these associations. The integration of multiple association datasets, including gene expression datasets, can contribute to this goal. We have developed a novel statistical methodology to assess whether two association signals are consistent with a shared causal variant. An application is the integration of disease scans with expression quantitative trait locus (eQTL) studies, but any pair of GWAS datasets can be integrated in this framework. We demonstrate the value of the approach by re-analysing a gene expression dataset in 966 liver samples with a published meta-analysis of lipid traits including >100,000 individuals of European ancestry. Combining all lipid biomarkers, our re-analysis supported 26 out of 38 reported colocalisation results with eQTLs and identified 14 new colocalisation results, hence highlighting the value of a formal statistical test. In three cases of reported eQTL-lipid pairs (SYPL2, IFT172, TBKBP1) for which our analysis suggests that the eQTL pattern is not consistent with the lipid association, we identify alternative colocalisation results with SORT1, GCKR, and KPNB1, indicating that these genes are more likely to be causal in these genomic intervals. A key feature of the method is the ability to derive the output statistics from single SNP summary statistics, hence making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets (implemented online at http://coloc.cs.ucl.ac.uk/coloc/). Our methodology provides information about candidate causal genes in associated intervals and has direct implications for the understanding of complex diseases as well as the design of drugs to target disease pathways

    Large-Scale Imputation of KIR Copy Number and HLA Alleles in North American and European Psoriasis Case-Control Cohorts Reveals Association of Inhibitory KIR2DL2 With Psoriasis

    Get PDF
    Killer cell immunoglobulin-like receptors (KIR) regulate immune responses in NK and CD8+ T cells via interaction with HLA ligands. KIR genes, including KIR2DS1, KIR3DL1, and KIR3DS1 have previously been implicated in psoriasis susceptibility. However, these previous studies were constrained to small sample sizes, in part due to the time and expense required for direct genotyping of KIR genes. Here, we implemented KIR*IMP to impute KIR copy number from single-nucleotide polymorphisms (SNPs) on chromosome 19 in the discovery cohort (n=11,912) from the PAGE consortium, University of California San Francisco, and the University of Dundee, and in a replication cohort (n=66,357) from Kaiser Permanente Northern California. Stratified multivariate logistic regression that accounted for patient ancestry and high-risk HLA alleles revealed that KIR2DL2 copy number was significantly associated with psoriasis in the discovery cohort (p ≤ 0.05). The KIR2DL2 copy number association was replicated in the Kaiser Permanente replication cohort. This is the first reported association of KIR2DL2 copy number with psoriasis and highlights the importance of KIR genetics in the pathogenesis of psoriasis

    Anthropology on Economic Development in Hanoi, Capital of Vietnam Analysis of Commercial Activities of Hanghom Paint Shops Street

    Get PDF
    The killer immunoglobulin-like receptors (KIRs), found predominantly on the surface of natural killer (NK) cells and some T-cells, are a collection of highly polymorphic activating and inhibitory receptors with variable specificity for class I human leukocyte antigen (HLA) ligands. Fifteen KIR genes are inherited in haplotypes of diverse gene content across the human population, and the repertoire of independently inherited KIR and HLA alleles is known to alter risk for immune-mediated and infectious disease by shifting the threshold of lymphocyte activation. We have conducted the largest disease-association study of KIR-HLA epistasis to date, enabled by the imputation of KIR gene and HLA allele dosages from genotype data for 12,214 healthy controls and 8,107 individuals with the HLA-B*27-associated immune-mediated arthritis, ankylosing spondylitis (AS). We identified epistatic interactions between KIR genes and their ligands (at both HLA subtype and allele resolution) that increase risk of disease, replicating analyses in a semi-independent cohort of 3,497 cases and 14,844 controls. We further confirmed that the strong AS-association with a pathogenic variant in the endoplasmic reticulum aminopeptidase gene ERAP1, known to alter the HLA-B*27 presented peptidome, is not modified by carriage of the canonical HLA-B receptor KIR3DL1/S1. Overall, our data suggests that AS risk is modified by the complement of KIRs and HLA ligands inherited, beyond the influence of HLA-B*27 alone, which collectively alter the proinflammatory capacity of KIR-expressing lymphocytes to contribute to disease immunopathogenesis

    Imputation of KIR Types from SNP Variation Data.

    Get PDF
    Large population studies of immune system genes are essential for characterizing their role in diseases, including autoimmune conditions. Of key interest are a group of genes encoding the killer cell immunoglobulin-like receptors (KIRs), which have known and hypothesized roles in autoimmune diseases, resistance to viruses, reproductive conditions, and cancer. These genes are highly polymorphic, which makes typing expensive and time consuming. Consequently, despite their importance, KIRs have been little studied in large cohorts. Statistical imputation methods developed for other complex loci (e.g., human leukocyte antigen [HLA]) on the basis of SNP data provide an inexpensive high-throughput alternative to direct laboratory typing of these loci and have enabled important findings and insights for many diseases. We present KIR∗IMP, a method for imputation of KIR copy number. We show that KIR∗IMP is highly accurate and thus allows the study of KIRs in large cohorts and enables detailed investigation of the role of KIRs in human disease.This work was supported by the Australian National Health and Medical Research Council (NHMRC), Career Development Fellowship ID 1053756 (S.L.); by a Victorian Life Sciences Computation Initiative (VLSCI) grant number VR0240 on its Peak Computing Facility at the University of Melbourne, an initiative of the Victorian Government, Australia (S.L.); by the UK Multiple Sclerosis Society, grant 894/08 (S.S.); and by the Wellcome Trust and the MRC with partial funding from the National Institute of Health Cambridge Biomedical Research Centre (J.T., J.A.T.). Research at the Murdoch Childrens Research Institute was supported by the Victorian Government's Operational Infrastructure Support Program.This is the final version of the article. It first appeared from Elsevier via http://dx.doi.org/10.1016/j.ajhg.2015.09.00
    corecore