29 research outputs found

    Exact and Approximate Conformal Inference for Multi-Output Regression

    Full text link
    It is common in machine learning to estimate a response yy given covariate information xx. However, these predictions alone do not quantify any uncertainty associated with said predictions. One way to overcome this deficiency is with conformal inference methods, which construct a set containing the unobserved response yy with a prescribed probability. Unfortunately, even with a one-dimensional response, conformal inference is computationally expensive despite recent encouraging advances. In this paper, we explore multi-output regression, delivering exact derivations of conformal inference pp-values when the predictive model can be described as a linear function of yy. Additionally, we propose \texttt{unionCP} and a multivariate extension of \texttt{rootCP} as efficient ways of approximating the conformal prediction region for a wide array of multi-output predictors, both linear and nonlinear, while preserving computational advantages. We also provide both theoretical and empirical evidence of the effectiveness of these methods using both real-world and simulated data.Comment: 20 pages, 6 figure

    A Risk Based Approach to Node Insertion within Social Networks

    Get PDF
    Social Network Analysis (SNA) is a primary tool for counter-terrorism operations, ranging from resiliency and influence to interdiction on threats stemming from illicit overt and clandestine network operations. In an ideal world, SNA would provide a perfect course of action to eliminate dangerous situations that terrorist organizations bring. Unfortunately, the covert nature of terrorist networks makes the effects of these techniques unknown and possibly detrimental. To avoid potentially harmful changes to enemy networks, tactical involvement must evolve, beginning with the intelligent use of network in filtration through the application of the node insertion problem. The framework for the node insertion problem includes a risk-benefit model to assess the utility of various node insertion scenarios. This model incorporates local, intermediate and global SNA measures, such as Laplacian centrality and assortative mixing, to account for the benefit and risk. Application of the model to the Zachary Karate Club produces a set of recommended insertion scenarios. A designed experiment validates the robustness of the methodology against network structure and characteristics. Ultimately, the research provides an SNA method to identify optimal and near-optimal node insertion strategies and extend past node utility models into a general form with the inclusion of benefit, risk, and bias functions

    Using Conformal Win Probability to Predict the Winners of the Cancelled 2020 NCAA Basketball Tournaments

    Get PDF
    The COVID-19 pandemic was responsible for the cancellation of both the men's and women's 2020 National Collegiate Athletic Association (NCAA) Division 1 basketball tournaments. Starting from the point at which the Division 1 tournaments and any unfinished conference tournaments were cancelled, we deliver closed-form probabilities for each team of making the Division 1 tournaments, had they not been cancelled, aided by use of conformal predictive distributions. We also deliver probabilities of a team winning March Madness, given a tournament bracket. We then compare single-game win probabilities generated with conformal predictive distributions, aptly named conformal win probabilities, to those generated through linear and logistic regression on seven years of historical college basketball data, specifically from the 2014-2015 season through the 2020-2021 season. Conformal win probabilities are shown to be better calibrated than other methods, resulting in more accurate win probability estimates, while requiring fewer distributional assumptions.Comment: preprint submitted to Journal of Quantitative Analysis in Sports, 28 pages without figures; figures included at end of documen

    Using Conformal Win Probability to Predict the Winners of the Canceled 2020 NCAA Basketball Tournaments

    Get PDF
    The COVID-19 pandemic was responsible for the cancellation of both the men’s and women’s 2020 National Collegiate Athletic Association (NCAA) Division I basketball tournaments. Starting from the point when the Division I tournaments and unfinished conference tournaments were canceled, we deliver closed-form probabilities for each team of making the Division I tournaments, had they not been canceled, under a simplified method for tournament selection. We also determine probabilities of a team winning March Madness, given a tournament bracket. Our calculations make use of conformal win probabilities derived from conformal predictive distributions. We compare these conformal win probabilities to those generated through linear and logistic regression on college basketball data spanning the 2011–2012 and 2022–2023 seasons, as well as to other publicly available win probability methods. Conformal win probabilities are shown to be well calibrated, while requiring fewer distributional assumptions than most alternative methods.This article is published as ohnstone, C., & Nettleton, D. (2023). Using Conformal Win Probability to Predict the Winners of the Canceled 2020 NCAA Basketball Tournaments. The American Statistician, 78(3), 304–317. https://doi.org/10.1080/00031305.2023.2283199

    Anomaly Detection in the Molecular Structure of Gallium Arsenide Using Convolutional Neural Networks

    Get PDF
    This paper concerns the development of a machine learning tool to detect anomalies in the molecular structure of Gallium Arsenide. We employ a combination of a CNN and a PCA reconstruction to create the model, using real images taken with an electron microscope in training and testing. The methodology developed allows for the creation of a defect detection model, without any labeled images of defects being required for training. The model performed well on all tests under the established assumptions, allowing for reliable anomaly detection. To the best of our knowledge, such methods are not currently available in the open literature; thus, this work fills a gap in current capabilities

    H2G2-Net: A Hierarchical Heterogeneous Graph Generative Network Framework for Discovery of Multi-Modal Physiological Responses

    Full text link
    Discovering human cognitive and emotional states using multi-modal physiological signals draws attention across various research applications. Physiological responses of the human body are influenced by human cognition and commonly used to analyze cognitive states. From a network science perspective, the interactions of these heterogeneous physiological modalities in a graph structure may provide insightful information to support prediction of cognitive states. However, there is no clue to derive exact connectivity between heterogeneous modalities and there exists a hierarchical structure of sub-modalities. Existing graph neural networks are designed to learn on non-hierarchical homogeneous graphs with pre-defined graph structures; they failed to learn from hierarchical, multi-modal physiological data without a pre-defined graph structure. To this end, we propose a hierarchical heterogeneous graph generative network (H2G2-Net) that automatically learns a graph structure without domain knowledge, as well as a powerful representation on the hierarchical heterogeneous graph in an end-to-end fashion. We validate the proposed method on the CogPilot dataset that consists of multi-modal physiological signals. Extensive experiments demonstrate that our proposed method outperforms the state-of-the-art GNNs by 5%-20% in prediction accuracy.Comment: Paper accepted in Human-Centric Representation Learning workshop at AAAI 2024 (https://hcrl-workshop.github.io/2024/

    Shape-restricted random forests and semiparametric prediction intervals

    No full text
    This dissertation is made up of three projects, all of which focus on prediction or uncertainty quantification with parametric and nonparametric methods. Chapter 2 introduces novel approaches to generating semiparametric prediction intervals for linear models. We compare these new methods to other prediction interval methods with simulated and real-world data. We show our method is competitive with other methods in most cases, and better in a subset of cases. We provide multiple theorems related to the marginal coverage of the new methods. We also use these methods to provide estimation for event outcomes in the sports realm. The results show the effectiveness of our methods in providing asymptotically valid, semiparametric prediction intervals. Chapter 3 introduces a new R package that implements multiple state-of-the-art methodologies to generate prediction intervals for random forests. We compare these methods via simulation. We also apply a subset of these methods to a drug-discovery data analysis problem. Chapter 4 introduces multiple monotone restriction methods for random forest predictions. We compare our methods to other tree-based monotone restriction methods, showing that our method stays competitive, while guaranteeing partially monotone predictions. We also extend our monotone restriction methods to generate monotone restricted prediction intervals for random forests.</p

    Shape-restricted random forests and semiparametric prediction intervals

    No full text
    This dissertation is made up of three projects, all of which focus on prediction or uncertainty quantification with parametric and nonparametric methods. Chapter 2 introduces novel approaches to generating semiparametric prediction intervals for linear models. We compare these new methods to other prediction interval methods with simulated and real-world data. We show our method is competitive with other methods in most cases, and better in a subset of cases. We provide multiple theorems related to the marginal coverage of the new methods. We also use these methods to provide estimation for event outcomes in the sports realm. The results show the effectiveness of our methods in providing asymptotically valid, semiparametric prediction intervals. Chapter 3 introduces a new R package that implements multiple state-of-the-art methodologies to generate prediction intervals for random forests. We compare these methods via simulation. We also apply a subset of these methods to a drug-discovery data analysis problem. Chapter 4 introduces multiple monotone restriction methods for random forest predictions. We compare our methods to other tree-based monotone restriction methods, showing that our method stays competitive, while guaranteeing partially monotone predictions. We also extend our monotone restriction methods to generate monotone restricted prediction intervals for random forests
    corecore