29 research outputs found
Exact and Approximate Conformal Inference for Multi-Output Regression
It is common in machine learning to estimate a response given covariate
information . However, these predictions alone do not quantify any
uncertainty associated with said predictions. One way to overcome this
deficiency is with conformal inference methods, which construct a set
containing the unobserved response with a prescribed probability.
Unfortunately, even with a one-dimensional response, conformal inference is
computationally expensive despite recent encouraging advances. In this paper,
we explore multi-output regression, delivering exact derivations of conformal
inference -values when the predictive model can be described as a linear
function of . Additionally, we propose \texttt{unionCP} and a multivariate
extension of \texttt{rootCP} as efficient ways of approximating the conformal
prediction region for a wide array of multi-output predictors, both linear and
nonlinear, while preserving computational advantages. We also provide both
theoretical and empirical evidence of the effectiveness of these methods using
both real-world and simulated data.Comment: 20 pages, 6 figure
A Risk Based Approach to Node Insertion within Social Networks
Social Network Analysis (SNA) is a primary tool for counter-terrorism operations, ranging from resiliency and influence to interdiction on threats stemming from illicit overt and clandestine network operations. In an ideal world, SNA would provide a perfect course of action to eliminate dangerous situations that terrorist organizations bring. Unfortunately, the covert nature of terrorist networks makes the effects of these techniques unknown and possibly detrimental. To avoid potentially harmful changes to enemy networks, tactical involvement must evolve, beginning with the intelligent use of network in filtration through the application of the node insertion problem. The framework for the node insertion problem includes a risk-benefit model to assess the utility of various node insertion scenarios. This model incorporates local, intermediate and global SNA measures, such as Laplacian centrality and assortative mixing, to account for the benefit and risk. Application of the model to the Zachary Karate Club produces a set of recommended insertion scenarios. A designed experiment validates the robustness of the methodology against network structure and characteristics. Ultimately, the research provides an SNA method to identify optimal and near-optimal node insertion strategies and extend past node utility models into a general form with the inclusion of benefit, risk, and bias functions
Using Conformal Win Probability to Predict the Winners of the Cancelled 2020 NCAA Basketball Tournaments
The COVID-19 pandemic was responsible for the cancellation of both the men's
and women's 2020 National Collegiate Athletic Association (NCAA) Division 1
basketball tournaments. Starting from the point at which the Division 1
tournaments and any unfinished conference tournaments were cancelled, we
deliver closed-form probabilities for each team of making the Division 1
tournaments, had they not been cancelled, aided by use of conformal predictive
distributions. We also deliver probabilities of a team winning March Madness,
given a tournament bracket. We then compare single-game win probabilities
generated with conformal predictive distributions, aptly named conformal win
probabilities, to those generated through linear and logistic regression on
seven years of historical college basketball data, specifically from the
2014-2015 season through the 2020-2021 season. Conformal win probabilities are
shown to be better calibrated than other methods, resulting in more accurate
win probability estimates, while requiring fewer distributional assumptions.Comment: preprint submitted to Journal of Quantitative Analysis in Sports, 28
pages without figures; figures included at end of documen
Using Conformal Win Probability to Predict the Winners of the Canceled 2020 NCAA Basketball Tournaments
The COVID-19 pandemic was responsible for the cancellation of both the men’s and women’s 2020 National Collegiate Athletic Association (NCAA) Division I basketball tournaments. Starting from the point when the Division I tournaments and unfinished conference tournaments were canceled, we deliver closed-form probabilities for each team of making the Division I tournaments, had they not been canceled, under a simplified method for tournament selection. We also determine probabilities of a team winning March Madness, given a tournament bracket. Our calculations make use of conformal win probabilities derived from conformal predictive distributions. We compare these conformal win probabilities to those generated through linear and logistic regression on college basketball data spanning the 2011–2012 and 2022–2023 seasons, as well as to other publicly available win probability methods. Conformal win probabilities are shown to be well calibrated, while requiring fewer distributional assumptions than most alternative methods.This article is published as ohnstone, C., & Nettleton, D. (2023). Using Conformal Win Probability to Predict the Winners of the Canceled 2020 NCAA Basketball Tournaments. The American Statistician, 78(3), 304–317. https://doi.org/10.1080/00031305.2023.2283199
Anomaly Detection in the Molecular Structure of Gallium Arsenide Using Convolutional Neural Networks
This paper concerns the development of a machine learning tool to detect anomalies in the molecular structure of Gallium Arsenide. We employ a combination of a CNN and a PCA reconstruction to create the model, using real images taken with an electron microscope in training and testing. The methodology developed allows for the creation of a defect detection model, without any labeled images of defects being required for training. The model performed well on all tests under the established assumptions, allowing for reliable anomaly detection. To the best of our knowledge, such methods are not currently available in the open literature; thus, this work fills a gap in current capabilities
H2G2-Net: A Hierarchical Heterogeneous Graph Generative Network Framework for Discovery of Multi-Modal Physiological Responses
Discovering human cognitive and emotional states using multi-modal
physiological signals draws attention across various research applications.
Physiological responses of the human body are influenced by human cognition and
commonly used to analyze cognitive states. From a network science perspective,
the interactions of these heterogeneous physiological modalities in a graph
structure may provide insightful information to support prediction of cognitive
states. However, there is no clue to derive exact connectivity between
heterogeneous modalities and there exists a hierarchical structure of
sub-modalities. Existing graph neural networks are designed to learn on
non-hierarchical homogeneous graphs with pre-defined graph structures; they
failed to learn from hierarchical, multi-modal physiological data without a
pre-defined graph structure. To this end, we propose a hierarchical
heterogeneous graph generative network (H2G2-Net) that automatically learns a
graph structure without domain knowledge, as well as a powerful representation
on the hierarchical heterogeneous graph in an end-to-end fashion. We validate
the proposed method on the CogPilot dataset that consists of multi-modal
physiological signals. Extensive experiments demonstrate that our proposed
method outperforms the state-of-the-art GNNs by 5%-20% in prediction accuracy.Comment: Paper accepted in Human-Centric Representation Learning workshop at
AAAI 2024 (https://hcrl-workshop.github.io/2024/
Shape-restricted random forests and semiparametric prediction intervals
This dissertation is made up of three projects, all of which focus on prediction or uncertainty quantification with parametric and nonparametric methods.
Chapter 2 introduces novel approaches to generating semiparametric prediction intervals for linear models. We compare these new methods to other prediction interval methods with simulated and real-world data. We show our method is competitive with other methods in most cases, and better in a subset of cases. We provide multiple theorems related to the marginal coverage of the new methods. We also use these methods to provide estimation for event outcomes in the sports realm. The results show the effectiveness of our methods in providing asymptotically valid, semiparametric prediction intervals.
Chapter 3 introduces a new R package that implements multiple state-of-the-art methodologies to generate prediction intervals for random forests. We compare these methods via simulation. We also apply a subset of these methods to a drug-discovery data analysis problem.
Chapter 4 introduces multiple monotone restriction methods for random forest predictions. We compare our methods to other tree-based monotone restriction methods, showing that our method stays competitive, while guaranteeing partially monotone predictions. We also extend our monotone restriction methods to generate monotone restricted prediction intervals for random forests.</p
Shape-restricted random forests and semiparametric prediction intervals
This dissertation is made up of three projects, all of which focus on prediction or uncertainty quantification with parametric and nonparametric methods.
Chapter 2 introduces novel approaches to generating semiparametric prediction intervals for linear models. We compare these new methods to other prediction interval methods with simulated and real-world data. We show our method is competitive with other methods in most cases, and better in a subset of cases. We provide multiple theorems related to the marginal coverage of the new methods. We also use these methods to provide estimation for event outcomes in the sports realm. The results show the effectiveness of our methods in providing asymptotically valid, semiparametric prediction intervals.
Chapter 3 introduces a new R package that implements multiple state-of-the-art methodologies to generate prediction intervals for random forests. We compare these methods via simulation. We also apply a subset of these methods to a drug-discovery data analysis problem.
Chapter 4 introduces multiple monotone restriction methods for random forest predictions. We compare our methods to other tree-based monotone restriction methods, showing that our method stays competitive, while guaranteeing partially monotone predictions. We also extend our monotone restriction methods to generate monotone restricted prediction intervals for random forests