311 research outputs found
Prediction without Preclusion: Recourse Verification with Reachable Sets
Machine learning models are often used to decide who will receive a loan, a
job interview, or a public benefit. Standard techniques to build these models
use features about people but overlook their actionability. In turn, models can
assign predictions that are fixed, meaning that consumers who are denied loans,
interviews, or benefits may be permanently locked out from access to credit,
employment, or assistance. In this work, we introduce a formal testing
procedure to flag models that assign fixed predictions that we call recourse
verification. We develop machinery to reliably determine if a given model can
provide recourse to its decision subjects from a set of user-specified
actionability constraints. We demonstrate how our tools can ensure recourse and
adversarial robustness in real-world datasets and use them to study the
infeasibility of recourse in real-world lending datasets. Our results highlight
how models can inadvertently assign fixed predictions that permanently bar
access, and we provide tools to design algorithms that account for
actionability when developing models
Responsible and Regulatory Conform Machine Learning for Medicine: A Survey of Challenges and Solutions
Machine learning is expected to fuel significant improvements in medical
care. To ensure that fundamental principles such as beneficence, respect for
human autonomy, prevention of harm, justice, privacy, and transparency are
respected, medical machine learning systems must be developed responsibly. Many
high-level declarations of ethical principles have been put forth for this
purpose, but there is a severe lack of technical guidelines explicating the
practical consequences for medical machine learning. Similarly, there is
currently considerable uncertainty regarding the exact regulatory requirements
placed upon medical machine learning systems. This survey provides an overview
of the technical and procedural challenges involved in creating medical machine
learning systems responsibly and in conformity with existing regulations, as
well as possible solutions to address these challenges. First, a brief review
of existing regulations affecting medical machine learning is provided, showing
that properties such as safety, robustness, reliability, privacy, security,
transparency, explainability, and nondiscrimination are all demanded already by
existing law and regulations - albeit, in many cases, to an uncertain degree.
Next, the key technical obstacles to achieving these desirable properties are
discussed, as well as important techniques to overcome these obstacles in the
medical context. We notice that distribution shift, spurious correlations,
model underspecification, uncertainty quantification, and data scarcity
represent severe challenges in the medical context. Promising solution
approaches include the use of large and representative datasets and federated
learning as a means to that end, the careful exploitation of domain knowledge,
the use of inherently transparent models, comprehensive out-of-distribution
model testing and verification, as well as algorithmic impact assessments
Cumulative Distribution Functions As The Foundation For Probabilistic Models
This thesis discusses applications of probabilistic and connectionist models for
constructing and training cumulative distribution functions (CDFs). First, it is shown
how existing tools from the copula literature can be combined to build probabilistic
models. It is found that this simple construction leads to numerical and scalability
issues that make training and inference challenging.
Next, several innovative ideas, combining neural networks, automatic differentiation
and copula functions, introduce how to assemble black-box probabilistic
models. The basic building block is a cumulative distribution function that is straightforward
to construct, composed of arithmetic operations and nonlinear functions.
There is no need to assume any specific parametric probability density function
(PDF), making the model flexible and normalisation unnecessary. The only requirement
is to design a computational graph that parameterises monotonically
non-decreasing functions with a constrained range. Training can be then performed
using standard tools from any neural network software library.
Finally, factorial hidden Markov models (FHMMs) for sequential data are
presented. It is shown how to leverage cumulative distribution functions in the
form of the Gaussian copula and amortised stochastic variational method to encode
hidden Markov chains coherently. This approach enables efficient learning and
inference to model long sequences of high-dimensional data with long-range dependencies.
Tackling such complex problems was impossible with the established
FHMM approximate inference algorithm.
It is empirically verified on several problems that some of the estimators introduced
in this work can perform comparably or better than the currently popular
models. Especially for tasks requiring tail-area or marginal probabilities that can be
read directly from a cumulative distribution function
Game theoretic and machine learning techniques for balancing games
Game balance is the problem of determining the fairness of actions or sets of actions in competitive, multiplayer games. This problem primarily arises in the context of designing board and video games. Traditionally, balance has been achieved through large amounts of play-testing and trial-and-error on the part of the designers. In this thesis, it is our intent to lay down the beginnings of a framework for a formal and analytical solution to this problem, combining techniques from game theory and machine learning. We first develop a set of game-theoretic definitions for different forms of balance, and then introduce the concept of a strategic abstraction. We show how machine classification techniques can be used to identify high-level player strategy in games, using the two principal methods of sequence alignment and Naive Bayes classification. Bioinformatics sequence alignment, when combined with a 3-nearest neighbor classification approach, can, with only 3 exemplars of each strategy, correctly identify the strategy used in 55\% of cases using all data, and 77\% of cases on data that experts indicated actually had a strategic class. Naive Bayes classification achieves similar results, with 65\% accuracy on all data and 75\% accuracy on data rated to have an actual class. We then show how these game theoretic and machine learning techniques can be combined to automatically build matrices that can be used to analyze game balance properties
- …