31 research outputs found
Semantic variation operators for multidimensional genetic programming
Multidimensional genetic programming represents candidate solutions as sets
of programs, and thereby provides an interesting framework for exploiting
building block identification. Towards this goal, we investigate the use of
machine learning as a way to bias which components of programs are promoted,
and propose two semantic operators to choose where useful building blocks are
placed during crossover. A forward stagewise crossover operator we propose
leads to significant improvements on a set of regression problems, and produces
state-of-the-art results in a large benchmark study. We discuss this
architecture and others in terms of their propensity for allowing heuristic
search to utilize information during the evolutionary process. Finally, we look
at the collinearity and complexity of the data representations that result from
these architectures, with a view towards disentangling factors of variation in
application.Comment: 9 pages, 8 figures, GECCO 201
Fair admission risk prediction with proportional multicalibration
Fair calibration is a widely desirable fairness criteria in risk prediction
contexts. One way to measure and achieve fair calibration is with
multicalibration. Multicalibration constrains calibration error among
flexibly-defined subpopulations while maintaining overall calibration. However,
multicalibrated models can exhibit a higher percent calibration error among
groups with lower base rates than groups with higher base rates. As a result,
it is possible for a decision-maker to learn to trust or distrust model
predictions for specific groups. To alleviate this, we propose
\emph{proportional multicalibration}, a criteria that constrains the percent
calibration error among groups and within prediction bins. We prove that
satisfying proportional multicalibration bounds a model's multicalibration as
well its \emph{differential calibration}, a fairness criteria that directly
measures how closely a model approximates sufficiency. Therefore,
proportionally calibrated models limit the ability of decision makers to
distinguish between model performance on different patient groups, which may
make the models more trustworthy in practice. We provide an efficient algorithm
for post-processing risk prediction models for proportional multicalibration
and evaluate it empirically. We conduct simulation studies and investigate a
real-world application of PMC-postprocessing to prediction of emergency
department patient admissions. We observe that proportional multicalibration is
a promising criteria for controlling simultaneous measures of calibration
fairness of a model over intersectional groups with virtually no cost in terms
of classification performance.Comment: Published in the 2023 Conference on Health, Inference, and Learning
(CHIL). Best paper awar
Genetic programming approaches to learning fair classifiers
Society has come to rely on algorithms like classifiers for important
decision making, giving rise to the need for ethical guarantees such as
fairness. Fairness is typically defined by asking that some statistic of a
classifier be approximately equal over protected groups within a population. In
this paper, current approaches to fairness are discussed and used to motivate
algorithmic proposals that incorporate fairness into genetic programming for
classification. We propose two ideas. The first is to incorporate a fairness
objective into multi-objective optimization. The second is to adapt lexicase
selection to define cases dynamically over intersections of protected groups.
We describe why lexicase selection is well suited to pressure models to perform
well across the potentially infinitely many subgroups over which fairness is
desired. We use a recent genetic programming approach to construct models on
four datasets for which fairness constraints are necessary, and empirically
compare performance to prior methods utilizing game-theoretic solutions.
Methods are assessed based on their ability to generate trade-offs of subgroup
fairness and accuracy that are Pareto optimal. The result show that genetic
programming methods in general, and random search in particular, are well
suited to this task.Comment: 9 pages, 7 figures. GECCO 202
A System for Accessible Artificial Intelligence
While artificial intelligence (AI) has become widespread, many commercial AI
systems are not yet accessible to individual researchers nor the general public
due to the deep knowledge of the systems required to use them. We believe that
AI has matured to the point where it should be an accessible technology for
everyone. We present an ongoing project whose ultimate goal is to deliver an
open source, user-friendly AI system that is specialized for machine learning
analysis of complex data in the biomedical and health care domains. We discuss
how genetic programming can aid in this endeavor, and highlight specific
examples where genetic programming has automated machine learning analyses in
previous projects.Comment: 14 pages, 5 figures, submitted to Genetic Programming Theory and
Practice 2017 worksho