17,318 research outputs found
Penalizing Unfairness in Binary Classification
We present a new approach for mitigating unfairness in learned classifiers.
In particular, we focus on binary classification tasks over individuals from
two populations, where, as our criterion for fairness, we wish to achieve
similar false positive rates in both populations, and similar false negative
rates in both populations. As a proof of concept, we implement our approach and
empirically evaluate its ability to achieve both fairness and accuracy, using
datasets from the fields of criminal risk assessment, credit, lending, and
college admissions
Revisiting chaos in stimulus-driven spiking networks: signal encoding and discrimination
Highly connected recurrent neural networks often produce chaotic dynamics,
meaning their precise activity is sensitive to small perturbations. What are
the consequences for how such networks encode streams of temporal stimuli? On
the one hand, chaos is a strong source of randomness, suggesting that small
changes in stimuli will be obscured by intrinsically generated variability. On
the other hand, recent work shows that the type of chaos that occurs in spiking
networks can have a surprisingly low-dimensional structure, suggesting that
there may be "room" for fine stimulus features to be precisely resolved. Here
we show that strongly chaotic networks produce patterned spikes that reliably
encode time-dependent stimuli: using a decoder sensitive to spike times on
timescales of 10's of ms, one can easily distinguish responses to very similar
inputs. Moreover, recurrence serves to distribute signals throughout chaotic
networks so that small groups of cells can encode substantial information about
signals arriving elsewhere. A conclusion is that the presence of strong chaos
in recurrent networks does not prohibit precise stimulus encoding.Comment: 8 figure
Actionable Recourse in Linear Classification
Machine learning models are increasingly used to automate decisions that
affect humans - deciding who should receive a loan, a job interview, or a
social service. In such applications, a person should have the ability to
change the decision of a model. When a person is denied a loan by a credit
score, for example, they should be able to alter its input variables in a way
that guarantees approval. Otherwise, they will be denied the loan as long as
the model is deployed. More importantly, they will lack the ability to
influence a decision that affects their livelihood.
In this paper, we frame these issues in terms of recourse, which we define as
the ability of a person to change the decision of a model by altering
actionable input variables (e.g., income vs. age or marital status). We present
integer programming tools to ensure recourse in linear classification problems
without interfering in model development. We demonstrate how our tools can
inform stakeholders through experiments on credit scoring problems. Our results
show that recourse can be significantly affected by standard practices in model
development, and motivate the need to evaluate recourse in practice.Comment: Extended version. ACM Conference on Fairness, Accountability and
Transparency [FAT2019
Modeling the Impact of Baryons on Subhalo Populations with Machine Learning
We identify subhalos in dark matter-only (DMO) zoom-in simulations that are
likely to be disrupted due to baryonic effects by using a random forest
classifier trained on two hydrodynamic simulations of Milky Way (MW)-mass host
halos from the Latte suite of the Feedback in Realistic Environments (FIRE)
project. We train our classifier using five properties of each disrupted and
surviving subhalo: pericentric distance and scale factor at first pericentric
passage after accretion, and scale factor, virial mass, and maximum circular
velocity at accretion. Our five-property classifier identifies disrupted
subhalos in the FIRE simulations with an out-of-bag classification
score. We predict surviving subhalo populations in DMO simulations of the FIRE
host halos, finding excellent agreement with the hydrodynamic results; in
particular, our classifier outperforms DMO zoom-in simulations that include the
gravitational potential of the central galactic disk in each hydrodynamic
simulation, indicating that it captures both the dynamical effects of a central
disk and additional baryonic physics. We also predict surviving subhalo
populations for a suite of DMO zoom-in simulations of MW-mass host halos,
finding that baryons impact each system consistently and that the predicted
amount of subhalo disruption is larger than the host-to-host scatter among the
subhalo populations. Although the small size and specific baryonic physics
prescription of our training set limits the generality of our results, our work
suggests that machine-learning classification algorithms trained on
hydrodynamic zoom-in simulations can efficiently predict realistic subhalo
populations.Comment: 20 pages, 14 figures. Updated to published version. Code available at
https://github.com/ollienad/subhalo_randomfores
Predicting diabetes-related hospitalizations based on electronic health records
OBJECTIVE: To derive a predictive model to identify patients likely to be hospitalized during the following year due to complications attributed to Type II diabetes. METHODS: A variety of supervised machine learning classification methods were tested and a new method that discovers hidden patient clusters in the positive class (hospitalized) was developed while, at the same time, sparse linear support vector machine classifiers were derived to separate positive samples from the negative ones (non-hospitalized). The convergence of the new method was established and theoretical guarantees were proved on how the classifiers it produces generalize to a test set not seen during training. RESULTS: The methods were tested on a large set of patients from the Boston Medical Center - the largest safety net hospital in New England. It is found that our new joint clustering/classification method achieves an accuracy of 89% (measured in terms of area under the ROC Curve) and yields informative clusters which can help interpret the classification results, thus increasing the trust of physicians to the algorithmic output and providing some guidance towards preventive measures. While it is possible to increase accuracy to 92% with other methods, this comes with increased computational cost and lack of interpretability. The analysis shows that even a modest probability of preventive actions being effective (more than 19%) suffices to generate significant hospital care savings. CONCLUSIONS: Predictive models are proposed that can help avert hospitalizations, improve health outcomes and drastically reduce hospital expenditures. The scope for savings is significant as it has been estimated that in the USA alone, about $5.8 billion are spent each year on diabetes-related hospitalizations that could be prevented.Accepted manuscrip
- …