957 research outputs found
Interpretable multiclass classification by MDL-based rule lists
Interpretable classifiers have recently witnessed an increase in attention
from the data mining community because they are inherently easier to understand
and explain than their more complex counterparts. Examples of interpretable
classification models include decision trees, rule sets, and rule lists.
Learning such models often involves optimizing hyperparameters, which typically
requires substantial amounts of data and may result in relatively large models.
In this paper, we consider the problem of learning compact yet accurate
probabilistic rule lists for multiclass classification. Specifically, we
propose a novel formalization based on probabilistic rule lists and the minimum
description length (MDL) principle. This results in virtually parameter-free
model selection that naturally allows to trade-off model complexity with
goodness of fit, by which overfitting and the need for hyperparameter tuning
are effectively avoided. Finally, we introduce the Classy algorithm, which
greedily finds rule lists according to the proposed criterion. We empirically
demonstrate that Classy selects small probabilistic rule lists that outperform
state-of-the-art classifiers when it comes to the combination of predictive
performance and interpretability. We show that Classy is insensitive to its
only parameter, i.e., the candidate set, and that compression on the training
set correlates with classification performance, validating our MDL-based
selection criterion
Caveat Emptor:The Risks of Using Big Data for Human Development
"Big Data" has the potential to facilitate sustainable development in many sectors of life such as education, health, agriculture, and in combating humanitarian crises and violent conflicts. However, lurking beneath the immense promises of Big Data are some significant risks such as 1) the potential use of Big Data for unethical ends; 2) its ability to mislead through reliance on unrepresentative and biased data; and 3) the various privacy and security challenges associated with data (including the danger of an adversary tampering with the data to harm people). These risks can have severe consequences and a better understanding of these risks is the first step towards their mitigation of these risks. In this article, we highlight the potential dangers associated with using Big Data, particularly for human development
The Hidden Inconsistencies Introduced by Predictive Algorithms in Judicial Decision Making
Algorithms, from simple automation to machine learning, have been introduced
into judicial contexts to ostensibly increase the consistency and efficiency of
legal decision making. In this paper, we describe four types of inconsistencies
introduced by risk prediction algorithms. These inconsistencies threaten to
violate the principle of treating similar cases similarly and often arise from
the need to operationalize legal concepts and human behavior into specific
measures that enable the building and evaluation of predictive algorithms.
These inconsistencies, however, are likely to be hidden from their end-users:
judges, parole officers, lawyers, and other decision-makers. We describe the
inconsistencies, their sources, and propose various possible indicators and
solutions. We also consider the issue of inconsistencies due to the use of
algorithms in light of current trends towards more autonomous algorithms and
less human-understandable behavioral big data. We conclude by discussing judges
and lawyers' duties of technological ("algorithmic") competence and call for
greater alignment between the evaluation of predictive algorithms and
corresponding judicial goals
European Union regulations on algorithmic decision-making and a "right to explanation"
We summarize the potential impact that the European Union's new General Data
Protection Regulation will have on the routine use of machine learning
algorithms. Slated to take effect as law across the EU in 2018, it will
restrict automated individual decision-making (that is, algorithms that make
decisions based on user-level predictors) which "significantly affect" users.
The law will also effectively create a "right to explanation," whereby a user
can ask for an explanation of an algorithmic decision that was made about them.
We argue that while this law will pose large challenges for industry, it
highlights opportunities for computer scientists to take the lead in designing
algorithms and evaluation frameworks which avoid discrimination and enable
explanation.Comment: presented at 2016 ICML Workshop on Human Interpretability in Machine
Learning (WHI 2016), New York, N
Bridging the Gap: Towards an Expanded Toolkit for ML-Supported Decision-Making in the Public Sector
Machine Learning (ML) systems are becoming instrumental in the public sector,
with applications spanning areas like criminal justice, social welfare,
financial fraud detection, and public health. While these systems offer great
potential benefits to institutional decision-making processes, such as improved
efficiency and reliability, they still face the challenge of aligning intricate
and nuanced policy objectives with the precise formalization requirements
necessitated by ML models. In this paper, we aim to bridge the gap between ML
and public sector decision-making by presenting a comprehensive overview of key
technical challenges where disjunctions between policy goals and ML models
commonly arise. We concentrate on pivotal points of the ML pipeline that
connect the model to its operational environment, delving into the significance
of representative training data and highlighting the importance of a model
setup that facilitates effective decision-making. Additionally, we link these
challenges with emerging methodological advancements, encompassing causal ML,
domain adaptation, uncertainty quantification, and multi-objective
optimization, illustrating the path forward for harmonizing ML and public
sector objectives
- …