24 research outputs found
A Nutritional Label for Rankings
Algorithmic decisions often result in scoring and ranking individuals to
determine credit worthiness, qualifications for college admissions and
employment, and compatibility as dating partners. While automatic and seemingly
objective, ranking algorithms can discriminate against individuals and
protected groups, and exhibit low diversity. Furthermore, ranked results are
often unstable --- small changes in the input data or in the ranking
methodology may lead to drastic changes in the output, making the result
uninformative and easy to manipulate. Similar concerns apply in cases where
items other than individuals are ranked, including colleges, academic
departments, or products.
In this demonstration we present Ranking Facts, a Web-based application that
generates a "nutritional label" for rankings. Ranking Facts is made up of a
collection of visual widgets that implement our latest research results on
fairness, stability, and transparency for rankings, and that communicate
details of the ranking methodology, or of the output, to the end user. We will
showcase Ranking Facts on real datasets from different domains, including
college rankings, criminal risk assessment, and financial services.Comment: 4 pages, SIGMOD demo, 3 figuress, ACM SIGMOD 201
Interpretable multiclass classification by MDL-based rule lists
Interpretable classifiers have recently witnessed an increase in attention
from the data mining community because they are inherently easier to understand
and explain than their more complex counterparts. Examples of interpretable
classification models include decision trees, rule sets, and rule lists.
Learning such models often involves optimizing hyperparameters, which typically
requires substantial amounts of data and may result in relatively large models.
In this paper, we consider the problem of learning compact yet accurate
probabilistic rule lists for multiclass classification. Specifically, we
propose a novel formalization based on probabilistic rule lists and the minimum
description length (MDL) principle. This results in virtually parameter-free
model selection that naturally allows to trade-off model complexity with
goodness of fit, by which overfitting and the need for hyperparameter tuning
are effectively avoided. Finally, we introduce the Classy algorithm, which
greedily finds rule lists according to the proposed criterion. We empirically
demonstrate that Classy selects small probabilistic rule lists that outperform
state-of-the-art classifiers when it comes to the combination of predictive
performance and interpretability. We show that Classy is insensitive to its
only parameter, i.e., the candidate set, and that compression on the training
set correlates with classification performance, validating our MDL-based
selection criterion