3 research outputs found
Climbing the Kaggle Leaderboard by Exploiting the Log-Loss Oracle
In the context of data-mining competitions (e.g., Kaggle, KDDCup, ILSVRC
Challenge), we show how access to an oracle that reports a contestant's
log-loss score on the test set can be exploited to deduce the ground-truth of
some of the test examples. By applying this technique iteratively to batches of
examples (for small ), all of the test labels can eventually be
inferred. In this paper, (1) We demonstrate this attack on the first stage of a
recent Kaggle competition (Intel & MobileODT Cancer Screening) and use it to
achieve a log-loss of (and thus attain a rank of #4 out of 848
contestants), without ever training a classifier to solve the actual task. (2)
We prove an upper bound on the batch size as a function of the
floating-point resolution of the probability estimates that the contestant
submits for the labels. (3) We derive, and demonstrate in simulation, a more
flexible attack that can be used even when the oracle reports the accuracy on
an unknown (but fixed) subset of the test set's labels. These results underline
the importance of evaluating contestants based only on test data that the
oracle does not examine
Beyond the Leaderboard: Insight and Deployment Challenges to Address Research Problems
In the medical image analysis field, organizing challenges with associated
workshops at international conferences began in 2007 and has grown to include
over 150 challenges. Several of these challenges have had a major impact in the
field. However, whereas well-designed challenges have the potential to unite
and focus the field on creating solutions to important problems, poorly
designed and documented challenges can equally impede a field and lead to
pursuing incremental improvements in metric scores with no theoretic or
clinical significance. This is supported by a critical assessment of challenges
at the international MICCAI conference. In this assessment the main observation
was that small changes to the underlying challenge data can drastically change
the ranking order on the leaderboard. Related to this is the practice of
leaderboard climbing, which is characterized by participants focusing on
incrementally improving metric results rather than advancing science or solving
the driving problem of a challenge. In this abstract we look beyond the
leaderboard of a challenge and instead look at the conclusions that can be
drawn from a challenge with respect to the research problem that it is
addressing. Research study design is well described in other research areas and
can be translated to challenge design when viewing challenges as research
studies on algorithm performance that address a research problem. Based on the
two main types of scientific research study design, we propose two main
challenge types, which we think would benefit other research areas as well: 1)
an insight challenge that is based on a qualitative study design and 2) a
deployment challenge that is based on a quantitative study design. In addition
we briefly touch upon related considerations with respect to statistical
significance versus practical significance, generalizability and data
saturation.Comment: This two-page abstract was accepted for the NIPS 2018 Challenges in
Machine Learning (CiML) workshop "Machine Learning competitions "in the
wild": Playing in the real world or in real time" on Saturday December 8,
2018 in Palais des congres de Montreal, Canad
On Primes, Log-Loss Scores and (No) Privacy
Membership Inference Attacks exploit the vulnerabilities of exposing models
trained on customer data to queries by an adversary. In a recently proposed
implementation of an auditing tool for measuring privacy leakage from sensitive
datasets, more refined aggregates like the Log-Loss scores are exposed for
simulating inference attacks as well as to assess the total privacy leakage
based on the adversary's predictions. In this paper, we prove that this
additional information enables the adversary to infer the membership of any
number of datapoints with full accuracy in a single query, causing complete
membership privacy breach. Our approach obviates any attack model training or
access to side knowledge with the adversary. Moreover, our algorithms are
agnostic to the model under attack and hence, enable perfect membership
inference even for models that do not memorize or overfit. In particular, our
observations provide insight into the extent of information leakage from
statistical aggregates and how they can be exploited