3 research outputs found

    Climbing the Kaggle Leaderboard by Exploiting the Log-Loss Oracle

    Full text link
    In the context of data-mining competitions (e.g., Kaggle, KDDCup, ILSVRC Challenge), we show how access to an oracle that reports a contestant's log-loss score on the test set can be exploited to deduce the ground-truth of some of the test examples. By applying this technique iteratively to batches of mm examples (for small mm), all of the test labels can eventually be inferred. In this paper, (1) We demonstrate this attack on the first stage of a recent Kaggle competition (Intel & MobileODT Cancer Screening) and use it to achieve a log-loss of 0.000000.00000 (and thus attain a rank of #4 out of 848 contestants), without ever training a classifier to solve the actual task. (2) We prove an upper bound on the batch size mm as a function of the floating-point resolution of the probability estimates that the contestant submits for the labels. (3) We derive, and demonstrate in simulation, a more flexible attack that can be used even when the oracle reports the accuracy on an unknown (but fixed) subset of the test set's labels. These results underline the importance of evaluating contestants based only on test data that the oracle does not examine

    Beyond the Leaderboard: Insight and Deployment Challenges to Address Research Problems

    Full text link
    In the medical image analysis field, organizing challenges with associated workshops at international conferences began in 2007 and has grown to include over 150 challenges. Several of these challenges have had a major impact in the field. However, whereas well-designed challenges have the potential to unite and focus the field on creating solutions to important problems, poorly designed and documented challenges can equally impede a field and lead to pursuing incremental improvements in metric scores with no theoretic or clinical significance. This is supported by a critical assessment of challenges at the international MICCAI conference. In this assessment the main observation was that small changes to the underlying challenge data can drastically change the ranking order on the leaderboard. Related to this is the practice of leaderboard climbing, which is characterized by participants focusing on incrementally improving metric results rather than advancing science or solving the driving problem of a challenge. In this abstract we look beyond the leaderboard of a challenge and instead look at the conclusions that can be drawn from a challenge with respect to the research problem that it is addressing. Research study design is well described in other research areas and can be translated to challenge design when viewing challenges as research studies on algorithm performance that address a research problem. Based on the two main types of scientific research study design, we propose two main challenge types, which we think would benefit other research areas as well: 1) an insight challenge that is based on a qualitative study design and 2) a deployment challenge that is based on a quantitative study design. In addition we briefly touch upon related considerations with respect to statistical significance versus practical significance, generalizability and data saturation.Comment: This two-page abstract was accepted for the NIPS 2018 Challenges in Machine Learning (CiML) workshop "Machine Learning competitions "in the wild": Playing in the real world or in real time" on Saturday December 8, 2018 in Palais des congres de Montreal, Canad

    On Primes, Log-Loss Scores and (No) Privacy

    Full text link
    Membership Inference Attacks exploit the vulnerabilities of exposing models trained on customer data to queries by an adversary. In a recently proposed implementation of an auditing tool for measuring privacy leakage from sensitive datasets, more refined aggregates like the Log-Loss scores are exposed for simulating inference attacks as well as to assess the total privacy leakage based on the adversary's predictions. In this paper, we prove that this additional information enables the adversary to infer the membership of any number of datapoints with full accuracy in a single query, causing complete membership privacy breach. Our approach obviates any attack model training or access to side knowledge with the adversary. Moreover, our algorithms are agnostic to the model under attack and hence, enable perfect membership inference even for models that do not memorize or overfit. In particular, our observations provide insight into the extent of information leakage from statistical aggregates and how they can be exploited
    corecore