4 research outputs found

    Look and You Will Find It:Fairness-Aware Data Collection through Active Learning

    Get PDF
    Machine learning models are often trained on data sets subject to selection bias. In particular, selection bias can be hard to avoid in scenarios where the proportion of positives is low and labeling is expensive, such as fraud detection. However, when selection bias is related to sensitive characteristics such as gender and race, it can result in an unequal distribution of burdens across sensitive groups, where marginalized groups are misrepresented and disproportionately scrutinized. Moreover, when the predictions of existing systems affect the selection of new labels, a feedback loop can occur in which selection bias is amplified over time. In this work, we explore the effectiveness of active learning approaches to mitigate fairnessrelated harm caused by selection bias. Active learning approaches aim to select the most informative instances from unlabeled data. We hypothesize that this characteristic steers data collection towards underexplored areas of the feature space and away from overexplored areas – including areas affectedby selection bias. Our preliminary simulation results confirm the intuition that active learning can mitigate the negative consequences of selection bias, compared to both the baseline scenario and random sampling.<br/

    Look and You Will Find It:Fairness-Aware Data Collection through Active Learning

    No full text
    Machine learning models are often trained on data sets subject to selection bias. In particular, selection bias can be hard to avoid in scenarios where the proportion of positives is low and labeling is expensive, such as fraud detection. However, when selection bias is related to sensitive characteristics such as gender and race, it can result in an unequal distribution of burdens across sensitive groups, where marginalized groups are misrepresented and disproportionately scrutinized. Moreover, when the predictions of existing systems affect the selection of new labels, a feedback loop can occur in which selection bias is amplified over time. In this work, we explore the effectiveness of active learning approaches to mitigate fairnessrelated harm caused by selection bias. Active learning approaches aim to select the most informative instances from unlabeled data. We hypothesize that this characteristic steers data collection towards underexplored areas of the feature space and away from overexplored areas – including areas affectedby selection bias. Our preliminary simulation results confirm the intuition that active learning can mitigate the negative consequences of selection bias, compared to both the baseline scenario and random sampling.<br/

    Characterizing Data Scientists' Mental Models of Local Feature Importance

    Get PDF
    Feature importance is an approach that helps to explain machine learning model predictions. It works through assigning importance scores to input features of a particular model. Different techniques exist to derive these scores, with widely varying underlying assumptions of what importance means. Little research has been done to verify whether these assumptions match the expectations of the target user, which is imperative to ensure that feature importance values are not misinterpreted. In this work, we explore data scientists’ mental models of (local) feature importance and compare these with the conceptual models of the techniques. We first identify several properties of local feature importance techniques that could potentially lead to misinterpretations. Subsequently, we explore the expectations data scientists have about local feature importance through an exploratory (qualitative and quantitative) survey of 34 data scientists in industry. We compare the identified expectations to the theory and assumptions behind the techniques and find that the two are not (always) in agreement

    Have it both ways : from A/B testing to A&B testing with exceptional model mining

    No full text
    In traditional A/B testing, we have two variants of the same product, a pool of test subjects, and a measure of success. In a randomized experiment, each test subject is presented with one of the two variants, and the measure of success is aggregated per variant. The variant of the product associated with the most success is retained, while the other variant is discarded. This, however, presumes that the company producing the products only has enough capacity to maintain one of the two product variants. If more capacity is available, then advanced data science techniques can extract more profit for the company from the A/B testing results. Exceptional Model Mining is one such advanced data science technique, which specializes in identifying subgroups that behave differently from the overall population. Using the association model class for EMM, we can find subpopulations that prefer variant A where the general population prefers variant B, and vice versa. This data science technique is applied on data from StudyPortals, a global study choice platform that ran an A/B test on the design of aspects of their website
    corecore