17 research outputs found
Improve learning combining crowdsourced labels by weighting Areas Under the Margin
In supervised learning -- for instance in image classification -- modern
massive datasets are commonly labeled by a crowd of workers. The obtained
labels in this crowdsourcing setting are then aggregated for training. The
aggregation step generally leverages a per worker trust score. Yet, such
worker-centric approaches discard each task ambiguity. Some intrinsically
ambiguous tasks might even fool expert workers, which could eventually be
harmful for the learning step. In a standard supervised learning setting --
with one label per task and balanced classes -- the Area Under the Margin (AUM)
statistic is tailored to identify mislabeled data. We adapt the AUM to identify
ambiguous tasks in crowdsourced learning scenarios, introducing the Weighted
AUM (WAUM). The WAUM is an average of AUMs weighted by worker and task
dependent scores. We show that the WAUM can help discarding ambiguous tasks
from the training set, leading to better generalization or calibration
performance. We report improvements with respect to feature-blind aggregation
strategies both for simulated settings and for the CIFAR-10H crowdsourced
dataset
Benchopt: Reproducible, efficient and collaborative optimization benchmarks
Numerical validation is at the core of machine learning research as it allows
to assess the actual impact of new methods, and to confirm the agreement
between theory and practice. Yet, the rapid development of the field poses
several challenges: researchers are confronted with a profusion of methods to
compare, limited transparency and consensus on best practices, as well as
tedious re-implementation work. As a result, validation is often very partial,
which can lead to wrong conclusions that slow down the progress of research. We
propose Benchopt, a collaborative framework to automate, reproduce and publish
optimization benchmarks in machine learning across programming languages and
hardware architectures. Benchopt simplifies benchmarking for the community by
providing an off-the-shelf tool for running, sharing and extending experiments.
To demonstrate its broad usability, we showcase benchmarks on three standard
learning tasks: -regularized logistic regression, Lasso, and ResNet18
training for image classification. These benchmarks highlight key practical
findings that give a more nuanced view of the state-of-the-art for these
problems, showing that for practical evaluation, the devil is in the details.
We hope that Benchopt will foster collaborative work in the community hence
improving the reproducibility of research findings.Comment: Accepted in proceedings of NeurIPS 22; Benchopt library documentation
is available at https://benchopt.github.io
Improve learning combining crowdsourced labels by weighting Areas Under the Margin
In supervised learning -- for instance in image classification -- modern massive datasets are commonly labeled by a crowd of workers. The obtained labels in this crowdsourcing setting are then aggregated for training. The aggregation step generally leverages a per worker trust score. Yet, such worker-centric approaches discard each task ambiguity. Some intrinsically ambiguous tasks might even fool expert workers, which could eventually be harmful for the learning step. In a standard supervised learning setting -- with one label per task and balanced classes -- the Area Under the Margin (AUM) statistic is tailored to identify mislabeled data. We adapt the AUM to identify ambiguous tasks in crowdsourced learning scenarios, introducing the Weighted AUM (WAUM). The WAUM is an average of AUMs weighted by worker and task dependent scores. We show that the WAUM can help discarding ambiguous tasks from the training set, leading to better generalization or calibration performance. We report improvements with respect to feature-blind aggregation strategies both for simulated settings and for the CIFAR-10H crowdsourced dataset
Peerannot: classification for crowdsourced image datasets with Python
Crowdsourcing is a quick and easy way to collect labels for large datasets, involving many workers. However, workers often disagree with each other. Sources of error can arise from the workers’ skills, but also from the intrinsic difficulty of the task. We present peerannot: a Python library for managing and learning from crowdsourced labels for classification. Our library allows users to aggregate labels from common noise models or train a deep learning-based classifier directly from crowdsourced labels. In addition, we provide an identification module to easily explore the task difficulty of datasets and worker capabilities
Cooperative learning of Pl@ntNet’s Artificial Intelligence algorithm using label aggregation
National audienceThe Pl@ntNet system enables global data collection by allowing users to upload and annotate plant observations, leading to noisy labels due to diverse user skills.Achieving consensus is crucial for training, but the vast scale of collected data makes traditional label aggregation strategies challenging. Additionally, as many species are rarely observed, user expertise can not be evaluated as an inter-user agreement: otherwise, botanical experts would have a lower weight in the training step than the average user as they have fewer but precise participation. Our proposed label aggregation strategy aims to cooperatively train plant identification models. This strategy estimates user expertise as a trust score per worker based on their ability to identify plant species from crowdsourced data.The trust score is recursively estimated from correctly identified species given the currentestimated labels. This interpretable score exploits botanical experts’ knowledge and the heterogeneity of users. We evaluate our strategy on a large subset of the Pl@ntNet database focused on European flora, comprising over 6 000 000 observations and 800 000 users.We demonstrate that estimating users’ skills based on the diversity of their expertise enhanceslabeling performance
Cooperative learning of Pl@ntNet’s Artificial Intelligence algorithm using label aggregation
National audienceThe Pl@ntNet system enables global data collection by allowing users to upload and annotate plant observations, leading to noisy labels due to diverse user skills.Achieving consensus is crucial for training, but the vast scale of collected data makes traditional label aggregation strategies challenging. Additionally, as many species are rarely observed, user expertise can not be evaluated as an inter-user agreement: otherwise, botanical experts would have a lower weight in the training step than the average user as they have fewer but precise participation. Our proposed label aggregation strategy aims to cooperatively train plant identification models. This strategy estimates user expertise as a trust score per worker based on their ability to identify plant species from crowdsourced data.The trust score is recursively estimated from correctly identified species given the currentestimated labels. This interpretable score exploits botanical experts’ knowledge and the heterogeneity of users. We evaluate our strategy on a large subset of the Pl@ntNet database focused on European flora, comprising over 6 000 000 observations and 800 000 users.We demonstrate that estimating users’ skills based on the diversity of their expertise enhanceslabeling performance
Cooperative learning of Pl@ntNet's Artificial Intelligence algorithm: how does it work and how can we improve it?
International audienceDeep learning models for plant species identification rely on large annotated datasets. The PlantNet system enables global data collection by allowing users to upload and annotate plant observations, leading to noisy labels due to diverse user skills. Achieving consensus is crucial for training, but the vast scale of collected data makes traditional label aggregation strategies challenging. Existing methods either retain all observations, resulting in noisy training data or selectively keep those with sufficient votes, discarding valuable information. Additionally, as many species are rarely observed, user expertise can not be evaluated as an inter-user agreement: otherwise, botanical experts would have a lower weight in the AI training step than the average user. Our proposed label aggregation strategy aims to cooperatively train plant identification AI models. This strategy estimates user expertise as a trust score per user based on their ability to identify plant species from crowdsourced data. The trust score is recursively estimated from correctly identified species given the current estimated labels. This interpretable score exploits botanical experts' knowledge and the heterogeneity of users. Subsequently, our strategy removes unreliable observations but retains those with limited trusted annotations, unlike other approaches. We evaluate PlantNet's strategy on a released large subset of the PlantNet database focused on European flora, comprising over 6M observations and 800K users. We demonstrate that estimating users' skills based on the diversity of their expertise enhances labeling performance. Our findings emphasize the synergy of human annotation and data filtering in improving AI performance for a refined dataset. We explore incorporating AI-based votes alongside human input. This can further enhance human-AI interactions to detect unreliable observations
Benchopt: Reproducible, efficient and collaborative optimization benchmarks
International audienceNumerical validation is at the core of machine learning research as it allows to assess the actual impact of new methods, and to confirm the agreement between theory and practice. Yet, the rapid development of the field poses several challenges: researchers are confronted with a profusion of methods to compare, limited transparency and consensus on best practices, as well as tedious re-implementation work. As a result, validation is often very partial, which can lead to wrong conclusions that slow down the progress of research. We propose Benchopt, a collaborative framework to automate, reproduce and publish optimization benchmarks in machine learning across programming languages and hardware architectures. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments. To demonstrate its broad usability, we showcase benchmarks on three standard learning tasks: â„“ 2-regularized logistic regression, Lasso, and ResNet18 training for image classification. These benchmarks highlight key practical findings that give a more nuanced view of the state-of-the-art for these problems, showing that for practical evaluation, the devil is in the details. We hope that Benchopt will foster collaborative work in the community hence improving the reproducibility of research findings
Benchopt: Reproducible, efficient and collaborative optimization benchmarks
International audienceNumerical validation is at the core of machine learning research as it allows to assess the actual impact of new methods, and to confirm the agreement between theory and practice. Yet, the rapid development of the field poses several challenges: researchers are confronted with a profusion of methods to compare, limited transparency and consensus on best practices, as well as tedious re-implementation work. As a result, validation is often very partial, which can lead to wrong conclusions that slow down the progress of research. We propose Benchopt, a collaborative framework to automate, reproduce and publish optimization benchmarks in machine learning across programming languages and hardware architectures. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments. To demonstrate its broad usability, we showcase benchmarks on three standard learning tasks: â„“ 2-regularized logistic regression, Lasso, and ResNet18 training for image classification. These benchmarks highlight key practical findings that give a more nuanced view of the state-of-the-art for these problems, showing that for practical evaluation, the devil is in the details. We hope that Benchopt will foster collaborative work in the community hence improving the reproducibility of research findings
Benchopt: Reproducible, efficient and collaborative optimization benchmarks
International audienceNumerical validation is at the core of machine learning research as it allows to assess the actual impact of new methods, and to confirm the agreement between theory and practice. Yet, the rapid development of the field poses several challenges: researchers are confronted with a profusion of methods to compare, limited transparency and consensus on best practices, as well as tedious re-implementation work. As a result, validation is often very partial, which can lead to wrong conclusions that slow down the progress of research. We propose Benchopt, a collaborative framework to automate, reproduce and publish optimization benchmarks in machine learning across programming languages and hardware architectures. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments. To demonstrate its broad usability, we showcase benchmarks on three standard learning tasks: â„“ 2-regularized logistic regression, Lasso, and ResNet18 training for image classification. These benchmarks highlight key practical findings that give a more nuanced view of the state-of-the-art for these problems, showing that for practical evaluation, the devil is in the details. We hope that Benchopt will foster collaborative work in the community hence improving the reproducibility of research findings