9,727 research outputs found
Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification
International audienceWe study the problem of fair binary classification using the notion of Equal Opportunity. It requires the true positive rate to distribute equally across the sensitive groups. Within this setting we show that the fair optimal classifier is obtained by recalibrating the Bayes classifier by a group-dependent threshold. We provide a constructive expression for the threshold. This result motivates us to devise a plug-in classification procedure based on both unlabeled and labeled datasets. While the latter is used to learn the output conditional probability, the former is used for calibration. The overall procedure can be computed in polynomial time and it is shown to be statistically consistent both in terms of the classification error and fairness measure. Finally, we present numerical experiments which indicate that our method is often superior or competitive with the state-of-the-art methods on benchmark datasets
Label Independent Memory for Semi-Supervised Few-shot Video Classification.
In this paper, we propose to leverage freely available unlabeled video data to facilitate few-shot video classification. In this semi-supervised few-shot video classification task, millions of unlabeled data are available for each episode during training. These videos can be extremely imbalanced, while they have profound visual and motion dynamics. To tackle the semi-supervised few-shot video classification problem, we make the following contributions. First, we propose a label independent memory (LIM) to cache label related features, which enables a similarity search over a large set of videos. LIM produces a class prototype for few-shot training. This prototype is an aggregated embedding for each class, which is more robust to noisy video features. Second, we integrate a multi-modality compound memory network to capture both RGB and flow information. We propose to store the RGB and flow representation in two separate memory networks, but they are jointly optimized via a unified loss. In this way, mutual communications between the two modalities are leveraged to achieve better classification performance. Third, we conduct extensive experiments on the few-shot Kinetics-100, Something-Something-100 datasets, which validates the effectiveness of leveraging the accessible unlabeled data for few-shot classification
Generative Adversarial Positive-Unlabelled Learning
In this work, we consider the task of classifying binary positive-unlabeled
(PU) data. The existing discriminative learning based PU models attempt to seek
an optimal reweighting strategy for U data, so that a decent decision boundary
can be found. However, given limited P data, the conventional PU models tend to
suffer from overfitting when adapted to very flexible deep neural networks. In
contrast, we are the first to innovate a totally new paradigm to attack the
binary PU task, from perspective of generative learning by leveraging the
powerful generative adversarial networks (GAN). Our generative
positive-unlabeled (GenPU) framework incorporates an array of discriminators
and generators that are endowed with different roles in simultaneously
producing positive and negative realistic samples. We provide theoretical
analysis to justify that, at equilibrium, GenPU is capable of recovering both
positive and negative data distributions. Moreover, we show GenPU is
generalizable and closely related to the semi-supervised classification. Given
rather limited P data, experiments on both synthetic and real-world dataset
demonstrate the effectiveness of our proposed framework. With infinite
realistic and diverse sample streams generated from GenPU, a very flexible
classifier can then be trained using deep neural networks.Comment: 8 page
Learning to Infer from Unlabeled Data: A Semi-supervised Learning Approach for Robust Natural Language Inference
Natural Language Inference (NLI) or Recognizing Textual Entailment (RTE) aims
at predicting the relation between a pair of sentences (premise and hypothesis)
as entailment, contradiction or semantic independence. Although deep learning
models have shown promising performance for NLI in recent years, they rely on
large scale expensive human-annotated datasets. Semi-supervised learning (SSL)
is a popular technique for reducing the reliance on human annotation by
leveraging unlabeled data for training. However, despite its substantial
success on single sentence classification tasks where the challenge in making
use of unlabeled data is to assign "good enough" pseudo-labels, for NLI tasks,
the nature of unlabeled data is more complex: one of the sentences in the pair
(usually the hypothesis) along with the class label are missing from the data
and require human annotations, which makes SSL for NLI more challenging. In
this paper, we propose a novel way to incorporate unlabeled data in SSL for NLI
where we use a conditional language model, BART to generate the hypotheses for
the unlabeled sentences (used as premises). Our experiments show that our SSL
framework successfully exploits unlabeled data and substantially improves the
performance of four NLI datasets in low-resource settings. We release our code
at: https://github.com/msadat3/SSL_for_NLI.Comment: Accepted in EMNLP 2022 (Findings
- …