47 research outputs found
Visually grounded learning of keyword prediction from untranscribed speech
During language acquisition, infants have the benefit of visual cues to
ground spoken language. Robots similarly have access to audio and visual
sensors. Recent work has shown that images and spoken captions can be mapped
into a meaningful common space, allowing images to be retrieved using speech
and vice versa. In this setting of images paired with untranscribed spoken
captions, we consider whether computer vision systems can be used to obtain
textual labels for the speech. Concretely, we use an image-to-words multi-label
visual classifier to tag images with soft textual labels, and then train a
neural network to map from the speech to these soft targets. We show that the
resulting speech system is able to predict which words occur in an
utterance---acting as a spoken bag-of-words classifier---without seeing any
parallel speech and text. We find that the model often confuses semantically
related words, e.g. "man" and "person", making it even more effective as a
semantic keyword spotter.Comment: 5 pages, 3 figures, 5 tables; small updates, added link to code;
accepted to Interspeech 201
Neural Network-based Word Alignment through Score Aggregation
We present a simple neural network for word alignment that builds source and
target word window representations to compute alignment scores for sentence
pairs. To enable unsupervised training, we use an aggregation operation that
summarizes the alignment scores for a given target word. A soft-margin
objective increases scores for true target words while decreasing scores for
target words that are not present. Compared to the popular Fast Align model,
our approach improves alignment accuracy by 7 AER on English-Czech, by 6 AER on
Romanian-English and by 1.7 AER on English-French alignment
Exploring Feature Representation Learning for Semi-supervised Medical Image Segmentation
This paper presents a simple yet effective two-stage framework for
semi-supervised medical image segmentation. Our key insight is to explore the
feature representation learning with labeled and unlabeled (i.e., pseudo
labeled) images to enhance the segmentation performance. In the first stage, we
present an aleatoric uncertainty-aware method, namely AUA, to improve the
segmentation performance for generating high-quality pseudo labels. Considering
the inherent ambiguity of medical images, AUA adaptively regularizes the
consistency on images with low ambiguity. To enhance the representation
learning, we propose a stage-adaptive contrastive learning method, including a
boundary-aware contrastive loss to regularize the labeled images in the first
stage and a prototype-aware contrastive loss to optimize both labeled and
pseudo labeled images in the second stage. The boundary-aware contrastive loss
only optimizes pixels around the segmentation boundaries to reduce the
computational cost. The prototype-aware contrastive loss fully leverages both
labeled images and pseudo labeled images by building a centroid for each class
to reduce computational cost for pair-wise comparison. Our method achieves the
best results on two public medical image segmentation benchmarks. Notably, our
method outperforms the prior state-of-the-art by 5.7% on Dice for colon tumor
segmentation relying on just 5% labeled images.Comment: On submission to TM
Regularized Optimal Transport Layers for Generalized Global Pooling Operations
Global pooling is one of the most significant operations in many machine
learning models and tasks, which works for information fusion and structured
data (like sets and graphs) representation. However, without solid mathematical
fundamentals, its practical implementations often depend on empirical
mechanisms and thus lead to sub-optimal, even unsatisfactory performance. In
this work, we develop a novel and generalized global pooling framework through
the lens of optimal transport. The proposed framework is interpretable from the
perspective of expectation-maximization. Essentially, it aims at learning an
optimal transport across sample indices and feature dimensions, making the
corresponding pooling operation maximize the conditional expectation of input
data. We demonstrate that most existing pooling methods are equivalent to
solving a regularized optimal transport (ROT) problem with different
specializations, and more sophisticated pooling operations can be implemented
by hierarchically solving multiple ROT problems. Making the parameters of the
ROT problem learnable, we develop a family of regularized optimal transport
pooling (ROTP) layers. We implement the ROTP layers as a new kind of deep
implicit layer. Their model architectures correspond to different optimization
algorithms. We test our ROTP layers in several representative set-level machine
learning scenarios, including multi-instance learning (MIL), graph
classification, graph set representation, and image classification.
Experimental results show that applying our ROTP layers can reduce the
difficulty of the design and selection of global pooling -- our ROTP layers may
either imitate some existing global pooling methods or lead to some new pooling
layers fitting data better. The code is available at
\url{https://github.com/SDS-Lab/ROT-Pooling}
Empowering Collaborative Filtering with Principled Adversarial Contrastive Loss
Contrastive Learning (CL) has achieved impressive performance in
self-supervised learning tasks, showing superior generalization ability.
Inspired by the success, adopting CL into collaborative filtering (CF) is
prevailing in semi-supervised top-K recommendations. The basic idea is to
routinely conduct heuristic-based data augmentation and apply contrastive
losses (e.g., InfoNCE) on the augmented views. Yet, some CF-tailored challenges
make this adoption suboptimal, such as the issue of out-of-distribution, the
risk of false negatives, and the nature of top-K evaluation. They necessitate
the CL-based CF scheme to focus more on mining hard negatives and
distinguishing false negatives from the vast unlabeled user-item interactions,
for informative contrast signals. Worse still, there is limited understanding
of contrastive loss in CF methods, especially w.r.t. its generalization
ability. To bridge the gap, we delve into the reasons underpinning the success
of contrastive loss in CF, and propose a principled Adversarial InfoNCE loss
(AdvInfoNCE), which is a variant of InfoNCE, specially tailored for CF methods.
AdvInfoNCE adaptively explores and assigns hardness to each negative instance
in an adversarial fashion and further utilizes a fine-grained hardness-aware
ranking criterion to empower the recommender's generalization ability. Training
CF models with AdvInfoNCE, we validate the effectiveness of AdvInfoNCE on both
synthetic and real-world benchmark datasets, thus showing its generalization
ability to mitigate out-of-distribution problems. Given the theoretical
guarantees and empirical superiority of AdvInfoNCE over most contrastive loss
functions, we advocate its adoption as a standard loss in recommender systems,
particularly for the out-of-distribution tasks. Codes are available at
https://github.com/LehengTHU/AdvInfoNCE.Comment: Accepted to NeurIPS 202
Fairness guarantee in multi-class classification
Algorithmic Fairness is an established area of machine learning, willing to
reduce the influence of biases in the data. Yet, despite its wide range of
applications, very few works consider the multi-class classification setting
from the fairness perspective. We extend both definitions of exact and
approximate fairness in the case of Demographic Parity to multi-class
classification. We specify the corresponding expressions of the optimal fair
classifiers. This suggests a plug-in data-driven procedure, for which we
establish theoretical guarantees. The enhanced estimator is proved to mimic the
behavior of the optimal rule both in terms of fairness and risk. Notably,
fairness guarantees are distribution-free. The approach is evaluated on both
synthetic and real datasets and turns out to be very effective in decision
making with a preset level of unfairness. In addition, our method is
competitive with the state-of-the-art in-processing fairlearn in the specific
binary classification setting