1,243 research outputs found
Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives
Over the past few years, adversarial training has become an extremely active
research topic and has been successfully applied to various Artificial
Intelligence (AI) domains. As a potentially crucial technique for the
development of the next generation of emotional AI systems, we herein provide a
comprehensive overview of the application of adversarial training to affective
computing and sentiment analysis. Various representative adversarial training
algorithms are explained and discussed accordingly, aimed at tackling diverse
challenges associated with emotional AI systems. Further, we highlight a range
of potential future research directions. We expect that this overview will help
facilitate the development of adversarial training for affective computing and
sentiment analysis in both the academic and industrial communities
Weakly-Supervised Alignment of Video With Text
Suppose that we are given a set of videos, along with natural language
descriptions in the form of multiple sentences (e.g., manual annotations, movie
scripts, sport summaries etc.), and that these sentences appear in the same
temporal order as their visual counterparts. We propose in this paper a method
for aligning the two modalities, i.e., automatically providing a time stamp for
every sentence. Given vectorial features for both video and text, we propose to
cast this task as a temporal assignment problem, with an implicit linear
mapping between the two feature modalities. We formulate this problem as an
integer quadratic program, and solve its continuous convex relaxation using an
efficient conditional gradient algorithm. Several rounding procedures are
proposed to construct the final integer solution. After demonstrating
significant improvements over the state of the art on the related task of
aligning video with symbolic labels [7], we evaluate our method on a
challenging dataset of videos with associated textual descriptions [36], using
both bag-of-words and continuous representations for text.Comment: ICCV 2015 - IEEE International Conference on Computer Vision, Dec
2015, Santiago, Chil
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
We introduce VoxPopuli, a large-scale multilingual corpus providing 100K
hours of unlabelled speech data in 23 languages. It is the largest open data to
date for unsupervised representation learning as well as semi-supervised
learning. VoxPopuli also contains 1.8K hours of transcribed speeches in 16
languages and their aligned oral interpretations into 5 other languages
totaling 5.1K hours. We provide speech recognition baselines and validate the
versatility of VoxPopuli unlabelled data in semi-supervised learning under
challenging out-of-domain settings. We will release the corpus at
https://github.com/facebookresearch/voxpopuli under an open license.Comment: Accepted to ACL 2021 (long paper
Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation
End-to-end speech translation, a hot topic in recent years, aims to translate
a segment of audio into a specific language with an end-to-end model.
Conventional approaches employ multi-task learning and pre-training methods for
this task, but they suffer from the huge gap between pre-training and
fine-tuning. To address these issues, we propose a Tandem Connectionist
Encoding Network (TCEN) which bridges the gap by reusing all subnets in
fine-tuning, keeping the roles of subnets consistent, and pre-training the
attention module. Furthermore, we propose two simple but effective methods to
guarantee the speech encoder outputs and the MT encoder inputs are consistent
in terms of semantic representation and sequence length. Experimental results
show that our model outperforms baselines 2.2 BLEU on a large benchmark
dataset.Comment: AAAI202
- …