4,512 research outputs found
Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification
There are a number of studies about extraction of bottleneck (BN) features
from deep neural networks (DNNs)trained to discriminate speakers, pass-phrases
and triphone states for improving the performance of text-dependent speaker
verification (TD-SV). However, a moderate success has been achieved. A recent
study [1] presented a time contrastive learning (TCL) concept to explore the
non-stationarity of brain signals for classification of brain states. Speech
signals have similar non-stationarity property, and TCL further has the
advantage of having no need for labeled data. We therefore present a TCL based
BN feature extraction method. The method uniformly partitions each speech
utterance in a training dataset into a predefined number of multi-frame
segments. Each segment in an utterance corresponds to one class, and class
labels are shared across utterances. DNNs are then trained to discriminate all
speech frames among the classes to exploit the temporal structure of speech. In
addition, we propose a segment-based unsupervised clustering algorithm to
re-assign class labels to the segments. TD-SV experiments were conducted on the
RedDots challenge database. The TCL-DNNs were trained using speech data of
fixed pass-phrases that were excluded from the TD-SV evaluation set, so the
learned features can be considered phrase-independent. We compare the
performance of the proposed TCL bottleneck (BN) feature with those of
short-time cepstral features and BN features extracted from DNNs discriminating
speakers, pass-phrases, speaker+pass-phrase, as well as monophones whose labels
and boundaries are generated by three different automatic speech recognition
(ASR) systems. Experimental results show that the proposed TCL-BN outperforms
cepstral features and speaker+pass-phrase discriminant BN features, and its
performance is on par with those of ASR derived BN features. Moreover,....Comment: Copyright (c) 2019 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure
It has been established that incorporating word cluster features derived from large unlabeled corpora can significantly improve prediction of linguistic structure. While previous work has focused primarily on English, we extend these results to other languages along two dimensions. First, we show that these results hold true for a number of languages across families. Second, and more interestingly, we provide an algorithm for inducing cross-lingual clusters and we show that features derived from these clusters significantly improve the accuracy of cross-lingual structure prediction. Specifically, we show that by augmenting direct-transfer systems with cross-lingual cluster features, the relative error of delexicalized dependency parsers, trained on English treebanks and transferred to foreign languages, can be reduced by up to 13%. When applying the same method to direct transfer of named-entity recognizers, we observe relative improvements of up to 26%
Weakly-supervised learning of visual relations
This paper introduces a novel approach for modeling visual relations between
pairs of objects. We call relation a triplet of the form (subject, predicate,
object) where the predicate is typically a preposition (eg. 'under', 'in front
of') or a verb ('hold', 'ride') that links a pair of objects (subject, object).
Learning such relations is challenging as the objects have different spatial
configurations and appearances depending on the relation in which they occur.
Another major challenge comes from the difficulty to get annotations,
especially at box-level, for all possible triplets, which makes both learning
and evaluation difficult. The contributions of this paper are threefold. First,
we design strong yet flexible visual features that encode the appearance and
spatial configuration for pairs of objects. Second, we propose a
weakly-supervised discriminative clustering model to learn relations from
image-level labels only. Third we introduce a new challenging dataset of
unusual relations (UnRel) together with an exhaustive annotation, that enables
accurate evaluation of visual relation retrieval. We show experimentally that
our model results in state-of-the-art results on the visual relationship
dataset significantly improving performance on previously unseen relations
(zero-shot learning), and confirm this observation on our newly introduced
UnRel dataset
Weakly-supervised learning of visual relations
This paper introduces a novel approach for modeling visual relations between
pairs of objects. We call relation a triplet of the form (subject, predicate,
object) where the predicate is typically a preposition (eg. 'under', 'in front
of') or a verb ('hold', 'ride') that links a pair of objects (subject, object).
Learning such relations is challenging as the objects have different spatial
configurations and appearances depending on the relation in which they occur.
Another major challenge comes from the difficulty to get annotations,
especially at box-level, for all possible triplets, which makes both learning
and evaluation difficult. The contributions of this paper are threefold. First,
we design strong yet flexible visual features that encode the appearance and
spatial configuration for pairs of objects. Second, we propose a
weakly-supervised discriminative clustering model to learn relations from
image-level labels only. Third we introduce a new challenging dataset of
unusual relations (UnRel) together with an exhaustive annotation, that enables
accurate evaluation of visual relation retrieval. We show experimentally that
our model results in state-of-the-art results on the visual relationship
dataset significantly improving performance on previously unseen relations
(zero-shot learning), and confirm this observation on our newly introduced
UnRel dataset
Lexicon Infused Phrase Embeddings for Named Entity Resolution
Most state-of-the-art approaches for named-entity recognition (NER) use semi
supervised information in the form of word clusters and lexicons. Recently
neural network-based language models have been explored, as they as a byproduct
generate highly informative vector representations for words, known as word
embeddings. In this paper we present two contributions: a new form of learning
word embeddings that can leverage information from relevant lexicons to improve
the representations, and the first system to use neural word embeddings to
achieve state-of-the-art results on named-entity recognition in both CoNLL and
Ontonotes NER. Our system achieves an F1 score of 90.90 on the test set for
CoNLL 2003---significantly better than any previous system trained on public
data, and matching a system employing massive private industrial query-log
data.Comment: Accepted in CoNLL 201
- …