3,277 research outputs found
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
Current trends in multilingual speech processing
In this paper, we describe recent work at Idiap Research Institute in the domain of multilingual speech processing and provide some insights into emerging challenges for the research community. Multilingual speech processing has been a topic of ongoing interest to the research community for many years and the field is now receiving renewed interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers. For example, discriminative features are seeing wide application by the speech recognition community, but additional issues arise when using such features in a multilingual setting. Another example is the apparent convergence of speech recognition and speech synthesis technologies in the form of statistical parametric methodologies. This convergence enables the investigation of new approaches to unified modelling for automatic speech recognition and text-to-speech synthesis (TTS) as well as cross-lingual speaker adaptation for TTS. The second driving force is the impetus being provided by both government and industry for technologies to help break down domestic and international language barriers, these also being barriers to the expansion of policy and commerce. Speech-to-speech and speech-to-text translation are thus emerging as key technologies at the heart of which lies multilingual speech processin
Adversarial Speaker Adaptation
We propose a novel adversarial speaker adaptation (ASA) scheme, in which
adversarial learning is applied to regularize the distribution of deep hidden
features in a speaker-dependent (SD) deep neural network (DNN) acoustic model
to be close to that of a fixed speaker-independent (SI) DNN acoustic model
during adaptation. An additional discriminator network is introduced to
distinguish the deep features generated by the SD model from those produced by
the SI model. In ASA, with a fixed SI model as the reference, an SD model is
jointly optimized with the discriminator network to minimize the senone
classification loss, and simultaneously to mini-maximize the SI/SD
discrimination loss on the adaptation data. With ASA, a senone-discriminative
deep feature is learned in the SD model with a similar distribution to that of
the SI model. With such a regularized and adapted deep feature, the SD model
can perform improved automatic speech recognition on the target speaker's
speech. Evaluated on the Microsoft short message dictation dataset, ASA
achieves 14.4% and 7.9% relative word error rate improvements for supervised
and unsupervised adaptation, respectively, over an SI model trained from 2600
hours data, with 200 adaptation utterances per speaker.Comment: 5 pages, 2 figures, ICASSP 201
Crossing Generative Adversarial Networks for Cross-View Person Re-identification
Person re-identification (\textit{re-id}) refers to matching pedestrians
across disjoint yet non-overlapping camera views. The most effective way to
match these pedestrians undertaking significant visual variations is to seek
reliably invariant features that can describe the person of interest
faithfully. Most of existing methods are presented in a supervised manner to
produce discriminative features by relying on labeled paired images in
correspondence. However, annotating pair-wise images is prohibitively expensive
in labors, and thus not practical in large-scale networked cameras. Moreover,
seeking comparable representations across camera views demands a flexible model
to address the complex distributions of images. In this work, we study the
co-occurrence statistic patterns between pairs of images, and propose to
crossing Generative Adversarial Network (Cross-GAN) for learning a joint
distribution for cross-image representations in a unsupervised manner. Given a
pair of person images, the proposed model consists of the variational
auto-encoder to encode the pair into respective latent variables, a proposed
cross-view alignment to reduce the view disparity, and an adversarial layer to
seek the joint distribution of latent representations. The learned latent
representations are well-aligned to reflect the co-occurrence patterns of
paired images. We empirically evaluate the proposed model against challenging
datasets, and our results show the importance of joint invariant features in
improving matching rates of person re-id with comparison to semi/unsupervised
state-of-the-arts.Comment: 12 pages. arXiv admin note: text overlap with arXiv:1702.03431 by
other author
- …