62,627 research outputs found
Weakly Supervised Training of Speaker Identification Models
We propose an approach for training speaker identification models in a weakly
supervised manner. We concentrate on the setting where the training data
consists of a set of audio recordings and the speaker annotation is provided
only at the recording level. The method uses speaker diarization to find unique
speakers in each recording, and i-vectors to project the speech of each speaker
to a fixed-dimensional vector. A neural network is then trained to map
i-vectors to speakers, using a special objective function that allows to
optimize the model using recording-level speaker labels. We report experiments
on two different real-world datasets. On the VoxCeleb dataset, the method
provides 94.6% accuracy on a closed set speaker identification task, surpassing
the baseline performance by a large margin. On an Estonian broadcast news
dataset, the method provides 66% time-weighted speaker identification recall at
93% precision.Comment: Odyssey 2018 The Speaker and Language Recognition Worksho
Imitating Targets from all sides: An Unsupervised Transfer Learning method for Person Re-identification
Person re-identification (Re-ID) models usually show a limited performance
when they are trained on one dataset and tested on another dataset due to the
inter-dataset bias (e.g. completely different identities and backgrounds) and
the intra-dataset difference (e.g. camera invariance). In terms of this issue,
given a labelled source training set and an unlabelled target training set, we
propose an unsupervised transfer learning method characterized by 1) bridging
inter-dataset bias and intra-dataset difference via a proposed ImitateModel
simultaneously; 2) regarding the unsupervised person Re-ID problem as a
semi-supervised learning problem formulated by a dual classification loss to
learn a discriminative representation across domains; 3) exploiting the
underlying commonality across different domains from the class-style space to
improve the generalization ability of re-ID models. Extensive experiments are
conducted on two widely employed benchmarks, including Market-1501 and
DukeMTMC-reID, and experimental results demonstrate that the proposed method
can achieve a competitive performance against other state-of-the-art
unsupervised Re-ID approaches
A Survey of Deep Learning Techniques for Mobile Robot Applications
Advancements in deep learning over the years have attracted research into how
deep artificial neural networks can be used in robotic systems. This research
survey will present a summarization of the current research with a specific
focus on the gains and obstacles for deep learning to be applied to mobile
robotics
Deep video gesture recognition using illumination invariants
In this paper we present architectures based on deep neural nets for gesture
recognition in videos, which are invariant to local scaling. We amalgamate
autoencoder and predictor architectures using an adaptive weighting scheme
coping with a reduced size labeled dataset, while enriching our models from
enormous unlabeled sets. We further improve robustness to lighting conditions
by introducing a new adaptive filer based on temporal local scale
normalization. We provide superior results over known methods, including recent
reported approaches based on neural nets
Recent Advances in Zero-shot Recognition
With the recent renaissance of deep convolution neural networks, encouraging
breakthroughs have been achieved on the supervised recognition tasks, where
each class has sufficient training data and fully annotated training data.
However, to scale the recognition to a large number of classes with few or now
training samples for each class remains an unsolved problem. One approach to
scaling up the recognition is to develop models capable of recognizing unseen
categories without any training instances, or zero-shot recognition/ learning.
This article provides a comprehensive review of existing zero-shot recognition
techniques covering various aspects ranging from representations of models, and
from datasets and evaluation settings. We also overview related recognition
tasks including one-shot and open set recognition which can be used as natural
extensions of zero-shot recognition when limited number of class samples become
available or when zero-shot recognition is implemented in a real-world setting.
Importantly, we highlight the limitations of existing approaches and point out
future research directions in this existing new research area.Comment: accepted by IEEE Signal Processing Magazin
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
Pseudo-positive regularization for deep person re-identification
An intrinsic challenge of person re-identification (re-ID) is the annotation
difficulty. This typically means 1) few training samples per identity, and 2)
thus the lack of diversity among the training samples. Consequently, we face
high risk of over-fitting when training the convolutional neural network (CNN),
a state-of-the-art method in person re-ID. To reduce the risk of over-fitting,
this paper proposes a Pseudo Positive Regularization (PPR) method to enrich the
diversity of the training data. Specifically, unlabeled data from an
independent pedestrian database is retrieved using the target training data as
query. A small proportion of these retrieved samples are randomly selected as
the Pseudo Positive samples and added to the target training set for the
supervised CNN training. The addition of Pseudo Positive samples is therefore a
data augmentation method to reduce the risk of over-fitting during CNN
training. We implement our idea in the identification CNN models (i.e.,
CaffeNet, VGGNet-16 and ResNet-50). On CUHK03 and Market-1501 datasets,
experimental results demonstrate that the proposed method consistently improves
the baseline and yields competitive performance to the state-of-the-art person
re-ID methods.Comment: 12 pages, 6 figure
Cross-Domain Visual Matching via Generalized Similarity Measure and Feature Learning
Cross-domain visual data matching is one of the fundamental problems in many
real-world vision tasks, e.g., matching persons across ID photos and
surveillance videos. Conventional approaches to this problem usually involves
two steps: i) projecting samples from different domains into a common space,
and ii) computing (dis-)similarity in this space based on a certain distance.
In this paper, we present a novel pairwise similarity measure that advances
existing models by i) expanding traditional linear projections into affine
transformations and ii) fusing affine Mahalanobis distance and Cosine
similarity by a data-driven combination. Moreover, we unify our similarity
measure with feature representation learning via deep convolutional neural
networks. Specifically, we incorporate the similarity measure matrix into the
deep architecture, enabling an end-to-end way of model optimization. We
extensively evaluate our generalized similarity model in several challenging
cross-domain matching tasks: person re-identification under different views and
face verification over different modalities (i.e., faces from still images and
videos, older and younger faces, and sketch and photo portraits). The
experimental results demonstrate superior performance of our model over other
state-of-the-art methods.Comment: To appear in IEEE Transactions on Pattern Analysis and Machine
Intelligence (T-PAMI), 201
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
Rare Event Detection using Disentangled Representation Learning
This paper presents a novel method for rare event detection from an image
pair with class-imbalanced datasets. A straightforward approach for event
detection tasks is to train a detection network from a large-scale dataset in
an end-to-end manner. However, in many applications such as building change
detection on satellite images, few positive samples are available for the
training. Moreover, scene image pairs contain many trivial events, such as in
illumination changes or background motions. These many trivial events and the
class imbalance problem lead to false alarms for rare event detection. In order
to overcome these difficulties, we propose a novel method to learn disentangled
representations from only low-cost negative samples. The proposed method
disentangles different aspects in a pair of observations: variant and invariant
factors that represent trivial events and image contents, respectively. The
effectiveness of the proposed approach is verified by the quantitative
evaluations on four change detection datasets, and the qualitative analysis
shows that the proposed method can acquire the representations that disentangle
rare events from trivial ones
- …