3,290 research outputs found
The Microsoft 2016 Conversational Speech Recognition System
We describe Microsoft's conversational speech recognition system, in which we
combine recent developments in neural-network-based acoustic and language
modeling to advance the state of the art on the Switchboard recognition task.
Inspired by machine learning ensemble techniques, the system uses a range of
convolutional and recurrent neural networks. I-vector modeling and lattice-free
MMI training provide significant gains for all acoustic model architectures.
Language model rescoring with multiple forward and backward running RNNLMs, and
word posterior-based system combination provide a 20% boost. The best single
system uses a ResNet architecture acoustic model with RNNLM rescoring, and
achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The
combined system has an error rate of 6.2%, representing an improvement over
previously reported results on this benchmark task
Incremental multi-domain learning with network latent tensor factorization
The prominence of deep learning, large amount of annotated data and
increasingly powerful hardware made it possible to reach remarkable performance
for supervised classification tasks, in many cases saturating the training
sets. However the resulting models are specialized to a single very specific
task and domain. Adapting the learned classification to new domains is a hard
problem due to at least three reasons: (1) the new domains and the tasks might
be drastically different; (2) there might be very limited amount of annotated
data on the new domain and (3) full training of a new model for each new task
is prohibitive in terms of computation and memory, due to the sheer number of
parameters of deep CNNs. In this paper, we present a method to learn
new-domains and tasks incrementally, building on prior knowledge from already
learned tasks and without catastrophic forgetting. We do so by jointly
parametrizing weights across layers using low-rank Tucker structure. The core
is task agnostic while a set of task specific factors are learnt on each new
domain. We show that leveraging tensor structure enables better performance
than simply using matrix operations. Joint tensor modelling also naturally
leverages correlations across different layers. Compared with previous methods
which have focused on adapting each layer separately, our approach results in
more compact representations for each new task/domain. We apply the proposed
method to the 10 datasets of the Visual Decathlon Challenge and show that our
method offers on average about 7.5x reduction in number of parameters and
competitive performance in terms of both classification accuracy and Decathlon
score.Comment: AAAI2
Multi-component Image Translation for Deep Domain Generalization
Domain adaption (DA) and domain generalization (DG) are two closely related
methods which are both concerned with the task of assigning labels to an
unlabeled data set. The only dissimilarity between these approaches is that DA
can access the target data during the training phase, while the target data is
totally unseen during the training phase in DG. The task of DG is challenging
as we have no earlier knowledge of the target samples. If DA methods are
applied directly to DG by a simple exclusion of the target data from training,
poor performance will result for a given task. In this paper, we tackle the
domain generalization challenge in two ways. In our first approach, we propose
a novel deep domain generalization architecture utilizing synthetic data
generated by a Generative Adversarial Network (GAN). The discrepancy between
the generated images and synthetic images is minimized using existing domain
discrepancy metrics such as maximum mean discrepancy or correlation alignment.
In our second approach, we introduce a protocol for applying DA methods to a DG
scenario by excluding the target data from the training phase, splitting the
source data to training and validation parts, and treating the validation data
as target data for DA. We conduct extensive experiments on four cross-domain
benchmark datasets. Experimental results signify our proposed model outperforms
the current state-of-the-art methods for DG.Comment: Accepted in WACV 201
Learning Rigid Image Registration - Utilizing Convolutional Neural Networks for Medical Image Registration
Many traditional computer vision tasks, such as segmentation, have seen large step-changes in accuracy and/or speed with the application of Convolutional Neural Networks (CNNs). Image registration, the alignment of two or more images to a common space, is a fundamental step in many medical imaging workflows. In this paper we investigate whether these techniques can also bring tangible benefits to the registration task. We describe and evaluate the use of convolutional neural networks (CNNs) for both mono- and multi- modality registration and compare their performance to more traditional schemes, namely multi-scale, iterative registration. This paper also investigates incorporating inverse consistency of the learned spatial transformations to impose additional constraints on the network during training and investigate any benefit in accuracy during detection. The approaches are validated with a series of artificial mono-modal registration tasks utilizing T1-weighted MR brain i mages from the Open Access Series of Imaging Studies (OASIS) study and IXI brain development dataset and a series of real multi-modality registration tasks using T1-weighted and T2-weighted MR brain images from the 2015 Ischemia Stroke Lesion segmentation (ISLES) challenge. The results demonstrate that CNNs give excellent performance for both mono- and multi- modality head and neck registration compared to the baseline method with significantly fewer outliers and lower mean errors
- …