Search CORE

3,290 research outputs found

The Microsoft 2016 Conversational Speech Recognition System

Author: Droppo J.
Huang X.
Seide F.
Seltzer M.
Stolcke A.
Xiong W.
Yu D.
Zweig G.
Publication venue
Publication date: 25/01/2017
Field of study

We describe Microsoft's conversational speech recognition system, in which we combine recent developments in neural-network-based acoustic and language modeling to advance the state of the art on the Switchboard recognition task. Inspired by machine learning ensemble techniques, the system uses a range of convolutional and recurrent neural networks. I-vector modeling and lattice-free MMI training provide significant gains for all acoustic model architectures. Language model rescoring with multiple forward and backward running RNNLMs, and word posterior-based system combination provide a 20% boost. The best single system uses a ResNet architecture acoustic model with RNNLM rescoring, and achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The combined system has an error rate of 6.2%, representing an improvement over previously reported results on this benchmark task

arXiv.org e-Print Archive

Crossref

Incremental multi-domain learning with network latent tensor factorization

Author: Bulat Adrian
Kossaifi Jean
Pantic Maja
Tzimiropoulos Georgios
Publication venue
Publication date: 22/11/2019
Field of study

The prominence of deep learning, large amount of annotated data and increasingly powerful hardware made it possible to reach remarkable performance for supervised classification tasks, in many cases saturating the training sets. However the resulting models are specialized to a single very specific task and domain. Adapting the learned classification to new domains is a hard problem due to at least three reasons: (1) the new domains and the tasks might be drastically different; (2) there might be very limited amount of annotated data on the new domain and (3) full training of a new model for each new task is prohibitive in terms of computation and memory, due to the sheer number of parameters of deep CNNs. In this paper, we present a method to learn new-domains and tasks incrementally, building on prior knowledge from already learned tasks and without catastrophic forgetting. We do so by jointly parametrizing weights across layers using low-rank Tucker structure. The core is task agnostic while a set of task specific factors are learnt on each new domain. We show that leveraging tensor structure enables better performance than simply using matrix operations. Joint tensor modelling also naturally leverages correlations across different layers. Compared with previous methods which have focused on adapting each layer separately, our approach results in more compact representations for each new task/domain. We apply the proposed method to the 10 datasets of the Visual Decathlon Challenge and show that our method offers on average about 7.5x reduction in number of parameters and competitive performance in terms of both classification accuracy and Decathlon score.Comment: AAAI2

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Multi-component Image Translation for Deep Domain Generalization

Author: Baktashmotlagh Mahsa
Fookes Clinton
Rahman Mohammad Mahfujur
Sridharan Sridha
Publication venue
Publication date: 21/12/2018
Field of study

Domain adaption (DA) and domain generalization (DG) are two closely related methods which are both concerned with the task of assigning labels to an unlabeled data set. The only dissimilarity between these approaches is that DA can access the target data during the training phase, while the target data is totally unseen during the training phase in DG. The task of DG is challenging as we have no earlier knowledge of the target samples. If DA methods are applied directly to DG by a simple exclusion of the target data from training, poor performance will result for a given task. In this paper, we tackle the domain generalization challenge in two ways. In our first approach, we propose a novel deep domain generalization architecture utilizing synthetic data generated by a Generative Adversarial Network (GAN). The discrepancy between the generated images and synthetic images is minimized using existing domain discrepancy metrics such as maximum mean discrepancy or correlation alignment. In our second approach, we introduce a protocol for applying DA methods to a DG scenario by excluding the target data from the training phase, splitting the source data to training and validation parts, and treating the validation data as target data for DA. We conduct extensive experiments on four cross-domain benchmark datasets. Experimental results signify our proposed model outperforms the current state-of-the-art methods for DG.Comment: Accepted in WACV 201

arXiv.org e-Print Archive

Crossref

Queensland University of Technology ePrints Archive

Learning Rigid Image Registration - Utilizing Convolutional Neural Networks for Medical Image Registration

Author: Goatman K.A.
Siebert J.P.
Sloan J.M.
Publication venue: 'Scitepress'
Publication date: 01/01/2018
Field of study

Many traditional computer vision tasks, such as segmentation, have seen large step-changes in accuracy and/or speed with the application of Convolutional Neural Networks (CNNs). Image registration, the alignment of two or more images to a common space, is a fundamental step in many medical imaging workflows. In this paper we investigate whether these techniques can also bring tangible benefits to the registration task. We describe and evaluate the use of convolutional neural networks (CNNs) for both mono- and multi- modality registration and compare their performance to more traditional schemes, namely multi-scale, iterative registration. This paper also investigates incorporating inverse consistency of the learned spatial transformations to impose additional constraints on the network during training and investigate any benefit in accuracy during detection. The approaches are validated with a series of artificial mono-modal registration tasks utilizing T1-weighted MR brain i mages from the Open Access Series of Imaging Studies (OASIS) study and IXI brain development dataset and a series of real multi-modality registration tasks using T1-weighted and T2-weighted MR brain images from the 2015 Ischemia Stroke Lesion segmentation (ISLES) challenge. The results demonstrate that CNNs give excellent performance for both mono- and multi- modality head and neck registration compared to the baseline method with significantly fewer outliers and lower mean errors

Crossref

Enlighten