Search CORE

14,775 research outputs found

In-Band Disparity Compensation for Multiview Image Compression and View Synthesis

Author: Anantrasirichai N
Bull DR
Canagarajah CN
Redmill DW
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2010
Field of study

Subword and Crossword Units for CTC Acoustic Models

Author: Metze Florian
Sanabria Ramon
Waibel Alex
Zenkel Thomas
Publication venue
Publication date: 18/06/2018
Field of study

This paper proposes a novel approach to create an unit set for CTC based speech recognition systems. By using Byte Pair Encoding we learn an unit set of an arbitrary size on a given training text. In contrast to using characters or words as units this allows us to find a good trade-off between the size of our unit set and the available training data. We evaluate both Crossword units, that may span multiple word, and Subword units. By combining this approach with decoding methods using a separate language model we are able to achieve state of the art results for grapheme based CTC systems.Comment: Current version accepted at Interspeech 201

arXiv.org e-Print Archive

Crossref

Representation Learning: A Review and New Perspectives

Author: Bengio Yoshua
Courville Aaron
Vincent Pascal
Publication venue
Publication date: 01/01/2014
Field of study

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning

arXiv.org e-Print Archive

CiteSeerX

Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models

Author: Asaei Afsaneh
Bourlard Herve
Dighe Pranay
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/10/2016
Field of study

Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems correspond to context-dependent tied states or senones. The present work addresses some limitations of GMM-HMM senone alignments for DNN training. We hypothesize that the senone probabilities obtained from a DNN trained with binary labels can provide more accurate targets to learn better acoustic models. However, DNN outputs bear inaccuracies which are exhibited as high dimensional unstructured noise, whereas the informative components are structured and low-dimensional. We exploit principle component analysis (PCA) and sparse coding to characterize the senone subspaces. Enhanced probabilities obtained from low-rank and sparse reconstructions are used as soft-targets for DNN acoustic modeling, that also enables training with untranscribed data. Experiments conducted on AMI corpus shows 4.6% relative reduction in word error rate

arXiv.org e-Print Archive

Crossref

Analysis, Visualization, and Transformation of Audio Signals Using Dictionary-based Methods

Author: Aaron McLeran
Bob L. Sturm
Boyd S.
Curtis Roads
Gabor D.
Gersho A.
John J. Shynk
Mallat S.
Meyer C.
Roads C.
Roads C.
Sturm B. L.
Xenakis I.
Publication venue: 'Informa UK Limited'
Publication date: 17/12/2009
Field of study

date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +0000date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +000

Crossref

Queen Mary Research Online

Paraunitary oversampled filter bank design for channel coding

Author: A Papoulis
C Liu
F Labeau
F Labeau
F Labeau
F Labeau
F Lorenzelli
H Bölcskei
H Bölcskei
J Kliewer
JG McWhirter
M Harteneck
PP Vaidyanathan
PP Vaidyanathan
S Redif
S Weiss
S Weiss
S Weiss
T Esmailian
T Tanaka
W Kellermann
WH Neo
Z Cvetković
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2006
Field of study

Oversampled filter banks (OSFBs) have been considered for channel coding, since their redundancy can be utilised to permit the detection and correction of channel errors. In this paper, we propose an OSFB-based channel coder for a correlated additive Gaussian noise channel, of which the noise covariance matrix is assumed to be known. Based on a suitable factorisation of this matrix, we develop a design for the decoder's synthesis filter bank in order to minimise the noise power in the decoded signal, subject to admitting perfect reconstruction through paraunitarity of the filter bank. We demonstrate that this approach can lead to a significant reduction of the noise interference by exploiting both the correlation of the channel and the redundancy of the filter banks. Simulation results providing some insight into these mechanisms are provided

Crossref

University of Strathclyde Institutional Repository

Online Research @ Cardiff

Springer - Publisher Connector

Directory of Open Access Journals

Data compression techniques applied to high resolution high frame rate video technology

Author: Alexovich Robert E.
Hartz William G.
Neustadter Marc S.
Publication venue
Publication date
Field of study

An investigation is presented of video data compression applied to microgravity space experiments using High Resolution High Frame Rate Video Technology (HHVT). An extensive survey of methods of video data compression, described in the open literature, was conducted. The survey examines compression methods employing digital computing. The results of the survey are presented. They include a description of each method and assessment of image degradation and video data parameters. An assessment is made of present and near term future technology for implementation of video data compression in high speed imaging system. Results of the assessment are discussed and summarized. The results of a study of a baseline HHVT video system, and approaches for implementation of video data compression, are presented. Case studies of three microgravity experiments are presented and specific compression techniques and implementations are recommended

NASA Technical Reports Server

Semi-blind speech-music separation using sparsity and continuity priors

Author: Erdogan Hakan
Erdoğan Hakan
Grais Emad Mounir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2010
Field of study

In this paper we propose an approach for the problem of single channel source separation of speech and music signals. Our approach is based on representing each source's power spectral density using dictionaries and nonlinearly projecting the mixture signal spectrum onto the combined span of the dictionary entries. We encourage sparsity and continuity of the dictionary coefficients using penalty terms (or log-priors) in an optimization framework. We propose to use a novel coordinate descent technique for optimization, which nicely handles nonnegativity constraints and nonquadratic penalty terms. We use an adaptive Wiener filter, and spectral subtraction to reconstruct both of the sources from the mixture data after corresponding power spectral densities (PSDs) are estimated for each source. Using conventional metrics, we measure the performance of the system on simulated mixtures of single person speech and piano music sources. The results indicate that the proposed method is a promising technique for low speech-to-music ratio conditions and that sparsity and continuity priors help improve the performance of the proposed system

CiteSeerX

Crossref

University of Surrey

Sabanci University Research Database

Surrey Research Insight

Distributed video coding for wireless video sensor networks: a review of the state-of-the-art architectures

Author: Fong A.C.M.
Imran Noreen
Seet Boon-Chong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/09/2015
Field of study

Distributed video coding (DVC) is a relatively new video coding architecture originated from two fundamental theorems namely, Slepian–Wolf and Wyner–Ziv. Recent research developments have made DVC attractive for applications in the emerging domain of wireless video sensor networks (WVSNs). This paper reviews the state-of-the-art DVC architectures with a focus on understanding their opportunities and gaps in addressing the operational requirements and application needs of WVSNs

Springer - Publisher Connector

PubMed Central

Enlighten