Search CORE

3,153 research outputs found

Wavenet based low rate speech coding

Author: Kleijn W. Bastiaan
Lim Felicia S. C.
Luebs Alejandro
Skoglund Jan
Stimberg Florian
Walters Thomas C.
Wang Quan
Publication venue
Publication date: 01/12/2017
Field of study

Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.Comment: 5 pages, 2 figure

arXiv.org e-Print Archive

Crossref

A Generative Product-of-Filters Model of Audio

Author: Hoffman Matthew D.
Liang Dawen
Mysore Gautham J.
Publication venue
Publication date: 25/11/2014
Field of study

We propose the product-of-filters (PoF) model, a generative model that decomposes audio spectra as sparse linear combinations of "filters" in the log-spectral domain. PoF makes similar assumptions to those used in the classic homomorphic filtering approach to signal processing, but replaces hand-designed decompositions built of basic signal processing operations with a learned decomposition based on statistical inference. This paper formulates the PoF model and derives a mean-field method for posterior inference and a variational EM algorithm to estimate the model's free parameters. We demonstrate PoF's potential for audio processing on a bandwidth expansion task, and show that PoF can serve as an effective unsupervised feature extractor for a speaker identification task.Comment: ICLR 2014 conference-track submission. Added link to the source cod

arXiv.org e-Print Archive

CiteSeerX

The Study of Correlation Structures of DNA Sequences: A Critical Review

Author: Arneodo
Arneodo
Arquès
Azbel
Basharin
Baum
Bernaola-Galván
Bernardi
Bernardi
Bernardi
Berthelsen
Borodovskii
Borodovskii
Borodovskii
Borodovskii
Borštnik
Bridge
Buldyrev
Buldyrev
Burks
Calladine
Chatzidimitriou-Dreismann
Chechetkin
Cheng
Chomsky
Churchill
Churchill
Clay
Cuny
Daubechies
Daubechies
des Cloizeaux
Doolittle
Elton
Feller
Fickett
Fickett
Filipski
Gate
Gatlin
Gatlin
Gillespie
Große
Große
Gu
Guigó
Hamori
Hamori
Hamori
Hamori
Herzel
Herzel
Herzel
Jeffrey
Johnson
Josse
Karlin
Kimura
Konopka
Konopka
Konopka
Korber
Kozhukhin
Kullback
Li
Li
Li
Li
Li
Li
Li
Li
Li
Li
Lin
Lindenmayer
Lindenmayer
Macaya
MacIntype
Mani
Market
McLachlan
Montroll
Morgan
Mouchiroud
Murakami
Newlon
Ohno
Ohno
Ohta
Ohta
Ohta
Ohta
Ohta
Ohta
Palzkill
Peng
Percival
Pickover
Prabhu
Press
Prusinkiewicz
Rabiner
Raffery
Rice
Rozenberg
SanMiguel
Schottky
Shannon
Shannon
Shepherd
Shepherd
Shpigelman
Shulman
Silverman
Sivia
Staden
Sueoka
Tavaré
Tavaré
Teitelman
Theiler
Thiery
Toffoli
Trifonov
Tung
Tung
van der Ziel
Voss
Voss
Voss
Voss
Wentian Li
Wolfram
Wolpert
Wu
Wȩgrzyn
Zhang
Zhang
Ziv
Zoubak
Publication venue: 'Elsevier BV'
Publication date: 09/04/1997
Field of study

The study of correlation structure in the primary sequences of DNA is reviewed. The issues reviewed include: symmetries among 16 base-base correlation functions, accurate estimation of correlation measures, the relationship between

1/f

and Lorentzian spectra, heterogeneity in DNA sequences, different modeling strategies of the correlation structure of DNA sequences, the difference of correlation structure between coding and non-coding regions (besides the period-3 pattern), and source of broad distribution of domain sizes. Although some of the results remain controversial, a body of work on this topic constitutes a good starting point for future studies.Comment: LaTeX, two figures, postscript is expected to be 46 pages. To appear in the special issue of Computer & Chemistry (1997

arXiv.org e-Print Archive

Crossref

Single-channel source separation using non-negative matrix factorization

Author: Schmidt Mikkel Nørgaard
Publication venue: Technical University of Denmark, DTU Informatics, Building 321
Publication date: 01/01/2009
Field of study

Online Research Database In Technology

A Subband-Based SVM Front-End for Robust ASR

Author: Ager Matthew
Cvetkovic Zoran
Sollich Peter
Yousafzai Jibran
Publication venue
Publication date: 24/12/2013
Field of study

This work proposes a novel support vector machine (SVM) based robust automatic speech recognition (ASR) front-end that operates on an ensemble of the subband components of high-dimensional acoustic waveforms. The key issues of selecting the appropriate SVM kernels for classification in frequency subbands and the combination of individual subband classifiers using ensemble methods are addressed. The proposed front-end is compared with state-of-the-art ASR front-ends in terms of robustness to additive noise and linear filtering. Experiments performed on the TIMIT phoneme classification task demonstrate the benefits of the proposed subband based SVM front-end: it outperforms the standard cepstral front-end in the presence of noise and linear filtering for signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed front-end with a conventional front-end such as MFCC yields further improvements over the individual front ends across the full range of noise levels

arXiv.org e-Print Archive

King's Research Portal

Using a low-bit rate speech enhancement variable post-filter as a speech recognition system pre-filter to improve robustness to GSM speech

Author: Mahlanyane Nkululeko S
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2003
Field of study

Includes bibliographical references.Performance of speech recognition systems degrades when they are used to recognize speech that has been transmitted through GS1 (Global System for Mobile Communications) voice communication channels (GSM speech). This degradation is mainly due to GSM speech coding and GSM channel noise on speech signals transmitted through the network. This poor recognition of GSM channel speech limits the use of speech recognition applications over GSM networks. If speech recognition technology is to be used unlimitedly over GSM networks recognition accuracy of GSM channel speech has to be improved. Different channel normalization techniques have been developed in an attempt to improve recognition accuracy of voice channel modified speech in general (not specifically for GSM channel speech). These techniques can be classified into three broad categories, namely, model modification, signal pre-processing and feature processing techniques. In this work, as a contribution toward improving the robustness of speech recognition systems to GSM speech, the use of a low-bit speech enhancement post-filter as a speech recognition system pre-filter is proposed. This filter is to be used in recognition systems in combination with channel normalization techniques

Cape Town University OpenUCT

Information Loss in the Human Auditory System

Author: Jahromi Mohsen Zareian
Jensen Jesper
Zahedi Adel
Østergaard Jan
Publication venue
Publication date: 02/05/2018
Field of study

From the eardrum to the auditory cortex, where acoustic stimuli are decoded, there are several stages of auditory processing and transmission where information may potentially get lost. In this paper, we aim at quantifying the information loss in the human auditory system by using information theoretic tools. To do so, we consider a speech communication model, where words are uttered and sent through a noisy channel, and then received and processed by a human listener. We define a notion of information loss that is related to the human word recognition rate. To assess the word recognition rate of humans, we conduct a closed-vocabulary intelligibility test. We derive upper and lower bounds on the information loss. Simulations reveal that the bounds are tight and we observe that the information loss in the human auditory system increases as the signal to noise ratio (SNR) decreases. Our framework also allows us to study whether humans are optimal in terms of speech perception in a noisy environment. Towards that end, we derive optimal classifiers and compare the human and machine performance in terms of information loss and word recognition rate. We observe a higher information loss and lower word recognition rate for humans compared to the optimal classifiers. In fact, depending on the SNR, the machine classifier may outperform humans by as much as 8 dB. This implies that for the speech-in-stationary-noise setup considered here, the human auditory system is sub-optimal for recognizing noisy words

arXiv.org e-Print Archive

VBN