Search CORE

15 research outputs found

Redundant Hash Addressing for Large-Scale Query by Example Spoken Query Detection

Author: Asaei Afsaneh
Bourlard Hervé
Ram Dhananjay
Publication venue: Idiap
Publication date: 19/12/2016
Field of study

State of the art query by example spoken term detection (QbE-STD) systems rely on representation of speech in terms of sequences of class-conditional posterior probabilities estimated by deep neural network (DNN). The posteriors are often used for pattern matching or dynamic time warping (DTW). Exploiting posterior probabilities as speech representation propounds diverse advantages in a classification system. One key property of the posterior representations is that they admit a highly effective hashing strategy that enables indexing the large archive in divisions for reducing the search complexity. Moreover, posterior indexing leads to a compressed representation and enables pronunciation dewarping and partial detection with no need for DTW. We exploit these characteristics of the posterior space in the context of redundant hash addressing for query-by-example spoken term detection (QbE-STD). We evaluate the QbE-STD system on AMI corpus and demonstrate that tremendous speedup and superior accuracy is achieved compared to the state-of-the-art pattern matching and DTW solutions. The system has great potential to enable massively large scale query detection

Infoscience - École polytechnique fédérale de Lausanne

PhonVoc: A Phonetic and Phonological Vocoding Toolkit

Author: Cernak Milos
Garner Philip N.
Publication venue
Publication date: 19/06/2016
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Sound Pattern Matching for Automatic Prosodic Event Detection

Author: Asaei Afsaneh
Bourlard Hervé
Cernak Milos
Garner Philip N.
Honnet Pierre-Edouard
Publication venue
Publication date: 19/06/2016
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Sound Pattern Matching for Automatic Prosodic Event Detection

Author: Asaei Afsaneh
Bourlard Hervé
Cernak Milos
Garner Philip
Garner Philip N.
Honnet Pierre-Edouard
Honnet Pierre-Edouard Jean Charles
Publication venue: Idiap
Publication date: 19/04/2016
Field of study

Prosody in speech is manifested by variations of loudness, exaggeration of pitch, and specific phonetic variations of prosodic segments. For example, in the stressed and unstressed syllables, there are differences in place or manner of articulation, vowels in unstressed syllables may have a more central articulation, and vowel reduction may occur when a vowel changes from a stressed to an unstressed position. In this paper, we characterize the sound patterns using phonological posteriors to capture the phonetic variations in a concise manner. The phonological posteriors quantify the posterior probabilities of the phonological features given the input speech acoustics, and they are obtained using the deep neural network (DNN) computational method. Built on the assumption that there are unique sound patterns in different prosodic segments, we devise a sound pattern matching (SPM) method based on 1-nearest neighbour classifier. In this work, we focus on automatic detection of prosodic stress placed on words, called also emphasized words. We evaluate the SPM method on English and French data with emphasized words. The word emphasis detection works very well also on cross-lingual tests, that is using a French classifier on English data, and vice versa

Infoscience - École polytechnique fédérale de Lausanne

Phonetic and Phonological Posterior Search Space Hashing Exploiting Class-Specific Sparsity Structures

Author: Asaei Afsaneh
Bourlard Hervé
Cernak Milos
Luyet Gil
Publication venue: Idiap
Publication date: 19/04/2016
Field of study

This paper shows that exemplar-based speech processing using class-conditional posterior probabilities admits a highly effective search strategy relying on posteriors' intrinsic sparsity structures. The posterior probabilities are estimated for phonetic and phonological classes using deep neural network (DNN) computational framework. Exploiting the class-specific sparsity leads to a simple quantized posterior hashing procedure to reduce the search space of posterior exemplars. To that end, small number of quantized posteriors are regarded as representatives of the posterior space and used as hash keys to index subsets of neighboring exemplars. The

k

nearest neighbor (

k

NN) method is applied for posterior based classification problems. The phonetic posterior probabilities are used as exemplars for phonetic classification whereas the phonological posteriors are used as exemplars for automatic prosodic event detection. Experimental results demonstrate that posterior hashing improves the efficiency of

k

NN classification drastically. This work encourages the use of posteriors as discriminative exemplars appropriate for large scale speech classification tasks

Infoscience - École polytechnique fédérale de Lausanne

Crossref

PAoS Markers: Trajectory Analysis of Selective Phonological Posteriors for Assessment of Progressive Apraxia of Speech

Author: Asaei Afsaneh
Cernak Milos
Laganaro Marina
Publication venue
Publication date: 19/08/2016
Field of study

Progressive apraxia of Speech (PAoS) is a progressive motor speech disorder associated with neurodegenerative disease causing impairment of phonetic encoding and motor speech planning. Clinical observation and acoustic studies show that duration analysis provides reliable cues for diagnosis of the disease progression and severity of articulatory disruption. The goal of this paper is to develop computational methods for objective evaluation of duration and trajectory of speech articulation. We use phonological posteriors as speech features. Phonological posteriors consist of probabilities of phonological classes estimated for every short segment of the speech signal. PAoS encompasses lengthening of duration which is more pronounced in vowels; we thus hypothesize that a small subset of phonological classes provide stronger evidence for duration and trajectory analysis. These classes are determined through analysis of linear prediction coefficients (LPC). To enable trajectory analysis without phonetic alignment, we exploit phonological structures defined through quantization of phonological posteriors. Duration and trajectory analysis are conducted on blocks of multiple consecutive segments possessing similar phonological structures. Moreover, unique phonological structures are identified for every severity condition

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Phonetic and Phonological Posterior Search Space Hashing Exploiting Class-Specific Sparsity Structures

Author: Azzalini
Azzalini
Basellini
Bergeron-Boucher
Bohk-Ewald
Booth
Brouard
Canudas-Romo
Canudas-Romo
Canudas-Romo
Canudas-Romo
Colchero
Coles
Congdon
Davison
Edwards
Fries
Gage
Garg
Gillespie
Gompertz
Graunt
Guillot
Guillot
Heligman
Horiuchi
Horiuchi
Kaergaard
Kannisto
Lexis
Makeham
Mazzuco
Missov
Missov
Missov
Pearson
Perks
Preston
Riley
Rogers
Shkolnikov
Siler
Siler
Tabeau
van Raalte
van Raalte
Vaupel
Vaupel
Vaupel
Wilmoth
Zanotto
Publication venue: Idiap
Publication date: 01/01/2016
Field of study

k

nearest neighbor (

k

k

NN classification drastically. This work encourages the use of posteriors as discriminative exemplars appropriate for large scale speech classification tasks

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

The Australian National University

Archivio istituzionale della ricerca - Università di Padova

Efficient Posterior Exemplar Search Space Hashing Exploiting Class-Specific Sparsity Structures

Author: Asaei Afsaneh
Bourlard Hervé
Cernak Milos
Luyet Gil
Publication venue
Publication date: 19/08/2016
Field of study

This paper shows that exemplar-based speech processing using class-conditional posterior probabilities admits a highly effective search strategy relying on posteriors' intrinsic sparsity structures. The posterior probabilities are estimated for phonetic and phonological classes using deep neural network (DNN) computational framework. Exploiting the class-specific sparsity leads to a simple quantized posterior hashing procedure to reduce the search space of posterior exemplars. To that end, small subset of quantized posteriors are regarded as representatives of the posterior space and used as hash keys to index subsets of similar exemplars. The

k

nearest neighbor (

k

NN) method is applied for posterior based classification problems. The phonetic posterior probabilities are used as exemplars for phoneme classification whereas the phonological posteriors are used as exemplars for automatic prosodic event detection. Experimental results demonstrate that posterior hashing improves the efficiency of

k

NN classification drastically. This work encourages the use of posteriors as discriminative exemplars appropriate for large scale speech classification tasks

Infoscience - École polytechnique fédérale de Lausanne

Information Theoretic Analysis of Production-Perception Efficiency: Case Study of Speech Pathology

Author: Asaei Afsaneh
Bourlard Hervé
Cernak Milos
Publication venue: Idiap
Publication date: 19/12/2016
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding

Author: Asaei Afsaneh
Cernak Milos
Garner Philip N.
Lazaridis Alexandros
Publication venue: Idiap
Publication date: 19/04/2016
Field of study

Most current very low bit rate (VLBR) speech coding systems use hidden Markov model (HMM) based speech recognition/synthesis techniques. This allows transmission of information (such as phonemes) segment by segment that decreases the bit rate. However, the encoder based on a phoneme speech recognition may create bursts of segmental errors. Segmental errors are further propagated to optional suprasegmental (such as syllable) information coding. Together with the errors of voicing detection in pitch parametrization, HMM-based speech coding creates speech discontinuities and unnatural speech sound artefacts. In this paper, we propose a novel VLBR speech coding framework based on neural networks (NNs) for end-to-end speech analysis and synthesis without HMMs. The speech coding framework relies on phonological (sub-phonetic) representation of speech, and it is designed as a composition of deep and spiking NNs: a bank of phonological analysers at the transmitter, and a phonological synthesizer at the receiver, both realised as deep NNs, and a spiking NN as an incremental and robust encoder of syllable boundaries for coding of continuous fundamental frequency (F0). A combination of phonological features defines much more sound patterns than phonetic features defined by HMM-based speech coders, and the finer analysis/synthesis code contributes into smoother encoded speech. Listeners significantly prefer the NN-based approach due to fewer discontinuities and speech artefacts of the encoded speech. A single forward pass is required during the speech encoding and decoding. The proposed VLBR speech coding operates at a bit rate of approximately 360 bits/s

Infoscience - École polytechnique fédérale de Lausanne

arXiv.org e-Print Archive