Search CORE

5,903 research outputs found

Optoelectronic Reservoir Computing

Author: A Rodan
A Rodan
AF Atiya
D Verstraeten
F Triefenbach
H Jaeger
H Jaeger
H Jaeger
HJ Caulfield
K Vandoorne
L Appeltant
M Lukoševičius
M Peil
T Larger
T Larger
VJ Mathews
W Maass
YK Chembo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/11/2011
Field of study

Reservoir computing is a recently introduced, highly efficient bio-inspired approach for processing time dependent data. The basic scheme of reservoir computing consists of a non linear recurrent dynamical system coupled to a single input layer and a single output layer. Within these constraints many implementations are possible. Here we report an opto-electronic implementation of reservoir computing based on a recently proposed architecture consisting of a single non linear node and a delay line. Our implementation is sufficiently fast for real time information processing. We illustrate its performance on tasks of practical importance such as nonlinear channel equalization and speech recognition, and obtain results comparable to state of the art digital implementations.Comment: Contains main paper and two Supplementary Material

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

PubMed Central

DI-fusion

Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

Author: Franco Horacio
Mitra Vikramjit
Sivaraman Ganesh
Yılmaz Emre
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

The rapid population aging has stimulated the development of assistive devices that provide personalized medical support to the needies suffering from various etiologies. One prominent clinical application is a computer-assisted speech training system which enables personalized speech therapy to patients impaired by communicative disorders in the patient's home environment. Such a system relies on the robust automatic speech recognition (ASR) technology to be able to provide accurate articulation feedback. With the long-term aim of developing off-the-shelf ASR systems that can be incorporated in clinical context without prior speaker information, we compare the ASR performance of speaker-independent bottleneck and articulatory features on dysarthric speech used in conjunction with dedicated neural network-based acoustic models that have been shown to be robust against spectrotemporal deviations. We report ASR performance of these systems on two dysarthric speech datasets of different characteristics to quantify the achieved performance gains. Despite the remaining performance gap between the dysarthric and normal speech, significant improvements have been reported on both datasets using speaker-independent ASR architectures.Comment: to appear in Computer Speech & Language - https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial text overlap with arXiv:1807.1094

arXiv.org e-Print Archive

Radboud Repository

ScholarBank@NUS

Relating Objective and Subjective Performance Measures for AAM-based Visual Speech Synthesizers

Author: Matthews I
Theobald B
Publication venue
Publication date: 01/01/2012
Field of study

We compare two approaches for synthesizing visual speech using Active Appearance Models (AAMs): one that utilizes acoustic features as input, and one that utilizes a phonetic transcription as input. Both synthesizers are trained using the same data and the performance is measured using both objective and subjective testing. We investigate the impact of likely sources of error in the synthesized visual speech by introducing typical errors into real visual speech sequences and subjectively measuring the perceived degradation. When only a small region (e.g. a single syllable) of ground-truth visual speech is incorrect we find that the subjective score for the entire sequence is subjectively lower than sequences generated by our synthesizers. This observation motivates further consideration of an often ignored issue, which is to what extent are subjective measures correlated with objective measures of performance? Significantly, we find that the most commonly used objective measures of performance are not necessarily the best indicator of viewer perception of quality. We empirically evaluate alternatives and show that the cost of a dynamic time warp of synthesized visual speech parameters to the respective ground-truth parameters is a better indicator of subjective quality

University of East Anglia digital repository

Evaluation of preprocessors for neural network speaker verification

Author: Salleh Sheikh-Hussain
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Edinburgh Research Archive

Speech and neural network dynamics

Author: Renals Stephen John
Publication venue: The University of Edinburgh
Publication date: 01/01/1990
Field of study

Edinburgh Research Archive

Deep audio-visual speech recognition

Author: Ma Pingchuan
Publication venue: Computing, Imperial College London
Publication date: 01/07/2022
Field of study

Decades of research in acoustic speech recognition have led to systems that we use in our everyday life. However, even the most advanced speech recognition systems fail in the presence of noise. The degraded performance can be compensated by introducing visual speech information. However, Visual Speech Recognition (VSR) in naturalistic conditions is very challenging, in part due to the lack of architectures and annotations. This thesis contributes towards the problem of Audio-Visual Speech Recognition (AVSR) from different aspects. Firstly, we develop AVSR models for isolated words. In contrast to previous state-of-the-art methods that consists of a two-step approach, feature extraction and recognition, we present an End-to-End (E2E) approach inside a deep neural network, and this has led to a significant improvement in audio-only, visual-only and audio-visual experiments. We further replace Bi-directional Gated Recurrent Unit (BGRU) with Temporal Convolutional Networks (TCN) to greatly simplify the training procedure. Secondly, we extend our AVSR model for continuous speech by presenting a hybrid Connectionist Temporal Classification (CTC)/Attention model, that can be trained in an end-to-end manner. We then propose the addition of prediction-based auxiliary tasks to a VSR model and highlight the importance of hyper-parameter optimisation and appropriate data augmentations. Next, we present a self-supervised framework, Learning visual speech Representations from Audio via self-supervision (LiRA). Specifically, we train a ResNet+Conformer model to predict acoustic features from unlabelled visual speech, and find that this pre-trained model can be leveraged towards word-level and sentence-level lip-reading. We also investigate the Lombard effect influence in an end-to-end AVSR system, which is the first work using end-to-end deep architectures and presents results on unseen speakers. We show that even if a relatively small amount of Lombard speech is added to the training set then the performance in a real scenario, where noisy Lombard speech is present, can be significantly improved. Lastly, we propose a detection method against adversarial examples in an AVSR system, where the strong correlation between audio and visual streams is leveraged. The synchronisation confidence score is leveraged as a proxy for audio-visual correlation and based on it, we can detect adversarial attacks. We apply recent adversarial attacks on two AVSR models and the experimental results demonstrate that the proposed approach is an effective way for detecting such attacks.Open Acces

Spiral - Imperial College Digital Repository

Reconstructing Speech from Human Auditory Cortex

Direct brain recordings from neurosurgical patients listening to speech reveal that the acoustic speech signals can be reconstructed from neural activity in auditory cortex

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

FigShare

Contributions of local speech encoding and functional connectivity to audio-visual speech perception

Author: Abrams
Alexandrou
Alho
Arnal
Arnal
Arnal
Ashburner
Auksztulewicz
Beauchamp
Belitski
Bernstein
Besle
Besserve
Besserve
Binder
Bornkessel-Schlesewsky
Bourguignon
Brainard
Callan
Callan
Callan
Canolty
Chandrasekaran
Chandrasekaran
Chennu
Chu
Clos
Crosse
Ding
Ding
Du
Evans
Ferstl
Fonteneau
Freedman
Ghazanfar
Ghazanfar
Giraud
Gow
Grant
Greenberg
Gross
Guediche
Hasson
Hasson
Hasson
Heim
Hickok
Hickok
Hipp
Horowitz-Kraus
Ince
Ince
Ince
Kandylaki
Kayser
Kayser
Kayser
Keitel
Krieger-Redwood
Kriegeskorte
Lakatos
Lee
Maris
Massey
McGettigan
Meister
Mesgarani
Morillon
Morís Fernández
Nath
Ng
Ohshiro
Oostenveld
Osnes
Panzeri
Park
Park
Peelle
Peelle
Pickering
Poeppel
Poeppel
Pola
Pouget
Price
Rauschecker
Riedel
Ross
Schepers
Schneidman
Schreiber
Schroeder
Schroeder
Schwartz
Schwartz
Skipper
Sohoglu
Sumby
Tavano
Thorne
van Atteveldt
van Atteveldt
van Wassenhove
Vetter
Vicente
Wibral
Wild
Wilson
Winkler
Wright
Yarkoni
Zion Golumbic
Zion Golumbic
Publication venue: eLife Sciences Publications
Publication date: 01/01/2017
Field of study

Seeing a speaker’s face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker’s face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments

Crossref

HAL AMU

ZENODO

Dryad Digital Repository (Duke University)

Publications at Bielefeld University

Electronic Archiving System

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Enlighten

FigShare