Search CORE

26 research outputs found

Biomimetic multi-resolution analysis for robust speaker recognition

Author: C Schreiner
D Garcia-Romero
D Garcia-Romero
D Zotkin
Dmitry N Zotkin
H Beigi
H Hermansky
H Hirsch
H Steeneken
H Versnel
J Woojay
JS Garofolo
K O’Connor
K Wang
L Miller
M Elhilali
Mounya Elhilali
P Kenny
P Loizou
Q Wu
R Auckenthaler
R Drullman
Ramani Duraiswami
S Greenberg
S Greenberg
Sridhar Krishna Nemala
T Arai
T Cover
T Elliott
T Kinnunen
X Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Single-channel acoustic echo cancellation in noise based on gradient-based adaptive filtering

Author: AWH Khong
B Widrow
C Beaugeant
C Breining
E Hänsler
E Hänsler
F Guangzeng
F Lindstrom
G Schmidt
H Yasukawa
JS Garofolo
JS Lim
M Berouti
M Omair Ahmad
M Yukawa
R Martin
R Nath
R Topa
S Boll
S Haykin
S Wu
Shaikh Anowarul Fattah
SM Kuo
SV Vaseghi
U Mahbub
Upal Mahbub
V Myllylä
Wei-Ping Zhu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Novel adaptive muting technique for packet loss concealment of ITU-T G.722 using optimized parametric shaping functions

Author: AV Aho
B Goode
B Kovesi
B-K Lee
Bong-Ki Lee
BW Wah
C Padhye
CA Rodbro
E Gunduzhan
ITU-T Rec. G.191
ITU-T Rec. G.722 Appendix III
ITU-T Rec. G.722 Appendix IV
ITU-T Rec. P.800
ITU-T Rec. P.862.2
J Lindblom
J Ramsey
J Suzuki
J Thyssen
JH James
Joon-Hyuk Chang
JS Garofolo
L Jeremie
MK Lee
N Aoki
P Mermelstein
S Bruhn
S Floyd
S Lingfen
S Quackenbush
S Subasingha
T Chua
U Tadeus
Y Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

An efficient solution to sparse linear prediction analysis of speech

Author: B Atal
D Giacobello
D Giacobello
D Giacobello
D Giacobello
D Giacobello
D Meng
D Wong
E Candès
E Denoel
EJ Candès
JS Garofolo
K Murty
Khalid Daoudi
M Thomas
MA Little
MA Little
N Hurley
S Boyd
S Singhal
T Drugman
TF Quatieri
Vahid Khanagha
WC CHU
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The Natural Statistics of Audiovisual Speech

Author: AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AL Giraud
Alice Caplier
Andrea Trubanova
Asif A. Ghazanfar
C Abry
C Chandrasekaran
C Kayser
C Rajkai
CE Schroeder
Chandramouli Chandrasekaran
CR Lansing
D Poeppel
D Sodoyer
D Sodoyer
E Ahissar
E Vatikiotis-Bateson
EP Simoncelli
G Buzsaki
G Monaci
GS Pollack
H Barlow
H Luo
H McGurk
H Yehia
HC Yehia
IJ Hirsh
J Kim
J Ohala
J Westbury
JS Garofolo
JX Maier
JX Maier
K Munhall
K Munhall
K Munhall
K Saberi
K von Kriegstein
K von Kriegstein
Karl J. Friston
KG Munhall
KG Munhall
KMG Fu
KW Grant
L Smith
LD Rosenblum
LD Rosenblum
LD Rosenblum
LD Rosenblum
LD Rosenblum
M Cooke
M Kamachi
M Lungarella
M Sams
M Vitkovitch
M Vitkovitch
MR Jarvis
N Eveno
NC Singh
NF Dixon
P Cosi
P Lakatos
P Lakatos
P Lieberman
P Suppes
PP Mitra
Q Summerfield
Q Summerfield
R Campbell
R Drullman
R Drullman
R Pfeifer
RT Canolty
RV Shannon
S Greenberg
S Stillittano
SJ Kiebel
Sébastien Stillittano
T Lallouache
U Werner-Reiss
V van Wassenhove
V van Wassenhove
ZM Smith
Publication venue: Public Library of Science
Publication date: 01/07/2009
Field of study

Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2–7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver

VTech Works (Virginia Tech)

Public Library of Science (PLOS)

Princeton University Open Access Repository

Crossref

Hal - Université Grenoble Alpes

Directory of Open Access Journals

PubMed Central

HAL: Hyper Article en Ligne

Automatic Phonetic Segmentation and Pronunciation Detection with Various Approaches of Acoustic Modeling

Author: DT Toledano
J Matoušek
J Matoušek
J Nouza
JS Garofolo
KF Lee
P Mizera
V Peddinti
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Perceptual evaluation of blind source separation in object-based audio production

Author: A Alinaghi
AJR Simpson
AJR Simpson
CH Taal
E Vincent
J Herre
JS Garofolo
MI Mandel
R McGill
V Emiya
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/06/2018
Field of study

Object-based audio has the potential to enable multime- dia content to be tailored to individual listeners and their reproduc- tion equipment. In general, object-based production assumes that the objects|the assets comprising the scene|are free of noise and inter- ference. However, there are many applications in which signal separa- tion could be useful to an object-based audio work ow, e.g., extracting individual objects from channel-based recordings or legacy content, or recording a sound scene with a single microphone array. This paper de- scribes the application and evaluation of blind source separation (BSS) for sound recording in a hybrid channel-based and object-based workflow, in which BSS-estimated objects are mixed with the original stereo recording. A subjective experiment was conducted using simultaneously spoken speech recorded with omnidirectional microphones in a rever- berant room. Listeners mixed a BSS-extracted speech object into the scene to make the quieter talker clearer, while retaining acceptable au- dio quality, compared to the raw stereo recording. Objective evaluations show that the relative short-term objective intelligibility and speech qual- ity scores increase using BSS. Further objective evaluations are used to discuss the in uence of the BSS method on the remixing scenario; the scenario shown by human listeners to be useful in object-based audio is shown to be a worse-case scenario

Crossref

University of Surrey

Surrey Research Insight