Search CORE

Wilfrid Laurier University

A corpus of audio-visual Lombard speech with frontal and profile views

Author: Junqua J.-C.
King D. E.
Lee B.
Pisoni D.
Povey A.
Vatikiotis-Bateson E.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2018
Field of study

This paper presents a bi-view (front and side) audiovisual Lombard speech corpus, which is freely available for download. It contains 5400 utterances (2700 Lombard and 2700 plain reference utterances), produced by 54 talkers, with each utterance in the dataset following the same sentence format as the audiovisual “Grid” corpus [Cooke, Barker, Cunningham, and Shao (2006). J. Acoust. Soc. Am. 120(5), 2421–2424]. Analysis of this dataset confirms previous research, showing prominent acoustic, phonetic, and articulatory speech modifications in Lombard speech. In addition, gender differences are observed in the size of Lombard effect. Specifically, female talkers exhibit a greater increase in estimated vowel duration and a greater reduction in F2 frequency

White Rose Research Online

HAL AMU

Rapid Change in Articulatory Lip Movement Induced by Preceding Auditory Feedback during Production of Bilabial Plosives

Author: A Postma
A Stuart
BS Lee
DE Callan
E Lombard
E Saltzman
E Vatikiotis-Bateson
FH Guenther
FH Guenther
G Curio
G Fairbanks
GJ Borden
H Kawahara
HG Mueller
Hiroaki Gomi
JA Jones
JA Kelso
JA Kelso
JA Tourville
JF Houde
JF Houde
JH Abbs
Makio Kashino
Paul L. Gribble
RW Peters
S Hibi
S Nonaka
S Sapir
S Zanini
SE Blumstein
SJ Eliades
TA Burnett
TA Burnett
Takemi Mochida
TH Heinks-Maldonado
VL Gracco
VL Gracco
VM Villacorta
WR Tiffany
Publication venue: Public Library of Science
Publication date: 08/11/2010
Field of study

BACKGROUND: There has been plentiful evidence of kinesthetically induced rapid compensation for unanticipated perturbation in speech articulatory movements. However, the role of auditory information in stabilizing articulation has been little studied except for the control of voice fundamental frequency, voice amplitude and vowel formant frequencies. Although the influence of auditory information on the articulatory control process is evident in unintended speech errors caused by delayed auditory feedback, the direct and immediate effect of auditory alteration on the movements of articulators has not been clarified. METHODOLOGY/PRINCIPAL FINDINGS: This work examined whether temporal changes in the auditory feedback of bilabial plosives immediately affects the subsequent lip movement. We conducted experiments with an auditory feedback alteration system that enabled us to replace or block speech sounds in real time. Participants were asked to produce the syllable /pa/ repeatedly at a constant rate. During the repetition, normal auditory feedback was interrupted, and one of three pre-recorded syllables /pa/, /Φa/, or /pi/, spoken by the same participant, was presented once at a different timing from the anticipated production onset, while no feedback was presented for subsequent repetitions. Comparisons of the labial distance trajectories under altered and normal feedback conditions indicated that the movement quickened during the short period immediately after the alteration onset, when /pa/ was presented 50 ms before the expected timing. Such change was not significant under other feedback conditions we tested. CONCLUSIONS/SIGNIFICANCE: The earlier articulation rapidly induced by the progressive auditory input suggests that a compensatory mechanism helps to maintain a constant speech rate by detecting errors between the internally predicted and actually provided auditory information associated with self movement. The timing- and context-dependent effects of feedback alteration suggest that the sensory error detection works in a temporally asymmetric window where acoustic features of the syllable to be produced may be coded

The listening talker: A review of human and algorithmic context-induced modifications of speech

Author: Adriaans
Albin
Alcántara
Andruski
ANSI S3.5-1997
Arai
Assmann
Assmann
Aubanel
Aubanel
Aubanel
Babel
Babel
Bailly
Baran
Barker
Batliner
Beautemps
Beckford Wassink
Beckman
Beckman
Bele
Bell
Benoit
Best
Biersack
Bird
Blamey
Boike
Bond
Bond
Bond
Boril
Bradlow
Bradlow
Bradlow
Bradlow
Branigan
Bregman
Bronkhorst
Brungart
Brungart
Brunskog
Burnham
Burnham
Burnham
Burnham
Castellanos
Chen
Cheskin
Cheyne
Chládková
Chung
Church
Cole
Cooke
Cooke
Cooke
Cooke
Cooke
Cooke
Cooper
Cooper
Cox
Cox
Cristia
Cristià
Cutler
Darwin
Dau
Davis
Davis
Dejonckere
Delvaux
Dodane
Dreher
Dudley
Dunst
Egan
Englund
Eriksson
Erting
Estival
Falk
Farris
Ferguson
Ferguson
Fernald
Fernald
Fernald
Fernald
Fernald
Field
Fisher
Fisher
Fitzpatrick
Floccia
Fogerty
Fogerty
Fowler
Fowler
Freed
Fux
Fux
Fux
Gagne
Gagne
Gagne
Galati
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garrod
Giles
Goldwater
Golinkoff
Golinkoff
Gordon-Salant
Granlund
Granlund
Green
Grieser
Hawley
Hazan
Hazan
Hazan
Hazan
Healey
Helfer
Helfer
Hornsby
Horwitz
Howell
Imaizumi
Imaizumi
Ishizuka
Janarthanam
Johnson
Jun
Jung
Junqua
Junqua
Junqua
Kadiri
Kang
Kaplan
Kappes
Kawahara
Kewley-Port
Kim
Kim
Kirchhoff
Kitamura
Kitamura
Kondaurova
Kondaurova
Korn
Krause
Krause
Krause
Krause
Krause
Kretsinger
Kryter
Kuhl
Kusumoto
Lam
Lane
Laures
Laures
Lee
Lienard
Lindblom
Lindblom
Little
Liu
Liu
Liu
Lombard
Long
Long
Lu
Lu
Lu
Malsheen
Maniwa
Marin
Martin Cooke
Masataka
Matthies
Mattys
Mattys
Mattys
Maye
Maye
Mayo
Maëva Garnier
Metz
Michael
Miller
Mokbel
Monsen
Montgomery
Moon
Moon
Moore
Moore
Moulines
Naoi
Natale
Nejime
Newport
Niederjohn
Niwano
Niwano
Ostroff
Oviatt
Owren
Papoušek
Papoušek
Papoušek
Pardo
Patel
Patel
Payne
Payton
Pegg
Pelegrín-García
Perkell
Petkov
Peutz
Phillips
Picheny
Picheny
Picheny
Pickering
Pickett
Pickett
Pisoni
Pittman
Pollack
Pucher
Pye
Rasetshwane
Ratner
Ratner
Ratner
Rieser
Rogers
Rostolland
Rostolland
Ryan
Räsänen
Sachs
Sankowska
Sauert
Scarborough
Schmitt
Schulman
Schum
Shimron
Simon King
Sims
Singh
Skowronski
Smiljanic
Smith
Snow
Song
Stanton
Stern
Stilp
Stylianou
Summers
Summers
Sundberg
Sundberg
Sundberg
Suni
Synnestvedt
Taal
Taal
Tang
Tang
Tang
Tartter
Ternström
Thanavisuth
Titze
Torick
Trainor
Trainor
Traunmuller
Uchanski
Uchanski
Uther
Valentini-Botinhao
Valentini-Botinhao
Valian
Valian
van de Weijer
van Rooij
Vatikiotis-Bateson
Villegas
Vincent Aubanel
Vitevitch
Wang
Warner
Warren
Watson
Webster
Welby
Welby
Werker
World Health Organisation
Xu
Xu
Yamagishi
Yang
Yoo
Zajdó
Zampini
Zangl
Zhao
Zipf
Zorilă
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

Hal - Université Grenoble Alpes

Edinburgh Research Explorer

Western Sydney ResearchDirect

What Affects Social Attention? Social Presence, Eye Contact and Autistic Traits

Author: A Frischen
A Glenberg
A Kingstone
A Klin
A Nadig
A Senju
AC Gallup
Alan Kingstone
B Guerin
B Noris
C Lord
C von Hofsten
CF Norbury
CL Kleinke
DC Richardson
DM Riby
E Birmingham
E Risko
E Vatikiotis-Bateson
EF Risko
F Chen
G Bird
G Csibra
G Doherty-Sneddon
G Doherty-Sneddon
GT Baranek
J Droll
J Osterling
K Laidlaw
K Pierce
Kevin Paterson
KEW Laidlaw
L Schilbach
LM Ponkanen
M Freeth
M Freeth
M Hayhoe
MD Rutherford
Megan Freeth
ML Spezio
PC Ellsworth
S Baron-Cohen
S Brown-Schmidt
S Fletcher-Watson
T Farroni
T Foulsham
T Foulsham
T Nakano
T Stein
Tom Foulsham
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Social understanding is facilitated by effectively attending to other people and the subtle social cues they generate. In order to more fully appreciate the nature of social attention and what drives people to attend to social aspects of the world, one must investigate the factors that influence social attention. This is especially important when attempting to create models of disordered social attention, e.g. a model of social attention in autism. Here we analysed participants' viewing behaviour during one-to-one social interactions with an experimenter. Interactions were conducted either live or via video (social presence manipulation). The participant was asked and then required to answer questions. Experimenter eye-contact was either direct or averted. Additionally, the influence of participant self-reported autistic traits was also investigated. We found that regardless of whether the interaction was conducted live or via a video, participants frequently looked at the experimenter's face, and they did this more often when being asked a question than when answering. Critical differences in social attention between the live and video interactions were also observed. Modifications of experimenter eye contact influenced participants' eye movements in the live interaction only; and increased autistic traits were associated with less looking at the experimenter for video interactions only. We conclude that analysing patterns of eye-movements in response to strictly controlled video stimuli and natural real-world stimuli furthers the field's understanding of the factors that influence social attention

University of Essex Research Repository

CiteSeerX

White Rose Research Online

The Natural Statistics of Audiovisual Speech

Author: AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AL Giraud
Alice Caplier
Andrea Trubanova
Asif A. Ghazanfar
C Abry
C Chandrasekaran
C Kayser
C Rajkai
CE Schroeder
Chandramouli Chandrasekaran
CR Lansing
D Poeppel
D Sodoyer
D Sodoyer
E Ahissar
E Vatikiotis-Bateson
EP Simoncelli
G Buzsaki
G Monaci
GS Pollack
H Barlow
H Luo
H McGurk
H Yehia
HC Yehia
IJ Hirsh
J Kim
J Ohala
J Westbury
JS Garofolo
JX Maier
JX Maier
K Munhall
K Munhall
K Munhall
K Saberi
K von Kriegstein
K von Kriegstein
Karl J. Friston
KG Munhall
KG Munhall
KMG Fu
KW Grant
L Smith
LD Rosenblum
LD Rosenblum
LD Rosenblum
LD Rosenblum
LD Rosenblum
M Cooke
M Kamachi
M Lungarella
M Sams
M Vitkovitch
M Vitkovitch
MR Jarvis
N Eveno
NC Singh
NF Dixon
P Cosi
P Lakatos
P Lakatos
P Lieberman
P Suppes
PP Mitra
Q Summerfield
Q Summerfield
R Campbell
R Drullman
R Drullman
R Pfeifer
RT Canolty
RV Shannon
S Greenberg
S Stillittano
SJ Kiebel
Sébastien Stillittano
T Lallouache
U Werner-Reiss
V van Wassenhove
V van Wassenhove
ZM Smith
Publication venue: Public Library of Science
Publication date: 01/07/2009
Field of study

Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2–7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver

Princeton University Open Access Repository

Hal - Université Grenoble Alpes

Towards an articulatory phonology

Author: Abercrombie
Abraham
Anderson
Anderson
Anderson
Aronoff
Benguerel
Bernstein
Cairns
Catford
Chomsky
Clements
Ewen
Feinstein
Fel'dman
Fourakis
Fowler
Fowler
Fowler
Fromkin
Fukui
Goldsmith
Haggard
Halle
Hayes
Herbert
Hockett
Hooper
Hooper
Jespersen
Kahn
Keating
Kelso
Kelso
Kelso
Kelso
Kuipers
Ladefoged
Ladefoged
Lass
Lehiste
Liberman
Lindau
Lindblom
Lindblom
Lisker
Lovins
Löfqvist
Löfqvist
McCarthy
Mitleb
Nurse
Ohala
Port
Pétursson
Rosen
Saltzman
Selkirk
Stelmach
Tuller
Turvey
Vatikiotis-Bateson
Walsh
Publication venue: 'Cambridge University Press (CUP)'
Publication date
Field of study

Predicting 3D lip movement using facial sEMG: a first step towards estimating functional and aesthetic outcome of oral cancer surgery

Author: A Phinyomark
A Van Boxtel
Alfons J. M. Balm
AM Kreeft
AM Kreeft
BJ Betts
Dieta Brandsma
E Vatikiotis-Bateson
F Vogt
Ferdinand van der Heijden
I Stavness
I Stavness
J-PV Pelteret
JC Lucero
JP Shah
K Honda
LD Robertson
Ludi E. Smeele
M Hamedi
Maarten J. A. van Alphen
Merijn Eskes
MJA Alphen Van
N Son Van
NP Schumann
PM Prendergast
R Campbell
R Siegel
SP Arjunan
X Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Monkeys and Humans Share a Common Computation for Face/Voice Integration

Author: A Diederich
A Diederich
A Diederich
A Diederich
A Ghazanfar
A Izumi
AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AH Bell
AK Churchland
AM Burrows
Andrea Trubanova
Asif A. Ghazanfar
BA Rowland
BD Corneil
BE Stein
BE Stein
BE Stein
BE Stein
BE Stein
C Cappe
C Chandrasekaran
C Chandrasekaran
CC Sherwood
CC Sherwood
CC Sherwood
CC Sherwood
Chandramouli Chandrasekaran
D Alais
D Reisberg
D Senkowski
DE Callan
DE Shub
DH Raab
E Huber
E Huber
E Kohler
E Vatikiotis-Bateson
FM Plat
G Gourevitch
G Musacchia
GA Calvert
H Colonius
H Colonius
H McGurk
H Yehia
HC Hughes
I Skaliora
IJ Hirsh
IR Lansing
J Besle
J Miller
J Miller
J Miller
J Miller
J Navarra
J Ohala
J Sliwa
J Todd
J-L Schwartz
JD Roitman
JL Flanagan
JP Egan
K von Kriegstein
KG Munhall
KG Munhall
KW Grant
LA Parr
LA Ross
LC Populin
LD Rosenblum
LE Bernstein
LH Arnal
Luis Lemus
M Avillac
M Giray
M Gondan
M Gondan
M Hershenson
M Murase
MA Frens
MA Meredith
Matthias Gondan
MD Hauser
MD Hauser
MH Giard
ML Patterson
ML Patterson
MO Ernst
MO Ernst
NE Barraclough
NF Dixon
Olaf Sporns
PK Kuhl
Q Summerfield
Q Summerfield
RA Stevenson
RJ Andrew
S Ouni
SR Partan
T Sugihara
T Yang
TA Evans
TE Rowell
TM Wright
TR Stanford
V Klucharev
V van Wassenhove
V van Wassenhove
W Jiang
W Schwarz
W Schwarz
WH Sumby
WJ Ma
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Speech production involves the movement of the mouth and other regions of the face resulting in visual motion cues. These visual cues enhance intelligibility and detection of auditory speech. As such, face-to-face speech is fundamentally a multisensory phenomenon. If speech is fundamentally multisensory, it should be reflected in the evolution of vocal communication: similar behavioral effects should be observed in other primates. Old World monkeys share with humans vocal production biomechanics and communicate face-to-face with vocalizations. It is unknown, however, if they, too, combine faces and voices to enhance their perception of vocalizations. We show that they do: monkeys combine faces and voices in noisy environments to enhance their detection of vocalizations. Their behavior parallels that of humans performing an identical task. We explored what common computational mechanism(s) could explain the pattern of results we observed across species. Standard explanations or models such as the principle of inverse effectiveness and a “race” model failed to account for their behavior patterns. Conversely, a “superposition model”, positing the linear summation of activity patterns in response to visual and auditory components of vocalizations, served as a straightforward but powerful explanatory mechanism for the observed behaviors in both species. As such, it represents a putative homologous mechanism for integrating faces and voices across primates

Princeton University Open Access Repository

University of Regensburg Publication Server