Search CORE

22 research outputs found

Audio-visual speech perception: a developmental ERP investigation

Author: Dick Frederic
Karmiloff-Smith Annette
Knowland Victoria C.P.
Mercure E.
Thomas Michael S.C.
Publication venue: 'Wiley'
Publication date: 31/10/2013
Field of study

Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language learning. We therefore explored this at the neural level. The event-related potential (ERP) technique has been used to assess the mechanisms of audio-visual speech perception in adults, with visual cues reliably modulating auditory ERP responses to speech. Previous work has shown congruence-dependent shortening of auditory N1/P2 latency and congruence-independent attenuation of amplitude in the presence of auditory and visual speech signals, compared to auditory alone. The aim of this study was to chart the development of these well-established modulatory effects over mid-to-late childhood. Experiment 1 employed an adult sample to validate a child-friendly stimulus set and paradigm by replicating previously observed effects of N1/P2 amplitude and latency modulation by visual speech cues; it also revealed greater attenuation of component amplitude given incongruent audio-visual stimuli, pointing to a new interpretation of the amplitude modulation effect. Experiment 2 used the same paradigm to map cross-sectional developmental change in these ERP responses between 6 and 11 years of age. The effect of amplitude modulation by visual cues emerged over development, while the effect of latency modulation was stable over the child sample. These data suggest that auditory ERP modulation by visual speech represents separable underlying cognitive processes, some of which show earlier maturation than others over the course of development

City Research Online

Goldsmiths Research Online

Crossref

UCL Discovery

PubMed Central

Birkbeck Institutional Research Online

Adaptive Decision Fusion for Audio-Visual Speech Recognition

Author: Cheol Hoon Park
Jong-Seok Lee
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

A study of lip movements during spontaneous dialog and its application to voice activity detection

Author: Girin Laurent
Jutten Christian
Rivet Bertrand
Savariaux Christophe
Schwartz Jean-Luc
Sodoyer David
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2009
Field of study

International audienceThis paper presents a quantitative and comprehensive study of the lip movements of a given speaker in different speech/nonspeech contexts, with a particular focus on silences i.e., when no sound is produced by the speaker . The aim is to characterize the relationship between "lip activity" and "speech activity" and then to use visual speech information as a voice activity detector VAD . To this aim, an original audiovisual corpus was recorded with two speakers involved in a face-to-face spontaneous dialog, although being in separate rooms. Each speaker communicated with the other using a microphone, a camera, a screen, and headphones. This system was used to capture separate audio stimuli for each speaker and to synchronously monitor the speaker's lip movements. A comprehensive analysis was carried out on the lip shapes and lip movements in either silence or nonsilence i.e., speech+nonspeech audible events . A single visual parameter, defined to characterize the lip movements, was shown to be efficient for the detection of silence sections. This results in a visual VAD that can be used in any kind of environment noise, including intricate and highly nonstationary noises, e.g., multiple and/or moving noise sources or competing speech signals

Crossref

Hal - Université Grenoble Alpes

The Natural Statistics of Audiovisual Speech

Author: AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AA Ghazanfar
AL Giraud
Alice Caplier
Andrea Trubanova
Asif A. Ghazanfar
C Abry
C Chandrasekaran
C Kayser
C Rajkai
CE Schroeder
Chandramouli Chandrasekaran
CR Lansing
D Poeppel
D Sodoyer
D Sodoyer
E Ahissar
E Vatikiotis-Bateson
EP Simoncelli
G Buzsaki
G Monaci
GS Pollack
H Barlow
H Luo
H McGurk
H Yehia
HC Yehia
IJ Hirsh
J Kim
J Ohala
J Westbury
JS Garofolo
JX Maier
JX Maier
K Munhall
K Munhall
K Munhall
K Saberi
K von Kriegstein
K von Kriegstein
Karl J. Friston
KG Munhall
KG Munhall
KMG Fu
KW Grant
L Smith
LD Rosenblum
LD Rosenblum
LD Rosenblum
LD Rosenblum
LD Rosenblum
M Cooke
M Kamachi
M Lungarella
M Sams
M Vitkovitch
M Vitkovitch
MR Jarvis
N Eveno
NC Singh
NF Dixon
P Cosi
P Lakatos
P Lakatos
P Lieberman
P Suppes
PP Mitra
Q Summerfield
Q Summerfield
R Campbell
R Drullman
R Drullman
R Pfeifer
RT Canolty
RV Shannon
S Greenberg
S Stillittano
SJ Kiebel
Sébastien Stillittano
T Lallouache
U Werner-Reiss
V van Wassenhove
V van Wassenhove
ZM Smith
Publication venue: Public Library of Science
Publication date: 01/07/2009
Field of study

Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2–7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver

Public Library of Science (PLOS)

Princeton University Open Access Repository

Crossref

Hal - Université Grenoble Alpes

Directory of Open Access Journals

PubMed Central

Detection and attention for auditory, visual, and audiovisual speech in children with hearing loss

Author: Alves
Balota
Bergeson
Bernstein
Betts
Biederman
Brandwein
Briscoe
Calvert
Campbell
Chen
Cooley
Corbetta
Fritz
Gilley
Gustafson
Heathcote
Hervey
Jerger
Jerger
Jerger
Jerger
Jerger
Jerger
Jerger
Jerger
Jerger
Jerger
Key
Kim
Lalonde
Lalonde
Langner
Lansing
Laurienti
Lewis
Lickliter
Mavica
McConachie
McVay
Mordkoff
Nissen
Otsuka
O’Toole
Parris
Ratcliff
Reinvang
Rollins
Scaltritti
Seitz
Smith
Stevenson
Tharpe
Thillay
Tjan
Tsao
Tse
Weissman
Whelan
Whyte
Wild
Wingfield
Woods
Worster
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date: 07/10/2019
Field of study

Crossref

Explore Bristol Research

Recommended from our members

Monkey lipsmacking develops like the human speech rhythm

Author: Ackermann
Bell
Bohland
Buzsaki
Campbell
Caruana
Catrin Blank
Chandrasekaran
Clancy
Crystal
Davis
Deacon
Deacon
Dolata
Dronkers
Drullman
Fant
Ferrari
Ferrari
Ferrari
Finlay
Fitch
Ghazanfar
Ghazanfar
Gibson
Giraud
Goldstein
Goldstein
Goodall
Gottlieb
Green
Green
Greenberg
Hinde
Keysers
Kiliaridis
Kim
Kingsbury
Levitt
Lindblom
Locke
Lund
Luo
Luo
Lynch
MacNeilage
MacNeilage
Maestripieri
Malecot
Malkova
Moore
Nathani
Oller
Parr
Pinker
Poeppel
Preuschoff
Redican
Ruppenthal
Saberi
Schneirla
Schroeder
Shannon
Smith
Smith
Smith
Steeve
Steeve
Thelen
Thelen
Tingley
Van Hooff
Vitkovitch
Vitkovitch
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

Across all languages studied to date, audiovisual speech exhibits a consistent rhythmic structure. This rhythm is critical to speech perception. Some have suggested that the speech rhythm evolved de novo in humans. An alternative account—the one we explored here—is that the rhythm of speech evolved through the modification of rhythmic facial expressions. We tested this idea by investigating the structure and development of macaque monkey lipsmacks and found that their developmental trajectory is strikingly similar to the one that leads from human infant babbling to adult speech. Specifically, we show that: 1) younger monkeys produce slower, more variable mouth movements and as they get older, these movements become faster and less variable; and 2) this developmental pattern does not occur for another cyclical mouth movement—chewing. These patterns parallel human developmental patterns for speech and chewing. They suggest that, in both species, the two types of rhythmic mouth movements use different underlying neural circuits that develop in different ways. Ultimately, both lipsmacking and speech converge on a ~5 Hz rhythm that represents the frequency that characterizes the speech rhythm of human adults. We conclude that monkey lipsmacking and human speech share a homologous developmental mechanism, lending strong empirical support for the idea that the human speech rhythm evolved from the rhythmic facial expressions of our primate ancestors

Princeton University Open Access Repository

Archivio istituzionale della Ricerca - Università degli Studi di Parma

Crossref

Nottingham Trent Institutional Repository (IRep)

PubMed Central