Search CORE

7,431 research outputs found

End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models

Author: Busso Carlos
Tao Fei
Publication venue
Publication date: 12/09/2018
Field of study

Speech activity detection (SAD) plays an important role in current speech processing systems, including automatic speech recognition (ASR). SAD is particularly difficult in environments with acoustic noise. A practical solution is to incorporate visual information, increasing the robustness of the SAD approach. An audiovisual system has the advantage of being robust to different speech modes (e.g., whisper speech) or background noise. Recent advances in audiovisual speech processing using deep learning have opened opportunities to capture in a principled way the temporal relationships between acoustic and visual features. This study explores this idea proposing a \emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to absolute improvements up to 1.2% under practical scenarios over a VAD baseline using only audio implemented with deep neural network (DNN). The proposed approach achieves 92.7% F1-score when it is evaluated using the sensors from a portable tablet under noisy acoustic environment, which is only 1.0% lower than the performance obtained under ideal conditions (e.g., clean speech obtained with a high definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio

arXiv.org e-Print Archive

The COGs (context, object, and goals) in multisensory processing

Author: A Alsius
A Alsius
A Alsius
A Alsius
A Alsius
A Amedi
A Baddeley
A Finisguerra
A Fort
A Jones
A Klapetek
A Thelen
A Thelen
A Thillay
A Vatakis
A Walker-Andrews
AA Ghazanfar
AM Cravo
AO Diaconescu
AR Powers
B Baier
BA Rowland
BE Stein
BK Barakat
BR Sarmiento
C Cappe
C Cappe
C Cappe
C Chandrasekaran
C Kayser
C Lunghi
C Spence
C Summerfield
C Summerfield
CA Sutherland
CE Schroeder
CE Schroeder
CI Baker
CJ Mondloch
CL Folk
CR Fetsch
CV Parise
CV Parise
D Amso
D Brang
D Nardo
D Sanabria
D Senkowski
D Talsma
D Talsma
D Talsma
D Talsma
DE Astle
DJ Froyen
DJ Lewkowicz
DR Bach
DS Barth
E Barenholtz
E Burg van der
E Burg van der
E Orchard-Mills
E Orchard-Mills
EM Zion Golumbic
FC Rind
G Musacchia
G Scerif
H McGurk
I Holloway
IC Fiebelkorn
IC Fiebelkorn
J Besle
J Duncan
J Heron
J Tuomainen
J Vroomen
JJ Stekelenburg
JT Coull
JX Maier
L Iordanescu
L Naci
L Spierer
LH Arnal
LM Fernández
M Aller
M Baart
M Bar
M Gori
M Nardini
M Nardini
MA Meredith
MA Meredith
MH Giard
Micah M. Murray
MM Murray
MM Murray
MM Murray
MS Beauchamp
N Altieri
N Atteveldt van
N Atteveldt van
N Atteveldt van
N Bien
N Ikumi
N Lavie
Nienke van Atteveldt
NM Atteveldt van
NM Atteveldt van
O Doehrmann
O Nahorna
P Belin
P Fries
P Lakatos
P Niemi
PA Neil
Pawel J. Matusz
PJ Laurienti
PJ Matusz
PJ Matusz
PJ Matusz
PJ Matusz
PJ Matusz
PJ Matusz
R Cecere
R Cecere
R De Meo
R Desimone
R Ee van
R Frost
R Martuzzi
RA Stevenson
RA Stevenson
RA Stevenson
RB Welch
S Dehaene
S Masterberdino
S Molholm
S Molholm
S Soto-Faraco
S ten Oever
S ten Oever
S Tyll
S Werner
S Werner
S Yuval-Greenberg
SA Los
Salvador Soto-Faraco
Sanne ten Oever
SJ Luck
SL Fairhall
T Raij
T Raij
T Rohe
TD Palmer
TS Braver
UR Beierholm
V Romei
V Romei
V Romei
V Romei
V Santangelo
V Santangelo
V Wassenhove Van
VA Lamme
Vincenzo Romei
W Fujisaki
W Schiff
WA Teder-Sälejärvi
Y Ding
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Our understanding of how perception operates in real-world environments has been substantially advanced by studying both multisensory processes and “top-down” control processes influencing sensory processing via activity from higher-order brain areas, such as attention, memory, and expectations. As the two topics have been traditionally studied separately, the mechanisms orchestrating real-world multisensory processing remain unclear. Past work has revealed that the observer’s goals gate the influence of many multisensory processes on brain and behavioural responses, whereas some other multisensory processes might occur independently of these goals. Consequently, other forms of top-down control beyond goal dependence are necessary to explain the full range of multisensory effects currently reported at the brain and the cognitive level. These forms of control include sensitivity to stimulus context as well as the detection of matches (or lack thereof) between a multisensory stimulus and categorical attributes of naturalistic objects (e.g. tools, animals). In this review we discuss and integrate the existing findings that demonstrate the importance of such goal-, object- and context-based top-down control over multisensory processing. We then put forward a few principles emerging from this literature review with respect to the mechanisms underlying multisensory processing and discuss their possible broader implications

University of Essex Research Repository

Maastricht University Research Portal

Crossref

VU Research Portal

Serveur académique lausannois

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

UPF Digital Repository

MPG.PuRe

CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

Author: Boujemaa Nozha
Compañó Ramón
Dosch Christoph
Geurts Joost
Karlgren Jussi
King Paul
Kompatsiaris Yiannis
Köhler Joachim
Le Moine Jean-Yves
Ortgies Robert
Point Jean-Charles
Rotenberg Boris
Rudström Åsa
Sebe Nicu
Publication venue: Chorus Project Consortium
Publication date: 01/01/2007
Field of study

Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Time-delay neural network for continuous emotional dimension prediction from facial expression sequences

Author: Hongying Meng
Jinkuang Cheng
John Cosmas
Nadia Bianchi-berthouze
Senior Member
Yangdong Deng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2016
Field of study

"(c) 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works."Automatic continuous affective state prediction from naturalistic facial expression is a very challenging research topic but very important in human-computer interaction. One of the main challenges is modeling the dynamics that characterize naturalistic expressions. In this paper, a novel two-stage automatic system is proposed to continuously predict affective dimension values from facial expression videos. In the first stage, traditional regression methods are used to classify each individual video frame, while in the second stage, a Time-Delay Neural Network (TDNN) is proposed to model the temporal relationships between consecutive predictions. The two-stage approach separates the emotional state dynamics modeling from an individual emotional state prediction step based on input features. In doing so, the temporal information used by the TDNN is not biased by the high variability between features of consecutive frames and allows the network to more easily exploit the slow changing dynamics between emotional states. The system was fully tested and evaluated on three different facial expression video datasets. Our experimental results demonstrate that the use of a two-stage approach combined with the TDNN to take into account previously classified frames significantly improves the overall performance of continuous emotional state estimation in naturalistic facial expressions. The proposed approach has won the affect recognition sub-challenge of the third international Audio/Visual Emotion Recognition Challenge (AVEC2013)1

CiteSeerX

Crossref

UCL Discovery

Brunel University Research Archive

First impressions: A survey on vision-based apparent personality trait analysis

Author: Andújar Gran Carlos Antonio
Baró Solé Xavier
Escalante Balderas Hugo Jair
Escalera Guerrero Sergio
Guyon Isabelle
Güçlü Umut
Güçlütürk Yagmur
Jacques Junior Julio
Pérez Quintana Marc
van Gerven Marcel A. J.
van Lier Rob
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

VBN

Radboud Repository

Musical experience may help the brain respond to second language reading

Author: Li Fali
Tao Qin
Tao Sha
Tervaniemi Mari
Wang Cuicui
Xu Peng
Publication venue
Publication date: 01/11/2020
Field of study

A person's native language background exerts constraints on the brain's automatic responses while learning a second language. It remains unclear, however, whether and how musical experience may help the brain overcome such constraints and meet the requirements of a second language. This study compared native Chinese English learners who were musicians, non-musicians and native English readers on their automatic brain automatic integration of English letter-sounds with an ERP cross-modal audiovisual mismatch negativity paradigm. The results showed that native Chinese-speaking musicians successfully integrated English letters and sounds, but their non-musician peers did not, despite of their comparable English learning experience and proficiency level. However, native Chinese-speaking musicians demonstrated enhanced cross-modal MMN for both synchronized and delayed letter-sound integration, while native English readers only showed enhanced cross-modal MMN for synchronized integration. Moreover, native Chinese-speaking musicians showed stronger theta oscillations when integrating English letters and sounds, suggesting that they had better top-down modulation. In contrast, native English readers showed stronger delta oscillations for synchronized integration, and their cross-modal delta oscillations significantly correlated with English reading performance. These findings suggest that long-term professional musical experience may enhance the top-down modulation, then help the brain efficiently integrating letter-sounds required by the second language. Such benefits from musical experience may be different from those from specific language experience in shaping the brain's automatic responses to reading.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Psychophysiology-based QoE assessment : a survey

Author: Antons J. N.
Arndt S.
Bosse S.
Brunnstrom K.
Cham K.
Darcy D.
Engelke U.
Martini M.G.
Mulliken G.
Ramzan N.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/09/2016
Field of study

We present a survey of psychophysiology-based assessment for quality of experience (QoE) in advanced multimedia technologies. We provide a classification of methods relevant to QoE and describe related psychological processes, experimental design considerations, and signal analysis techniques. We summarize multimodal techniques and discuss several important aspects of psychophysiology-based QoE assessment, including the synergies with psychophysical assessment and the need for standardized experimental design. This survey is not considered to be exhaustive but serves as a guideline for those interested to further explore this emerging field of research

Crossref

Mid Sweden University

Fraunhofer-ePrints

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Kingston University Research Repository

espace@Curtin