Search CORE

45 research outputs found

Environmental assessment of incinerator residue utilisation

Author: Baumann
Birgisdottir
Carpenter
E. Kärrman
Ekvall
Finnveden
Finnveden
Hung
J.P. Gustafsson
Mroueh
Olsson
Rendek
Ribbing
Roth
S. Toller
SETAC-Europe
Tillman
Tillman
Weiss
Y. Magnusson
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Sobolev Descent

Author: Mroueh Y.
Raj A.
Sercu T.
Publication venue
Publication date: 01/01/2019
Field of study

MPG.PuRe

Sobolev GAN

Author: Cheng Y.
Li C.
Mroueh Y.
Raj A.
Sercu T.
Publication venue
Publication date: 01/01/2018
Field of study

MPG.PuRe

Exploring ROI size in deep learning based lipreading

Author: Koumparoulis A. Potamianos G., Mroueh Y., Rennie S.J.
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2017
Field of study

Automatic speechreading systems have increasingly exploited deep learning advances, resulting in dramatic gains over traditional methods. State-of-the-art systems typically employ convolutional neural networks (CNNs), operating on a video region-of-interest (ROI) that contains the speaker’s mouth. However, little or no attention has been paid to the effects of ROI physical coverage and resolution on the resulting recognition performance within the deep learning framework. In this paper, we investigate such choices for a visual-only speech recognition system based on CNNs and long short-term memory models that we present in detail. Further, we employ a separate CNN to perform face detection and facial landmark localization, driving the ROI extraction process. We conduct experiments on a multi-speaker corpus of connected digits utterances, recorded in ideal visual conditions. Our results show that ROI design choices affect automatic speechreading performance significantly: the best visual-only word error rate (5.07%) corresponds to a ROI that contains a large part of the lower face, in addition to just the mouth, and at a relatively high resolution. Noticeably, the result represents a 27% relative error reduction compared to employing the entire lower face as the ROI. © 2017 14th International Conference on Auditory-Visual Speech Processing, AVSP 2017. All rights reserved

University of Thessaly Institutional Repository