Search CORE

9 research outputs found

Visual Speech Enhancement

Author: Gabbay Aviv
Peleg Shmuel
Shamir Asaph
Publication venue
Publication date: 13/06/2018
Field of study

When video is shot in noisy environment, the voice of a speaker seen in the video can be enhanced using the visible mouth movements, reducing background noise. While most existing methods use audio-only inputs, improved performance is obtained with our visual speech enhancement, based on an audio-visual neural network. We include in the training data videos to which we added the voice of the target speaker as background noise. Since the audio input is not sufficient to separate the voice of a speaker from his own voice, the trained model better exploits the visual input and generalizes well to different noise types. The proposed model outperforms prior audio visual methods on two public lipreading datasets. It is also the first to be demonstrated on a dataset not designed for lipreading, such as the weekly addresses of Barack Obama.Comment: Accepted to Interspeech 2018. Supplementary video: https://www.youtube.com/watch?v=nyYarDGpcY

arXiv.org e-Print Archive

Crossref

Seeing Through Noise: Visually Driven Speaker Separation and Enhancement

Author: Ephrat Ariel
Gabbay Aviv
Halperin Tavi
Peleg Shmuel
Publication venue
Publication date: 09/02/2018
Field of study

Isolating the voice of a specific person while filtering out other voices or background noises is challenging when video is shot in noisy environments. We propose audio-visual methods to isolate the voice of a single speaker and eliminate unrelated sounds. First, face motions captured in the video are used to estimate the speaker's voice, by passing the silent video frames through a video-to-speech neural network-based model. Then the speech predictions are applied as a filter on the noisy input audio. This approach avoids using mixtures of sounds in the learning process, as the number of such possible mixtures is huge, and would inevitably bias the trained model. We evaluate our method on two audio-visual datasets, GRID and TCD-TIMIT, and show that our method attains significant SDR and PESQ improvements over the raw video-to-speech predictions, and a well-known audio-only method.Comment: Supplementary video: https://www.youtube.com/watch?v=qmsyj7vAzo

arXiv.org e-Print Archive

Crossref

Distribution of alcohol and sorbitol dehydrogenases. Assessment of mRNA species in mammalian tissues

Crossref

Statement in Support of Revising the Uniform Determination of Death Act and in Opposition to a Proposed Revision

Author: Abe
Allen
American Academy of Neurology - Quality Standards Subcommittee [American Academy of Neurology]
Andrews
Aramesh
Aranake
Asai
Ashwal
Ashwal
Aviv
Bagheri
Beecher
Berkowitz
Bernat
Bhagat
Bolton
Brilli
Campbell
Celizic
Choong
Coimbra
Collins
Cruse
Dalle Ave
Datar
de Mattei
de Mattei
Dimancescu
Dodd-Sullivan
Edlow
Feldman
Gabbay
Giacino
Goldfine
Green
Greer
Gómez-Lobo
Hansen
Haun
Imanaka
Ingvar
Jackson
Joffe
Joffe
Joffe
Joffe
Joffe
Joffe
Joffe
Johnson
Karakatsanis
Kato
Kato
Kohrman
La Puma
Lang
Lang
Latorre
Laureys
Leemputte
Lewis
Lewis
Lewis
Lock
Machado
Mashour
McGee
Miller
Miller
Miller
Miller
Morales
Morioka
Muramoto
Nair-Collins
Nair-Collins
Nair-Collins
Nakagawa
Newberg
Nguyen
Okamoto
Olick
Orliaguet
Paolin
Paquette
Paret
Parnia
Pasternak
Pellegrino
President’s Commission for the Study of Ethical Problems in Medicine and Biomedical and Behavioral Research [President’s Commission]
President’s Council on Bioethics
Rady
Repertinger
Ringel
Robbins
Roberts
Ross
Roth
Russell
Salih
Sanchez
Sass
Schiff
Shah
Shah
Shah
Shewmon
Swedish Committee on Defining Death
Sánchez Sorondo
Takeuchi
Taylor
Thompson
Tibballs
Tosch
Truog
Truog
Truog
Truog
Truog
Vardis
Veatch
Veatch
Verheijde
Watanabe
Webb
Wijdicks
Wijdicks
Wijdicks
Yanke
Yanke
Young
Youngner
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref