75,655 research outputs found
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
The objective of this paper is to recover the original component signals from
a mixture audio with the aid of visual cues of the sound sources. Such task is
usually referred as visually guided sound source separation. The proposed
Cascaded Opponent Filter (COF) framework consists of multiple stages, which
recursively refine the source separation. A key element in COF is a novel
opponent filter module that identifies and relocates residual components
between sources. The system is guided by the appearance and motion of the
source, and, for this purpose, we study different representations based on
video frames, optical flows, dynamic images, and their combinations. Finally,
we propose a Sound Source Location Masking (SSLM) technique, which, together
with COF, produces a pixel level mask of the source location. The entire system
is trained end-to-end using a large set of unlabelled videos. We compare COF
with recent baselines and obtain the state-of-the-art performance in three
challenging datasets (MUSIC, A-MUSIC, and A-NATURAL). Project page:
https://ly-zhu.github.io/cof-net.Comment: main paper 14 pages, ref 3 pages, and supp 7 pages. Revised argument
in section 3 and
Visually Guided Sound Source Separation Using Cascaded Opponent Filter Network
The objective of this paper is to recover the original component signals from a mixture audio with the aid of visual cues of the sound sources. Such task is usually referred as visually guided sound source separation. The proposed Cascaded Opponent Filter (COF) framework consists of multiple stages, which recursively refine the source separation. A key element in COF is a novel opponent filter module that identifies and relocates residual components between sources. The system is guided by the appearance and motion of the source, and, for this purpose, we study different representations based on video frames, optical flows, dynamic images, and their combinations. Finally, we propose a Sound Source Location Masking (SSLM) technique, which, together with COF, produces a pixel level mask of the source location. The entire system is trained in an end-to-end manner using a large set of unlabelled videos. We compare COF with recent baselines and obtain the state-of-the-art performance in three challenging datasets (MUSIC, A-MUSIC, and A-NATURAL).acceptedVersionPeer reviewe
Multi-modal Blind Source Separation with Microphones and Blinkies
We propose a blind source separation algorithm that jointly exploits
measurements by a conventional microphone array and an ad hoc array of low-rate
sound power sensors called blinkies. While providing less information than
microphones, blinkies circumvent some difficulties of microphone arrays in
terms of manufacturing, synchronization, and deployment. The algorithm is
derived from a joint probabilistic model of the microphone and sound power
measurements. We assume the separated sources to follow a time-varying
spherical Gaussian distribution, and the non-negative power measurement
space-time matrix to have a low-rank structure. We show that alternating
updates similar to those of independent vector analysis and Itakura-Saito
non-negative matrix factorization decrease the negative log-likelihood of the
joint distribution. The proposed algorithm is validated via numerical
experiments. Its median separation performance is found to be up to 8 dB more
than that of independent vector analysis, with significantly reduced
variability.Comment: Accepted at IEEE ICASSP 2019, Brighton, UK. 5 pages. 3 figure
Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation
The audio-visual sound separation field assumes visible sources in videos,
but this excludes invisible sounds beyond the camera's view. Current methods
struggle with such sounds lacking visible cues. This paper introduces a novel
"Audio-Visual Scene-Aware Separation" (AVSA-Sep) framework. It includes a
semantic parser for visible and invisible sounds and a separator for
scene-informed separation. AVSA-Sep successfully separates both sound types,
with joint training and cross-modal alignment enhancing effectiveness.Comment: Accepted at ICCV 2023 - AV4D, 4 figures, 3 table
A Semantic Web Annotation Tool for a Web-Based Audio Sequencer
Music and sound have a rich semantic structure which is so clear to the composer and the listener, but that remains mostly hidden to computing machinery. Nevertheless, in recent years, the introduction of software tools for music production have enabled new opportunities for migrating this knowledge from humans to machines. A new generation of these tools may exploit sound samples and semantic information coupling for the creation not only of a musical, but also of a "semantic" composition. In this paper we describe an ontology driven content annotation framework for a web-based audio editing tool. In a supervised approach, during the editing process, the graphical web interface allows the user to annotate any part of the composition with concepts from publicly available ontologies. As a test case, we developed a collaborative web-based audio sequencer that provides users with the functionality to remix the audio samples from the Freesound website and subsequently annotate them. The annotation tool can load any ontology and thus gives users the opportunity to augment the work with annotations on the structure of the composition, the musical materials, and the creator's reasoning and intentions. We believe this approach will provide several novel ways to make not only the final audio product, but also the creative process, first class citizens of the Semantic We
- ā¦