Search CORE

75,655 research outputs found

Visually Guided Sound Source Separation using Cascaded Opponent Filter Network

Author: Rahtu Esa
Zhu Lingyu
Publication venue
Publication date: 14/07/2020
Field of study

The objective of this paper is to recover the original component signals from a mixture audio with the aid of visual cues of the sound sources. Such task is usually referred as visually guided sound source separation. The proposed Cascaded Opponent Filter (COF) framework consists of multiple stages, which recursively refine the source separation. A key element in COF is a novel opponent filter module that identifies and relocates residual components between sources. The system is guided by the appearance and motion of the source, and, for this purpose, we study different representations based on video frames, optical flows, dynamic images, and their combinations. Finally, we propose a Sound Source Location Masking (SSLM) technique, which, together with COF, produces a pixel level mask of the source location. The entire system is trained end-to-end using a large set of unlabelled videos. We compare COF with recent baselines and obtain the state-of-the-art performance in three challenging datasets (MUSIC, A-MUSIC, and A-NATURAL). Project page: https://ly-zhu.github.io/cof-net.Comment: main paper 14 pages, ref 3 pages, and supp 7 pages. Revised argument in section 3 and

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

Visually Guided Sound Source Separation Using Cascaded Opponent Filter Network

Author: Rahtu Esa
Zhu Lingyu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/02/2021
Field of study

The objective of this paper is to recover the original component signals from a mixture audio with the aid of visual cues of the sound sources. Such task is usually referred as visually guided sound source separation. The proposed Cascaded Opponent Filter (COF) framework consists of multiple stages, which recursively refine the source separation. A key element in COF is a novel opponent filter module that identifies and relocates residual components between sources. The system is guided by the appearance and motion of the source, and, for this purpose, we study different representations based on video frames, optical flows, dynamic images, and their combinations. Finally, we propose a Sound Source Location Masking (SSLM) technique, which, together with COF, produces a pixel level mask of the source location. The entire system is trained in an end-to-end manner using a large set of unlabelled videos. We compare COF with recent baselines and obtain the state-of-the-art performance in three challenging datasets (MUSIC, A-MUSIC, and A-NATURAL).acceptedVersionPeer reviewe

Trepo - Institutional Repository of Tampere University

Multi-modal Blind Source Separation with Microphones and Blinkies

Author: Ono Nobutaka
Scheibler Robin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/04/2019
Field of study

We propose a blind source separation algorithm that jointly exploits measurements by a conventional microphone array and an ad hoc array of low-rate sound power sensors called blinkies. While providing less information than microphones, blinkies circumvent some difficulties of microphone arrays in terms of manufacturing, synchronization, and deployment. The algorithm is derived from a joint probabilistic model of the microphone and sound power measurements. We assume the separated sources to follow a time-varying spherical Gaussian distribution, and the non-negative power measurement space-time matrix to have a low-rank structure. We show that alternating updates similar to those of independent vector analysis and Itakura-Saito non-negative matrix factorization decrease the negative log-likelihood of the joint distribution. The proposed algorithm is validated via numerical experiments. Its median separation performance is found to be up to 8 dB more than that of independent vector analysis, with significantly reduced variability.Comment: Accepted at IEEE ICASSP 2019, Brighton, UK. 5 pages. 3 figure

arXiv.org e-Print Archive

Crossref

Making the Connection II: Designing the Language Lab to Meet Educational Objectives

Author: Trometer Ruth
Publication venue: 'The University of Kansas'
Publication date: 15/01/1994
Field of study

The University of Kansas: Journals@KU

Biodiversity Informatics

Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation

Author: Deng Shijian
Su Yiyang
Tian Yapeng
Vosoughi Ali
Xu Chenliang
Publication venue
Publication date: 18/10/2023
Field of study

The audio-visual sound separation field assumes visible sources in videos, but this excludes invisible sounds beyond the camera's view. Current methods struggle with such sounds lacking visible cues. This paper introduces a novel "Audio-Visual Scene-Aware Separation" (AVSA-Sep) framework. It includes a semantic parser for visible and invisible sounds and a separator for scene-informed separation. AVSA-Sep successfully separates both sound types, with joint training and cross-modal alignment enhancing effectiveness.Comment: Accepted at ICCV 2023 - AV4D, 4 figures, 3 table

arXiv.org e-Print Archive

A Semantic Web Annotation Tool for a Web-Based Audio Sequencer

Author: Akkermans V.
Restagno L.
Rizzo Giuseppe
Servetti Antonio
Publication venue: Springer
Publication date: 01/01/2011
Field of study

Music and sound have a rich semantic structure which is so clear to the composer and the listener, but that remains mostly hidden to computing machinery. Nevertheless, in recent years, the introduction of software tools for music production have enabled new opportunities for migrating this knowledge from humans to machines. A new generation of these tools may exploit sound samples and semantic information coupling for the creation not only of a musical, but also of a "semantic" composition. In this paper we describe an ontology driven content annotation framework for a web-based audio editing tool. In a supervised approach, during the editing process, the graphical web interface allows the user to annotate any part of the composition with concepts from publicly available ontologies. As a test case, we developed a collaborative web-based audio sequencer that provides users with the functionality to remix the audio samples from the Freesound website and subsequently annotate them. The annotation tool can load any ontology and thus gives users the opportunity to augment the work with annotations on the structure of the composition, the musical materials, and the creator's reasoning and intentions. We believe this approach will provide several novel ways to make not only the final audio product, but also the creative process, first class citizens of the Semantic We

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino