Search CORE

44,904 research outputs found

Multimodal music information processing and retrieval: survey and future challenges

Author: Avanzini Federico
Ntalampiras Stavros
Simonetta Federico
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/02/2019
Field of study

Towards improving the performance in various music information processing tasks, recent studies exploit different modalities able to capture diverse aspects of music. Such modalities include audio recordings, symbolic music scores, mid-level representations, motion, and gestural data, video recordings, editorial or cultural tags, lyrics and album cover arts. This paper critically reviews the various approaches adopted in Music Information Processing and Retrieval and highlights how multimodal algorithms can help Music Computing applications. First, we categorize the related literature based on the application they address. Subsequently, we analyze existing information fusion approaches, and we conclude with the set of challenges that Music Information Retrieval and Sound and Music Computing research communities should focus in the next years

arXiv.org e-Print Archive

Crossref

Speech Separation Using Partially Asynchronous Microphone Arrays Without Resampling

Author: Corey Ryan M.
Singer Andrew C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/07/2018
Field of study

We consider the problem of separating speech sources captured by multiple spatially separated devices, each of which has multiple microphones and samples its signals at a slightly different rate. Most asynchronous array processing methods rely on sample rate offset estimation and resampling, but these offsets can be difficult to estimate if the sources or microphones are moving. We propose a source separation method that does not require offset estimation or signal resampling. Instead, we divide the distributed array into several synchronous subarrays. All arrays are used jointly to estimate the time-varying signal statistics, and those statistics are used to design separate time-varying spatial filters in each array. We demonstrate the method for speech mixtures recorded on both stationary and moving microphone arrays.Comment: To appear at the International Workshop on Acoustic Signal Enhancement (IWAENC 2018

arXiv.org e-Print Archive

Crossref

The role of perceived source location in auditory stream segregation: separation affects sound organization, common fate does not

Author: Andreou Andreas G.
Bendixen Alexandra
Bőhm Tamás M.
Cassidy Andrew
Denham Susan L.
Garreau Guillame
Georgiou Julius
Pouliquen Philippe
Shestopalova Lidia
Winkler István
Publication venue: 'Akademiai Kiado Zrt.'
Publication date: 19/06/2013
Field of study

The human auditory system is capable of grouping sounds originating from different sound sources into coherent auditory streams, a process termed auditory stream segregation. Several cues can inﬂuence auditory stream segregation, but the full set of cues and the way in which they are integrated is still unknown. In the current study, we tested whether auditory motion can serve as a cue for segregating sequences of tones. Our hypothesis was that, following the principle of common fate, sounds emitted by sources moving together in space along similar trajectories will be more likely to be grouped into a single auditory stream, while sounds emitted by independently moving sources will more often be heard as two streams. Stimuli were derived from sound recordings in which the sound source motion was induced by walking humans. Although the results showed a clear effect of spatial separation, auditory motion had a negligible inﬂuence on stream segregation. Hence, auditory motion may not be used as a primitive cue in auditory stream segregation

Repository of the Academy's Library

A mechatronic approach to supernormal auditory localisation

Author: Harrison C.S.
Mair G. M.
Publication venue: 'Elsevier BV'
Publication date: 01/11/2007
Field of study

Remote audio perception is a fundamental requirement for telepresence and teleoperation in applications that range from work in hostile environments to security and entertainment. The following paper presents the use of a mechatronic system to test the efficacy of audio for telepresence. It describes work to determine whether the use of supernormal inter-aural distance is a valid means of approaching an enhanced method of hearing for telepresence. The particular audio variable investigated is the azimuth angle of error and the construction of a dedicated mechatronic test rig is reported and the results obtained. The paper concludes by observing that the combination of the mechatronic system and supernormal audition does enhance the ability to localise sound sources and that further work in this area is justified

University of Strathclyde Institutional Repository

Singing voice correction using canonical time warping

Author: Chen Ming-Tso
Chi Tai-Shih
Luo Yin-Jyun
Su Li
Publication venue
Publication date: 23/11/2017
Field of study

Expressive singing voice correction is an appealing but challenging problem. A robust time-warping algorithm which synchronizes two singing recordings can provide a promising solution. We thereby propose to address the problem by canonical time warping (CTW) which aligns amateur singing recordings to professional ones. A new pitch contour is generated given the alignment information, and a pitch-corrected singing is synthesized back through the vocoder. The objective evaluation shows that CTW is robust against pitch-shifting and time-stretching effects, and the subjective test demonstrates that CTW prevails the other methods including DTW and the commercial auto-tuning software. Finally, we demonstrate the applicability of the proposed method in a practical, real-world scenario

arXiv.org e-Print Archive

Crossref

Acoustic Space Learning for Sound Source Separation and Localization on Binaural Manifolds

Author: Deleforge Antoine
Forbes Florence
Horaud Radu
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 20/03/2014
Field of study

In this paper we address the problems of modeling the acoustic space generated by a full-spectrum sound source and of using the learned model for the localization and separation of multiple sources that simultaneously emit sparse-spectrum sounds. We lay theoretical and methodological grounds in order to introduce the binaural manifold paradigm. We perform an in-depth study of the latent low-dimensional structure of the high-dimensional interaural spectral data, based on a corpus recorded with a human-like audiomotor robot head. A non-linear dimensionality reduction technique is used to show that these data lie on a two-dimensional (2D) smooth manifold parameterized by the motor states of the listener, or equivalently, the sound source directions. We propose a probabilistic piecewise affine mapping model (PPAM) specifically designed to deal with high-dimensional data exhibiting an intrinsic piecewise linear structure. We derive a closed-form expectation-maximization (EM) procedure for estimating the model parameters, followed by Bayes inversion for obtaining the full posterior density function of a sound source direction. We extend this solution to deal with missing data and redundancy in real world spectrograms, and hence for 2D localization of natural sound sources such as speech. We further generalize the model to the challenging case of multiple sound sources and we propose a variational EM framework. The associated algorithm, referred to as variational EM for source separation and localization (VESSL) yields a Bayesian estimation of the 2D locations and time-frequency masks of all the sources. Comparisons of the proposed approach with several existing methods reveal that the combination of acoustic-space learning with Bayesian inference enables our method to outperform state-of-the-art methods.Comment: 19 pages, 9 figures, 3 table

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Fusion of Multimodal Information in Music Content Analysis

Author: Essid Slim
Publication venue: Dagstuhl Follow-Ups. Multimodal Music Processing
Publication date: 01/01/2012
Field of study

Music is often processed through its acoustic realization. This is restrictive in the sense that music is clearly a highly multimodal concept where various types of heterogeneous information can be associated to a given piece of music (a musical score, musicians\u27 gestures, lyrics, user-generated metadata, etc.). This has recently led researchers to apprehend music through its various facets, giving rise to "multimodal music analysis" studies. This article gives a synthetic overview of methods that have been successfully employed in multimodal signal analysis. In particular, their use in music content processing is discussed in more details through five case studies that highlight different multimodal integration techniques. The case studies include an example of cross-modal correlation for music video analysis, an audiovisual drum transcription system, a description of the concept of informed source separation, a discussion of multimodal dance-scene analysis, and an example of user-interactive music analysis. In the light of these case studies, some perspectives of multimodality in music processing are finally suggested

Dagstuhl Research Online Publication Server

Mapping brain function during naturalistic viewing using high-density diffuse optical tomography

Author: Bergonzi Karla M.
Burns-Yocum Tracy M.
Culver Joseph P.
Eggebrecht Adam T.
Fishell Andrew K.
Publication venue: Digital Commons@Becker
Publication date: 01/01/2019
Field of study

Digital Commons@Becker