Search CORE

312 research outputs found

Application of sound source separation methods to advanced spatial audio systems

Author: Cobos Serrano Máximo
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 03/12/2010
Field of study

This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately, most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to the sparsity of the sources under some signal transformation. This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result, its contributions can be categorized within these two areas. First, two underdetermined SSS methods are proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the features considered by each of them are related to different localization cues that enable to perform separation of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at improving the isolation of the separated sources are proposed. The performance achieved by several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of listening tests, paying special attention to the change observed in the perceived spatial attributes. Although the estimated sources are distorted versions of the original ones, the masking effects involved in their spatial remixing make artifacts less perceptible, which improves the overall assessed quality. Finally, some novel developments related to the application of time-frequency processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci

RiuNet

Real-time Sound Source Separation For Music Applications

Author: Barry Dan
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2019
Field of study

Sound source separation refers to the task of extracting individual sound sources from some number of mixtures of those sound sources. In this thesis, a novel sound source separation algorithm for musical applications is presented. It leverages the fact that the vast majority of commercially recorded music since the 1950s has been mixed down for two channel reproduction, more commonly known as stereo. The algorithm presented in Chapter 3 in this thesis requires no prior knowledge or learning and performs the task of separation based purely on azimuth discrimination within the stereo field. The algorithm exploits the use of the pan pot as a means to achieve image localisation within stereophonic recordings. As such, only an interaural intensity difference exists between left and right channels for a single source. We use gain scaling and phase cancellation techniques to expose frequency dependent nulls across the azimuth domain, from which source separation and resynthesis is carried out. The algorithm is demonstrated to be state of the art in the field of sound source separation but also to be a useful pre-process to other tasks such as music segmentation and surround sound upmixing

Arrow@TUDublin

A computational framework for sound segregation in music signals

Author: Martins Luís Gustavo Pereira Marques
Publication venue
Publication date: 01/01/2008
Field of study

Tese de doutoramento. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200

Repositório Aberto da Universidade do Porto

Spectromorphology and Spatiomorphology: Wave terrain synthesis as a framework for controlling timbre spatialisation in the frequency domain

Author: James Stuart
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2015
Field of study

This research project examines the scope of the technique of timbre spatialisation in the frequency domain that can be realised and controlled in live performance by a single performer. Existing implementations of timbre spatialisation take either a psychoacoustical approach – employing control rate signals for determining azimuth and distance cues – or an adoption of abstract structures for determining frequency-space modulations. This research project aims to overcome the logistical constraints of real-time multi-parameter mapping by developing an overarching multi-signal framework for control: wave terrain synthesis, an interactive control rate and audio rate system. Due to the precise timing requirements of vectorbased FFT processes, spectral control data are generated in frames. Performed in MaxMSP, the project addresses notions of space and immersion using a practice-led methodology contributing to the creation of a number of compositions, performance software and an accompanying exegesis. In addition, the development and evaluation of timbre spatialisation software by the author is accompanied by a categorical definition of the spatial sound shapes generated.https://ro.ecu.edu.au/theses_ebooks/1003/thumbnail.jp

Research Online @ ECU

Prediction-driven computational auditory scene analysis

Author: Ellis Daniel P. W.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1996
Field of study

The sound of a busy environment, such as a city street, gives rise to a perception of numerous distinct events in a human listener--the 'auditory scene analysis' of the acoustic information. Recent advances in the understanding of this process from experimental psychoacoustics have led to several efforts to build a computer model capable of the same function. This work is known as 'computational auditory scene analysis'. The dominant approach to this problem has been as a sequence of modules, the output of one forming the input to the next. Sound is converted to its spectrum, cues are picked out, and representations of the cues are grouped into an abstract description of the initial input. This 'data-driven' approach has some specific weaknesses in comparison to the auditory system: it will interpret a given sound in the same way regardless of its context, and it cannot 'infer' the presence of a sound for which direct evidence is hidden by other components. The 'prediction-driven' approach is presented as an alternative, in which analysis is a process of reconciliation between the observed acoustic features and the predictions of an internal model of the sound-producing entities in the environment. In this way, predicted sound events will form part of the scene interpretation as long as they are consistent with the input sound, regardless of whether direct evidence is found. A blackboard-based implementation of this approach is described which analyzes dense, ambient sound examples into a vocabulary of noise clouds, transient clicks, and a correlogram-based representation of wide-band periodic energy called the weft. The system is assessed through experiments that firstly investigate subjects' perception of distinct events in ambient sound examples, and secondly collect quality judgments for sound events resynthesized by the system. Although rated as far from perfect, there was good agreement between the events detected by the model and by the listeners. In addition, the experimental procedure does not depend on special aspects of the algorithm (other than the generation of resyntheses), and is applicable to the assessment and comparison of other models of human auditory organization

DSpace@MIT

Columbia University Academic Commons

Sound Source Separation

Author: Evangelista G
Marchand S
Plumbley MD
Vincent E
Publication venue: 'Wiley'
Publication date: 01/01/2011
Field of study

This is the author's accepted pre-print of the article, first published as G. Evangelista, S. Marchand, M. D. Plumbley and E. Vincent. Sound source separation. In U. Zölzer (ed.), DAFX: Digital Audio Effects, 2nd edition, Chapter 14, pp. 551-588. John Wiley & Sons, March 2011. ISBN 9781119991298. DOI: 10.1002/9781119991298.ch14file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.2

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Queen Mary Research Online

Surrey Research Insight

HAL-Rennes 1

Statistical models for natural sounds

Author: Turner R.E.
Publication venue: UCL (University College London)
Publication date: 01/01/2010
Field of study

It is important to understand the rich structure of natural sounds in order to solve important tasks, like automatic speech recognition, and to understand auditory processing in the brain. This thesis takes a step in this direction by characterising the statistics of simple natural sounds. We focus on the statistics because perception often appears to depend on them, rather than on the raw waveform. For example the perception of auditory textures, like running water, wind, fire and rain, depends on summary-statistics, like the rate of falling rain droplets, rather than on the exact details of the physical source. In order to analyse the statistics of sounds accurately it is necessary to improve a number of traditional signal processing methods, including those for amplitude demodulation, time-frequency analysis, and sub-band demodulation. These estimation tasks are ill-posed and therefore it is natural to treat them as Bayesian inference problems. The new probabilistic versions of these methods have several advantages. For example, they perform more accurately on natural signals and are more robust to noise, they can also fill-in missing sections of data, and provide error-bars. Furthermore, free-parameters can be learned from the signal. Using these new algorithms we demonstrate that the energy, sparsity, modulation depth and modulation time-scale in each sub-band of a signal are critical statistics, together with the dependencies between the sub-band modulators. In order to validate this claim, a model containing co-modulated coloured noise carriers is shown to be capable of generating a range of realistic sounding auditory textures. Finally, we explored the connection between the statistics of natural sounds and perception. We demonstrate that inference in the model for auditory textures qualitatively replicates the primitive grouping rules that listeners use to understand simple acoustic scenes. This suggests that the auditory system is optimised for the statistics of natural sounds

CiteSeerX

UCL Discovery

Expression of gender in the human voice: investigating the “gender code”

Author: Cartei Valentina
Publication venue
Publication date: 01/01/2014
Field of study

We can easily and reliably identify the gender of an unfamiliar interlocutor over the telephone. This is because our voice is “sexually dimorphic”: men typically speak with a lower fundamental frequency (F0 - lower pitch) and lower vocal tract resonances (ΔF – “deeper” timbre) than women. While the biological bases of these differences are well understood, and mostly down to size differences between men and women, very little is known about the extent to which we can play with these differences to accentuate or de-emphasise our perceived gender, masculinity and femininity in a range of social roles and contexts. The general aim of this thesis is to investigate the behavioural basis of gender expression in the human voice in both children and adults. More specifically, I hypothesise that, on top of the biologically determined sexual dimorphism, humans use a “gender code” consisting of vocal gestures (global F0 and ΔF adjustments) aimed at altering the gender attributes conveyed by their voice. In order to test this hypothesis, I first explore how acoustic variation of sexually dimorphic acoustic cues (F0 and ΔF) relates to physiological differences in pre-pubertal speakers (vocal tract length) and adult speakers (body height and salivary testosterone levels), and show that voice gender variation cannot be solely explained by static, biologically determined differences in vocal apparatus and body size of speakers. Subsequently, I show that both children and adult speakers can spontaneously modify their voice gender by lowering (raising) F0 and ΔF to masculinise (feminise) their voice, a key ability for the hypothesised control of voice gender. Finally, I investigate the interplay between voice gender expression and social context in relation to cultural stereotypes. I report that listeners spontaneously integrate stereotypical information in the auditory and visual domain to make stereotypical judgments about children’s gender and that adult actors manipulate their gender expression in line with stereotypical gendered notions of homosexuality. Overall, this corpus of data supports the existence of a “gender code” in human nonverbal vocal communication. This “gender code” provides not only a methodological framework with which to empirically investigate variation in voice gender and its role in expressing gender identity, but also a unifying theoretical structure to understand the origins of such variation from both evolutionary and social perspectives

Sussex Research Online

Recommended from our members

A new user interface for musical timbre design

Author: Seago Allan
Publication venue
Publication date: 01/01/2009
Field of study

This thesis characterises and addresses problems and issues associated with the design of intuitive user interfaces for timbral control. The usability of a range of synthesis methods and representative implementations of these methods is assessed, and three interface architectures - fixed architecture, architecture specification and direct specification - are identified. The characteristics of each of these architectures, as well as problems of usability inherent to each of them are discussed; it is argued that none of them provide intuitive tools for the manipulation and control of timbre. The study examines the nature of timbre and the notion of timbre space; different kinds of timbre space are considered and criteria are proposed for the selection of suitable timbre spaces as vehicles for synthesis. A number of listening tests, designed to demonstrate the feasibility of subsequent work, were devised and carried out; the results of these tests provide evidence that, where Euclidean distances between sounds located in a given timbre space are reflected in perceptual distances, the ability of subjects to detect relative distances in different parts of the space varies with the perceptual granularity of the space. Three contrasting timbre spaces conforming to the proposed criteria for use in synthesis are constructed; the purpose of these spaces is to provide an environment for a novel user interaction approach for timbral design which incorporates a search strategy based on weighted centroid localization. Two prototypes which exemplify the proposed approach in alternative ways are designed, implemented and tested with potential users in order to validate the approach; a third contrasting prototype which represents a simple contrasting alternative is tested for purposes of comparison. The results of these tests are evaluated and discussed, and areas of further work identified

Open Research Online (The Open University)

OpenGrey Repository

Measurement-based auralization methodology for the assessment of noise mitigation measures

Author: Algazi
Algazi
Aylor
Blauert
Chen
Cho
Defrance
Dick Botteldooren
Evans
Forssén
Gardner
Golub
Grantham
Jagla
Jonasson
Kistler
Langendijk
Na
Perrott
Pierce
Pieter Thomas
Pollow
Ribeiro
Salomons
Salomons
Timothy Van Renterghem
Van Renterghem
Van Renterghem
Weigang Wei
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Crossref

Ghent University Academic Bibliography