Search CORE

321 research outputs found

Modification of multichannel audio for non-standard loudspeaker configurations

Author: Porkka Oskari
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2009
Field of study

Tämä diplomityö käsittelee monikanavaäänen analyysi- ja hajotelmamenetelmiä. Työn tavoitteena on pystyä muokkaamaan monikanavaäänityksiä uusille kaiutinkokoonpanoille siten, että äänen tilaominaisuudet säilyvät. Teoriataustana työssä ovat ihmiskuulon tilahavainnointiominaisuudet, äänisignaaleihin perustuvat samankaltaisuusmitat sekä suunta-arviot ja informaatioteknologian lähde-erottelumenetelmät. Työ käy läpi kirjallisuudesta löytyviä monikanavaäänen muokkausmenetelmiä. Diplomityön kokeellisen osuuden aloittaa DVD-levyjen analyysi, jolla pyrittiin saamaan tietoa levyjen äänituotannossa käytettävistä menetelmistä myöhempää äänimuunnostekniikoiden kehittämistä varten. Koe osoitti, että kolmen etukanavasignaalin ja kahden takakanavasignaalin välillä on vain harvoin yhteisiä äänikomponentteja. Kompaktien kaiutinkokoonpanojen ominaisuuksia tutkittiin kahdessa kuuntelukokeessa. Ensimmäinen koe tarkasteli eroja eri kolmikanavaisten kaiutinasettelujen välillä. Tavoitteena näissä toistosysteemeissä oli hyödyntää ääniaaltojen heijastuksia huoneen seinistä. Jälkimmäinen kuuntelukoe sovelsi kolmea tunnettua äänimuunnosmenetelmää kolmikanavaiseen kompaktiin kaiutinkokoonpanoon, jonka toistosta saatavaa tilahavaintoa pyrittiin laajentamaan. Kahden metodeista havaittiin parantavan tutkittuja tilaominaisuuksia.In this thesis, analysis and decomposition methods for multichannel audio are studied. The objective of the work is to transform multichannel recordings to new reproduction systems so that the spatial properties of the sound are preserved. Spatial hearing of the human auditory system, signal-based similarity and localization measures, and information-technological source separation methods are described as background theory. Then, different multichannel audio transform methods are reviewed. The experimental part of the work starts with an analysis of DVD recordings to gain helpful information about the production methods of such recordings for further development of audio transform methods. The test reveals that the three frontal channels do not usually share common sound sources with the two rear channels. The properties of compact loudspeaker systems are investigated in two listening tests. The first test studies the differences between three-channel loudspeaker layouts, which exploit the reflections of sound waves from room boundaries. The latter one of the tests applies three transform methods known from the literature to widen the spatial dimensions of a three-channel compact loudspeaker system in comparison to a reference stereo system. These methods are a stereo signal transform method based on signal powers and interchannel cross-correlations, a primaryambient signal decomposition based on principal component analysis (PCA), and directional audio coding (DirAC). The methods were ranked in this descending order of preference by the test subjects

Aaltodoc Publication Archive

Application of sound source separation methods to advanced spatial audio systems

Author: Cobos Serrano Máximo
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 03/12/2010
Field of study

This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately, most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to the sparsity of the sources under some signal transformation. This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result, its contributions can be categorized within these two areas. First, two underdetermined SSS methods are proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the features considered by each of them are related to different localization cues that enable to perform separation of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at improving the isolation of the separated sources are proposed. The performance achieved by several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of listening tests, paying special attention to the change observed in the perceived spatial attributes. Although the estimated sources are distorted versions of the original ones, the masking effects involved in their spatial remixing make artifacts less perceptible, which improves the overall assessed quality. Finally, some novel developments related to the application of time-frequency processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci

RiuNet

Binaural to multichannel audio upmix

Author: Jakka Julia
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2005
Field of study

Audion tallennus- ja toistolaitteiden valikoiman kasvaessa on tärkeää, että kaikenlaisilla välineillä tallennettua sekä syntetisoitua audiota voidaan muokata toistettavaksi kaikenlaisilla äänentoistojärjestelmillä. Tässä diplomityössä esitellään menetelmä, jolla binauraalinen audiosignaali voidaan muokata toistettavaksi monikanavaisella kaiutinjärjestelmällä säilyttäen signaalin suuntainformaation. Tällaiselle muokkausmenetelmälle on tarvetta esimerkiksi etäläsnäolosovelluksissa keinona toistaa binauraalinen äänitys monikanavaisella kaiutinjärjestelmällä. Menetelmässä binauraalisesta signaalista estimoidaan ensin äänilähteiden suunnat käyttäen hyväksi korvien välistä aikaeroa. Signaali muokataan monofoniseksi, ja tulosuunnan estimoinnin antama tieto tallennetaan sivuinformaationa. Monofoninen signaali muokataan sen jälkeen halutulle monikanavaiselle kaiutinjärjestelmälle panoroimalla se tallennetun suuntainformaation mukaisesti. Käytännössä menetelmä siis muuntaa korvien välisen aikaeron kanavien väliseksi voimakkuuseroksi. Menetelmässä käytetään ja yhdistellään olemassaolevia tekniikoita tulosuunnan estimoinnille sekä panoroinnille. Menetelmää testattiin vapaamuotoisessa kuuntelukokeessa, sekä lisäämällä ääninäytteisiin binauraalista taustamelua ennen muokkausta ja arvioimalla sen vaikutusta muokatun signaalin laatuun. Menetelmän todettiin toimivan kelvollisesti sekä suuntainformaation säilymisen, että äänen laadun suhteen, ottaen huomioon, että sen kehitystyö on vasta aluillaan.The increasing diversity of popular audio recording and playback systems gives reasons to ensure that recordings made with any equipment, as well as any synthesised audio, can be reproduced for playback with all types of devices. In this thesis, a method is introduced for upmixing binaural audio into a multichannel format while preserving the correct spatial sensation. This type of upmix is required when a binaural recording is desired to be spatially reproduced for playback over a multichannel loudspeaker setup, a scenario typical for e.g. the prospective telepresence appliances. In the upmix method the sound source directions are estimated from the binaural signal by using the interaural time difference. The signal is then downmixed into a monophonic format and the data given by the azimuth estimation is stored as side-information. The monophonic signal is upmixed for an arbitrary multichannel loudspeaker setup by panning it on the basis of the spatial side-information. The method, thus effectively converting interaural time differences into interchannel level differences, employs and conjoins existing techniques for azimuth estimation and discrete panning. The method was tested in an informal listening test, as well as by adding spatial background noise into the samples before upmixing and evaluating its influence on the sound quality of the upmixed samples. The method was found to perform acceptably well in maintaining both the spatiality as well as the sound quality, regarding that much development work remains to be done

Aaltodoc Publication Archive

Natural sound rendering for headphones: . . .

Author: Kaushik Sunder et al.
Publication venue
Publication date: 01/01/2015
Field of study

With the strong growth of assistive and personal listening devices, natural sound rendering over headphones is becoming a necessity for prolonged listening in multimedia and virtual reality applications. The aim of natural sound rendering is to recreate the sound scenes with the spatial and timbral quality as natural as possible, so as to achieve a truly immersive listening experience. However, rendering natural sound over headphones encounters many challenges. This tutorial paper presents signal processing techniques to tackle these challenges to assist human listening

CiteSeerX

Spatial Audio Production for Immersive Media Experiences: Perspectives on practice-led approaches to designing immersive audio content

Author: Baume Chris
Murphy Damian Thomas
Pike Chris
Turner Daniel
Publication venue: 'Intellect'
Publication date: 01/10/2022
Field of study

Sound design with the goal of immersion is not new, however, sound design for Immersive Media Experiences (IMEs) utilizing spatial audio can still be considered a relatively new area of practice with less well-defined methods requiring a new and still emerging set of skills and tools. There is, at present, a lack of formal literature around the challenges introduced by this relatively new content form and the tools used to create it, and how these may differ from audio production for traditional media. This article, through the use of semi-structured interviews and an online questionnaire, looks to explore what audio practitioners view as defining features of IMEs, the challenges in creating audio content for IMEs, and how current practices for traditional stereo productions are being adapted for use within 360 interactive soundfields. It also highlights potential direction for future research and technological development and the importance of practitioner involvement in research and development in ensuring future tools and technologies satisfy the current needs

White Rose Research Online

Real-time Sound Source Separation For Music Applications

Author: Barry Dan
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2019
Field of study

Sound source separation refers to the task of extracting individual sound sources from some number of mixtures of those sound sources. In this thesis, a novel sound source separation algorithm for musical applications is presented. It leverages the fact that the vast majority of commercially recorded music since the 1950s has been mixed down for two channel reproduction, more commonly known as stereo. The algorithm presented in Chapter 3 in this thesis requires no prior knowledge or learning and performs the task of separation based purely on azimuth discrimination within the stereo field. The algorithm exploits the use of the pan pot as a means to achieve image localisation within stereophonic recordings. As such, only an interaural intensity difference exists between left and right channels for a single source. We use gain scaling and phase cancellation techniques to expose frequency dependent nulls across the azimuth domain, from which source separation and resynthesis is carried out. The algorithm is demonstrated to be state of the art in the field of sound source separation but also to be a useful pre-process to other tasks such as music segmentation and surround sound upmixing

Arrow@TUDublin

Audio for Virtual, Augmented and Mixed Realities: Proceedings of ICSA 2019 ; 5th International Conference on Spatial Audio ; September 26th to 28th, 2019, Ilmenau, Germany

Author: Verband Deutscher Tonmeister
Publication venue
Publication date: 20/11/2019
Field of study

The ICSA 2019 focuses on a multidisciplinary bringing together of developers, scientists, users, and content creators of and for spatial audio systems and services. A special focus is on audio for so-called virtual, augmented, and mixed realities. The fields of ICSA 2019 are: - Development and scientific investigation of technical systems and services for spatial audio recording, processing and reproduction / - Creation of content for reproduction via spatial audio systems and services / - Use and application of spatial audio systems and content presentation services / - Media impact of content and spatial audio systems and services from the point of view of media science. The ICSA 2019 is organized by VDT and TU Ilmenau with support of Fraunhofer Institute for Digital Media Technology IDMT

Digitale Bibliothek Thüringen

SIGNAL TRANSFORMATIONS FOR IMPROVING INFORMATION REPRESENTATION, FEATURE EXTRACTION AND SOURCE SEPARATION

Author: G. Presti
Publication venue: Università degli Studi di Milano
Publication date: 28/02/2017
Field of study

Questa tesi riguarda nuovi metodi di rappresentazione del segnale nel dominio tempo-frequenza, tali da mostrare le informazioni ricercate come dimensioni esplicite di un nuovo spazio. In particolare due trasformate sono introdotte: lo Spazio di Miscelazione Bivariato (Bivariate Mixture Space) e il Campo della Struttura Spettro-Temporale (Spectro-Temporal Structure-Field). La prima trasformata mira a evidenziare le componenti latenti di un segnale bivariato basandosi sul comportamento di ogni componente frequenziale (ad esempio a fini di separazione delle sorgenti); la seconda trasformata mira invece all'incapsulamento di informazioni relative al vicinato di un punto in R^2 in un vettore associato al punto stesso, tale da descrivere alcune propriet\ue0 topologiche della funzione di partenza. Nel dominio dell'elaborazione digitale del segnale audio, il Bivariate Mixture Space pu\uf2 essere interpretato come un modo di investigare lo spazio stereofonico per operazioni di separazione delle sorgenti o di estrazione di informazioni, mentre lo Spectro-Temporal Structure-Field pu\uf2 essere usato per ispezionare lo spazio spettro-temporale (segregare suoni percussivi da suoni intonati o tracciae modulazioni di frequenza). Queste trasformate sono studiate e testate anche in relazione allo stato del'arte in campi come la separazione delle sorgenti, l'estrazione di informazioni e la visualizzazione dei dati. Nel campo dell'informatica applicata al suono, queste tecniche mirano al miglioramento della rappresentazione del segnale nel dominio tempo-frequenza, in modo tale da rendere possibile l'esplorazione dello spettro anche in spazi alternativi, quali il panorama stereofonico o una dimensione virtuale che separa gli aspetti percussivi da quelli intonati.This thesis is about new methods of signal representation in time-frequency domain, so that required information is rendered as explicit dimensions in a new space. In particular two transformations are presented: Bivariate Mixture Space and Spectro-Temporal Structure-Field. The former transform aims at highlighting latent components of a bivariate signal based on the behaviour of each frequency base (e.g. for source separation purposes), whereas the latter aims at folding neighbourhood information of each point of a R^2 function into a vector, so as to describe some topological properties of the function. In the audio signal processing domain, the Bivariate Mixture Space can be interpreted as a way to investigate the stereophonic space for source separation and Music Information Retrieval tasks, whereas the Spectro-Temporal Structure-Field can be used to inspect spectro-temporal dimension (segregate pitched vs. percussive sounds or track pitch modulations). These transformations are investigated and tested against state-of-the-art techniques in fields such as source separation, information retrieval and data visualization. In the field of sound and music computing, these techniques aim at improving the frequency domain representation of signals such that the exploration of the spectrum can be achieved also in alternative spaces like the stereophonic panorama or a virtual percussive vs. pitched dimension

AIR Universita degli studi di Milano

Perception of Reverberation in Domestic and Automotive Environments

Author: Kaplanis Neofytos
Publication venue: Aalborg Universitetsforlag
Publication date: 01/12/2016
Field of study

nrpages: 227status: publishe

Lirias

VBN

The role that sound spatialization plays in improving performance in an interactive installation : study of the correlation between gesture and localization of sound sources in space

Author: Franco Diogo Leichsenring
Publication venue
Publication date: 01/01/2018
Field of study

The main objective of this research work is to study the correlation between gesture and localization of sound sources in space within the framework of interactive installations, based on theories of hearing and gesture. We have therefore chosen the experimental method by developing an interactive installation with which we carry out three different experiments, in which a subject’s hand is tracked by a Microsoft Kinect depth camera (motion capture) and a deictic gesture is used to trigger recorded music sounds and identify their localization in the horizontal plane. Thus, we manipulate the direction of sound and we measure the percentage of correct perceptual sound source localizations resulting from the participant’s responses in an Inquiry Mode Questionnaire in comparison with the actual directions of the gesture and perceptual sound sources provided by software. Descriptive and inferential statistics is applied to the collected data. The main results show that it is easier to define the origin of sound and that auditory perception is more accurate when its incidence is frontal in the horizontal plane, just as sound source localization theory predicts. Whereas 86.1% of all volunteers consider that their gesture coincides with the origin of sound in experiment 1, in which the use of their gesture in a certain direction produces a sound from that direction, only 58.1% admit the same in experiment 3, in which the same gesture is used to identify the system-predetermined localization of a perceptual sound source in an angle of 260o around a subject. At least 55.9% of all participants do not perceive that their gesture cannot coincide with the origin of sound in experiment 2, since sound is produced from the opposite surround direction, which seems to demonstrate that, when sounds are produced frontally or from the back and a person has the task of controlling their motion with a deictic gesture at the same time, his or her ability to identify the origin of sound generally diminishes, in addition to the already well-known reduced ability to identify it when it is in the median plane, if the head is not rotated. We therefore conclude that there is a relatively high correlation between gesture and localization of sound sources in space, but this is not as perfect as it could be owing to the limitations of the human auditory system and to the natural dependence of head movement on gesture.O objectivo principal deste trabalho de pesquisa é o de estudar a correlação entre gesto e localização de fontes sonoras no espaço, no âmbito das instalações interactivas, com base nas teorias da audição e do gesto. Na ocasisão em que começamos a nossa investigação verificámos que havia vários estudos que abordavam os assuntos “gesto” e “localização de fontes sonoras” de diversas maneiras: 1) de forma independente um do outro e/ou noutros contextos distintos dos das instalações interactivas, como por exemplo em Blauert (1997), Pulkki (1999) Pulkki & Karjalainen (2001), Pulkki (2001a), Bates et al. (2007), Hammershøi (2009), McNeill (1992), Coutaz & Crowley (1995), Choi (2000), Cadoz & Wanderley (2000), Nehaniv (2005), Campbell (2005), ou Godøy & Leman (2010); 2) de um ponto de vista mais técnico, como por exemplo em Harada et al. (1992), Jensenius et al. (2006), Marshall et al. (2006), Schacher (2007), Neukom & Schacher (2008), Zelli (2009), Marshall et al. (2009), Bhuiyan & Picking (2009), ou Schumacher & Bresson (2010); ou 3) de um ponto de vista mais artístico, como em Bencina et al. (2008) ou Grigoriou & Floros (2010). Havia, no entanto, muito poucos estudos a envolver ou a abordar ambos os assuntos e a analisar de maneira conjugada as suas relações de um ponto de vista mais perceptual, como por exemplo em Gröhn (2002), de Götzen (2004) ou Marentakis et al. (2008). Foi esta última perspectiva que decidimos seguir e que aqui exploramos. Desta forma, optámos pelo método experimental, aplicando um desenho de medidas repetidas e desenvolvendo uma instalação interactiva com a qual realizamos três experiências diferentes, em que a mão de um sujeito é rastreada por uma câmara de profundidade Microsoft Kinect (captação de movimento) e um gesto díctico é usado para activar sons de música gravada e para identificar as suas localizações no plano de escuta horizontal. Assim, manipulamos a direcção do som e medimos a percentagem de localizações de fontes sonoras perceptuais correctas, resultante das respostas dos participantes num Inquérito Por Questionário em comparação com as direcções reais do gesto díctico e das fontes sonoras perceptuais fornecidas pelo software que utilizamos no nosso trabalho. Para população-alvo pensámos em pessoas com conhecimentos musicais e pessoas com poucos ou nenhuns conhecimentos musicais, o que nos levou a solicitar a um grande número de pessoas a sua participação voluntária, anónima e sem constrangimentos no nosso estudo. Isso foi levado a cabo sobretudo através do envio de correio electrónico para amigos, para estudantes de diferentes áreas a frequentar e para colegas a trabalhar na Escola de Artes da Universidade Católica Portuguesa (EA- -UCP), na Escola Superior de Música e Artes do Espetáculo do Instituto Politécnico do Porto e na Academia de Música de Espinho. Para além disso, foi também crucial falar-se com amigos e familiares e informar tantas pessoas quanto possíıvel sobre a nossa investigação, através da colocação de cartazes informativos nas paredes dos corredores da Universidade Católica, alguns dias antes de as experiências terem sido realizadas no Laboratório de Captação de Movimento da EA-UCP. Por fim, é efectuada uma análise estatística descritiva e inferencial dos dados recolhidos. Os principais resultados apontam no sentido de ser mais fácil definir a origem do som quando a sua incidência é frontal no plano de escuta horizontal, para além de a percepção auditiva ser mais precisa nessa direcção, tal como a teoria da localização de fontes sonoras prevê. Enquanto 86.1% de todos os participantes consideram que o seu gesto díctico coincide com a origem do som na experiência 1, em que o uso desse gesto numa determinada direcção faz despoletar um som proveniente dessa direcção, apenas 58.1% admitem o mesmo na experiência 3, em que o mesmo gesto é usado para identificar a localização de uma fonte sonora perceptual predeterminada pelo sistema num ângulo de 260º em torno de um sujeito. Esta última percentagem parece dever-se ao facto de a maior parte dos sons ser produzida a partir de direcções laterais na experiência 3, tendo a posição da cabeça voltada para a câmara como referência. Pelo menos 55.9% de todos os voluntários não percebem que o seu gesto não poderia ter coincidido com a origem do som na experiência 2, já que o som é produzido a partir da direcção envolvente oposta. Este facto parece demonstrar que, quando os sons são produzidos frontalmente ou de trás e uma pessoa tem a tarefa de controlar os seus movimentos com um gesto díctico ao mesmo tempo, a sua capacidade para identificar a origem do som é, em geral, ainda mais baixa, para além da já conhecida capacidade reduzida para identificá-la quando o som se encontra no plano mediano, se a cabeça não for rodada. A maior parte dos participantes sente um controlo imediato sobre o som nas experiências 1 e 2, mas os tempos estimados pelos próprios são bastante superiores aos aproximadamente 650 milissegundos necessários para o ser humano ouvir e reagir a um som na nossa instalação interactiva. Descobrimos também que o tempo médio necessário para localizar sons com o uso de um gesto díctico na nossa experiência 3 é de cerca de 10 segundos, o que corresponde a um tempo bastante mais longo do que os 3 segundos que supusemos. Para além disso, os voluntários fazem em média 2 tentativas para localizar sons com os seus gestos dícticos, tendo a necessidade de ouvir apenas uma vez em média cada som na íntegra para o localizar. Os desvios à esquerda e à direita efectuados pela maior parte dos participantes relativamente às direcções verdadeiras do som, quando estes tentam identificar as localizações predeterminadas pelo sistema das fontes sonoras perceptuais com os seus gestos dícticos na zona periférica do corpo, são em média de 7.97º e -7.19º, respectivamente. Desta forma, o desvio médio absoluto é de 7.76º. Comparando esses desvios com aqueles levados a cabo pelos participantes usando a mão esquerda (desvios de 6.86o para a esquerda e -6.35º para a direita das direcções verdadeiras do som) e com aqueles usando a mão direita (desvios de 8.46º para a esquerda e -7.38º para a direita das direcções verdadeiras do som), concluímos que os resultados são bastante parecidos entre si. Descobrimos que a maior parte dos voluntários estima um tempo muito mais longo do que os 2 segundos que supusemos experimentalmente para entender cada uma das três experiências. Para além disso, esse tempo estimado pelos participantes diminui da primeira para a última experiência, aparentemente devido à familiarização, conscientemente provocada por nós através da mesma sequência de realização das experiências imposta a cada participante, com o nosso sistema interactivo, embora considerem ter entendido cada uma das três experiências rapidamente. Acresce que a maioria dos voluntários interage facilmente com a nossa instalação e concorda que o gesto sugerido por nós foi adequadamente seleccionado para qualquer uma das três experiências. Também constatamos que os participantes consideram a resposta do sistema ao gesto como sendo imediata nas nossas três experiências, ou seja, estimam cerca de 1 segundo, o que é consistente com o resultado da medição da latência do sistema de cerca de 470 milissegundos. Além disso, verificamos que a maioria dos voluntários se sente envolvida pelo som na nossa instalação interactiva usando Ambisonics Equivalent Panning. Portanto, concluímos que, usando uma instalação interactiva como a nossa com um público-alvo semelhante aquele que tivemos, há uma correlação relativamente elevada entre o gesto e a localização de fontes sonoras no espaço, mas que esta não é tão perfeita como poderia ser devido às limitações do nosso sistema auditivo e aparentemente à dependência natural do movimento da cabeça do gesto. Assim, parece que a espacialização sonora pode melhorar o desempenho numa instalação interactiva, mas de forma moderada. Mesmo assim, defendemos que um sistema como o nosso pode vir a ser aplicado com vantagem em domínios diversos como os que apresentamos como exemplos

Repositório Institucional da Universidade Católica Portuguesa