1,971 research outputs found

    Resynthesis of Acoustic Scenes Combining Sound Source Separation and WaveField Synthesis Techniques

    Full text link
    [ES] La Separacón de Fuentes ha sido un tema de intensa investigación en muchas aplicaciones de tratamiento de señaal, cubriendo desde el procesado de voz al análisis de im'agenes biomédicas. Aplicando estas técnicas a los sistemas de reproducci'on espacial de audio, se puede solucionar una limitaci ón importante en la resíntesis de escenas sonoras 3D: la necesidad de disponer de las se ñales individuales correspondientes a cada fuente. El sistema Wave-field Synthesis (WFS) puede sintetizar un campo acústico mediante arrays de altavoces, posicionando varias fuentes en el espacio. Sin embargo, conseguir las señales de cada fuente de forma independiente es normalmente un problema. En este trabajo se propone la utilización de distintas técnicas de separaci'on de fuentes sonoras para obtener distintas pistas a partir de grabaciones mono o estéreo. Varios métodos de separación han sido implementados y comprobados, siendo uno de ellos desarrollado por el autor. Aunque los algoritmos existentes están lejos de conseguir una alta calidad, se han realizado tests subjetivos que demuestran cómo no es necesario obtener una separación óptima para conseguir resultados aceptables en la reproducción de escenas 3D[EN] Source Separation has been a subject of intense research in many signal processing applications, ranging from speech processing to medical image analysis. Applied to spatial audio systems, it can be used to overcome one fundamental limitation in 3D scene resynthesis: the need of having the independent signals for each source available. Wave-field Synthesis is a spatial sound reproduction system that can synthesize an acoustic field by means of loudspeaker arrays and it is also capable of positioning several sources in space. However, the individual signals corresponding to these sources must be available and this is often a difficult problem. In this work, we propose to use Sound Source Separation techniques in order to obtain different tracks from stereo and mono mixtures. Some separation methods have been implemented and tested, having been one of them developed by the author. Although existing algorithms are far from getting hi-fi quality, subjective tests show how it is not necessary an optimum separation for getting acceptable results in 3D scene reproductionCobos Serrano, M. (2007). Resynthesis of Acoustic Scenes Combining Sound Source Separation and WaveField Synthesis Techniques. http://hdl.handle.net/10251/12515Archivo delegad

    Natural sound rendering for headphones: . . .

    Get PDF
    With the strong growth of assistive and personal listening devices, natural sound rendering over headphones is becoming a necessity for prolonged listening in multimedia and virtual reality applications. The aim of natural sound rendering is to recreate the sound scenes with the spatial and timbral quality as natural as possible, so as to achieve a truly immersive listening experience. However, rendering natural sound over headphones encounters many challenges. This tutorial paper presents signal processing techniques to tackle these challenges to assist human listening

    The Bird's Ear View: Audification for the Spectral Analysis of Heliospheric Time Series Data.

    Full text link
    The sciences are inundated with a tremendous volume of data, and the analysis of rapidly expanding data archives presents a persistent challenge. Previous research in the field of data sonification suggests that auditory display may serve a valuable function in the analysis of complex data sets. This dissertation uses the heliospheric sciences as a case study to empirically evaluate the use of audification (a specific form of sonification) for the spectral analysis of large time series. Three primary research questions guide this investigation, the first of which addresses the comparative capabilities of auditory and visual analysis methods in applied analysis tasks. A number of controlled within-subject studies revealed a strong correlation between auditory and visual observations, and demonstrated that auditory analysis provided a heightened sensitivity and accuracy in the detection of spectral features. The second research question addresses the capability of audification methods to reveal features that may be overlooked through visual analysis of spectrograms. A number of open-ended analysis tasks quantitatively demonstrated that participants using audification regularly discovered a greater percentage of embedded phenomena such as low-frequency wave storms. In addition, four case studies document collaborative research initiatives in which audification contributed to the acquisition of new domain-specific knowledge. The final question explores the potential benefits of audification when introduced into the workflow of a research scientist. A case study is presented in which a heliophysicist incorporated audification into their working practice, and the “Think-Aloud” protocol is applied to gain a sense for how audification augmented the researcher’s analytical abilities. Auditory observations are demonstrated to make significant contributions to ongoing research, including the detection of previously unidentified equipment-induced artifacts. This dissertation provides three primary contributions to the field: 1) an increased understanding of the comparative capabilities of auditory and visual analysis methods, 2) a methodological framework for conducting audification that may be transferred across scientific domains, and 3) a set of well-documented cases in which audification was applied to extract new knowledge from existing data archives. Collectively, this work presents a “bird’s ear view” afforded by audification methods—a macro understanding of time series data that preserves micro-level detail.PhDDesign ScienceUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111561/1/rlalexan_1.pd

    Modification of multichannel audio for non-standard loudspeaker configurations

    Get PDF
    Tämä diplomityö käsittelee monikanavaäänen analyysi- ja hajotelmamenetelmiä. Työn tavoitteena on pystyä muokkaamaan monikanavaäänityksiä uusille kaiutinkokoonpanoille siten, että äänen tilaominaisuudet säilyvät. Teoriataustana työssä ovat ihmiskuulon tilahavainnointiominaisuudet, äänisignaaleihin perustuvat samankaltaisuusmitat sekä suunta-arviot ja informaatioteknologian lähde-erottelumenetelmät. Työ käy läpi kirjallisuudesta löytyviä monikanavaäänen muokkausmenetelmiä. Diplomityön kokeellisen osuuden aloittaa DVD-levyjen analyysi, jolla pyrittiin saamaan tietoa levyjen äänituotannossa käytettävistä menetelmistä myöhempää äänimuunnostekniikoiden kehittämistä varten. Koe osoitti, että kolmen etukanavasignaalin ja kahden takakanavasignaalin välillä on vain harvoin yhteisiä äänikomponentteja. Kompaktien kaiutinkokoonpanojen ominaisuuksia tutkittiin kahdessa kuuntelukokeessa. Ensimmäinen koe tarkasteli eroja eri kolmikanavaisten kaiutinasettelujen välillä. Tavoitteena näissä toistosysteemeissä oli hyödyntää ääniaaltojen heijastuksia huoneen seinistä. Jälkimmäinen kuuntelukoe sovelsi kolmea tunnettua äänimuunnosmenetelmää kolmikanavaiseen kompaktiin kaiutinkokoonpanoon, jonka toistosta saatavaa tilahavaintoa pyrittiin laajentamaan. Kahden metodeista havaittiin parantavan tutkittuja tilaominaisuuksia.In this thesis, analysis and decomposition methods for multichannel audio are studied. The objective of the work is to transform multichannel recordings to new reproduction systems so that the spatial properties of the sound are preserved. Spatial hearing of the human auditory system, signal-based similarity and localization measures, and information-technological source separation methods are described as background theory. Then, different multichannel audio transform methods are reviewed. The experimental part of the work starts with an analysis of DVD recordings to gain helpful information about the production methods of such recordings for further development of audio transform methods. The test reveals that the three frontal channels do not usually share common sound sources with the two rear channels. The properties of compact loudspeaker systems are investigated in two listening tests. The first test studies the differences between three-channel loudspeaker layouts, which exploit the reflections of sound waves from room boundaries. The latter one of the tests applies three transform methods known from the literature to widen the spatial dimensions of a three-channel compact loudspeaker system in comparison to a reference stereo system. These methods are a stereo signal transform method based on signal powers and interchannel cross-correlations, a primaryambient signal decomposition based on principal component analysis (PCA), and directional audio coding (DirAC). The methods were ranked in this descending order of preference by the test subjects

    SIGNAL TRANSFORMATIONS FOR IMPROVING INFORMATION REPRESENTATION, FEATURE EXTRACTION AND SOURCE SEPARATION

    Get PDF
    Questa tesi riguarda nuovi metodi di rappresentazione del segnale nel dominio tempo-frequenza, tali da mostrare le informazioni ricercate come dimensioni esplicite di un nuovo spazio. In particolare due trasformate sono introdotte: lo Spazio di Miscelazione Bivariato (Bivariate Mixture Space) e il Campo della Struttura Spettro-Temporale (Spectro-Temporal Structure-Field). La prima trasformata mira a evidenziare le componenti latenti di un segnale bivariato basandosi sul comportamento di ogni componente frequenziale (ad esempio a fini di separazione delle sorgenti); la seconda trasformata mira invece all'incapsulamento di informazioni relative al vicinato di un punto in R^2 in un vettore associato al punto stesso, tale da descrivere alcune propriet\ue0 topologiche della funzione di partenza. Nel dominio dell'elaborazione digitale del segnale audio, il Bivariate Mixture Space pu\uf2 essere interpretato come un modo di investigare lo spazio stereofonico per operazioni di separazione delle sorgenti o di estrazione di informazioni, mentre lo Spectro-Temporal Structure-Field pu\uf2 essere usato per ispezionare lo spazio spettro-temporale (segregare suoni percussivi da suoni intonati o tracciae modulazioni di frequenza). Queste trasformate sono studiate e testate anche in relazione allo stato del'arte in campi come la separazione delle sorgenti, l'estrazione di informazioni e la visualizzazione dei dati. Nel campo dell'informatica applicata al suono, queste tecniche mirano al miglioramento della rappresentazione del segnale nel dominio tempo-frequenza, in modo tale da rendere possibile l'esplorazione dello spettro anche in spazi alternativi, quali il panorama stereofonico o una dimensione virtuale che separa gli aspetti percussivi da quelli intonati.This thesis is about new methods of signal representation in time-frequency domain, so that required information is rendered as explicit dimensions in a new space. In particular two transformations are presented: Bivariate Mixture Space and Spectro-Temporal Structure-Field. The former transform aims at highlighting latent components of a bivariate signal based on the behaviour of each frequency base (e.g. for source separation purposes), whereas the latter aims at folding neighbourhood information of each point of a R^2 function into a vector, so as to describe some topological properties of the function. In the audio signal processing domain, the Bivariate Mixture Space can be interpreted as a way to investigate the stereophonic space for source separation and Music Information Retrieval tasks, whereas the Spectro-Temporal Structure-Field can be used to inspect spectro-temporal dimension (segregate pitched vs. percussive sounds or track pitch modulations). These transformations are investigated and tested against state-of-the-art techniques in fields such as source separation, information retrieval and data visualization. In the field of sound and music computing, these techniques aim at improving the frequency domain representation of signals such that the exploration of the spectrum can be achieved also in alternative spaces like the stereophonic panorama or a virtual percussive vs. pitched dimension

    Application of sound source separation methods to advanced spatial audio systems

    Full text link
    This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately, most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to the sparsity of the sources under some signal transformation. This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result, its contributions can be categorized within these two areas. First, two underdetermined SSS methods are proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the features considered by each of them are related to different localization cues that enable to perform separation of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at improving the isolation of the separated sources are proposed. The performance achieved by several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of listening tests, paying special attention to the change observed in the perceived spatial attributes. Although the estimated sources are distorted versions of the original ones, the masking effects involved in their spatial remixing make artifacts less perceptible, which improves the overall assessed quality. Finally, some novel developments related to the application of time-frequency processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci

    A novel lip geometry approach for audio-visual speech recognition

    Get PDF
    By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. Various method have been studied by research group around the world to incorporate lip movements into speech recognition in recent years, however exactly how best to incorporate the additional visual information is still not known. This study aims to extend the knowledge of relationships between visual and speech information specifically using lip geometry information due to its robustness to head rotation and the fewer number of features required to represent movement. A new method has been developed to extract lip geometry information, to perform classification and to integrate visual and speech modalities. This thesis makes several contributions. First, this work presents a new method to extract lip geometry features using the combination of a skin colour filter, a border following algorithm and a convex hull approach. The proposed method was found to improve lip shape extraction performance compared to existing approaches. Lip geometry features including height, width, ratio, area, perimeter and various combinations of these features were evaluated to determine which performs best when representing speech in the visual domain. Second, a novel template matching technique able to adapt dynamic differences in the way words are uttered by speakers has been developed, which determines the best fit of an unseen feature signal to those stored in a database template. Third, following on evaluation of integration strategies, a novel method has been developed based on alternative decision fusion strategy, in which the outcome from the visual and speech modality is chosen by measuring the quality of audio based on kurtosis and skewness analysis and driven by white noise confusion. Finally, the performance of the new methods introduced in this work are evaluated using the CUAVE and LUNA-V data corpora under a range of different signal to noise ratio conditions using the NOISEX-92 dataset

    Audio Mastering as a Musical Competency

    Get PDF
    In this dissertation, I demonstrate that audio mastering is a musical competency by elucidating the most significant, and clearly audible, facets of this competence. In fact, the mastering process impacts traditionally valued musical aspects of records, such as timbre and dynamics. By applying the emerging creative scholarship method used within the field of music production studies, this dissertation will aid scholars seeking to hear and understand audio mastering by elucidating its core practices as musical endeavours. And, in so doing, I hope to enable increased clarity and accuracy in future scholarly discussions on the topic of audio mastering, as well as the end product of the mastering process: records. Audio mastering produces a so-called master of a record, that is, a finished version of a record optimized for duplication and distribution via available formats (i.e, vinyl LP, audio cassette, compact disc, mp3, wav, and so on). This musical process plays a crucial role in determining how records finally sound, and it is not, as is so often inferred in research, the sole concern of a few technicians working in isolated rooms at a record label\u27s corporate headquarters. In fact, as Mark Cousins and Russ Hepworth-Sawyer (2013: 2) explain, nowadays “all musicians and engineers, to a lesser or greater extent, have to actively engage in the mastering process.” Thus, this dissertation clarifies the creative nature of audio mastering through an investigation of how mastering engineers hear records, and how they use technology to achieve the sonic goals they conceptualize

    Audio-Visual Learning for Scene Understanding

    Get PDF
    Multimodal deep learning aims at combining the complementary information of different modalities. Among all modalities, audio and video are the predominant ones that humans use to explore the world. In this thesis, we decided to focus our study on audio-visual deep learning to mimic with our networks how humans perceive the world. Our research includes images, audio signals and acoustic images. The latter provide spatial audio information and are obtained from a planar array of microphones combining their raw audios with the beamforming algorithm. They better mimic human auditory systems, which cannot be replicated using just one microphone, not able alone to give spatial sound cues. However, as microphones arrays are not so widespread, we also study how to handle the missing spatialized audio modality at test time. As a solution, we propose to distill acoustic images content to audio features during the training in order to handle their absence at test time. This is done for supervised audio classification using the generalized distillation framework, which we also extend for self-supervised learning. Next, we devise a method for reconstructing acoustic images given a single microphone and an RGB frame. Therefore, in case we just dispose of a standard video, we are able to synthesize spatial audio, which is useful for many audio-visual tasks, including sound localization. Lastly, as another example of restoring one modality from available ones, we inpaint degraded images providing audio features, to reconstruct the missing region not only to be visually plausible but also semantically consistent with the related sound. This includes also cross-modal generation, in the limit case of completely missing or hidden visual modality: our method naturally deals with it, being able to generate images from sound. In summary we show how audio can help visual learning and vice versa, by transferring knowledge between the two modalities at training time, in order to distill, reconstruct, or restore the missing modality at test time
    corecore