11 research outputs found

    Rétroingénierie du son pour l écoute active et autres applications

    Get PDF
    Ce travail s intéresse au problème de la rétroingénierie du son pour l écoute active. Le format considéré correspond au CD audio. Le contenu musical est vu comme le résultat d un enchaînement de la composition, l enregistrement, le mixage et le mastering. L inversion des deux dernières étapes constitue le fond du problème présent. Le signal audio est traité comme un mélange post-non-linéaire. Ainsi, le mélange est décompressé avant d'être décomposé en pistes audio. Le problème est abordé dans un contexte informé : l inversion est accompagnée d'une information qui est spécifique à la production du contenu. De cette manière, la qualité de l inversion est significativement améliorée. L information est réduite de taille en se servant des méthodes de quantification, codage, et des faits sur la psychoacoustique. Les méthodes proposées s appliquent en temps réel et montrent une complexité basse. Les résultats obtenus améliorent l état de l art et contribuent aux nouvelles connaissances.This work deals with the problem of reverse audio engineering for active listening. The format under consideration corresponds to the audio CD. The musical content is viewed as the result of a concatenation of the composition, the recording, the mixing, and the mastering. The inversion of the two latter stages constitutes the core of the problem at hand. The audio signal is treated as a post-nonlinear mixture. Thus, the mixture is decompressed before being decomposed into audio tracks. The problem is tackled in an informed context: The inversion is accompanied by information which is specific to the content production. In this manner, the quality of the inversion is significantly improved. The information is reduced in size by the use of quantification and coding methods, and some facts on psychoacoustics. The proposed methods are applicable in real time and have a low complexity. The obtained results advance the state of the art and contribute new insights.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF

    Object-based radio : effects on production and audience experience

    Get PDF
    This thesis analyses the benefits of using object-based audio as a production and delivery format in order to enable new audience experiences. This is achieved though a series of case studies, each focusing on a different user experience enabled by the use of object-based audio. Each study considers the impact of using object-based audio on the creative process, production workflow and audience experience.The first study analyses the audience’s use of the ability to personalise the mix of a live football match. It demonstrates that there was not a single audio mix favoured by all, and the ability to change the mix was valued by the audience. While listeners did adjust the mix initially, they tended to leave it at that setting and did not interact much once they made their initial selection. While there were three favoured mixes, over 50% of listeners did not choose one of these three mixes, indicating that only offering three options would not satisfy everyone.Modes of listening model the ways listeners deconstruct complex sound scenes into foreground and background categories ascribing different salience to foreground and background sounds. The second study uses this model to inform a series of card sorting exercises which result in similar foreground and background categories. However, rather than being unimportant, background sounds were present to convey ancillary information or to affect emotional responses and foreground sounds to expose plot or story events. This study demonstrated that this grouping was a meaningful categorisation for broadcast sound and evaluated how beneficial allowing different foreground and background audio mixes would be for audiences. It contains analysis of audio objects in the context of foreground and background sounds based on the opinions of the content creators. It also includes subjective testing of audience preferences for different mixes of foreground verses background audio levels across five different genres and four different loudspeaker layouts. It shows that there is no clustering of listeners based on their preference of foreground vs background balances. It also shows that there is significant variation of foreground and background balance preference between loudspeaker layouts.The final study goes beyond tailoring audio levels, balances and loudspeaker layouts and analyses the benefit to audiences of being able to adapt the story of a drama in order to set it in a location that is familiar to the listener. It shows that being able to set a radio drama in the location where the listening is taking place improves audience’s enjoyment of the programme. 75% of listeners who experienced the tailored version of the drama reported liking the story, compared with 65% of listeners who experienced a non-tailored version.The three studies also analyse the impact of object-based content creation on production workflows by documenting the challenges faced and discussing possible solutions. For example, providing writers with constraints when they are designing dynamic content and allowing sound designers time to develop trust in the technology when mixing content for multiple loudspeaker layouts.The original contribution to knowledge is to establish a new listening model applicable to constructed and designed sound experiences based on functional analysis of audio objects. This work also establishes, for the first time, a framework for the definition of an audio object based on the creator’s intended range of audience experiences. In addition the thesis also provides insights into how audiences interact with object-based content experiences and insights about audience attitudes towards using personal data to personalise object-based content experiences. Each study addresses the potential advantages of delivering object-based audio, assess any impact on the quality of the audience’s experience and analyses the challenges faced by production in the creation of these new experiences

    Application of sound source separation methods to advanced spatial audio systems

    Full text link
    This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately, most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to the sparsity of the sources under some signal transformation. This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result, its contributions can be categorized within these two areas. First, two underdetermined SSS methods are proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the features considered by each of them are related to different localization cues that enable to perform separation of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at improving the isolation of the separated sources are proposed. The performance achieved by several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of listening tests, paying special attention to the change observed in the perceived spatial attributes. Although the estimated sources are distorted versions of the original ones, the masking effects involved in their spatial remixing make artifacts less perceptible, which improves the overall assessed quality. Finally, some novel developments related to the application of time-frequency processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci

    An investigation into the use of intuitive control interfaces and distributed processing for enhanced three dimensional sound localization

    Get PDF
    This thesis investigates the feasibility of using gestures as a means of control for localizing three dimesional (3D) sound sources in a distributed immersive audio system. A prototype system was implemented and tested which uses state of the art technology to achieve the stated goals. A Windows Kinect is used for gesture recognition which translates human gestures into control messages by the prototype system, which in turn performs actions based on the recognized gestures. The term distributed in the context of this system refers to the audio processing capacity. The prototype system partitions and allocates the processing load between a number of endpoints. The reallocated processing load consists of the mixing of audio samples according to a specification. The endpoints used in this research are XMOS AVB endpoints. The firmware on these endpoints were modified to include the audio mixing capability which was controlled by a state of the art audio distribution networking standard, Ethernet AVB. The hardware used for the implementation of the prototype system is relatively cost efficient in comparison to professional audio hardware, and is also commercially available for end users. The successful implementation and results from user testing of the prototype system demonstrates how it is a feasible option for recording the localization of a sound source. The ability to partition the processing provides a modular approach to building immersive sound systems. This removes the constraint of a centralized mixing console with a predetermined speaker configuration

    Audio for Virtual, Augmented and Mixed Realities: Proceedings of ICSA 2019 ; 5th International Conference on Spatial Audio ; September 26th to 28th, 2019, Ilmenau, Germany

    Get PDF
    The ICSA 2019 focuses on a multidisciplinary bringing together of developers, scientists, users, and content creators of and for spatial audio systems and services. A special focus is on audio for so-called virtual, augmented, and mixed realities. The fields of ICSA 2019 are: - Development and scientific investigation of technical systems and services for spatial audio recording, processing and reproduction / - Creation of content for reproduction via spatial audio systems and services / - Use and application of spatial audio systems and content presentation services / - Media impact of content and spatial audio systems and services from the point of view of media science. The ICSA 2019 is organized by VDT and TU Ilmenau with support of Fraunhofer Institute for Digital Media Technology IDMT

    深層学習に基づく音源情報推定のための確率論的目的関数の研究

    Get PDF
     本研究は,マイクロホンで観測した音響信号から,源信号や音源の種類や状態などの音に関係する情報である「音源情報」を推定する研究である.音源情報推定の題材として,源信号と雑音が重畳した観測信号から源信号を推定する「音源強調」と,観測信号に含まれる環境音の種類や状態を推定して周囲の危険を予測/察知する「異常音検知」に焦点を当てる.音源の種類や状態などの潜在的な音源情報を考慮しながら音源強調ができれば,大歓声に包まれたサッカースタジアムで,特定の選手の声やボールのキック音を推定でき,まるでサッカースタジアムに潜り込んだようなコンテンツ視聴の方法をユーザに提供可能になる.観測信号に含まれる環境音の種類や状態を推定する異常音検知が実現すれば,機器の動作音から,その機器の動作が正常か異常か(状態)を推定できるようになり,製造/保守業務の効率化ができる. 音源情報を推定するための手法として,統計的機械学習に基づくアプローチが研究されており,近年では深層学習を音源情報推定に適用することで,その推定精度が大きく向上している.深層学習に基づく音源情報推定では,ニューラルネットワークを観測信号から所望の音源情報への非線形写像関数として用いる.そしてニューラルネットワークを音源情報の推定精度を評価する「目的関数」の値を最大化/最小化するように求める.多くの深層学習において目的関数には,二乗誤差関数や交差エントロピー関数などの決定論的な目的関数が用いられる. 音源情報推定において目的関数の設計とは,所望の音源情報の性質や推定精度を定義することと等価である.音源情報の中は,決定論的な目的関数では音源情報の性質や推定精度を定義できないものや,もしくは定義することが妥当ではないものも存在する.例えば,人間の主観的な音質評価を最大化する源信号や,異常音(ラベルデータ)が収集できない音源の状態の推定のための目的関数には,決定論的な目的関数は採用できない.この問題を解決するためには,ネットワークの構造だけでなく,ニューラルネットワークの学習に用いる目的関数を高度化しなくてはならない. 本研究では,決定論的な関数で目的関数を設計できない音源情報を推定するために,深層学習に基づく音源情報推定のための目的関数の研究を行う.所望の音源情報の性質や推定精度を,推定したい音源情報の特性や解きたい問題に応じて入出力値がとるべき値の確率分布や集合として定義し,ニューラルネットワークの入出力が満たすべき統計的な性質を目的関数として記述するという着想からこの問題に取り組む. 3 章では,スポーツの競技音など,ラベルデータが十分に存在しない源信号を強調するための手法を提案する.少量の学習データでニューラルネットワークを学習するためには,事前に設計/選択した音響特徴量を観測信号から抽出し,小規模なニューラルネットワークで音源強調を行う必要がある.3 章では,所望の音源を強調するための適切な音響特徴量を,相互情報量最大化に基づき選択する方法を検討した.この際,特徴量候補の次元数が大きい音響特徴量選択に相互情報量を正確に計算する "カーネル次元圧縮法" を適用することを考え,スパース正則化法に基づく微分可能な目的関数を導出し,大量な音響特徴量候補から適切な音響特徴量を勾配法により選択できる音響特徴量選択法を提案した.定量評価試験では,従来の音響特徴量選択法と比べSDR が向上することを示し,また主観評価試験では,提案法を用いて音響特徴量を選択することで従来法と比べ源信号の明瞭性が向上することを示した.この成果により,これまで推定が困難とされていた,学習データが十分に得られないような源信号や,これまで源信号の推定対象とされてこず,適切な音響特徴量が未知な源信号も推定できるようになった. 4 章では,音源強調の出力音の主観品質を向上させるために,ラベルデータを一意に定めることができず,二乗誤差などの目的関数で推定精度を定義することが妥当でない源信号を強調するための手法を提案する.従来の深層学習に基づく音源強調では,源信号の振幅スペクトルなどをラベルデータとし,ニューラルネットワークの出力とラベルデータの二乗誤差を最小化するように学習をしてきた.このため,出力音に歪が生じて主観品質が低下するという問題があった.そこで4 章では,ラベルデータを用意する代わりに主観評価値と相関の高い音質評価値(聴感評点)を最大化するようための目的関数を提案した.定量評価試験では,提案する目的関数を利用することで,聴感評点を最大化するようにニューラルネットワークを学習できることを確認した.また主観評価試験では,提案法は従来の二乗誤差最小化に基づく目的関数を利用した音源強調よりも高い主観品質で音源強調できることを示した.この成果により,これまで音源強調の学習に利用できなかった聴感評点や人間の評価などの,より\高次" な評価尺度を目的関数として利用できるようになり,ニューラルネットワークを用いた音源強調の応用範囲を広げることができる. 5 章では,モーターの異常回転音やベアリングのぶつかり音などの普段発生しない音(異常音)を検知し,機器動作の状態が正常か異常かを判定することで機器の故障を検知する「異常音検知」の実現を目指す.この問題の難しさは,機器の故障頻度がきわめて低いため,機器の異常動作音(ラベルデータ)が収集できず,一般的な識別のためのニューラルネットワークの目的関数である交差エントロピーが利用できない点にある.そこで5 章では,正常音が従う確率分布と統計的に差異がある音を異常音と定義することで異常音検知を仮説検定とみなし,異常音検知器を最適化するための目的関数として,仮説検定の最適化基準であるネイマン・ピアソンの補題から"ネイマン・ピアソン指標" を導出した.定量評価試験では,従来法と比べ調和平均が向上したことから,提案法が従来法よりも安定して異常音検知できることを示した.また実環境実験では3D プリンタや送風ポンプの突発的な異常音や,ベアリングの傷などに起因する持続的な異常音を検知できることを示した.この成果により,異常音データの集まらない状態識別問題を安定的に解くことが可能になり,銃声検知や未知話者検出などのセキュリティのための音源情報推定技術など,負例データの収集が困難な様々な音源情報推定へと応用ができる.電気通信大学201

    Approche informée pour l'analyse du son et de la musique

    Get PDF
    En traitement du signal audio, l analyse est une étape essentielle permettant de comprendre et d inter-agir avec les signaux existants. En effet, la qualité des signaux obtenus par transformation ou par synthèse des paramètres estimés dépend de la précision des estimateurs utilisés. Cependant, des limitations théoriques existent et démontrent que la qualité maximale pouvant être atteinte avec une approche classique peut s avérer insuf sante dans les applications les plus exigeantes (e.g. écoute active de la musique). Le travail présenté dans cette thèse revisite certains problèmes d analyse usuels tels que l analyse spectrale, la transcription automatique et la séparation de sources en utilisant une approche dite informée . Cette nouvelle approche exploite la con guration des studios de musique actuels qui maitrisent la chaîne de traitement avant l étape de création du mélange. Dans les solutions proposées, de l information complémentaire minimale calculée est transmise en même temps que le signal de mélange a n de permettre certaines transformations sur celui-ci tout en garantissant le niveau de qualité. Lorsqu une compatibilité avec les formats audio existants est nécessaire, cette information est cachée à l intérieur du mélange lui-même de manière inaudible grâce au tatouage audionumérique. Ce travail de thèse présente de nombreux aspects théoriques et pratiques dans lesquels nous montrons que la combinaison d un estimateur avec de l information complémentaire permet d améliorer les performances des approches usuelles telles que l estimation non informée ou le codage pur.In the field of audio signal processing, analysis is an essential step which allows interactions with existing signals. In fact, the quality of transformed or synthesized audio signals depends on the accuracy over the estimated model parameters. However, theoretical limits exist and show that the best accuracy which can be reached by a classic estimator can be insufficient for the most demanding applications (e.g. active listening of music). The work which is developed in this thesis revisits well known audio analysis problems like spectral analysis, automatic transcription of music and audio sources separation using the novel informed'' approach. This approach takes advantage of a specific configuration where the parameters of the elementary signals which compose a mixture are known before the mixing process. Using the tools which are proposed in this thesis, the minimal side information is computed and transmitted with the mixture signal. This allows any kind of transformation of the mixture signal with a constraint over the resulting quality. When the compatibility with existing audio formats is required, the side information is embedded directly into the analyzed audio signal using a watermarking technique. This work describes several theoretical and practical aspects of audio signal processing. We show that a classic estimator combined with the sufficient side information can obtain better performances than classic approaches (classic estimation or pure coding).BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF

    Sonic Interactions in Virtual Environments

    Get PDF
    This open access book tackles the design of 3D spatial interactions in an audio-centered and audio-first perspective, providing the fundamental notions related to the creation and evaluation of immersive sonic experiences. The key elements that enhance the sensation of place in a virtual environment (VE) are: Immersive audio: the computational aspects of the acoustical-space properties of Virutal Reality (VR) technologies Sonic interaction: the human-computer interplay through auditory feedback in VE VR systems: naturally support multimodal integration, impacting different application domains Sonic Interactions in Virtual Environments will feature state-of-the-art research on real-time auralization, sonic interaction design in VR, quality of the experience in multimodal scenarios, and applications. Contributors and editors include interdisciplinary experts from the fields of computer science, engineering, acoustics, psychology, design, humanities, and beyond. Their mission is to shape an emerging new field of study at the intersection of sonic interaction design and immersive media, embracing an archipelago of existing research spread in different audio communities and to increase among the VR communities, researchers, and practitioners, the awareness of the importance of sonic elements when designing immersive environments

    Sonic Interactions in Virtual Environments

    Get PDF
    This open access book tackles the design of 3D spatial interactions in an audio-centered and audio-first perspective, providing the fundamental notions related to the creation and evaluation of immersive sonic experiences. The key elements that enhance the sensation of place in a virtual environment (VE) are: Immersive audio: the computational aspects of the acoustical-space properties of Virutal Reality (VR) technologies Sonic interaction: the human-computer interplay through auditory feedback in VE VR systems: naturally support multimodal integration, impacting different application domains Sonic Interactions in Virtual Environments will feature state-of-the-art research on real-time auralization, sonic interaction design in VR, quality of the experience in multimodal scenarios, and applications. Contributors and editors include interdisciplinary experts from the fields of computer science, engineering, acoustics, psychology, design, humanities, and beyond. Their mission is to shape an emerging new field of study at the intersection of sonic interaction design and immersive media, embracing an archipelago of existing research spread in different audio communities and to increase among the VR communities, researchers, and practitioners, the awareness of the importance of sonic elements when designing immersive environments
    corecore