519 research outputs found
Perception of Reverberation in Domestic and Automotive Environments
nrpages: 227status: publishe
Application of sound source separation methods to advanced spatial audio systems
This thesis is related to the field of Sound Source Separation (SSS). It addresses the development
and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by
means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel
stereo format, special up-converters are required to use advanced spatial audio reproduction formats,
such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to
accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is
required.
Source separation problems in digital signal processing are those in which several signals have been mixed
together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied
to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately,
most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This
condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to
the sparsity of the sources under some signal transformation.
This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result,
its contributions can be categorized within these two areas. First, two underdetermined SSS methods are
proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a
multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of
sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the
features considered by each of them are related to different localization cues that enable to perform separation
of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at
improving the isolation of the separated sources are proposed. The performance achieved by
several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of
listening tests, paying special attention to the change observed in the perceived spatial attributes.
Although the estimated sources are distorted versions of the original ones, the masking effects
involved in their spatial remixing make artifacts less perceptible, which improves the overall
assessed quality. Finally, some novel developments related to the application of time-frequency
processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat PolitĂšcnica de ValĂšncia. https://doi.org/10.4995/Thesis/10251/8969Palanci
Spatial Multizone Soundfield Reproduction Design
It is desirable for people sharing a physical space to access different multimedia information streams simultaneously. For a good user experience, the interference of the different streams should be held to a minimum. This is straightforward for the video component but currently difficult for the audio sound component. Spatial multizone soundfield reproduction, which aims to provide an individual sound environment to each of a set of listeners without the use of physical isolation or headphones, has drawn significant attention of researchers in recent years. The realization of multizone soundfield reproduction is a conceptually challenging problem as currently most of the soundfield reproduction techniques concentrate on a single zone.
This thesis considers the theory and design of a multizone soundfield reproduction system using arrays of loudspeakers in given complex environments. We first introduce a novel method for spatial multizone soundfield reproduction based on describing the desired multizone soundfield as an orthogonal expansion of formulated basis functions over the desired reproduction region. This provides the theoretical basis of both 2-D (height invariant) and 3-D soundfield reproduction for this work. We then extend the reproduction of the multizone soundfield over the desired region to reverberant environments, which is based on the identification of the acoustic transfer function (ATF) from the loudspeaker over the desired reproduction region using sparse methods. The simulation results confirm that the method leads to a significantly reduced number of required microphones for an accurate multizone sound reproduction compared with the state of the art, while it also facilitates the reproduction over a wide frequency range.
In addition, we focus on the improvements of the proposed multizone reproduction system with regard to practical implementation. The so-called 2.5D multizone oundfield reproduction is considered to accurately reproduce the desired multizone soundfield over a selected 2-D plane at the height approximately level with the listenerâs ears using a single array of loudspeakers with 3-D reverberant settings. Then, we propose an adaptive reverberation cancelation method for the multizone soundfield reproduction within the desired region and simplify the prior soundfield measurement process. Simulation results suggest that the proposed method provides a faster convergence rate than the comparative approaches under the same hardware provision. Finally, we conduct the real-world implementation based on the proposed theoretical work. The experimental results show that we can achieve a very noticeable acoustic energy contrast between the signals recorded in the bright zone and the quiet zone, especially for the system implementation with reverberation equalization
Implementation of an Autonomous Impulse Response Measurement System
Data collection is crucial for researchers, as it can provide important insights for describing phenomena. In acoustics, acoustic phenomena are characterized by Room Impulse Responses (RIRs) occurring when sound propagates in a room. Room impulse responses are needed in vast quantities for various reasons, including the prediction of acoustical parameters and the rendering of virtual acoustical spaces. Recently, mobile robots navigating within indoor spaces have become increasingly used to acquire information about its environment. However, little research has attempted to utilize robots for the collection of room acoustic data.
This thesis presents an adaptable automated system to measure room impulse responses in multi-room environments, using mobile and stationary measurement platforms. The system, known as Autonomous Impulse Response Measurement System (AIRMS), is divided into two stages: data collection and post-processing. To automate data collection, a mobile robotic platform was developed to perform acoustic measurements within a room. The robot was equipped with spatial microphones, multiple loudspeakers and an indoor localization system, which reported real time location of the robot. Additionally, stationary platforms were installed in specific locations inside and outside the room. The mobile and stationary platforms wirelessly communicated with one another to perform the acoustical tests systematically. Since a major requirement of the system is adaptability, researchers can define the elements of the system according to their needs, including the mounted equipment and the number of platforms. Post-processing included extraction of sine sweeps and the calculation of impulse responses. Extraction of the sine sweeps refers to the process of framing every acoustical test signal from the raw recordings. These signals are then processed to calculate the room impulse responses. The automatically collected information was complemented with manually produced data, which included rendering of a 3D model of the room, a panoramic picture.
The performance of the system was tested under two conditions: a single-room and a multiroom setting. Room impulse responses were calculated for each of the test conditions, representing typical characteristics of the signals and showing the effects of proximity from sources and receivers, as well as the presence of boundaries. This prototype produces RIR measurements in a fast and reliable manner.
Although some shortcomings were noted in the compact loudspeakers used to produce the sine sweeps and the accuracy of the indoor localization system, the proposed autonomous measurement system yielded reasonable results. Future work could expand the amount of impulse response measurements in order to further refine the artificial intelligence algorithms
Effects of errorless learning on the acquisition of velopharyngeal movement control
Session 1pSC - Speech Communication: Cross-Linguistic Studies of Speech Sound Learning of the Languages of Hong Kong (Poster Session)The implicit motor learning literature suggests a benefit for learning if errors are minimized during practice. This study investigated whether the same principle holds for learning velopharyngeal movement control. Normal speaking participants learned to produce hypernasal speech in either an errorless learning condition (in which the possibility for errors was limited) or an errorful learning condition (in which the possibility for errors was not limited). Nasality level of the participantsâ speech was measured by nasometer and reflected by nasalance scores (in %). Errorless learners practiced producing hypernasal speech with a threshold nasalance score of 10% at the beginning, which gradually increased to a threshold of 50% at the end. The same set of threshold targets were presented to errorful learners but in a reversed order. Errors were defined by the proportion of speech with a nasalance score below the threshold. The results showed that, relative to errorful learners, errorless learners displayed fewer errors (50.7% vs. 17.7%) and a higher mean nasalance score (31.3% vs. 46.7%) during the acquisition phase. Furthermore, errorless learners outperformed errorful learners in both retention and novel transfer tests. Acknowledgment: Supported by The University of Hong Kong Strategic Research Theme for Sciences of Learning © 2012 Acoustical Society of Americapublished_or_final_versio
Implementation of the Radiation Characteristics of Musical Instruments in Wave Field Synthesis Applications
In this thesis a method to implement the radiation characteristics of musical instruments in wave ïŹeld synthesis systems is developed. It is applied and tested in two loudspeaker systems.Because the loudspeaker systems have a comparably low number of loudspeakers the wave ïŹeld is synthesized at discrete listening positions by solving a linear equation system. Thus, for every constellation of listening and source position all loudspeakers can be used for the synthesis. The calculations are done in spectral domain, denying sound propagation velocity at ïŹrst. This approach causes artefacts in the loudspeaker signals and synthesis errors in the listening area which are compensated by means of psychoacoustic methods. With these methods the aliasing frequency is determined by the extent of the listening area whereas in other wave ïŹeld synthesis systems it is determined by the distance of adjacent loudspeakers. Musical instruments are simpliïŹed as complex point sources to gain, store and propagate their radiation characteristics. This method is the basis of the newly developed âRadiation Methodâ which improves the matrix conditioning of the equation system and the precision of the wave ïŹeld synthesis by implementing the radiation characteristics of the driven loudspeakers. In this work, the âMinimum Energy Methodâ â originally developed for acoustic holography â is applied for matters of wave ïŹeld synthesis for the ïŹrst time. It guarantees a robust solution and creates softer loudspeaker driving signals than the Radiation Method but yields a worse approximation of the wave ïŹeld beyond the discrete listening positions. Psychoacoustic considerations allow for a successfull wave ïŹeld synthesis: Integration times of the auditory system determine the spatial dimensions in which the wave ïŹeld synthesis approach works despite diïŹerent arrival times and directions of wave fronts. By separating the spectrum into frequency bands of the critical band width, masking eïŹects are utilized to reduce the amount of calculations with hardly audible consequances. By applying the âPrecedence Fadeâ, the precedence eïŹect is used to manipulate the perceived source position and improve the reproduction of initial transients of notes. Based on Auditory Scene Analysis principles, âFading Based Panningâ creates precise phantom source positions between the actual loudspeaker positions. Physical measurements, simulations and listening tests prove evidence for the introduced methods and reveal their precision. Furthermore, results of the listening tests show that the perceived spaciousness of instrumental sound not necessarily goes along with distinctness of localization. The introduced methods are compatible to conventional multi channel audio systems as well as other wave ïŹeld synthesis applications.In dieser Arbeit wird eine Methode entwickelt, um die Abstrahlcharakteristik von Musikinstrumenten in Wellenfeldsynthesesystemen zu implementieren. Diese wird in zwei Lautsprechersystemen umgesetzt und getestet. Aufgrund der vergleichsweise geringen Anzahl an Lautsprechern wird das Schallfeld an diskreten Hörpositionen durch Lösung eines linearen Gleichungssystems resynthetisiert. Dadurch können fĂŒr jede Konstellation aus Quellen- und Hörposition alle Lautsprecher fĂŒr die Synthese verwendet werden. Hierzu wird zunĂ€chst in Frequenzebene, unter VernachlĂ€ssigung der Ausbreitungsgeschwindigkeit des Schalls gerechnet. Dieses Vorgehen sorgt fĂŒr Artefakte im Schallsignal und Synthesefehler im Hörbereich, die durch psychoakustische Methoden kompensiert werden. Im Vergleich zu anderen Wellenfeldsyntheseverfahren wird bei diesem Vorgehen die Aliasingfrequenz durch die GröĂe des Hörbereichs und nicht durch den Lautsprecherabstand bestimmt. Musikinstrumente werden als komplexe Punktquellen vereinfacht, wodurch die Abstrahlung erfasst, gespeichert und in den Raum propagiert werden kann. Dieses Vorgehen ist auch die Basis der neu entwickelten âRadiation Methodâ, die durch Einbeziehung der Abstrahlcharakteristik der verwendeten Lautsprecher die Genauigkeit der Wellenfeldsynthese erhöht und die Konditionierung der Propagierungsmatrix des zu lösenden Gleichungssystems verbessert. In dieser Arbeit wird erstmals die fĂŒr die akustische HolograïŹe entwickelte âMinimum Energy Methodâ auf Wellenfeldsynthese angewandt. Sie garantiert eine robuste Lösung und erzeugt leisere Lautsprechersignale und somit mehr konstruktive Interferenz, approximiert das Schallfeld jenseits der diskreten Hörpositionen jedoch schlechter als die Radiation Method. Zahlreiche psychoakustische Ăberlegungen machen die Umsetzung der Wellenfeldsynthese möglich: Integrationszeiten des Gehörs bestimmen die rĂ€umlichen Dimensionen in der die Wellenfeldsynthesemethode â trotz der aus verschiedenen Richtungen und zu unterschiedlichen Zeitpunkten ankommenden Wellenfronten â funktioniert. Durch Teilung des Schallsignals in FrequenzbĂ€nder der kritischen Bandbreite wird unter Ausnutzung von MaskierungseïŹekten die Anzahl an nötigen Rechnungen mit kaum hörbaren Konsequenzen reduziert. Mit dem âPrecedence Fadeâ wird der PrĂ€zedenzeïŹekt genutzt, um die wahrgenommene Schallquellenposition zu beeinïŹussen. Zudem wird dadurch die Reproduktion transienter EinschwingvorgĂ€nge verbessert. Auf Grundlage von Auditory Scene Analysis wird âFading Based Panningâ eingefĂŒhrt, um darĂŒber hinaus eine prĂ€zise Schallquellenlokalisation jenseits der Lautsprecherpositionen zu erzielen. Physikalische Messungen, Simulationen und Hörtests weisen nach, dass die neu eingefĂŒhrten Methoden funktionieren und zeigen ihre PrĂ€zision auf. Auch zeigt sich, dass die wahrgenommene RĂ€umlichkeit eines Instrumentenklangs nicht der Lokalisationssicherheit entspricht. Die eingefĂŒhrten Methoden sind kompatibel mit konventionellen Mehrkanal-Audiosystemen sowie mit anderen Wellenfeldsynthesesystemen
Audio for Virtual, Augmented and Mixed Realities: Proceedings of ICSA 2019 ; 5th International Conference on Spatial Audio ; September 26th to 28th, 2019, Ilmenau, Germany
The ICSA 2019 focuses on a multidisciplinary bringing together of developers, scientists, users, and content creators of and for spatial audio systems and services. A special focus is on audio for so-called virtual, augmented, and mixed realities.
The fields of ICSA 2019 are: - Development and scientific investigation of technical systems and services for spatial audio recording, processing and reproduction / - Creation of content for reproduction via spatial audio systems and services / - Use and application of spatial audio systems and content presentation services / - Media impact of content and spatial audio systems and services from the point of view of media science. The ICSA 2019 is organized by VDT and TU Ilmenau with support of Fraunhofer Institute for Digital Media Technology IDMT
- âŠ