8 research outputs found

    Äänikentän tila-analyysi parametrista tilaäänentoistoa varten käyttäen harvoja mikrofoniasetelmia

    Get PDF
    In spatial audio capturing the aim is to store information about the sound field so that the sound field can be reproduced without a perceptual difference to the original. The need for this is in applications like virtual reality and teleconferencing. Traditionally the sound field has been captured with a B-format microphone, but it is not always a feasible solution due to size and cost constraints. Alternatively, also arrays of omnidirectional microphones can be utilized and they are often used in devices like mobile phones. If the microphone array is sparse, i.e., the microphone spacings are relatively large, the analysis of the sound Direction of Arrival (DoA) becomes ambiguous in higher frequencies. This is due to spatial aliasing, which is a common problem in narrowband DoA estimation. In this thesis the spatial aliasing problem was examined and its effect on DoA estimation and spatial sound synthesis with Directional Audio Coding (DirAC) was studied. The aim was to find methods for unambiguous narrowband DoA estimation. The current State of the Art methods can remove aliased estimates but are not capable of estimating the DoA with the optimal Time-Frequency resolution. In this thesis similar results were obtained with parameter extrapolation when only a single broadband source exists. The main contribution of this thesis was the development of a correlation-based method. The developed method utilizes pre-known, array-specific information on aliasing in each DoA and frequency. The correlation-based method was tested and found to be the best option to overcome the problem of spatial aliasing. This method was able to resolve spatial aliasing even with multiple sources or when the source’s frequency content is completely above the spatial aliasing frequency. In a listening test it was found that the correlation-based method could provide a major improvement to the DirAC synthesized spatial image quality when compared to an aliased estimator.Tilaäänen tallentamisessa tavoitteena on tallentaa äänikentän ominaisuudet siten, että äänikenttä pystytään jälkikäteen syntetisoimaan ilman kuuloaistilla havaittavaa eroa alkuperäiseen. Tarve tälle löytyy erilaisista sovelluksista, kuten virtuaalitodellisuudesta ja telekonferensseista. Perinteisesti äänikentän ominaisuuksia on tallennettu B-formaatti mikrofonilla, jonka käyttö ei kuitenkaan aina ole koko- ja kustannussyistä mahdollista. Vaihtoehtoisesti voidaan käyttää myös pallokuvioisista mikrofoneista koostuvia mikrofoniasetelmia. Mikäli mikrofonien väliset etäisyydet ovat liian suuria, eli asetelma on harva, tulee äänen saapumissuunnan selvittämisestä epäselvää korkeammilla taajuuksilla. Tämä johtuu ilmiöstä nimeltä tilallinen laskostuminen. Tämän diplomityön tarkoituksena oli tutkia tilallisen laskostumisen ilmiötä, sen vaikutusta saapumissuunnan arviointiin sekä tilaäänisynteesiin Directional Audio Coding (DirAC) -menetelmällä. Lisäksi tutkittiin menetelmiä, joiden avulla äänen saapumissuunta voitaisiin selvittää oikein myös tilallisen laskostumisen läsnä ollessa. Työssä havaittiin, että nykyiset ratkaisut laskostumisongelmaan eivät kykene tuottamaan oikeita suunta-arvioita optimaalisella aikataajuusresoluutiolla. Tässä työssä samantapaisia tuloksia saatiin laajakaistaisen äänilähteen tapauksessa ekstrapoloimalla suunta-arvioita laskostumisen rajataajuuden alapuolelta. Työn pääosuus oli kehittää korrelaatioon perustuva saapumissuunnan arviointimenetelmä, joka kykenee tuottamaan luotettavia arvioita rajataajuuden yläpuolella ja useamman äänilähteen ympäristöissä. Kyseinen menetelmä hyödyntää mikrofoniasetelmalle ominaista, saapumissuunnasta ja taajuudesta riippuvaista laskostumiskuviota. Kuuntelukokeessa havaittiin, että korrelaatioon perustuva menetelmä voi tuoda huomattavan parannuksen syntetisoidun tilaäänikuvan laatuun verrattuna synteesiin laskostuneilla suunta-arvioilla

    Parametric coding of stereo audio

    Get PDF
    Parametric-stereo coding is a technique to efficiently code a stereo audio signal as a monaural signal plus a small amount of parametric overhead to describe the stereo image. The stereo properties are analyzed, encoded, and reinstated in a decoder according to spatial psychoacoustical principles. The monaural signal can be encoded using any (conventional) audio coder. Experiments show that the parameterized description of spatial properties enables a highly efficient, high-quality stereo audio representation

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    Data-driven time-frequency analysis of multivariate data

    No full text
    Empirical Mode Decomposition (EMD) is a data-driven method for the decomposition and time-frequency analysis of real world nonstationary signals. Its main advantages over other time-frequency methods are its locality, data-driven nature, multiresolution-based decomposition, higher time-frequency resolution and its ability to capture oscillation of any type (nonharmonic signals). These properties have made EMD a viable tool for real world nonstationary data analysis. Recent advances in sensor and data acquisition technologies have brought to light new classes of signals containing typically several data channels. Currently, such signals are almost invariably processed channel-wise, which is suboptimal. It is, therefore, imperative to design multivariate extensions of the existing nonlinear and nonstationary analysis algorithms as they are expected to give more insight into the dynamics and the interdependence between multiple channels of such signals. To this end, this thesis presents multivariate extensions of the empirical mode de- composition algorithm and illustrates their advantages with regards to multivariate non- stationary data analysis. Some important properties of such extensions are also explored, including their ability to exhibit wavelet-like dyadic filter bank structures for white Gaussian noise (WGN), and their capacity to align similar oscillatory modes from multiple data channels. Owing to the generality of the proposed methods, an improved multi- variate EMD-based algorithm is introduced which solves some inherent problems in the original EMD algorithm. Finally, to demonstrate the potential of the proposed methods, simulations on the fusion of multiple real world signals (wind, images and inertial body motion data) support the analysis

    Estimation of Interchannel Time Difference in Frequency Subbands Based on Nonuniform Discrete Fourier Transform

    No full text
    Binaural cue coding (BCC) is an efficient technique for spatial audio rendering by using the side information such as interchannel level difference (ICLD), interchannel time difference (ICTD), and interchannel correlation (ICC). Of the side information, the ICTD plays an important role to the auditory spatial image. However, inaccurate estimation of the ICTD may lead to the audio quality degradation. In this paper, we develop a novel ICTD estimation algorithm based on the nonuniform discrete Fourier transform (NDFT) and integrate it with the BCC approach to improve the decoded auditory image. Furthermore, a new subjective assessment method is proposed for the evaluation of auditory image widths of decoded signals. The test results demonstrate that the NDFT-based scheme can achieve much wider and more externalized auditory image than the existing BCC scheme based on the discrete Fourier transform (DFT). It is found that the present technique, regardless of the image width, does not deteriorate the sound quality at the decoder compared to the traditional scheme without ICTD estimation

    Estimation of Interchannel Time Difference in Frequency Subbands Based on Nonuniform Discrete Fourier Transform

    No full text
    <p/> <p>Binaural cue coding (BCC) is an efficient technique for spatial audio rendering by using the side information such as interchannel level difference (ICLD), interchannel time difference (ICTD), and interchannel correlation (ICC). Of the side information, the ICTD plays an important role to the auditory spatial image. However, inaccurate estimation of the ICTD may lead to the audio quality degradation. In this paper, we develop a novel ICTD estimation algorithm based on the nonuniform discrete Fourier transform (NDFT) and integrate it with the BCC approach to improve the decoded auditory image. Furthermore, a new subjective assessment method is proposed for the evaluation of auditory image widths of decoded signals. The test results demonstrate that the NDFT-based scheme can achieve much wider and more externalized auditory image than the existing BCC scheme based on the discrete Fourier transform (DFT). It is found that the present technique, regardless of the image width, does not deteriorate the sound quality at the decoder compared to the traditional scheme without ICTD estimation. </p

    Quantum mechanical transport in submicron electronic devices

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1991.Includes bibliographical references.by Philip Frederick Bagwell.Ph.D
    corecore