6 research outputs found

    Graafinen ekvalisointi taajuusvarpattujen digitaalisten suotimien avulla

    Get PDF
    The aim of this thesis is to design a graphic equalizer with frequency warped digital filters. The proposed design consists of a warped FIR filter for the low frequency bands and a standard FIR filter for the high frequency bands. This de- sign is used to implement both an octave and a one-third octave equalizer in Matlab. Low frequency equalization with FIR filters requires high filter orders. The frequency resolution of the lowest band of the graphic equalizer requires filter orders that are impractical for real life applications. With frequency warping filter orders can be lowered, so that a practical graphic equalizer can be designed. With this design common gain build-up problems, which are present in most of the IIR designs, can be avoided. The proposed equalizer design is found to be accurate and comparable to the previous equalizer designs. Filter orders required are small enough to this design to be used in real life applications. The gain build-up problem is avoided in this design, as several equalizer bands are filtered with a single filter. The computational costs of the design are higher than the costs of the other compared designs. However, the difference can be smaller if the accuracy restrictions are lowered.Tämän työn tavoitteena on suunnitella graafinen ekvalisaattori taajuusvarpattujen digitaalisten suotimien avulla. Ehdotettu ekvalisaattorimalli koostuu taajuusvarpatusta ja tavallisesta FIR suotimesta. Varpattua suodinta käytetään alimpien taajuuskaistojen suodattamiseen ja tavallista FIR suodinta ylimpien kaistojen suodattamiseen. Tätä mallia käytetään sekä oktaavi- että terssikaista-ekvalisaattorien totetutamiseen Matlabilla. Matalien taajuuksien ekvalisointi edellyttää korkeaa astelukua FIR suotimilta. Alimpien taajuuskaistojen taajuusresoluutio edellyttää astelukuja, jotka ovat epäkäytännöllisiä tosielämän sovelluksissa. Taajuusvarppauksella suotimien astelukuja voidaan pienentää, jolloin graafinen ekvalisaattori voidaan toteuttaa käytännössä. Tällä mallilla voidaan välttää IIR ekvalisaattorien yleinen ongelma, jossa ekvalisaattorien kaistojen vahvistus vaikuttaa viereisiin kaistoihin. Ehdotettu ekvalisaattorimalli todetaan olevan tarkka ja vertailukelpoinen aikaisempien toteutuksien kanssa. Suotimien asteluvut ovat tarpeeksi pieniä, jotta tätä mallia voidaan käyttää tosielämän toteutuksissa. Kaistojen välinen vaikutus vältetään tällä mallilla, sillä useampi kaista suodatetaan yhdellä suotimella. Laskennallinen kuorma on tällä toteutuksella suurempi kuin muilla vertailluilla toteutuksilla. Eroa voidaan pienentää, jos ekvalisaattorin tarkkuusvaatimuksia lasketaan

    Reality (sound)bites: Audio tricks from the film and TV studio

    Get PDF
    Proceedings of the 9th International Conference on Auditory Display (ICAD), Boston, MA, July 7-9, 2003.In the example-filled session that accompanies this paper, we'll listen to some of the ways sound designers fix–-or sometimes, break–-voices, music, and effects to help serve a director's vision. We'll start with how phoneme-level editing can change content, affect a dialect, or merge one voice with a completely different one. If time permits, we'll give some quick examples of how individual cookbook processes, such as equalization or delay, can be used in creative ways to change the nature of a sound. Finally, we'll examine how these processes are strung together in unusual ways, to simulate everything from an airplane interior to the sound of a classroom movie projector

    Deep Learning for Audio Effects Modeling

    Get PDF
    PhD Thesis.Audio effects modeling is the process of emulating an audio effect unit and seeks to recreate the sound, behaviour and main perceptual features of an analog reference device. Audio effect units are analog or digital signal processing systems that transform certain characteristics of the sound source. These transformations can be linear or nonlinear, time-invariant or time-varying and with short-term and long-term memory. Most typical audio effect transformations are based on dynamics, such as compression; tone such as distortion; frequency such as equalization; and time such as artificial reverberation or modulation based audio effects. The digital simulation of these audio processors is normally done by designing mathematical models of these systems. This is often difficult because it seeks to accurately model all components within the effect unit, which usually contains mechanical elements together with nonlinear and time-varying analog electronics. Most existing methods for audio effects modeling are either simplified or optimized to a very specific circuit or type of audio effect and cannot be efficiently translated to other types of audio effects. This thesis aims to explore deep learning architectures for music signal processing in the context of audio effects modeling. We investigate deep neural networks as black-box modeling strategies to solve this task, i.e. by using only input-output measurements. We propose different DSP-informed deep learning models to emulate each type of audio effect transformations. Through objective perceptual-based metrics and subjective listening tests we explore the performance of these models when modeling various analog audio effects. Also, we analyze how the given tasks are accomplished and what the models are actually learning. We show virtual analog models of nonlinear effects, such as a tube preamplifier; nonlinear effects with memory, such as a transistor-based limiter; and electromechanical nonlinear time-varying effects, such as a Leslie speaker cabinet and plate and spring reverberators. We report that the proposed deep learning architectures represent an improvement of the state-of-the-art in black-box modeling of audio effects and the respective directions of future work are given

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    Analogue filter networks: developments in theory, design and analyses

    Get PDF
    Not availabl
    corecore