1,861 research outputs found

    Refining a Phase Vocoder for Vocal Modulation

    Get PDF
    Vocal harmonies are a highly sought-after effect in the music industry, as they allow singers to portray more emotion and meaning through their voices. The chords one hears when listening to nearly any modern song are constructed through common ratios of frequencies (e.g., the recipe for a major triad is 4:5:6). Currently, vocal melodies are only readily obtainable through a few methods, including backup singers, looper-effects systems, and post-process overdubbing. The issue with these is that there is currently no publicly-available code that allows solo-artists to modulate input audio to whatever chord structure is desired while maintaining the same duration and timbre in the successive layers. This thesis plans to address this issue using the phase vocoder method. If this modulation technique is successful, this could revolutionize the way vocalists perform. The introduction of real-time self harmonization would allow artists to have access to emphasized lyrical phrases and vocals without needing to hire and train backup vocalists. This phase vocoder would also allow for more vocal improvisation, as the individual would only need to know how to harmonize with themselves and would thus not be relying on interpreting how backup vocalists plan on moving the melody when creating more spontaneously

    Developing a flexible and expressive realtime polyphonic wave terrain synthesis instrument based on a visual and multidimensional methodology

    Get PDF
    The Jitter extended library for Max/MSP is distributed with a gamut of tools for the generation, processing, storage, and visual display of multidimensional data structures. With additional support for a wide range of media types, and the interaction between these mediums, the environment presents a perfect working ground for Wave Terrain Synthesis. This research details the practical development of a realtime Wave Terrain Synthesis instrument within the Max/MSP programming environment utilizing the Jitter extended library. Various graphical processing routines are explored in relation to their potential use for Wave Terrain Synthesis

    Spectro-Temporal Analysis of Auscultatory Sounds

    Get PDF

    Models and experiments of binaural interactions

    Get PDF
    Dizertační práce popisuje modely a experimenty binaurální interakce se zaměřením na lidské slyšení. Prezentovány jsou modely mediální a laterální superior olivy~(MSO a LSO) fungujících na rate-code principu. Tyto modely jsou inspirovány nedávnými objevy v neurofyziologii a byly publikovány v Bouse et al., J. Acoust. Soc. Am. 2019. Modely navíc obsahují centrální části dekódující interaurální časové diference a interaurální úrovňové diference (ITD a ILD). Tyto části jsou pak schopné vyjádřit subjektivní lateralizaci v absolutních číslech. Predikce jak MSO, tak LSO modelu jsou porovnávány se subjektivními daty tj. lateralizací čistých tónů a úzkopásmových šumů, diskriminací ITD a ILD a diskriminací phase warpu. Jak lateralizační, tak diskriminační experimenty ukazují shodu mezi predikcemi modelů a subjektivními daty. Publikované modely jsou v této práci dále vylepšeny s cílem snížit výpočetní nároky modelů. Predikce vylepšených modelů jsou porovnány se subjektivními daty a predikcemi původních modelů na stejných testovacích datech. Dodatečně je ještě přidán experiment s čistými tóny s ITD versus IPD (interaurální fázová diference). Obě verze modelů ukazují dobrou shodu s lateralizačními i diskriminačními subjektivními daty. V některých případech vykazují nové modely lepší výsledky než modely původní. Experimenty binaurální interakce popisované v této dizertační práci jsou lateralizační experiment s 1-ERB (ekvivalentní pravoúhlá šířka pásma) širokými úzkopásmovými šumy s IPD nebo ILD a subjektivní hodnocení kvality metod odstraňujících rušení z DHRTF (differential head related transfer function), publikovány v Bouse et al., J. Acoust. Soc. Am 2019, a Storek et al., J. Audio Eng. Soc. 2016.This dissertation thesis presents models and experiments of binaural interactions in human hearing. The rate-code models of medial and lateral superior olives (MSO and LSO) are presented. The models are inspired by recent neurophysiological findings and published in Bouse et al., J. Acoust. Soc. Am. 2019. A feature of these models is that they contain central stages of interaural time difference and interaural level difference (ITD and ILD) processing. These stages give subjective lateralization expressed in absolute numbers. The predictions made by both MSO and LSO models are compared with subjective data on the lateralization of pure tones and narrow band noises, discrimination of the ITD and ILD, and discrimination of the phase warp. The lateralization and discrimination experiments show good agreement with the subjective data.The published models are further improved in this thesis to reduce computational demands. The improved model predictions are compared with both subjective experiments and former models data from the same test pool. Additionally, lateralization pure tone experiment on ITD versus IPD (interaural phase difference) was added to the test pool. Both versions of the models show good agreement with lateralization and discrimination subjective data. In some cases, new models show better performance than the old ones. The experiments of binaural interactions shown in this thesis are lateralization of 1-ERB (equivalent rectangular bandwidth) wide narrow band noises with IPD or ILD, and audible quality assessment of DHRTF (differential head related transfer functions) artifact reduction methods, presented in Bouse et al., J. Acoust. Soc. Am 2019, and Storek et al., J. Audio Eng. Soc. 2016

    A Phase Vocoder based on Nonstationary Gabor Frames

    Full text link
    We propose a new algorithm for time stretching music signals based on the theory of nonstationary Gabor frames (NSGFs). The algorithm extends the techniques of the classical phase vocoder (PV) by incorporating adaptive time-frequency (TF) representations and adaptive phase locking. The adaptive TF representations imply good time resolution for the onsets of attack transients and good frequency resolution for the sinusoidal components. We estimate the phase values only at peak channels and the remaining phases are then locked to the values of the peaks in an adaptive manner. During attack transients we keep the stretch factor equal to one and we propose a new strategy for determining which channels are relevant for reinitializing the corresponding phase values. In contrast to previously published algorithms we use a non-uniform NSGF to obtain a low redundancy of the corresponding TF representation. We show that with just three times as many TF coefficients as signal samples, artifacts such as phasiness and transient smearing can be greatly reduced compared to the classical PV. The proposed algorithm is tested on both synthetic and real world signals and compared with state of the art algorithms in a reproducible manner.Comment: 10 pages, 6 figure

    Doctor of Philosophy

    Get PDF
    dissertationHearing aids suffer from the problem of acoustic feedback that limits the gain provided by hearing aids. Moreover, the output sound quality of hearing aids may be compromised in the presence of background acoustic noise. Digital hearing aids use advanced signal processing to reduce acoustic feedback and background noise to improve the output sound quality. However, it is known that the output sound quality of digital hearing aids deteriorates as the hearing aid gain is increased. Furthermore, popular subband or transform domain digital signal processing in modern hearing aids introduces analysis-synthesis delays in the forward path. Long forward-path delays are not desirable because the processed sound combines with the unprocessed sound that arrives at the cochlea through the vent and changes the sound quality. In this dissertation, we employ a variable, frequency-dependent gain function that is lower at frequencies of the incoming signal where the information is perceptually insignificant. In addition, the method of this dissertation automatically identifies and suppresses residual acoustical feedback components at frequencies that have the potential to drive the system to instability. The suppressed frequency components are monitored and the suppression is removed when such frequencies no longer pose a threat to drive the hearing aid system into instability. Together, the method of this dissertation provides more stable gain over traditional methods by reducing acoustical coupling between the microphone and the loudspeaker of a hearing aid. In addition, the method of this dissertation performs necessary hearing aid signal processing with low-delay characteristics. The central idea for the low-delay hearing aid signal processing is a spectral gain shaping method (SGSM) that employs parallel parametric equalization (EQ) filters. Parameters of the parametric EQ filters and associated gain values are selected using a least-squares approach to obtain the desired spectral response. Finally, the method of this dissertation switches to a least-squares adaptation scheme with linear complexity at the onset of howling. The method adapts to the altered feedback path quickly and allows the patient to not lose perceivable information. The complexity of the least-squares estimate is reduced by reformulating the least-squares estimate into a Toeplitz system and solving it with a direct Toeplitz solver. The increase in stable gain over traditional methods and the output sound quality were evaluated with psychoacoustic experiments on normal-hearing listeners with speech and music signals. The results indicate that the method of this dissertation provides 8 to 12 dB more hearing aid gain than feedback cancelers with traditional fixed gain functions. Furthermore, experimental results obtained with real world hearing aid gain profiles indicate that the method of this dissertation provides less distortion in the output sound quality than classical feedback cancelers, enabling the use of more comfortable style hearing aids for patients with moderate to profound hearing loss. Extensive MATLAB simulations and subjective evaluations of the results indicate that the method of this dissertation exhibits much smaller forward-path delays with superior howling suppression capability

    Singing voice resynthesis using concatenative-based techniques

    Get PDF
    Tese de Doutoramento. Engenharia Informática. Faculdade de Engenharia. Universidade do Porto. 201

    Listening to the magnetosphere: How best to make ULF waves audible

    Get PDF
    Observations across the heliosphere typically rely on in situ spacecraft observations producing time-series data. While often this data is analysed visually, it lends itself more naturally to our sense of sound. The simplest method of converting oscillatory data into audible sound is audification—a one-to-one mapping of data samples to audio samples—which has the benefit that no information is lost, thus is a true representation of the original data. However, audification can make some magnetospheric ULF waves observations pass by too quickly for someone to realistically be able to listen to effectively. For this reason, we detail various existing audio time scale modification techniques developed for music, applying these to ULF wave observations by spacecraft and exploring how they affect the properties of the resulting audio. Through a public dialogue we arrive at recommendations for ULF wave researchers on rendering these waves audible and discuss the scientific and educational possibilities of these new methods
    corecore