2,117 research outputs found

    Wavenet based low rate speech coding

    Full text link
    Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.Comment: 5 pages, 2 figure

    Perceptually smooth timbral guides by state-space analysis of phase-vocoder parameters

    Get PDF
    Sculptor is a phase-vocoder-based package of programs that allows users to explore timbral manipulation of sound in real time. It is the product of a research program seeking ultimately to perform gestural capture by analysis of the sound a performer makes using a conventional instrument. Since the phase-vocoder output is of high dimensionality — typically more than 1,000 channels per analysis frame—mapping phase-vocoder output to appropriate input parameters for a synthesizer is only feasible in theory

    Techniques for the Regeneration of Wideband Speech from Narrowband Speech

    Get PDF
    This paper addresses the problem of reconstructing wideband speech signals from observed narrowband speech signals. The goal of this work is to improve the perceived quality of speech signals which have been transmitted through narrowband channels or degraded during acquisition. We describe a system, based on linear predictive coding, for estimating wideband speech from narrowband. This system employs both previously identified and novel techniques. Experimental results are provided in order to illustrate the system’s ability to improve speech quality. Both objective and subjective criteria are used to evaluate the quality of the processed speech signals

    Uses of the pitch-scaled harmonic filter in speech processing

    No full text
    The pitch-scaled harmonic filter (PSHF) is a technique for decomposing speech signals into their periodic and aperiodic constituents, during periods of phonation. In this paper, the use of the PSHF for speech analysis and processing tasks is described. The periodic component can be used as an estimate of the part attributable to voicing, and the aperiodic component can act as an estimate of that attributable to turbulence noise, i.e., from fricative, aspiration and plosive sources. Here we present the algorithm for separating the periodic and aperiodic components from the pitch-scaled Fourier transform of a short section of speech, and show how to derive signals suitable for time-series analysis and for spectral analysis. These components can then be processed in a manner appropriate to their source type, for instance, extracting zeros as well as poles from the aperiodic spectral envelope. A summary of tests on synthetic speech-like signals demonstrates the robustness of the PSHF's performance to perturbations from additive noise, jitter and shimmer. Examples are given of speech analysed in various ways: power spectrum, short-time power and short-time harmonics-to-noise ratio, linear prediction and mel-frequency cepstral coefficients. Besides being valuable for speech production and perception studies, the latter two analyses show potential for incorporation into speech coding and speech recognition systems. Further uses of the PSHF are revealing normally-obscured acoustic features, exploring interactions of turbulence-noise sources with voicing, and pre-processing speech to enhance subsequent operations

    Audio Analysis/synthesis System

    Get PDF
    A method and apparatus for the automatic analysis, synthesis and modification of audio signals, based on an overlap-add sinusoidal model, is disclosed. Automatic analysis of amplitude, frequency and phase parameters of the model is achieved using an analysis-by-synthesis procedure which incorporates successive approximation, yielding synthetic waveforms which are very good approximations to the original waveforms and are perceptually identical to the original sounds. A generalized overlap-add sinusoidal model is introduced which can modify audio signals without objectionable artifacts. In addition, a new approach to pitch-scale modification allows for the use of arbitrary spectral envelope estimates and addresses the problems of high-frequency loss and noise amplification encountered with prior art methods. The overlap-add synthesis method provides the ability to synthesize sounds with computational efficiency rivaling that of synthesis using the discrete short-time Fourier transform (DSTFT) while eliminating the modification artifacts associated with that method.Georgia Tech Research Corporatio

    Computer Models for Musical Instrument Identification

    Get PDF
    PhDA particular aspect in the perception of sound is concerned with what is commonly termed as texture or timbre. From a perceptual perspective, timbre is what allows us to distinguish sounds that have similar pitch and loudness. Indeed most people are able to discern a piano tone from a violin tone or able to distinguish different voices or singers. This thesis deals with timbre modelling. Specifically, the formant theory of timbre is the main theme throughout. This theory states that acoustic musical instrument sounds can be characterised by their formant structures. Following this principle, the central point of our approach is to propose a computer implementation for building musical instrument identification and classification systems. Although the main thrust of this thesis is to propose a coherent and unified approach to the musical instrument identification problem, it is oriented towards the development of algorithms that can be used in Music Information Retrieval (MIR) frameworks. Drawing on research in speech processing, a complete supervised system taking into account both physical and perceptual aspects of timbre is described. The approach is composed of three distinct processing layers. Parametric models that allow us to represent signals through mid-level physical and perceptual representations are considered. Next, the use of the Line Spectrum Frequencies as spectral envelope and formant descriptors is emphasised. Finally, the use of generative and discriminative techniques for building instrument and database models is investigated. Our system is evaluated under realistic recording conditions using databases of isolated notes and melodic phrases

    Sparsity in Linear Predictive Coding of Speech

    Get PDF
    nrpages: 197status: publishe
    • …
    corecore