1,227 research outputs found

    SWIPE: A Sawtooth Waveform Inspired Pitch Estimator for Speech and Music

    Get PDF
    Se encuentra disponible en:http://www.cise.ufl.edu/~acamacho/publications/dissertation.pdfA Sawtooth Waveform Inspired Pitch Estimator (SWIPE) has been developed for processing speech and music. SWIPE is shown to outperform existing algorithms on several publicly available speech/musical-instruments databases and a disordered speech database. SWIPE estimates the pitch as the fundamental frequency of the sawtooth waveform whose spectrum best matches the spectrum of the input signal. A decaying cosine kernel provides an extension to older frequency-based, sieve-type estimation algorithms by providing smooth peaks with decaying amplitudes to correlate with the harmonics of the signal. An improvement on the algorithm is achieved by using only the first and prime harmonics, which significantly reduces subharmonic errors commonly found in other pitch estimation algorithms.UCR::Vicerrectoría de Investigación::Unidades de Investigación::Ingeniería::Centro de Investigaciones en Tecnologías de Información y Comunicación (CITIC

    Automatic acoustic analysis of waveform perturbations

    Get PDF

    Exploiting correlogram structure for robust speech recognition with multiple speech sources

    Get PDF
    This paper addresses the problem of separating and recognising speech in a monaural acoustic mixture with the presence of competing speech sources. The proposed system treats sound source separation and speech recognition as tightly coupled processes. In the first stage sound source separation is performed in the correlogram domain. For periodic sounds, the correlogram exhibits symmetric tree-like structures whose stems are located on the delay that corresponds to multiple pitch periods. These pitch-related structures are exploited in the study to group spectral components at each time frame. Local pitch estimates are then computed for each spectral group and are used to form simultaneous pitch tracks for temporal integration. These processes segregate a spectral representation of the acoustic mixture into several time-frequency regions such that the energy in each region is likely to have originated from a single periodic sound source. The identified time-frequency regions, together with the spectral representation, are employed by a `speech fragment decoder' which employs `missing data' techniques with clean speech models to simultaneously search for the acoustic evidence that best matches model sequences. The paper presents evaluations based on artificially mixed simultaneous speech utterances. A coherence-measuring experiment is first reported which quantifies the consistency of the identified fragments with a single source. The system is then evaluated in a speech recognition task and compared to a conventional fragment generation approach. Results show that the proposed system produces more coherent fragments over different conditions, which results in significantly better recognition accuracy

    A Phase Vocoder based on Nonstationary Gabor Frames

    Full text link
    We propose a new algorithm for time stretching music signals based on the theory of nonstationary Gabor frames (NSGFs). The algorithm extends the techniques of the classical phase vocoder (PV) by incorporating adaptive time-frequency (TF) representations and adaptive phase locking. The adaptive TF representations imply good time resolution for the onsets of attack transients and good frequency resolution for the sinusoidal components. We estimate the phase values only at peak channels and the remaining phases are then locked to the values of the peaks in an adaptive manner. During attack transients we keep the stretch factor equal to one and we propose a new strategy for determining which channels are relevant for reinitializing the corresponding phase values. In contrast to previously published algorithms we use a non-uniform NSGF to obtain a low redundancy of the corresponding TF representation. We show that with just three times as many TF coefficients as signal samples, artifacts such as phasiness and transient smearing can be greatly reduced compared to the classical PV. The proposed algorithm is tested on both synthetic and real world signals and compared with state of the art algorithms in a reproducible manner.Comment: 10 pages, 6 figure

    DESIGN AND EVALUATION OF HARMONIC SPEECH ENHANCEMENT AND BANDWIDTH EXTENSION

    Get PDF
    Improving the quality and intelligibility of speech signals continues to be an important topic in mobile communications and hearing aid applications. This thesis explored the possibilities of improving the quality of corrupted speech by cascading a log Minimum Mean Square Error (logMMSE) noise reduction system with a Harmonic Speech Enhancement (HSE) system. In HSE, an adaptive comb filter is deployed to harmonically filter the useful speech signal and suppress the noisy components to noise floor. A Bandwidth Extension (BWE) algorithm was applied to the enhanced speech for further improvements in speech quality. Performance of this algorithm combination was evaluated using objective speech quality metrics across a variety of noisy and reverberant environments. Results showed that the logMMSE and HSE combination enhanced the speech quality in any reverberant environment and in the presence of multi-talker babble. The objective improvements associated with the BWE were found to be minima

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

    Object coding of music using expressive MIDI

    Get PDF
    PhDStructured audio uses a high level representation of a signal to produce audio output. When it was first introduced in 1998, creating a structured audio representation from an audio signal was beyond the state-of-the-art. Inspired by object coding and structured audio, we present a system to reproduce audio using Expressive MIDI, high-level parameters being used to represent pitch expression from an audio signal. This allows a low bit-rate MIDI sketch of the original audio to be produced. We examine optimisation techniques which may be suitable for inferring Expressive MIDI parameters from estimated pitch trajectories, considering the effect of data codings on the difficulty of optimisation. We look at some less common Gray codes and examine their effect on algorithm performance on standard test problems. We build an expressive MIDI system, estimating parameters from audio and synthesising output from those parameters. When the parameter estimation succeeds, we find that the system produces note pitch trajectories which match source audio to within 10 pitch cents. We consider the quality of the system in terms of both parameter estimation and the final output, finding that improvements to core components { audio segmentation and pitch estimation, both active research fields { would produce a better system. We examine the current state-of-the-art in pitch estimation, and find that some estimators produce high precision estimates but are prone to harmonic errors, whilst other estimators produce fewer harmonic errors but are less precise. Inspired by this, we produce a novel pitch estimator combining the output of existing estimators

    Psychophysical and signal-processing aspects of speech representation

    Get PDF

    Implementation and optimization of the synthesis of musical instrument tones using frequency modulation

    Get PDF
    Im Bereich der elektronischen Musik hat die Frequenzmodulation (FM) als eine effiziente Methode zur Klangsynthese in jüngster Zeit enorm an Bedeutung gewonnen. In der vorliegenden Arbeit werden Methoden zur Grundfrequenzschätzung und zur FM-Synthese für Musikinstrumentenklänge untersucht, bewertet und optimiert. Dazu wurde im Rahmen dieser Arbeit eine FM Analyse- und Syntheseumgebung entwickelt, in welcher die hier betrachteten Verfahren implementiert wurden. Zur Grundfrequenzschätzung in Musiksignalen wurde ein neuartiges Verfahren auf Basis von Harmonic Pattern Match (HPM) entwickelt, welches eine höhere Schätzungsgenauigkeit als bisher verwendete Verfahren bietet. Hierzu wird nach Festlegung einer geeigneten Teilmenge der Spektraldaten die Autokorrelation sowohl im Zeitals auch im Frequenzbereich analysiert, um Kandidaten für die Grundfrequenz des Signals zu bestimmen. Anschließend wird die Übereinstimmung jedes dieser Kandidaten mit dem Profil der Harmonischen des Musiksignals nach einem effizienten Verfahren analysiert. Das vorgeschlagene Verfahren wurde analysiert und im Kontext mit anderen Verfahren zur Grundfrequenzschätzung bewertet. Die praktische Anwendbarkeit des HPM Verfahrens konnte gezeigt werden. Zur Implementierung einer FM Synthese wird ein Verfahren zur Approximation eines Spektrums auf Basis Genetischer Algorithmen (GA) vorgestellt. Die Problemstellung des GA einschließlich eines Verfahrens zur Bestimmung optimaler FMParameter wird beschrieben. Des Weiteren wurden im Hinblick auf eine optimierte FM-Synthese die Anforderungen an das Trägersignal sowie an den Modulator untersucht, mit dem Ziel einer Vorab-Festlegung des Parameterraums für akkurate Syntheseresultate. Mit dem Ziel einer Datenreduktion bei der FM-Synthese wurde eine stückweise lineare Approximation der Einhüllenden des Trägersignals entwickelt. Einen weiteren Aspekt der Optimierung stellt die Verknüpfung von Formanten in der Matching-Prozedur dar, wobei die Harmonischen der Formanten mit entsprechenden Faktoren gewichtet werden. Auf diese Weise wird eine deutlich genauere Approximation des Timbres des zu synthetisierenden Klangs erreicht. Hierzu wurden die Schätzung der spektralen Einhüllenden und die Extraktion der Formanten analysiert und implementiert. Die im Rahmen dieser Arbeit entwickelte Testumgebung ermöglicht die Schätzung der Parameter und die Analyse und Bewertung der so erzeugten FM-Syntheseresultate.Frequency modulation (FM) as an efficient method to synthesize musical sounds is of great importance in the area of computer music. In this thesis, the estimation of fundamental frequency, the FM synthesis procedure of musical instrument tones and the optimization on FM synthesis were analysed, evaluated, improved and implemented. A FM analysis and synthesis environment was developed, in which the presented work in this thesis were implemented. For the estimation of fundamental frequency of music signals, an algorithm based on harmonic pattern match (HPM) was designed to achieve more reliable estimation accuracy. After defining the spectrum subset, the autocorrelation was applied on the spectrum subset to exploiting candidates of fundamental frequency, and an efficient mechanism to evaluate the match between each candidate and the harmonic pattern of the musical signal was designed. Evaluation of the proposed algorithm and several other estimation algorithms was performed. For the implementation of FM synthesis, the matching procedure of spectra using genetic algorithm (GA) was described, including the definition of the task in GA and the searching procedure of optimized FM parameters through GA. For the optimization on FM synthesis, the requirements of carrier and modulator were analysed and the parameter space was examined, based on which a method for the predetermination of parameter space was designed to achieve accurate synthesis results. For data reduction in FM synthesis, the piecewise linear approximation of the carrier amplitude envelope was designed. Further step on the FM synthesis optimization was implemented by the combination of formants in the spectra matching procedure, in which the formant harmonics were emphasized by the weighting coefficients to achieve more accurate timbre of the synthesized sounds. The spectral envelope estimation and the formant extraction were analysed and implemented. For the analysis and implementation of FM synthesis, a testing environment program was developed, offering the functionality of parameter estimation and performance evaluation in FM synthesis
    corecore