1,227 research outputs found
SWIPE: A Sawtooth Waveform Inspired Pitch Estimator for Speech and Music
Se encuentra disponible en:http://www.cise.ufl.edu/~acamacho/publications/dissertation.pdfA Sawtooth Waveform Inspired Pitch Estimator (SWIPE) has been developed for processing speech and music. SWIPE is shown to outperform existing algorithms on several publicly available speech/musical-instruments databases and a disordered speech database. SWIPE estimates the pitch as the fundamental frequency of the sawtooth waveform whose spectrum best matches the spectrum of the input signal. A decaying cosine kernel provides an extension to older frequency-based, sieve-type estimation algorithms by providing smooth peaks with decaying amplitudes to correlate with the harmonics of the signal. An improvement on the algorithm is achieved by using only the first and prime harmonics, which significantly reduces subharmonic errors commonly found in other pitch estimation algorithms.UCR::Vicerrectoría de Investigación::Unidades de Investigación::Ingeniería::Centro de Investigaciones en Tecnologías de Información y Comunicación (CITIC
Exploiting correlogram structure for robust speech recognition with multiple speech sources
This paper addresses the problem of separating and recognising speech in a monaural acoustic mixture with the presence of competing speech sources. The proposed system treats sound source separation and speech recognition as
tightly coupled processes. In the first stage sound source separation is performed in the correlogram domain. For periodic sounds, the correlogram exhibits symmetric tree-like structures whose stems are located on the delay
that corresponds to multiple pitch periods. These pitch-related structures are exploited in the study to group spectral components at each time frame. Local
pitch estimates are then computed for each spectral group and are used to form simultaneous pitch tracks for temporal integration. These processes segregate a spectral representation of the acoustic mixture into several time-frequency regions such that the energy in each region is likely to have originated from a single periodic sound source. The identified time-frequency regions, together
with the spectral representation, are employed by a `speech fragment decoder' which employs `missing data' techniques with clean speech models to simultaneously search for the acoustic evidence that best matches model sequences. The paper presents evaluations based on artificially mixed simultaneous speech utterances. A coherence-measuring experiment is first reported which quantifies the consistency of the identified fragments with a single source. The system is then evaluated in a speech recognition task and compared to a conventional fragment generation approach. Results show that the proposed system produces more coherent fragments over different conditions,
which results in significantly better recognition accuracy
A Phase Vocoder based on Nonstationary Gabor Frames
We propose a new algorithm for time stretching music signals based on the
theory of nonstationary Gabor frames (NSGFs). The algorithm extends the
techniques of the classical phase vocoder (PV) by incorporating adaptive
time-frequency (TF) representations and adaptive phase locking. The adaptive TF
representations imply good time resolution for the onsets of attack transients
and good frequency resolution for the sinusoidal components. We estimate the
phase values only at peak channels and the remaining phases are then locked to
the values of the peaks in an adaptive manner. During attack transients we keep
the stretch factor equal to one and we propose a new strategy for determining
which channels are relevant for reinitializing the corresponding phase values.
In contrast to previously published algorithms we use a non-uniform NSGF to
obtain a low redundancy of the corresponding TF representation. We show that
with just three times as many TF coefficients as signal samples, artifacts such
as phasiness and transient smearing can be greatly reduced compared to the
classical PV. The proposed algorithm is tested on both synthetic and real world
signals and compared with state of the art algorithms in a reproducible manner.Comment: 10 pages, 6 figure
DESIGN AND EVALUATION OF HARMONIC SPEECH ENHANCEMENT AND BANDWIDTH EXTENSION
Improving the quality and intelligibility of speech signals continues to be an important topic in mobile communications and hearing aid applications. This thesis explored the possibilities of improving the quality of corrupted speech by cascading a log Minimum Mean Square Error (logMMSE) noise reduction system with a Harmonic Speech Enhancement (HSE) system. In HSE, an adaptive comb filter is deployed to harmonically filter the useful speech signal and suppress the noisy components to noise floor. A Bandwidth Extension (BWE) algorithm was applied to the enhanced speech for further improvements in speech quality. Performance of this algorithm combination was evaluated using objective speech quality metrics across a variety of noisy and reverberant environments. Results showed that the logMMSE and HSE combination enhanced the speech quality in any reverberant environment and in the presence of multi-talker babble. The objective improvements associated with the BWE were found to be minima
A Parametric Sound Object Model for Sound Texture Synthesis
This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed
Object coding of music using expressive MIDI
PhDStructured audio uses a high level representation of a signal to produce audio output.
When it was first introduced in 1998, creating a structured audio representation
from an audio signal was beyond the state-of-the-art. Inspired by object coding and
structured audio, we present a system to reproduce audio using Expressive MIDI,
high-level parameters being used to represent pitch expression from an audio signal.
This allows a low bit-rate MIDI sketch of the original audio to be produced.
We examine optimisation techniques which may be suitable for inferring Expressive
MIDI parameters from estimated pitch trajectories, considering the effect of data
codings on the difficulty of optimisation. We look at some less common Gray codes
and examine their effect on algorithm performance on standard test problems.
We build an expressive MIDI system, estimating parameters from audio and synthesising
output from those parameters. When the parameter estimation succeeds,
we find that the system produces note pitch trajectories which match source audio to
within 10 pitch cents. We consider the quality of the system in terms of both parameter
estimation and the final output, finding that improvements to core components {
audio segmentation and pitch estimation, both active research fields { would produce
a better system.
We examine the current state-of-the-art in pitch estimation, and find that some
estimators produce high precision estimates but are prone to harmonic errors, whilst
other estimators produce fewer harmonic errors but are less precise. Inspired by this,
we produce a novel pitch estimator combining the output of existing estimators
Implementation and optimization of the synthesis of musical instrument tones using frequency modulation
Im Bereich der elektronischen Musik hat die Frequenzmodulation (FM) als eine
effiziente Methode zur Klangsynthese in jüngster Zeit enorm an Bedeutung gewonnen.
In der vorliegenden Arbeit werden Methoden zur Grundfrequenzschätzung und
zur FM-Synthese für Musikinstrumentenklänge untersucht, bewertet und optimiert.
Dazu wurde im Rahmen dieser Arbeit eine FM Analyse- und Syntheseumgebung
entwickelt, in welcher die hier betrachteten Verfahren implementiert wurden.
Zur Grundfrequenzschätzung in Musiksignalen wurde ein neuartiges Verfahren auf
Basis von Harmonic Pattern Match (HPM) entwickelt, welches eine höhere Schätzungsgenauigkeit
als bisher verwendete Verfahren bietet. Hierzu wird nach Festlegung
einer geeigneten Teilmenge der Spektraldaten die Autokorrelation sowohl im Zeitals
auch im Frequenzbereich analysiert, um Kandidaten für die Grundfrequenz des
Signals zu bestimmen. Anschließend wird die Übereinstimmung jedes dieser Kandidaten
mit dem Profil der Harmonischen des Musiksignals nach einem effizienten
Verfahren analysiert. Das vorgeschlagene Verfahren wurde analysiert und im Kontext
mit anderen Verfahren zur Grundfrequenzschätzung bewertet. Die praktische
Anwendbarkeit des HPM Verfahrens konnte gezeigt werden.
Zur Implementierung einer FM Synthese wird ein Verfahren zur Approximation
eines Spektrums auf Basis Genetischer Algorithmen (GA) vorgestellt. Die Problemstellung
des GA einschließlich eines Verfahrens zur Bestimmung optimaler FMParameter
wird beschrieben. Des Weiteren wurden im Hinblick auf eine optimierte
FM-Synthese die Anforderungen an das Trägersignal sowie an den Modulator untersucht,
mit dem Ziel einer Vorab-Festlegung des Parameterraums für akkurate
Syntheseresultate. Mit dem Ziel einer Datenreduktion bei der FM-Synthese wurde
eine stückweise lineare Approximation der Einhüllenden des Trägersignals entwickelt.
Einen weiteren Aspekt der Optimierung stellt die Verknüpfung von Formanten in der
Matching-Prozedur dar, wobei die Harmonischen der Formanten mit entsprechenden
Faktoren gewichtet werden. Auf diese Weise wird eine deutlich genauere Approximation
des Timbres des zu synthetisierenden Klangs erreicht. Hierzu wurden
die Schätzung der spektralen Einhüllenden und die Extraktion der Formanten
analysiert und implementiert. Die im Rahmen dieser Arbeit entwickelte Testumgebung
ermöglicht die Schätzung der Parameter und die Analyse und Bewertung der
so erzeugten FM-Syntheseresultate.Frequency modulation (FM) as an efficient method to synthesize musical sounds is
of great importance in the area of computer music. In this thesis, the estimation
of fundamental frequency, the FM synthesis procedure of musical instrument tones
and the optimization on FM synthesis were analysed, evaluated, improved and implemented.
A FM analysis and synthesis environment was developed, in which the
presented work in this thesis were implemented.
For the estimation of fundamental frequency of music signals, an algorithm based on
harmonic pattern match (HPM) was designed to achieve more reliable estimation
accuracy. After defining the spectrum subset, the autocorrelation was applied on the
spectrum subset to exploiting candidates of fundamental frequency, and an efficient
mechanism to evaluate the match between each candidate and the harmonic pattern
of the musical signal was designed. Evaluation of the proposed algorithm and several
other estimation algorithms was performed.
For the implementation of FM synthesis, the matching procedure of spectra using
genetic algorithm (GA) was described, including the definition of the task in GA
and the searching procedure of optimized FM parameters through GA. For the optimization
on FM synthesis, the requirements of carrier and modulator were analysed
and the parameter space was examined, based on which a method for the predetermination
of parameter space was designed to achieve accurate synthesis results. For
data reduction in FM synthesis, the piecewise linear approximation of the carrier
amplitude envelope was designed.
Further step on the FM synthesis optimization was implemented by the combination
of formants in the spectra matching procedure, in which the formant harmonics
were emphasized by the weighting coefficients to achieve more accurate timbre of
the synthesized sounds. The spectral envelope estimation and the formant extraction
were analysed and implemented. For the analysis and implementation of FM
synthesis, a testing environment program was developed, offering the functionality
of parameter estimation and performance evaluation in FM synthesis
- …