1,036 research outputs found
Recommended from our members
Auditory Spectrum-Based Pitched Instrument Onset Detection
In this paper, a method for onset detection of music signals using auditory spectra is proposed. The auditory spectrogram provides a time-frequency representation that employs a sound processing model resembling the human auditory system. Recent work on onset detection employs DFT-based features describing spectral energy and phase differences, as well as pitch-based features. These features are often combined for maximizing detection performance. Here, the spectral flux and phase slope features are derived in the auditory framework and a novel fundamental frequency estimation algorithm based on auditory spectra is introduced. An onset detection algorithm is proposed, which processes and combines the aforementioned features at the decision level. Experiments are conducted on a dataset covering 11 pitched instrument types, consisting of 1829 onsets in total. Results indicate that auditory representations outperform various state-of-the-art approaches, with the onset detection algorithm reaching an F-measure of 82.6%
High-resolution sinusoidal analysis for resolving harmonic collisions in music audio signal processing
Many music signals can largely be considered an additive combination of
multiple sources, such as musical instruments or voice. If the musical sources
are pitched instruments, the spectra they produce are predominantly harmonic,
and are thus well suited to an additive sinusoidal model. However,
due to resolution limits inherent in time-frequency analyses, when the harmonics
of multiple sources occupy equivalent time-frequency regions, their
individual properties are additively combined in the time-frequency representation
of the mixed signal. Any such time-frequency point in a mixture
where multiple harmonics overlap produces a single observation from which
the contributions owed to each of the individual harmonics cannot be trivially
deduced. These overlaps are referred to as overlapping partials or harmonic
collisions. If one wishes to infer some information about individual sources in
music mixtures, the information carried in regions where collided harmonics
exist becomes unreliable due to interference from other sources. This interference
has ramifications in a variety of music signal processing applications
such as multiple fundamental frequency estimation, source separation, and
instrumentation identification.
This thesis addresses harmonic collisions in music signal processing applications.
As a solution to the harmonic collision problem, a class of signal
subspace-based high-resolution sinusoidal parameter estimators is explored.
Specifically, the direct matrix pencil method, or equivalently, the Estimation
of Signal Parameters via Rotational Invariance Techniques (ESPRIT)
method, is used with the goal of producing estimates of the salient parameters
of individual harmonics that occupy equivalent time-frequency regions. This
estimation method is adapted here to be applicable to time-varying signals
such as musical audio. While high-resolution methods have been previously
explored in the context of music signal processing, previous work has not
addressed whether or not such methods truly produce high-resolution sinusoidal parameter estimates in real-world music audio signals. Therefore, this
thesis answers the question of whether high-resolution sinusoidal parameter
estimators are really high-resolution for real music signals.
This work directly explores the capabilities of this form of sinusoidal parameter
estimation to resolve collided harmonics. The capabilities of this
analysis method are also explored in the context of music signal processing
applications. Potential benefits of high-resolution sinusoidal analysis are
examined in experiments involving multiple fundamental frequency estimation
and audio source separation. This work shows that there are indeed
benefits to high-resolution sinusoidal analysis in music signal processing applications,
especially when compared to methods that produce sinusoidal
parameter estimates based on more traditional time-frequency representations.
The benefits of this form of sinusoidal analysis are made most evident
in multiple fundamental frequency estimation applications, where substantial
performance gains are seen. High-resolution analysis in the context of
computational auditory scene analysis-based source separation shows similar
performance to existing comparable methods
Object coding of music using expressive MIDI
PhDStructured audio uses a high level representation of a signal to produce audio output.
When it was first introduced in 1998, creating a structured audio representation
from an audio signal was beyond the state-of-the-art. Inspired by object coding and
structured audio, we present a system to reproduce audio using Expressive MIDI,
high-level parameters being used to represent pitch expression from an audio signal.
This allows a low bit-rate MIDI sketch of the original audio to be produced.
We examine optimisation techniques which may be suitable for inferring Expressive
MIDI parameters from estimated pitch trajectories, considering the effect of data
codings on the difficulty of optimisation. We look at some less common Gray codes
and examine their effect on algorithm performance on standard test problems.
We build an expressive MIDI system, estimating parameters from audio and synthesising
output from those parameters. When the parameter estimation succeeds,
we find that the system produces note pitch trajectories which match source audio to
within 10 pitch cents. We consider the quality of the system in terms of both parameter
estimation and the final output, finding that improvements to core components {
audio segmentation and pitch estimation, both active research fields { would produce
a better system.
We examine the current state-of-the-art in pitch estimation, and find that some
estimators produce high precision estimates but are prone to harmonic errors, whilst
other estimators produce fewer harmonic errors but are less precise. Inspired by this,
we produce a novel pitch estimator combining the output of existing estimators
Feature Extraction for Music Information Retrieval
Copyright c © 2009 Jesper Højvang Jensen, except where otherwise stated
Computational Tonality Estimation: Signal Processing and Hidden Markov Models
PhDThis thesis investigates computational musical tonality estimation from an audio signal. We
present a hidden Markov model (HMM) in which relationships between chords and keys are
expressed as probabilities of emitting observable chords from a hidden key sequence. The model
is tested first using symbolic chord annotations as observations, and gives excellent global key
recognition rates on a set of Beatles songs.
The initial model is extended for audio input by using an existing chord recognition algorithm,
which allows it to be tested on a much larger database. We show that a simple model of the
upper partials in the signal improves percentage scores. We also present a variant of the HMM
which has a continuous observation probability density, but show that the discrete version gives
better performance.
Then follows a detailed analysis of the effects on key estimation and computation time of
changing the low level signal processing parameters. We find that much of the high frequency
information can be omitted without loss of accuracy, and significant computational savings can
be made by applying a threshold to the transform kernels. Results show that there is no single
ideal set of parameters for all music, but that tuning the parameters can make a difference to
accuracy.
We discuss methods of evaluating more complex tonal changes than a single global key, and
compare a metric that measures similarity to a ground truth to metrics that are rooted in music
retrieval. We show that the two measures give different results, and so recommend that the choice
of evaluation metric is determined by the intended application.
Finally we draw together our conclusions and use them to suggest areas for continuation of this
research, in the areas of tonality model development, feature extraction, evaluation methodology,
and applications of computational tonality estimation.Engineering and Physical
Sciences Research Council (EPSRC)
Implementation and optimization of the synthesis of musical instrument tones using frequency modulation
Im Bereich der elektronischen Musik hat die Frequenzmodulation (FM) als eine
effiziente Methode zur Klangsynthese in jüngster Zeit enorm an Bedeutung gewonnen.
In der vorliegenden Arbeit werden Methoden zur Grundfrequenzschätzung und
zur FM-Synthese für Musikinstrumentenklänge untersucht, bewertet und optimiert.
Dazu wurde im Rahmen dieser Arbeit eine FM Analyse- und Syntheseumgebung
entwickelt, in welcher die hier betrachteten Verfahren implementiert wurden.
Zur Grundfrequenzschätzung in Musiksignalen wurde ein neuartiges Verfahren auf
Basis von Harmonic Pattern Match (HPM) entwickelt, welches eine höhere Schätzungsgenauigkeit
als bisher verwendete Verfahren bietet. Hierzu wird nach Festlegung
einer geeigneten Teilmenge der Spektraldaten die Autokorrelation sowohl im Zeitals
auch im Frequenzbereich analysiert, um Kandidaten für die Grundfrequenz des
Signals zu bestimmen. Anschließend wird die Übereinstimmung jedes dieser Kandidaten
mit dem Profil der Harmonischen des Musiksignals nach einem effizienten
Verfahren analysiert. Das vorgeschlagene Verfahren wurde analysiert und im Kontext
mit anderen Verfahren zur Grundfrequenzschätzung bewertet. Die praktische
Anwendbarkeit des HPM Verfahrens konnte gezeigt werden.
Zur Implementierung einer FM Synthese wird ein Verfahren zur Approximation
eines Spektrums auf Basis Genetischer Algorithmen (GA) vorgestellt. Die Problemstellung
des GA einschließlich eines Verfahrens zur Bestimmung optimaler FMParameter
wird beschrieben. Des Weiteren wurden im Hinblick auf eine optimierte
FM-Synthese die Anforderungen an das Trägersignal sowie an den Modulator untersucht,
mit dem Ziel einer Vorab-Festlegung des Parameterraums für akkurate
Syntheseresultate. Mit dem Ziel einer Datenreduktion bei der FM-Synthese wurde
eine stückweise lineare Approximation der Einhüllenden des Trägersignals entwickelt.
Einen weiteren Aspekt der Optimierung stellt die Verknüpfung von Formanten in der
Matching-Prozedur dar, wobei die Harmonischen der Formanten mit entsprechenden
Faktoren gewichtet werden. Auf diese Weise wird eine deutlich genauere Approximation
des Timbres des zu synthetisierenden Klangs erreicht. Hierzu wurden
die Schätzung der spektralen Einhüllenden und die Extraktion der Formanten
analysiert und implementiert. Die im Rahmen dieser Arbeit entwickelte Testumgebung
ermöglicht die Schätzung der Parameter und die Analyse und Bewertung der
so erzeugten FM-Syntheseresultate.Frequency modulation (FM) as an efficient method to synthesize musical sounds is
of great importance in the area of computer music. In this thesis, the estimation
of fundamental frequency, the FM synthesis procedure of musical instrument tones
and the optimization on FM synthesis were analysed, evaluated, improved and implemented.
A FM analysis and synthesis environment was developed, in which the
presented work in this thesis were implemented.
For the estimation of fundamental frequency of music signals, an algorithm based on
harmonic pattern match (HPM) was designed to achieve more reliable estimation
accuracy. After defining the spectrum subset, the autocorrelation was applied on the
spectrum subset to exploiting candidates of fundamental frequency, and an efficient
mechanism to evaluate the match between each candidate and the harmonic pattern
of the musical signal was designed. Evaluation of the proposed algorithm and several
other estimation algorithms was performed.
For the implementation of FM synthesis, the matching procedure of spectra using
genetic algorithm (GA) was described, including the definition of the task in GA
and the searching procedure of optimized FM parameters through GA. For the optimization
on FM synthesis, the requirements of carrier and modulator were analysed
and the parameter space was examined, based on which a method for the predetermination
of parameter space was designed to achieve accurate synthesis results. For
data reduction in FM synthesis, the piecewise linear approximation of the carrier
amplitude envelope was designed.
Further step on the FM synthesis optimization was implemented by the combination
of formants in the spectra matching procedure, in which the formant harmonics
were emphasized by the weighting coefficients to achieve more accurate timbre of
the synthesized sounds. The spectral envelope estimation and the formant extraction
were analysed and implemented. For the analysis and implementation of FM
synthesis, a testing environment program was developed, offering the functionality
of parameter estimation and performance evaluation in FM synthesis
- …