430 research outputs found
Score extraction usign MPEG-4 T/F partial encoding
This paper describes the preliminary work in the development of an MPEG-4 audio transcoder between the time/frequency (T/F) and the structured audio (SA) formats. Our approach consists in not going from T/F format through to waveform data and back again to SA, but extracting the score information from an intermediate stage. For this intermediate form we have chosen the input of the filterbank and block switching tool, which consists of frequency data. This data is the result of windowing and applying the modified discrete cosine transform (MDCT) to the signal. The size of the window to be used is determined in a frame-by-frame basis by a psychoacoustics analysis of the data. In this paper we show that this approach is feasible by developing a system which extracts the score information from the filterbank and block switching tool output in a MPEG-4 T/F encoder by adapting and fine-tuning some existing processing techniques.Peer ReviewedPostprint (published version
Tracking the Frequency Components of Musical Tones Based on Global Waveform Fitting
Abstract: A novel approach to estimating sinusoidd parameters of musical tones is presented. The proposed algorithm uses a quadratic polynomial phase sinusoidal model and a global waveform fitting criterion to obtain optimal model parameter estimates
Separation of musical sources and structure from single-channel polyphonic recordings
EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Recommended from our members
Signal separation of musical instruments: simulation-based methods for musical signal decomposition and transcription
This thesis presents techniques for the modelling of musical signals, with particular regard to monophonic and polyphonic pitch estimation. Musical signals are modelled as a set of notes, each comprising of a set of harmonically-related sinusoids. An hierarchical model is presented that is very general and applicable to any signal that can be decomposed as the sum of basis functions. Parameter estimation is posed within a Bayesian framework, allowing for the incorporation of prior information about model parameters. The resulting posterior distribution is of variable dimension and so reversible jump MCMC simulation techniques are employed for the parameter estimation task. The extension of the model to time-varying signals with high posterior correlations between model parameters is described. The parameters and hyperparameters of several frames of data are estimated jointly to achieve a more robust detection. A general model for the description of time-varying homogeneous and heterogeneous multiple component signals is developed, and then applied to the analysis of musical signals. The importance of high level musical and perceptual psychological knowledge in the formulation of the model is highlighted, and attention is drawn to the limitation of pure signal processing techniques for dealing with musical signals. Gestalt psychological grouping principles motivate the hierarchical signal model, and component identifiability is considered in terms of perceptual streaming where each component establishes its own context. A major emphasis of this thesis is the practical application of MCMC techniques, which are generally deemed to be too slow for many applications. Through the design of efficient transition kernels highly optimised for harmonic models, and by careful choice of assumptions and approximations, implementations approaching the order of realtime are viable.Engineering and Physical Sciences Research Counci
Effect of the glottal source and the vocal tract on the partials amplitude of vibrato in male voices
In this paper the production of vocal vibrato is investigated. The most relevant features of the
acoustical vibrato signal, frequency and amplitude variations of the partials, will be related to the
voice production features, glottal source GS and vocal tract response VTR . Unlike previous
related works, in this approach, the effect on the amplitude variations of the partials of each one of
the above-mentioned voice production features will be identified in recordings of natural singing
voice. Moreover, we will take special care of the reliability of the measurements, and, to this aim,
a noninteractive vibrato production model will be also proposed in order to describe the vibrato
production process and, more importantly, validate the measurements carried out in natural vibrato.
Based on this study, it will be shown that during a few vibrato cycles, the glottal pulse
characteristics, as well as the VTR, do not significantly change, and only the fundamental frequency
of the GS varies. As a result, the pitch variations can be attributed to the GS, and these variations,
along with the vocal tract filtering effect, will result in frequency and amplitude variations of the
acoustic signal partials.This work was supported in part by the Ministerio de
EducaciĂłn y Ciencia under Grant FPU, AP2000-4674. The
Gobierno de Navarra and the Universidad PĂşblica de Navarra
are gratefully acknowledged for financial support
A Parametric Sound Object Model for Sound Texture Synthesis
This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed
Towards the automated analysis of simple polyphonic music : a knowledge-based approach
PhDMusic understanding is a process closely related to the knowledge and experience
of the listener. The amount of knowledge required is relative to the
complexity of the task in hand.
This dissertation is concerned with the problem of automatically decomposing
musical signals into a score-like representation. It proposes that, as
with humans, an automatic system requires knowledge about the signal and
its expected behaviour to correctly analyse music.
The proposed system uses the blackboard architecture to combine the
use of knowledge with data provided by the bottom-up processing of the
signal's information. Methods are proposed for the estimation of pitches,
onset times and durations of notes in simple polyphonic music.
A method for onset detection is presented. It provides an alternative to
conventional energy-based algorithms by using phase information. Statistical
analysis is used to create a detection function that evaluates the expected
behaviour of the signal regarding onsets.
Two methods for multi-pitch estimation are introduced. The first concentrates
on the grouping of harmonic information in the frequency-domain.
Its performance and limitations emphasise the case for the use of high-level
knowledge.
This knowledge, in the form of the individual waveforms of a single
instrument, is used in the second proposed approach. The method is based
on a time-domain linear additive model and it presents an alternative to
common frequency-domain approaches.
Results are presented and discussed for all methods, showing that, if
reliably generated, the use of knowledge can significantly improve the quality
of the analysis.Joint Information Systems Committee (JISC) in the UK National Science Foundation (N.S.F.) in the United states. Fundacion Gran Mariscal Ayacucho in Venezuela
Contributions to automatic multiple F0 detection in polyphonic music signals
Multiple fundamental frequency estimation, or multi-pitch estimation (MPE), is a key problem in automatic music transcription (AMT) and many other related audio processing tasks. Applications of AMT are numerous, ranging from musical genre classification to automatic piano tutoring, and these form a significant part of musical information retrieval tasks. Current AMT systems still perform considerably below human experts, and there is a consensus that the development of an automated system for full transcription of polyphonic music regardless of its complexity is still an open problem. The goal of this work is to propose contributions for the automatic detection of multiple fundamental frequencies in polyphonic music signals. A reference MPE method is chosen to be studied and implemented, and a modification is proposed to improve the performance of the system. Lastly, three refinement strategies are proposed to be incorporated into the modified method, in order to increase the quality of the results. Experimental tests reveal that such refinements improve the overall performance of the system, even if each one performs differently according to signal characteristics.Estimação de múltiplas frequências fundamentais (MPE, do inglês multipitch estimation) é um problema importante na área de transcrição musical automática (TMA) e em muitas outras tarefas relacionadas a processamento de áudio. Aplicações de TMA são diversas, desde classificação de gêneros musicais ao aprendizado automático de piano, as quais consistem em uma parcela significativa de tarefas de extração de informação musical. Métodos atuais de TMA ainda possuem um desempenho consideravelmente ruim quando comparados aos de profissionais da área, e há um consenso que o desenvolvimento de um sistema automatizado para a transcrição completa de música polifônica independentemente de sua complexidade ainda é um problema em aberto. O objetivo deste trabalho é propor contribuições para a detecção automática de múltiplas frequências fundamentais em sinais de música polifônica. Um método de referência para MPEé primeiramente escolhido para ser estudado e implementado, e uma modificação é proposta para melhorar o desempenho do sistema. Por fim, três estratégias de refinamento são propostas para serem incorporadas ao método modificado, com o objetivo de aumentar a qualidade dos resultados. Testes experimentais mostram que tais refinamentos melhoram em média o desempenho do sistema, embora cada um atue de uma maneira diferente de acordo com a natureza dos sinais
- …