15 research outputs found

    A quick search method for audio signals based on a piecewise linear representation of feature trajectories

    Full text link
    This paper presents a new method for a quick similarity-based search through long unlabeled audio streams to detect and locate audio clips provided by users. The method involves feature-dimension reduction based on a piecewise linear representation of a sequential feature trajectory extracted from a long audio stream. Two techniques enable us to obtain a piecewise linear representation: the dynamic segmentation of feature trajectories and the segment-based Karhunen-L\'{o}eve (KL) transform. The proposed search method guarantees the same search results as the search method without the proposed feature-dimension reduction method in principle. Experiment results indicate significant improvements in search speed. For example the proposed method reduced the total search time to approximately 1/12 that of previous methods and detected queries in approximately 0.3 seconds from a 200-hour audio database.Comment: 20 pages, to appear in IEEE Transactions on Audio, Speech and Language Processin

    Analysis and resynthesis of polyphonic music

    Get PDF
    This thesis examines applications of Digital Signal Processing to the analysis, transformation, and resynthesis of musical audio. First I give an overview of the human perception of music. I then examine in detail the requirements for a system that can analyse, transcribe, process, and resynthesise monaural polyphonic music. I then describe and compare the possible hardware and software platforms. After this I describe a prototype hybrid system that attempts to carry out these tasks using a method based on additive synthesis. Next I present results from its application to a variety of musical examples, and critically assess its performance and limitations. I then address these issues in the design of a second system based on Gabor wavelets. I conclude by summarising the research and outlining suggestions for future developments

    Computational Tonality Estimation: Signal Processing and Hidden Markov Models

    Get PDF
    PhDThis thesis investigates computational musical tonality estimation from an audio signal. We present a hidden Markov model (HMM) in which relationships between chords and keys are expressed as probabilities of emitting observable chords from a hidden key sequence. The model is tested first using symbolic chord annotations as observations, and gives excellent global key recognition rates on a set of Beatles songs. The initial model is extended for audio input by using an existing chord recognition algorithm, which allows it to be tested on a much larger database. We show that a simple model of the upper partials in the signal improves percentage scores. We also present a variant of the HMM which has a continuous observation probability density, but show that the discrete version gives better performance. Then follows a detailed analysis of the effects on key estimation and computation time of changing the low level signal processing parameters. We find that much of the high frequency information can be omitted without loss of accuracy, and significant computational savings can be made by applying a threshold to the transform kernels. Results show that there is no single ideal set of parameters for all music, but that tuning the parameters can make a difference to accuracy. We discuss methods of evaluating more complex tonal changes than a single global key, and compare a metric that measures similarity to a ground truth to metrics that are rooted in music retrieval. We show that the two measures give different results, and so recommend that the choice of evaluation metric is determined by the intended application. Finally we draw together our conclusions and use them to suggest areas for continuation of this research, in the areas of tonality model development, feature extraction, evaluation methodology, and applications of computational tonality estimation.Engineering and Physical Sciences Research Council (EPSRC)

    Towards Real-Time Non-Stationary Sinusoidal Modelling of Kick and Bass Sounds for Audio Analysis and Modification

    Get PDF
    Sinusoidal Modelling is a powerful and flexible parametric method for analysing and processing audio signals. These signals have an underlying structure that modern spectral models aim to exploit by separating the signal into sinusoidal, transient, and noise components. Each of these can then be modelled in a manner most appropriate to that component's inherent structure. The accuracy of the estimated parameters is directly related to the quality of the model's representation of the signal, and the assumptions made about its underlying structure. For sinusoidal models, these assumptions generally affect the non-stationary estimates related to amplitude and frequency modulations, and the type of amplitude change curve. This is especially true when using a single analysis frame in a non-overlapping framework, where biased estimates can result in discontinuities at frame boundaries. It is therefore desirable for such a model to distinguish between the shape of different amplitude changes and adapt the estimation of this accordingly. Intra-frame amplitude change can be interpreted as a change in the windowing function applied to a stationary sinusoid, which can be estimated from the derivative of the phase with respect to frequency at magnitude peaks in the DFT spectrum. A method for measuring monotonic linear amplitude change from single-frame estimates using the first-order derivative of the phase with respect to frequency (approximated by the first-order difference) is presented, along with a method of distinguishing between linear and exponential amplitude change. An adaption of the popular matching pursuit algorithm for refining model parameters in a segmented framework has been investigated using a dictionary comprised of sinusoids with parameters varying slightly from model estimates, based on Modelled Pursuit (MoP). Modelling of the residual signal using a segmented undecimated Wavelet Transform (segUWT) is presented. A generalisation for both the forward and inverse transforms, for delay compensations and overlap extensions for different lengths of Wavelets and the number of decomposition levels in an Overlap Save (OLS) implementation for dealing with convolution block-based artefacts is presented. This shift invariant implementation of the DWT is a popular tool for de-noising and shows promising results for the separation of transients from noise

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010

    The perceptual flow of phonetic feature processing

    Get PDF
    corecore