2,139 research outputs found
Computer Models for Musical Instrument Identification
PhDA particular aspect in the perception of sound is concerned with what is commonly
termed as texture or timbre. From a perceptual perspective, timbre is what allows us
to distinguish sounds that have similar pitch and loudness. Indeed most people are
able to discern a piano tone from a violin tone or able to distinguish different voices
or singers.
This thesis deals with timbre modelling. Specifically, the formant theory of timbre
is the main theme throughout. This theory states that acoustic musical instrument
sounds can be characterised by their formant structures. Following this principle, the
central point of our approach is to propose a computer implementation for building
musical instrument identification and classification systems.
Although the main thrust of this thesis is to propose a coherent and unified
approach to the musical instrument identification problem, it is oriented towards the
development of algorithms that can be used in Music Information Retrieval (MIR)
frameworks. Drawing on research in speech processing, a complete supervised system
taking into account both physical and perceptual aspects of timbre is described.
The approach is composed of three distinct processing layers. Parametric models
that allow us to represent signals through mid-level physical and perceptual representations
are considered. Next, the use of the Line Spectrum Frequencies as spectral
envelope and formant descriptors is emphasised. Finally, the use of generative and
discriminative techniques for building instrument and database models is investigated.
Our system is evaluated under realistic recording conditions using databases of isolated
notes and melodic phrases
Polyphonic music information retrieval based on multi-label cascade classification system
Recognition and separation of sounds played by various instruments is very useful in labeling audio files with semantic information. This is a non-trivial task requiring sound analysis, but the results can aid automatic indexing and browsing music data when searching for melodies played by user specified instruments. Melody match based on pitch detection technology has drawn much attention and a lot of MIR systems have been developed to fulfill this task. However, musical instrument recognition remains an unsolved problem in the domain. Numerous approaches on acoustic feature extraction have already been proposed for timbre recognition. Unfortunately, none of those monophonic timbre estimation algorithms can be successfully applied to polyphonic sounds, which are the more usual cases in the real music world. This has stimulated the research on multi-labeled instrument classification and new features development for content-based automatic music information retrieval. The original audio signals are the large volume of unstructured sequential values, which are not suitable for traditional data mining algorithms; while the acoustical features are sometime not sufficient for instrument recognition in polyphonic sounds because they are higher-level representatives of raw signal lacking details of original information. In order to capture the patterns which evolve on the time scale, new temporal features are introduced to supply more temporal information for the timbre recognition. We will introduce the multi-labeled classification system to estimate multiple timbre information from the polyphonic sound by classification based on acoustic features and short-term power spectrum matching. In order to achieve higher estimation rate, we introduced the hierarchically structured cascade classification system under the inspiration of the human perceptual process. This cascade classification system makes a first estimate on the higher level decision attribute, which stands for the musical instrument family. Then, the further estimation is done within that specific family range. Experiments showed better performance of a hierarchical system than the traditional flat classification method which directly estimates the instrument without higher level of family information analysis.
Traditional hierarchical structures were constructed in human semantics, which are meaningful from human perspective but not appropriate for the cascade system. We introduce the new hierarchical instrument schema according to the clustering results of the acoustic features. This new schema better describes the similarity among different instruments or among different playing techniques of the same instrument. The classification results show the higher accuracy of cascade system with the new schema compared to the traditional schemas. The query answering system is built based on the cascade classifier
Analysis and resynthesis of polyphonic music
This thesis examines applications of Digital Signal Processing to the analysis, transformation, and resynthesis of musical audio. First I give an overview of the human perception of music. I then examine in detail the requirements for a system that can analyse, transcribe, process, and resynthesise monaural polyphonic music. I then describe and compare the possible hardware and software platforms. After this I describe a prototype hybrid system that attempts to carry out these tasks using a method based on additive synthesis. Next I present results from its application to a variety of musical examples, and critically assess its performance and limitations. I then address these issues in the design of a second system based on Gabor wavelets. I conclude by summarising the research and outlining suggestions for future developments
Feature Extraction of Musical Instrument Tones using FFT and Segment Averaging
A feature extraction for musical instrument tones that based on a transform domain approach was proposed in this paper. The aim of the proposed feature extraction was to get the lower feature extraction coefficients. In general, the proposed feature extraction was carried out as follow. Firstly, the input signal was transformed using FFT (Fast Fourier Transform). Secondly, the left half of the transformed signal was divided into a number of segments. Finally, the averaging results of that segments, was the feature extraction of the input signal. Based on the test results, the proposed feature extraction was highly efficient for the tones, which have many significant local peaks in the Fourier transform domain, because it only required at least four feature extraction coefficients, in order to represent every tone
Recommended from our members
A new user interface for musical timbre design
This thesis characterises and addresses problems and issues associated with the design of intuitive user interfaces for timbral control. The usability of a range of synthesis methods and representative implementations of these methods is assessed, and three interface architectures - fixed architecture, architecture specification and direct specification - are identified. The characteristics of each of these architectures, as well as problems of usability inherent to each of them are discussed; it is argued that none of them provide intuitive tools for the manipulation and control of timbre.
The study examines the nature of timbre and the notion of timbre space; different kinds of timbre space are considered and criteria are proposed for the selection of suitable timbre spaces as vehicles for synthesis.
A number of listening tests, designed to demonstrate the feasibility of subsequent work, were devised and carried out; the results of these tests provide evidence that, where Euclidean distances between sounds located in a given timbre space are reflected in perceptual distances, the ability of subjects to detect relative distances in different parts of the space varies with the perceptual granularity of the space.
Three contrasting timbre spaces conforming to the proposed criteria for use in synthesis are constructed; the purpose of these spaces is to provide an environment for a novel user interaction approach for timbral design which incorporates a search strategy based on weighted centroid localization. Two prototypes which exemplify the proposed approach in alternative ways are designed, implemented and tested with potential users in order to validate the approach; a third contrasting prototype which represents a simple contrasting alternative is tested for purposes of comparison. The results of these tests are evaluated and discussed, and areas of further work identified
Deriving musical control features froma real-time timbre analysis of the clarinet
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (leaves 65-67).by Eran Baruch Egozy.M.Eng
Discriminating music performers by timbre: On the relation between instrumental gesture, tone quality and perception in classical cello performance
Classical music performers use instruments to transform the symbolic notationof the score into sound which is ultimately perceived by a listener. For acoustic instruments, the timbre of the resulting sound is assumed to be strongly linked to the physical and acoustical properties of the instrument itself. However, rather little is known about how much influence the player has over the timbre of the sound — is it possible to discriminate music performers by timbre? This thesis explores player-dependent aspects of timbre, serving as an individual means of musical expression. With a research scope narrowed to analysis of solo cello recordings, the differences in tone quality of six performers who played the same musical excerpts on the same cello are investigated from three different perspectives: perceptual, acoustical and gestural. In order to understand how the physical actions that a performer exerts on an instrument affect spectro-temporal features of the sound produced, which then can be perceived as the player’s unique tone quality, a series of experiments are conducted, starting with the creation of dedicated multi-modal cello recordings extended by performance gesture information (bowing control parameters). In the first study, selected tone samples of six cellists are perceptually evaluated across various musical contexts via timbre dissimilarity and verbal attribute ratings. The spectro-temporal analysis follows in the second experiment, with the aim to identify acoustic features which best describe varying timbral characteristics of the players. Finally, in the third study, individual combinationsof bowing controls are examined in search for bowing patterns which might characterise each cellist regardless of the music being performed. The results show that the different players can be discriminated perceptually, by timbre, and that this perceptual discrimination can be projected back through the acoustical and gestural domains. By extending current understanding of human-instrument dependencies for qualitative tone production, this research may have further applications in computer-aided musical training and performer-informed instrumental sound synthesis.This work was supported by a UK EPSRC DTA studentship EP/P505054/1
and the EPSRC funded OMRAS2 project EP/E017614/1
A flexible bio-inspired hierarchical model for analyzing musical timbre
A flexible and multipurpose bio-inspired hierarchical model for analyzing musical timbre is presented in this paper. Inspired by findings in the fields of neuroscience, computational neuroscience, and psychoacoustics, not only does the model extract spectral and temporal characteristics of a signal, but it also analyzes amplitude modulations on different timescales. It uses a cochlear filter bank to resolve the spectral components of a sound, lateral inhibition to enhance spectral resolution, and a modulation filter bank to extract the global temporal envelope and roughness of the sound from amplitude modulations. The model was evaluated in three applications. First, it was used to simulate subjective data from two roughness experiments. Second, it was used for musical instrument classification using the k-NN algorithm and a Bayesian network. Third, it was applied to find the features that characterize sounds whose timbres were labeled in an audiovisual experiment. The successful application of the proposed model in these diverse tasks revealed its potential in capturing timbral information
- …