40 research outputs found

    Computer Models for Musical Instrument Identification

    Get PDF
    PhDA particular aspect in the perception of sound is concerned with what is commonly termed as texture or timbre. From a perceptual perspective, timbre is what allows us to distinguish sounds that have similar pitch and loudness. Indeed most people are able to discern a piano tone from a violin tone or able to distinguish different voices or singers. This thesis deals with timbre modelling. Specifically, the formant theory of timbre is the main theme throughout. This theory states that acoustic musical instrument sounds can be characterised by their formant structures. Following this principle, the central point of our approach is to propose a computer implementation for building musical instrument identification and classification systems. Although the main thrust of this thesis is to propose a coherent and unified approach to the musical instrument identification problem, it is oriented towards the development of algorithms that can be used in Music Information Retrieval (MIR) frameworks. Drawing on research in speech processing, a complete supervised system taking into account both physical and perceptual aspects of timbre is described. The approach is composed of three distinct processing layers. Parametric models that allow us to represent signals through mid-level physical and perceptual representations are considered. Next, the use of the Line Spectrum Frequencies as spectral envelope and formant descriptors is emphasised. Finally, the use of generative and discriminative techniques for building instrument and database models is investigated. Our system is evaluated under realistic recording conditions using databases of isolated notes and melodic phrases

    Making music through real-time voice timbre analysis: machine learning and timbral control

    Get PDF
    PhDPeople can achieve rich musical expression through vocal sound { see for example human beatboxing, which achieves a wide timbral variety through a range of extended techniques. Yet the vocal modality is under-exploited as a controller for music systems. If we can analyse a vocal performance suitably in real time, then this information could be used to create voice-based interfaces with the potential for intuitive and ful lling levels of expressive control. Conversely, many modern techniques for music synthesis do not imply any particular interface. Should a given parameter be controlled via a MIDI keyboard, or a slider/fader, or a rotary dial? Automatic vocal analysis could provide a fruitful basis for expressive interfaces to such electronic musical instruments. The principal questions in applying vocal-based control are how to extract musically meaningful information from the voice signal in real time, and how to convert that information suitably into control data. In this thesis we address these questions, with a focus on timbral control, and in particular we develop approaches that can be used with a wide variety of musical instruments by applying machine learning techniques to automatically derive the mappings between expressive audio input and control output. The vocal audio signal is construed to include a broad range of expression, in particular encompassing the extended techniques used in human beatboxing. The central contribution of this work is the application of supervised and unsupervised machine learning techniques to automatically map vocal timbre to synthesiser timbre and controls. Component contributions include a delayed decision-making strategy for low-latency sound classi cation, a regression-tree method to learn associations between regions of two unlabelled datasets, a fast estimator of multidimensional di erential entropy and a qualitative method for evaluating musical interfaces based on discourse analysis

    Perceptual synthesis engine : an audio-driven timbre generator

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2001.Includes bibliographical references (leaves 68-75).A real-time synthesis engine which models and predicts the timbre of acoustic instruments based on perceptual features extracted from an audio stream is presented. The thesis describes the modeling sequence including the analysis of natural sounds, the inference step that finds the mapping between control and output parameters, the timbre prediction step, and the sound synthesis. The system enables applications such as cross-synthesis, pitch shifting or compression of acoustic instruments, and timbre morphing between instrument families. It is fully implemented in the Max/MSP environment. The Perceptual Synthesis Engine was developed for the Hyperviolin as a novel, generic and perceptually meaningful synthesis technique for non-discretely pitched instruments.by Tristan Jehan.S.M

    Singing voice analysis/synthesis

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2003.Includes bibliographical references (p. 109-115).The singing voice is the oldest and most variable of musical instruments. By combining music, lyrics, and expression, the voice is able to affect us in ways that no other instrument can. As listeners, we are innately drawn to the sound of the human voice, and when present it is almost always the focal point of a musical piece. But the acoustic flexibility of the voice in intimating words, shaping phrases, and conveying emotion also makes it the most difficult instrument to model computationally. Moreover, while all voices are capable of producing the common sounds necessary for language understanding and communication, each voice possesses distinctive features independent of phonemes and words. These unique acoustic qualities are the result of a combination of innate physical factors and expressive characteristics of performance, reflecting an individual's vocal identity. A great deal of prior research has focused on speech recognition and speaker identification, but relatively little work has been performed specifically on singing. There are significant differences between speech and singing in terms of both production and perception. Traditional computational models of speech have focused on the intelligibility of language, often sacrificing sound quality for model simplicity. Such models, however, are detrimental to the goal of singing, which relies on acoustic authenticity for the non-linguistic communication of expression and emotion. These differences between speech and singing dictate that a different and specialized representation is needed to capture the sound quality and musicality most valued in singing.(cont.) This dissertation proposes an analysis/synthesis framework specifically for the singing voice that models the time-varying physical and expressive characteristics unique to an individual voice. The system operates by jointly estimating source-filter voice model parameters, representing vocal physiology, and modeling the dynamic behavior of these features over time to represent aspects of expression. This framework is demonstrated to be useful for several applications, such as singing voice coding, automatic singer identification, and voice transformation.by Youngmoo Edmund Kim.Ph.D

    Auditory group theory with applications to statistical basis methods for structured audio

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1998.Includes bibliographical references (p. 161-172).Michael Anthony Casey.Ph.D

    Separation of musical sources and structure from single-channel polyphonic recordings

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Discriminating music performers by timbre: On the relation between instrumental gesture, tone quality and perception in classical cello performance

    Get PDF
    Classical music performers use instruments to transform the symbolic notationof the score into sound which is ultimately perceived by a listener. For acoustic instruments, the timbre of the resulting sound is assumed to be strongly linked to the physical and acoustical properties of the instrument itself. However, rather little is known about how much influence the player has over the timbre of the sound — is it possible to discriminate music performers by timbre? This thesis explores player-dependent aspects of timbre, serving as an individual means of musical expression. With a research scope narrowed to analysis of solo cello recordings, the differences in tone quality of six performers who played the same musical excerpts on the same cello are investigated from three different perspectives: perceptual, acoustical and gestural. In order to understand how the physical actions that a performer exerts on an instrument affect spectro-temporal features of the sound produced, which then can be perceived as the player’s unique tone quality, a series of experiments are conducted, starting with the creation of dedicated multi-modal cello recordings extended by performance gesture information (bowing control parameters). In the first study, selected tone samples of six cellists are perceptually evaluated across various musical contexts via timbre dissimilarity and verbal attribute ratings. The spectro-temporal analysis follows in the second experiment, with the aim to identify acoustic features which best describe varying timbral characteristics of the players. Finally, in the third study, individual combinationsof bowing controls are examined in search for bowing patterns which might characterise each cellist regardless of the music being performed. The results show that the different players can be discriminated perceptually, by timbre, and that this perceptual discrimination can be projected back through the acoustical and gestural domains. By extending current understanding of human-instrument dependencies for qualitative tone production, this research may have further applications in computer-aided musical training and performer-informed instrumental sound synthesis.This work was supported by a UK EPSRC DTA studentship EP/P505054/1 and the EPSRC funded OMRAS2 project EP/E017614/1

    Ultra-high-speed imaging of bubbles interacting with cells and tissue

    Get PDF
    Ultrasound contrast microbubbles are exploited in molecular imaging, where bubbles are directed to target cells and where their high-scattering cross section to ultrasound allows for the detection of pathologies at a molecular level. In therapeutic applications vibrating bubbles close to cells may alter the permeability of cell membranes, and these systems are therefore highly interesting for drug and gene delivery applications using ultrasound. In a more extreme regime bubbles are driven through shock waves to sonoporate or kill cells through intense stresses or jets following inertial bubble collapse. Here, we elucidate some of the underlying mechanisms using the 25-Mfps camera Brandaris128, resolving the bubble dynamics and its interactions with cells. We quantify acoustic microstreaming around oscillating bubbles close to rigid walls and evaluate the shear stresses on nonadherent cells. In a study on the fluid dynamical interaction of cavitation bubbles with adherent cells, we find that the nonspherical collapse of bubbles is responsible for cell detachment. We also visualized the dynamics of vibrating microbubbles in contact with endothelial cells followed by fluorescent imaging of the transport of propidium iodide, used as a membrane integrity probe, into these cells showing a direct correlation between cell deformation and cell membrane permeability

    Pitch perception as probabilistic inference

    Get PDF
    Pitch is a fundamental and salient perceptual attribute of many behaviourally important sounds, including animal calls, human speech and music. Human listeners perceive pitch without conscious effort or attention. These and similar observations have prompted a search for mappings from acoustic stimulus to percept that can be easily computed from peripheral neural responses at early stages of the central auditory pathway. This tenet however is not supported by physiological evidence: how the percept of pitch is encoded in neural firing patterns across the brain, and where – if at all – such a representation may be localised remain as yet unsolved questions. Here, instead of seeking an explanation guided by putative mechanisms, we take a more abstract stance in developing a model by asking, what computational goal the auditory system is set up to achieve during pitch perception. Many natural pitch-evoking sounds are approximately periodic within short observation time windows. We posit that pitch reflects a near-optimal estimate of the underlying periodicity of sounds from noisy evoked responses in the auditory nerve, exploiting statistical knowledge about the regularities and irregularities occurring during sound generation and transduction. We compute (or approximate) the statistically optimal estimate using a Bayesian probabilistic framework. Model predictions match the pitch reported by human listeners for a wide range of welldocumented, pitch-evoking stimuli, both periodic and aperiodic. We then present new psychophysical data on octave biases and pitch-timbre interactions in human perception which further demonstrates the validity of our approach, while posing difficulties for alternative models based on autocorrelation analysis or simple spectral pattern matching. Our model embodies the concept of perception as unconscious inference, originally proposed by von Helmholtz as an interface bridging optics and vision. Our results support the view that even apparently primitive acoustic percepts may derive from subtle statistical inference, suggesting that such inferential processes operate at all levels across our sensory systems
    corecore