185 research outputs found

    Timbral Learning for Musical Robots

    Get PDF
    abstract: The tradition of building musical robots and automata is thousands of years old. Despite this rich history, even today musical robots do not play with as much nuance and subtlety as human musicians. In particular, most instruments allow the player to manipulate timbre while playing; if a violinist is told to sustain an E, they will select which string to play it on, how much bow pressure and velocity to use, whether to use the entire bow or only the portion near the tip or the frog, how close to the bridge or fingerboard to contact the string, whether or not to use a mute, and so forth. Each one of these choices affects the resulting timbre, and navigating this timbre space is part of the art of playing the instrument. Nonetheless, this type of timbral nuance has been largely ignored in the design of musical robots. Therefore, this dissertation introduces a suite of techniques that deal with timbral nuance in musical robots. Chapter 1 provides the motivating ideas and introduces Kiki, a robot designed by the author to explore timbral nuance. Chapter 2 provides a long history of musical robots, establishing the under-researched nature of timbral nuance. Chapter 3 is a comprehensive treatment of dynamic timbre production in percussion robots and, using Kiki as a case-study, provides a variety of techniques for designing striking mechanisms that produce a range of timbres similar to those produced by human players. Chapter 4 introduces a machine-learning algorithm for recognizing timbres, so that a robot can transcribe timbres played by a human during live performance. Chapter 5 introduces a technique that allows a robot to learn how to produce isolated instances of particular timbres by listening to a human play an examples of those timbres. The 6th and final chapter introduces a method that allows a robot to learn the musical context of different timbres; this is done in realtime during interactive improvisation between a human and robot, wherein the robot builds a statistical model of which timbres the human plays in which contexts, and uses this to inform its own playing.Dissertation/ThesisDoctoral Dissertation Media Arts and Sciences 201

    Generative rhythmic models

    Get PDF
    A system for generative rhythmic modeling is presented. The work aims to explore computational models of creativity, realizing them in a system designed for realtime generation of semi-improvisational music. This is envisioned as an attempt to develop musical intelligence in the context of structured improvisation, and by doing so to enable and encourage new forms of musical control and performance; the systems described in this work, already capable of realtime creation, have been designed with the explicit intention of embedding them in a variety of performance-based systems. A model of qaida, a solo tabla form, is presented, along with the results of an online survey comparing it to a professional tabla player's recording on dimensions of musicality, creativity, and novelty. The qaida model generates a bank of rhythmic variations by reordering subphrases. Selections from this bank are sequenced using a feature-based approach. An experimental extension into modeling layer- and loop-based forms of electronic music is presented, in which the initial modeling approach is generalized. Starting from a seed track, the layer-based model utilizes audio analysis techniques such as blind source separation and onset-based segmentation to generate layers which are shuffled and recombined to generate novel music in a manner analogous to the qaida model.M.S.Committee Chair: Chordia, Parag; Committee Member: Freeman, Jason; Committee Member: Weinberg, Gi

    Real-time Percussive Technique Recognition and Embedding Learning for the Acoustic Guitar

    Full text link
    Real-time music information retrieval (RT-MIR) has much potential to augment the capabilities of traditional acoustic instruments. We develop RT-MIR techniques aimed at augmenting percussive fingerstyle, which blends acoustic guitar playing with guitar body percussion. We formulate several design objectives for RT-MIR systems for augmented instrument performance: (i) causal constraint, (ii) perceptually negligible action-to-sound latency, (iii) control intimacy support, (iv) synthesis control support. We present and evaluate real-time guitar body percussion recognition and embedding learning techniques based on convolutional neural networks (CNNs) and CNNs jointly trained with variational autoencoders (VAEs). We introduce a taxonomy of guitar body percussion based on hand part and location. We follow a cross-dataset evaluation approach by collecting three datasets labelled according to the taxonomy. The embedding quality of the models is assessed using KL-Divergence across distributions corresponding to different taxonomic classes. Results indicate that the networks are strong classifiers especially in a simplified 2-class recognition task, and the VAEs yield improved class separation compared to CNNs as evidenced by increased KL-Divergence across distributions. We argue that the VAE embedding quality could support control intimacy and rich interaction when the latent space's parameters are used to control an external synthesis engine. Further design challenges around generalisation to different datasets have been identified.Comment: Accepted at the 24th Int. Society for Music Information Retrieval Conf., Milan, Italy, 202

    Automatic Labelling of Tabla Signals

    Get PDF
    Most of the recent developments in the field of music indexing and music information retrieval are focused on western music. In this paper, we present an automatic music transcription system dedicated to Tabla - a North Indian percussion instrument. Our approach is based on three main steps: firstly, the audio signal is segmented in adjacent segments where each segment represents a single stroke. Secondly, rhythmic information such as relative durations are calculated using beat detection techniques. Finally, the transcription (recognition of the strokes) is performed by means of a statistical model based on Hidden Markov Model (HMM). The structure of this model is designed in order to represent the time dependencies between successives strokes and to take into account the specificities of the tabla score notation (transcription symbols may be context dependent). Realtime transcription of Tabla soli (or performances) with an error rate of 6.5% is made possible with this transcriber. The transcription system, along with some additional features such as sound synthesis or phrase correction, are integrated in a user-friendly environment called Tablascope

    Real-time Percussive Technique Recognition and Embedding Learning for the Acoustic Guitar

    Get PDF
    Real-time music information retrieval (RT-MIR) has much potential to augment the capabilities of traditional acoustic instruments. We develop RT-MIR techniques aimed at augmenting percussive fingerstyle, which blends acoustic guitar playing with guitar body percussion. We formulate several design objectives for RT-MIR systems for augmented instrument performance: (i) causal constraint, (ii) perceptually negligible action-to-sound latency, (iii) control intimacy support, (iv) synthesis control support. We present and evaluate real-time guitar body percussion recognition and embedding learning techniques based on convolutional neural networks (CNNs) and CNNs jointly trained with variational autoencoders (VAEs). We introduce a taxonomy of guitar body percussion based on hand part and location. We follow a cross-dataset evaluation approach by collecting three datasets labelled according to the taxonomy. The embedding quality of the models is assessed using KL-Divergence across distributions corresponding to different taxonomic classes. Results indicate that the networks are strong classifiers especially in a simplified 2-class recognition task, and the VAEs yield improved class separation compared to CNNs as evidenced by increased KL-Divergence across distributions. We argue that the VAE embedding quality could support control intimacy and rich interaction when the latent space's parameters are used to control an external synthesis engine. Further design challenges around generalisation to different datasets have been identified

    Culturally sensitive strategies for automatic music prediction

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 103-112).Music has been shown to form an essential part of the human experience-every known society engages in music. However, as universal as it may be, music has evolved into a variety of genres, peculiar to particular cultures. In fact people acquire musical skill, understanding, and appreciation specific to the music they have been exposed to. This process of enculturation builds mental structures that form the cognitive basis for musical expectation. In this thesis I argue that in order for machines to perform musical tasks like humans do, in particular to predict music, they need to be subjected to a similar enculturation process by design. This work is grounded in an information theoretic framework that takes cultural context into account. I introduce a measure of musical entropy to analyze the predictability of musical events as a function of prior musical exposure. Then I discuss computational models for music representation that are informed by genre-specific containers for musical elements like notes. Finally I propose a software framework for automatic music prediction. The system extracts a lexicon of melodic, or timbral, and rhythmic primitives from audio, and generates a hierarchical grammar to represent the structure of a particular musical form. To improve prediction accuracy, context can be switched with cultural plug-ins that are designed for specific musical instruments and genres. In listening experiments involving music synthesis a culture-specific design fares significantly better than a culture-agnostic one. Hence my findings support the importance of computational enculturation for automatic music prediction. Furthermore I suggest that in order to sustain and cultivate the diversity of musical traditions around the world it is indispensable that we design culturally sensitive music technology.by Mihir Sarkar.Ph.D

    Real-time detection of overlapping sound events with non-negative matrix factorization

    Get PDF
    International audienceIn this paper, we investigate the problem of real-time detection of overlapping sound events by employing non-negative matrix factorization techniques. We consider a setup where audio streams arrive in real-time to the system and are decomposed onto a dictionary of event templates learned off-line prior to the decomposition. An important drawback of existing approaches in this context is the lack of controls on the decomposition. We propose and compare two provably convergent algorithms that address this issue, by controlling respectively the sparsity of the decomposition and the trade-off of the decomposition between the different frequency components. Sparsity regularization is considered in the framework of convex quadratic programming, while frequency compromise is introduced by employing the beta-divergence as a cost function. The two algorithms are evaluated on the multi-source detection tasks of polyphonic music transcription, drum transcription and environmental sound recognition. The obtained results show how the proposed approaches can improve detection in such applications, while maintaining low computational costs that are suitable for real-time

    A methodology for investigation of bowed string performance through measurement of violin bowing technique

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007.Includes bibliographical references (leaves 181-186).Virtuosic bowed string performance in many ways exemplifies the incredible potential of human physical performance and expression. Today, a great deal is known about the physics of the violin family and those factors responsible for its sound capabilities. However, there remains much to be discovered about the intricacies of how players control these instruments in order to achieve their characteristic range and nuance of sound. Today, technology offers the ability to study this player control under realistic, unimpeded playing conditions to lead to greater understanding of these performance skills. Presented here is a new methodology for investigation of bowed string performance that uses a playable hardware measurement system to capture the gestures of right hand violin bowing technique. Building upon previous Hyperstring research, this measurement system was optimized to be small, lightweight, and portable and was installed on a carbon fiber violin bow and an electric violin to enable study of realistic, unencumbered violin performances. Included in the system are inertial and force sensors, and an electric field position sensor. In order to maximize the applicability of the gesture data provided by this system to related fields of interest, all of the sensors were calibrated in SI units.(cont.) The gesture data captured by these sensors are recorded together with the audio data from the violin as they are produced by violinists in typical playing scenarios. To explore the potential of the bowing measurement system created, a study of standard bowing techniques, such as detache, martele and spiccato, was conducted with expert violinist participants. Gesture data from these trials were evaluated and input to a classifier to examine physical distinctions between bowing techniques, as well as between players. Results from this analysis, and their implications on this methodology will be presented. In addition to this examination of bowing techniques, applications of the measurement system for study of bowed string acoustics and digital music instrument performance, with focus on virtual instruments created from physical models, will be discussed.by Diana Young.Ph.D

    Extending physical instruments using sampled acoustics

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007.Includes bibliographical references (p. 133-138).This thesis presents a system architecture for creating hybrid digital-acoustic percussion instruments by combining extensions of existing signal processing techniques with specially-designed semi-acoustic physical controllers. This work aims to provide greater realism to digital percussion, gaining much of the richness and understandability of acoustic instruments while preserving the flexibility of digital systems. For this thesis, I have collaborated with percussionists to develop a range of instruments, to refine and extend the algorithmic and physical designs, and to determine successful models of interaction. Conventional percussion controllers measure and discretize the intensity of strikes into discrete trigger messages, but they also ignore the timbre of the hits and fail to track more ambiguous input. In this work, the continuous acoustic output of a struck physical object is processed to add the resonance of a sampled instrument. This is achieved by employing existing low-latency convolution algorithms which have been extended to give the player control over features such as damping, spectral flattening, nonlinear effects, and pitch.(cont.) One of the advantages of this approach is that light taps, scrapes, rubs, or stirring with brushes all take on a hybrid timbre of the real and sampled sound that is surprisingly realistic and controllable. Since part of its behavior is inherently acoustic, a player's intuition about interacting with physical objects can be applied to controlling it. The ability to transform the apparent acoustic properties of objects also suggests applications to HCI and product design contexts.by Roberto Mario Aimi.Ph.D

    Development of an Augmented Reality musical instrument

    Get PDF
    Nowadays, Augmented Reality and Virtual Reality are concepts of which people are becoming more and more aware of due to their application to the video-game industry (speceially in the case of VR). Such raise is partly due to a decrease in costs of Head Mounted Displays, which are consequently becoming more and more accessible to the public and developers worldwide. All of these novelties, along with the frenetic development of Information Technologies applied to essentially, all markets; have also made digital artists and manufacturers aware of the never-ending interaction possibilities these paradigms provide and a variety of systems have appeared, which offer innovative creative capabilities. Due to the personal interest of the author in music and the technologies surrounding its creation by digital means, this document covers the application of the Virtuality- Reality-Continuum (VR and AR) paradigms to the field of interfaces for the musical expression. More precisely, it covers the development of an electronic drumset which integrates Arduino-compatible hardware with a 3D visualisation application (developed based on Unity) to create a complete functioning instrument musical instrument, The system presented along the document attempts to leverage three-dimensional visual feedback with tangible interaction based on hitting, which is directly translated to sound and visuals in the sound generation application. Furthermore, the present paper provides a notably deep study of multiple technologies and areas that are ultimately applied to the target system itself. Hardware concerns, time requirements, approaches to the creation of NIMEs (New Interfaces for Musical Expression), Virtual Musical Instrument (VMI) design, musical-data transmission protocols (MIDI and OSC) and 3D modelling constitute the fundamental topics discussed along the document. At the end of this paper, conclusions reflect on the difficulties found along the project, the unfulfilled objectives and all deviations from the initial concept that the project suffered during the development process. Besides, future work paths will be listed and depicted briefly and personal comments will be included as well as humble pieces of advice targeted at readers interested in facing an ambitious project on their own.En la actualidad, los conceptos de Realidad Aumentada (AR) y Realidad Virtual (VR) son cada vez más conocidos por la gente de a pie, debido en gran parte a su aplicación al ámbito de los videojuegos, donde el desarollo para dispositivos HMDs está en auge. Esta popularidad se debe en gran parte al abaratamiento de este tipo de dispositivos, los cuales son cada vez más accesibles al público y a los desarrolladores de todo el mundo. Todas estas novedades sumadas al frenético desarrollo de la industria de IT han llamado la atención de artistas y empresas que han visto en estos paradigmas (VR and AR) una oportunidad para proporcionar nuevas e ilimitadas formas de interacción y creación de arte en alguna de sus formas. Debido al interés personal del autor de este TFG en la música y las tecnologías que posiblitan la creación musical por medios digitales, este documento explora la aplicación de los paradigmas del Virtuality-Reality Continuum de Milgram (AR y VR) al ámbito de las interfaces para la creación musical. Concretamente, este TFG detalla el desarrollo de una batería electrónica, la cual combina una interfaz tangible creada con hardware compatible con Arduino con una aplicación de generación de sonidos y visualización, desarrollada utilizando Unity como base. Este sistema persigue lograr una interacción natural por parte del usuario por medio de integrar el hardware en unas baquetas, las cuales permiten detectar golpes a cualquier tipo de superficie y convierten estos en mensajes MIDI que son utilizados por el sistema generador de sonido para proporcionar feedback al usuario (tanto visual como auditivo); por tanto, este sistema se distingue por abogar por una interacción que permita golpear físicamente objetos (e.g. una cama), mientras que otros sistemas similates basan su modo de interacción en “air-drumming”. Además, este sistema busca solventar algunos de los inconvenientes principales asociados a los baterías y su normalmente conflictivo instrumento, como es el caso de las limitaciones de espacio, la falta de flexibilidad en cuanto a los sonidos que pueden ser generados y el elevado coste del equipo. Por otro lado, este documento pormenoriza diversos aspectos relacionados con el sistema descrito en cuestión, proporcionando al lector una completa panorámica de sistemas similares al propuesto. Asimismo, se describen los aspectos más importantes en relación al desarrollo del TFG, como es el caso de protocolos de transmisión de información musical (MIDI y OSC), algoritmos de control, guías de diseño para interfaces de creación musical (NIMEs) y modelado 3D. Se incluye un íntegro proceso de Ingeniería de Software para mantener la formalidad y tratar de garantizar un desarrollo más organizado y se discute la metodología utilizada para este proceso. Por último, este documento reflexiona sobre las dificultades encontradas, se enumeran posibilidades de Trabajo Futuro y se finaliza con algunas conclusiones personales derivadas de este trabajo de investigación.Ingeniería Informátic
    corecore