3,025 research outputs found

    Real-time online musical collaboration system for Indian percussion

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007.Includes bibliographical references (p. 111-119).Thanks to the Internet, musicians located in different countries can now aspire to play with each other almost as if they were in the same room. However, the time delays due to the inherent latency in computer networks (up to several hundreds of milliseconds over long distances) are unsuitable for musical applications. Some musical collaboration systems address this issue by transmitting compressed audio streams (such as MP3) over low-latency and high-bandwidth networks (e.g. LANs or Internet2) to constrain time delays and optimize musician synchronization. Other systems, on the contrary, increase time delays to a musically-relevant value like one phrase, or one chord progression cycle, and then play it in a loop, thereby constraining the music being performed. In this thesis I propose TablaNet, a real-time online musical collaboration system for the tabla, a pair of North Indian hand drums. This system is based on a novel approach that combines machine listening and machine learning. Trained for a particular instrument, here the tabla, the system recognizes individual drum strokes played by the musician and sends them as symbols over the network. A computer at the receiving end identifies the musical structure from the incoming sequence of symbols by mapping them dynamically to known musical constructs. To deal with transmission delays, the receiver predicts the next events by analyzing previous patterns before receiving the original events, and synthesizes an audio output estimate with the appropriate timing. Although prediction approximations may result in a slightly different musical experience at both ends, we find that this system demonstrates a fair level of playability by tabla players of various levels, and functions well as an educational tool.by Mihir Sarkar.S.M

    Introducing Experimental Stereoscopic into Networked Live Performance with Very Limited System Resources

    Get PDF
    Stereoscopic and its wide application are recently getting popular. In thispaper, we examine potential limitations that might hinder the direct introduction ofstereoscopic into networked live performance. This paper also suggests a plausibleexample of simple and general configuration that supports stereoscopic networked liveperformance with very limited system resources. As a result, experiment quality ofstereoscopic networked performance was acquired with minimal resources by usingthe anaglyph method

    Culturally sensitive strategies for automatic music prediction

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 103-112).Music has been shown to form an essential part of the human experience-every known society engages in music. However, as universal as it may be, music has evolved into a variety of genres, peculiar to particular cultures. In fact people acquire musical skill, understanding, and appreciation specific to the music they have been exposed to. This process of enculturation builds mental structures that form the cognitive basis for musical expectation. In this thesis I argue that in order for machines to perform musical tasks like humans do, in particular to predict music, they need to be subjected to a similar enculturation process by design. This work is grounded in an information theoretic framework that takes cultural context into account. I introduce a measure of musical entropy to analyze the predictability of musical events as a function of prior musical exposure. Then I discuss computational models for music representation that are informed by genre-specific containers for musical elements like notes. Finally I propose a software framework for automatic music prediction. The system extracts a lexicon of melodic, or timbral, and rhythmic primitives from audio, and generates a hierarchical grammar to represent the structure of a particular musical form. To improve prediction accuracy, context can be switched with cultural plug-ins that are designed for specific musical instruments and genres. In listening experiments involving music synthesis a culture-specific design fares significantly better than a culture-agnostic one. Hence my findings support the importance of computational enculturation for automatic music prediction. Furthermore I suggest that in order to sustain and cultivate the diversity of musical traditions around the world it is indispensable that we design culturally sensitive music technology.by Mihir Sarkar.Ph.D

    Vocal imitation for query by vocalisation

    Get PDF
    PhD ThesisThe human voice presents a rich and powerful medium for expressing sonic ideas such as musical sounds. This capability extends beyond the sounds used in speech, evidenced for example in the art form of beatboxing, and recent studies highlighting the utility of vocal imitation for communicating sonic concepts. Meanwhile, the advance of digital audio has resulted in huge libraries of sounds at the disposal of music producers and sound designers. This presents a compelling search problem: with larger search spaces, the task of navigating sound libraries has become increasingly difficult. The versatility and expressive nature of the voice provides a seemingly ideal medium for querying sound libraries, raising the question of how well humans are able to vocally imitate musical sounds, and how we might use the voice as a tool for search. In this thesis we address these questions by investigating the ability of musicians to vocalise synthesised and percussive sounds, and evaluate the suitability of different audio features for predicting the perceptual similarity between vocal imitations and imitated sounds. In the first experiment, musicians were tasked with imitating synthesised sounds with one or two time–varying feature envelopes applied. The results show that participants were able to imitate pitch, loudness, and spectral centroid features accurately, and that imitation accuracy was generally preserved when the imitated stimuli combined two, non-necessarily congruent features. This demonstrates the viability of using the voice as a natural means of expressing time series of two features simultaneously. The second experiment consisted of two parts. In a vocal production task, musicians were asked to imitate drum sounds. Listeners were then asked to rate the similarity between the imitations and sounds from the same category (e.g. kick, snare etc.). The results show that drum sounds received the highest similarity ratings when rated against their imitations (as opposed to imitations of another sound), and overall more than half the imitated sounds were correctly identified with above chance accuracy from the imitations, although this varied considerably between drum categories. The findings from the vocal imitation experiments highlight the capacity of musicians to vocally imitate musical sounds, and some limitations of non– verbal vocal expression. Finally, we investigated the performance of different audio features as predictors of perceptual similarity between the imitations and imitated sounds from the second experiment. We show that features learned using convolutional auto–encoders outperform a number of popular heuristic features for this task, and that preservation of temporal information is more important than spectral resolution for differentiating between the vocal imitations and same–category drum sounds

    CONEqNet: convolutional music equalizer network

    Get PDF
    The process of parametric equalization of musical pieces seeks to highlight their qualities by cutting and/or stimulating certain frequencies. In this work, we present a neural model capable of equalizing a song according to the musical genre that is being played at a given moment. It is normal that (1) the equalization should adapt throughout the song and not always be the same for the whole song; and (2) songs do not always belong to a specific musical genre and may contain touches of different musical genres. The neural model designed in this work, called CONEqNet (convolutional music equalizer network), takes these aspects into account and proposes a neural model capable of adapting to the different changes that occur throughout a song and with the possibility of mixing nuances of different musical genres. For the training of this model, the well-known GTzan dataset, which provides 1,000 fragments of songs of 30 seconds each, divided into 10 genres, was used. The paper will show proofs of concept of the performance of the neural model.This work was funded by the private research project of Company BQ, the public research projects of the Spanish Ministry of Science and Innovation PID2020-118249RB-C22 and PDC2021-121567-C22 - AEI/10.13039/501100011033, and the Madrid Government (Comunidad de Madrid-Spain) under the Multiannual Agreement with UC3M in the line of Excellence of University Professors (EPUC3M17) and in the context of the V PRICIT (Regional Programme of Research and Technological Innovation)

    Timbral Learning for Musical Robots

    Get PDF
    abstract: The tradition of building musical robots and automata is thousands of years old. Despite this rich history, even today musical robots do not play with as much nuance and subtlety as human musicians. In particular, most instruments allow the player to manipulate timbre while playing; if a violinist is told to sustain an E, they will select which string to play it on, how much bow pressure and velocity to use, whether to use the entire bow or only the portion near the tip or the frog, how close to the bridge or fingerboard to contact the string, whether or not to use a mute, and so forth. Each one of these choices affects the resulting timbre, and navigating this timbre space is part of the art of playing the instrument. Nonetheless, this type of timbral nuance has been largely ignored in the design of musical robots. Therefore, this dissertation introduces a suite of techniques that deal with timbral nuance in musical robots. Chapter 1 provides the motivating ideas and introduces Kiki, a robot designed by the author to explore timbral nuance. Chapter 2 provides a long history of musical robots, establishing the under-researched nature of timbral nuance. Chapter 3 is a comprehensive treatment of dynamic timbre production in percussion robots and, using Kiki as a case-study, provides a variety of techniques for designing striking mechanisms that produce a range of timbres similar to those produced by human players. Chapter 4 introduces a machine-learning algorithm for recognizing timbres, so that a robot can transcribe timbres played by a human during live performance. Chapter 5 introduces a technique that allows a robot to learn how to produce isolated instances of particular timbres by listening to a human play an examples of those timbres. The 6th and final chapter introduces a method that allows a robot to learn the musical context of different timbres; this is done in realtime during interactive improvisation between a human and robot, wherein the robot builds a statistical model of which timbres the human plays in which contexts, and uses this to inform its own playing.Dissertation/ThesisDoctoral Dissertation Media Arts and Sciences 201
    corecore