303 research outputs found

    On Using Backpropagation for Speech Texture Generation and Voice Conversion

    Full text link
    Inspired by recent work on neural network image generation which rely on backpropagation towards the network inputs, we present a proof-of-concept system for speech texture synthesis and voice conversion based on two mechanisms: approximate inversion of the representation learned by a speech recognition neural network, and on matching statistics of neuron activations between different source and target utterances. Similar to image texture synthesis and neural style transfer, the system works by optimizing a cost function with respect to the input waveform samples. To this end we use a differentiable mel-filterbank feature extraction pipeline and train a convolutional CTC speech recognition network. Our system is able to extract speaker characteristics from very limited amounts of target speaker data, as little as a few seconds, and can be used to generate realistic speech babble or reconstruct an utterance in a different voice.Comment: Accepted to ICASSP 201

    The analysis of composition techniques in utp_: Synthetic composition for electroacoustic ensembles

    Get PDF
    This thesis attempts to analyse and describe a number of spectrally oriented composition techniques for composing music for electroacoustic ensemble. These techniques aim to achieve a synthetic approach to combining electronic and acoustic sound sources in live performance. To achieve this, an in-depth analysis of utp_ (2008) by Alva Noto and Ryuichi Sakamoto in collaboration with Ensemble Modern is conducted. utp_ utilises a large acoustic ensemble, live electronic processing, prerecorded electronic sound and video projections in performance. The discussion also queries the possible problems of electroacoustic performance, and examines ways to resolve the most prevalent issues. This involves a discussion of the materials of electroacoustic works, timbral differences in acoustic and electronic sounds and liveness in electroacoustic music performance. The analysis involves using spectral and score analysis to identify composition techniques. The final section describes the way these composition techniques are applied in my own work for electroacoustic ensemble, lucidity

    Audio-Material Modeling and Reconstruction for Multimodal Interaction

    Get PDF
    Interactive virtual environments enable the creation of training simulations, games, and social applications. These virtual environments can create a sense of presence in the environment: a sensation that its user is truly in another location. To maintain presence, interactions with virtual objects should engage multiple senses. Furthermore, multisensory input should be consistent, e.g. a virtual bowl that visually appears plastic should also sound like plastic when dropped on the floor. In this dissertation, I propose methods to improve the perceptual realism of virtual object impact sounds and ensure consistency between those sounds and the input from other senses. Recreating the impact sound of a real-world object requires an accurate estimate of that object's material parameters. The material parameters that affect impact sound---collectively forming the audio-material---include the material damping parameters for a damping model. I propose and evaluate damping models and use them to estimate material damping parameters for real-world objects. I also consider how interaction with virtual objects can be made more consistent between the senses of sight, hearing, and touch. First, I present a method for modeling the damping behavior of impact sounds, using generalized proportional damping to both estimate more expressive material damping parameters from recorded impact sounds and perform impact sound synthesis. Next, I present a method for estimating material damping parameters in the presence of confounding factors and with no knowledge of the object's shape. To accomplish this, a probabilistic damping model captures various external effects to produce robust damping parameter estimates. Next, I present a method for consistent multimodal interaction with textured surfaces. Texture maps serve as a single unified representation of mesoscopic detail for the purposes of visual rendering, sound synthesis, and rigid-body simulation. Finally, I present a method for geometry and material classification using multimodal audio-visual input. Using this method, a real-world scene can be scanned and virtually reconstructed while accurately modeling both the visual appearances and audio-material parameters of each object.Doctor of Philosoph

    Spectral tourist: joystick operated sound production software for Macintosh computers (OS 9.1 & .2) (documentation and CD recording of performances)

    Get PDF
    SPECTRAL TOURIST September 2002 -June 2003 Software documentation • • THE SPECTRAL TOURIST Audio CD Duration 29.10 The eight tracks on this short CD represent some of the possibilities for performing solos with the Spectral Tourist. Each track was recorded in one take travelling through spectrograms that were generated from sound files stored on my laptop hard disk. Of course, the Spectral Tourist is not limited to solos and it is just as possible to work with a live musician. • • Hell's Angles [sic] Generative software for Macintosh computer running 0S9.1 or .2 Martin Parker January 2003 • • HAZE ver.1 for computer with Clarinet in Bb, Tenor Trombone and Cello Duration ca 5 minutes Martin Parker October 2002 • • Environment for Stone Violin and Computer duration max. 15minutes Martin Parker September 2002 • • THE VIEW DVD VIDEO duration 12.18 Martin Parker February 2002 • • Shonky Music for Tracker action Organs Duration ca 6 minutes Martin Parker February 2001 • • Sounds of Line - Melody Sounds of Line - Rhythm For 4 prepared French horns duration ca. 7 minutes Martin Parker September 2000 • • In formation II For Treble recorder (doubling descant and tenor) and CD Martin Parker, April 2000 • • Antiorp 2 Clarinets in Bb Bassoon Contrabass Recorder 7 string Viola da gamba (scordatura) Large Bass Drum Duration ca 15 minutes Martin Parker November 1999 • • In formation I For bass recorder, CD and Digital Delay Martin Parker and Laurie Crump, September 199

    Cyberflesh: The Me I\u27ve Made for You

    Get PDF
    This thesis deals with the impact of digital life on individuals by examining how real things become virtual bodies of information. Throughout this text, I weave key theoretical and literary references between my own thoughts and experiences which led to the work that appears in this book. I find it useful to attend to the spaces in between entities, theories, and technolo- gies. In these undefined spaces lies the spiritual dimension of media, the everyday magic that makes digital life possible. These ‘in-betweens’ are the site of transcendence granted by technology. In my practice, I’ve learned that things have a certain resistance to being captured, so the transcen- dence we gain is incomplete and muddled by corruption, distortion, and loss. I conclude with the suggestion of a new term to refer to the separate entity of data that constitutes a fractured and fragmented digital double. This cyberflesh is an evolving, vulnerable being, captured by media and contained within the global infrastructure of technology

    Classification of sound textures

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995.Includes bibliographical references (leaves 87-89).by Nicolas Saint-Arnaud.M.S
    • …
    corecore