144,665 research outputs found

    Comparison of input devices in an ISEE direct timbre manipulation task

    Get PDF
    The representation and manipulation of sound within multimedia systems is an important and currently under-researched area. The paper gives an overview of the authors' work on the direct manipulation of audio information, and describes a solution based upon the navigation of four-dimensional scaled timbre spaces. Three hardware input devices were experimentally evaluated for use in a timbre space navigation task: the Apple Standard Mouse, Gravis Advanced Mousestick II joystick (absolute and relative) and the Nintendo Power Glove. Results show that the usability of these devices significantly affected the efficacy of the system, and that conventional low-cost, low-dimensional devices provided better performance than the low-cost, multidimensional dataglove

    Enhancing timbre model using MFCC and its time derivatives for music similarity estimation

    No full text
    One of the popular methods for content-based music similarity estimation is to model timbre with MFCC as a single multivariate Gaussian with full covariance matrix, then use symmetric Kullback-Leibler divergence. From the field of speech recognition, we propose to use the same approach on the MFCCs’ time derivatives to enhance the timbre model. The Gaussian models for the delta and acceleration coefficients are used to create their respective distance matrix. The distance matrices are then combined linearly to form a full distance matrix for music similarity estimation. In our experiments on two datasets, our novel approach performs better than using MFCC alone.Moreover, performing genre classification using k-NN showed that the accuracies obtained are already close to the state-of-the-art

    TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer

    Full text link
    In this work, we address the problem of musical timbre transfer, where the goal is to manipulate the timbre of a sound sample from one instrument to match another instrument while preserving other musical content, such as pitch, rhythm, and loudness. In principle, one could apply image-based style transfer techniques to a time-frequency representation of an audio signal, but this depends on having a representation that allows independent manipulation of timbre as well as high-quality waveform generation. We introduce TimbreTron, a method for musical timbre transfer which applies "image" domain style transfer to a time-frequency representation of the audio signal, and then produces a high-quality waveform using a conditional WaveNet synthesizer. We show that the Constant Q Transform (CQT) representation is particularly well-suited to convolutional architectures due to its approximate pitch equivariance. Based on human perceptual evaluations, we confirmed that TimbreTron recognizably transferred the timbre while otherwise preserving the musical content, for both monophonic and polyphonic samples.Comment: 17 pages, published as a conference paper at ICLR 201
    corecore