144,665 research outputs found
Recommended from our members
A Critical Analysis of Synthesizer User Interfaces for Timbre
In this paper, we review and analyse categories of user interface used in hardware and software electronic music synthesizers. Problems with the user specification and modification of timbre are discussed. Three principal types of user interface for controlling timbre are distinguished. A problem common to all three categories is identified: that the core language of each category has no well-defined mapping onto the task languages of subjective timbre categories as used by musicians
Comparison of input devices in an ISEE direct timbre manipulation task
The representation and manipulation of sound within multimedia systems is an important and currently under-researched area. The paper gives an overview of the authors' work on the direct manipulation of audio information, and describes a solution based upon the navigation of four-dimensional scaled timbre spaces. Three hardware input devices were experimentally evaluated for use in a timbre space navigation task: the Apple Standard Mouse, Gravis Advanced Mousestick II joystick (absolute and relative) and the Nintendo Power Glove. Results show that the usability of these devices significantly affected the efficacy of the system, and that conventional low-cost, low-dimensional devices provided better performance than the low-cost, multidimensional dataglove
Enhancing timbre model using MFCC and its time derivatives for music similarity estimation
One of the popular methods for content-based music similarity estimation is to model timbre with MFCC as a single multivariate Gaussian with full covariance matrix, then use symmetric Kullback-Leibler divergence. From the field of speech recognition, we propose to use the same approach on the MFCCs’ time derivatives to enhance the timbre model. The Gaussian models for the delta and acceleration coefficients are used to create their respective distance matrix. The distance matrices are then combined linearly to form a full distance matrix for music similarity estimation. In our experiments on two datasets, our novel approach performs better than using MFCC alone.Moreover, performing genre classification using k-NN showed that the accuracies obtained are already close to the state-of-the-art
TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer
In this work, we address the problem of musical timbre transfer, where the
goal is to manipulate the timbre of a sound sample from one instrument to match
another instrument while preserving other musical content, such as pitch,
rhythm, and loudness. In principle, one could apply image-based style transfer
techniques to a time-frequency representation of an audio signal, but this
depends on having a representation that allows independent manipulation of
timbre as well as high-quality waveform generation. We introduce TimbreTron, a
method for musical timbre transfer which applies "image" domain style transfer
to a time-frequency representation of the audio signal, and then produces a
high-quality waveform using a conditional WaveNet synthesizer. We show that the
Constant Q Transform (CQT) representation is particularly well-suited to
convolutional architectures due to its approximate pitch equivariance. Based on
human perceptual evaluations, we confirmed that TimbreTron recognizably
transferred the timbre while otherwise preserving the musical content, for both
monophonic and polyphonic samples.Comment: 17 pages, published as a conference paper at ICLR 201
Recommended from our members
Timbre space as synthesis space: towards a navigation based approach to timbre specification
Much research into timbre, its perception and classification over the last forty years has modelled timbre as an n-dimensional co-ordinate space or timbre space, whose axes are measurable acoustical quantities (variously, spectral density, simultaneity of partial onsets etc). Typically, these spaces have been constructed from data generated from similarity/dissimilarity listening tests, using multidimensional scaling (MDS) analysis techniques. Our current research is the computer assisted synthesis of new timbres using a timbre space search strategy, in which a previously constructed simple timbre space is used as a search space by an algorithm designed to synthesize desired new timbres steered by iterative user input. The success of such an algorithm clearly depends on establishing suitable mapping between its quantifiable features and its perceptual features. We therefore present here, firstly, some of the findings of a series of listening tests aimed at establishing the perceptual topography and granularity of a simple, predefined timbre space, and secondly, the results of preliminary tests of two search strategies designed to navigate this space. The behaviour of these strategies in a circumscribed space of this kind, together with the corresponding user experience is intended to provide a baseline to applications in a more complex space
Recommended from our members
Searching for pitch invariant representations in auditory cortex [oral presentation]
Pitch constancy relates to perceiving the same pitch from tones with differing spectral shapes and is one key criteria for identifying a pitch selective neural representation in auditory cortex. Here we used an event-related potential (ERP) adaptation study and a behavioural task (target same/different) to investigate whether pitch coding is invariant to changes in timbre. Adaptation is observed as a decrease in N100-P200 when the same stimulus is repeated because overlapping neuronal populations encode the stimulus. Reduced adaptation indicates that new neuronal populations are recruited to encode a change in an acoustic feature of interest (i.e. pitch, timbre or both). If neurons are selective to pitch (invariant to timbre), reduced adaptation should occur for pitch changes only. If selective to both (non-invariant to timbre), reduced adaptation should occur for pitch and timbre changes. Similarly, stimulus discrimination during the behavioural task should not require any additional processing resources if neurons are selective to pitch only, and hence reaction times and accuracy should be equivalent across conditions. If neurons are selective to both pitch and timbre, longer reaction times and poorer accuracy should be observed for timbre changes. We found reduced adaptation in the N100-P200 and increased reaction times and poorer accuracy for timbre changes. This suggests that neurons in auditory cortex are selective to both pitch and timbre, i.e. pitch coding is non-invariant to timbre. This supports recent evidence suggesting interdependence between pitch
- …
