393 research outputs found

    Transient and steady-state component separation for audio signals

    Get PDF
    In this work the problem of transient and steady-state component separation of an audio signal was addressed. In particular, a recently proposed method for separation of transient and steady-state components based on the median filter was investigated. For a better understanding of the processes involved, a modification of the filtering stage of the algorithm was proposed. This modification was evaluated subjectively by listening tests and objectively by an application-based comparison. Also some extensions to the model were presented in conjunction with different possible applications for the transient and steady-state decomposition in the area of audio editing and processing

    A review of differentiable digital signal processing for music and speech synthesis

    Get PDF
    The term “differentiable digital signal processing” describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music and speech synthesis. We catalogue applications to tasks including music performance rendering, sound matching, and voice transformation, discussing the motivations for and implications of the use of this methodology. This is accompanied by an overview of digital signal processing operations that have been implemented differentiably, which is further supported by a web book containing practical advice on differentiable synthesiser programming (https://intro2ddsp.github.io/). Finally, we highlight open challenges, including optimisation pathologies, robustness to real-world conditions, and design trade-offs, and discuss directions for future research

    Separation of musical sources and structure from single-channel polyphonic recordings

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

    DDSP-Piano: A Neural Sound Synthesizer Informed by Instrument Knowledge

    Get PDF
    Instrument sound synthesis using deep neural networks has received numerous improvements over the last couple of years. Among them, the Differentiable Digital Signal Processing (DDSP) framework has modernized the spectral modeling paradigm by including signal-based synthesizers and effects into fully differentiable architectures. The present work extends the applications of DDSP to the task of polyphonic sound synthesis, with the proposal of a differentiable piano synthesizer conditioned on MIDI inputs. The model architecture is motivated by high-level acoustic modeling knowledge of the instrument, which, along with the sound structure priors inherent to the DDSP components, makes for a lightweight, interpretable, and realistic-sounding piano model. A subjective listening test has revealed that the proposed approach achieves better sound quality than a state-of-the-art neural-based piano synthesizer, but physical-modeling-based models still hold the best quality. Leveraging its interpretability and modularity, a qualitative analysis of the model behavior was also conducted: it highlights where additional modeling knowledge and optimization procedures could be inserted in order to improve the synthesis quality and the manipulation of sound properties. Eventually, the proposed differentiable synthesizer can be further used with other deep learning models for alternative musical tasks handling polyphonic audio and symbolic data

    SOUND SYNTHESIS WITH CELLULAR AUTOMATA

    Get PDF
    This thesis reports on new music technology research which investigates the use of cellular automata (CA) for the digital synthesis of dynamic sounds. The research addresses the problem of the sound design limitations of synthesis techniques based on CA. These limitations fundamentally stem from the unpredictable and autonomous nature of these computational models. Therefore, the aim of this thesis is to develop a sound synthesis technique based on CA capable of allowing a sound design process. A critical analysis of previous research in this area will be presented in order to justify that this problem has not been previously solved. Also, it will be discussed why this problem is worthwhile to solve. In order to achieve such aim, a novel approach is proposed which considers the output of CA as digital signals and uses DSP procedures to analyse them. This approach opens a large variety of possibilities for better understanding the self-organization process of CA with a view to identifying not only mapping possibilities for making the synthesis of sounds possible, but also control possibilities which enable a sound design process. As a result of this approach, this thesis presents a technique called Histogram Mapping Synthesis (HMS), which is based on the statistical analysis of CA evolutions by histogram measurements. HMS will be studied with four different automatons, and a considerable number of control mechanisms will be presented. These will show that HMS enables a reasonable sound design process. With these control mechanisms it is possible to design and produce in a predictable and controllable manner a variety of timbres. Some of these timbres are imitations of sounds produced by acoustic means and others are novel. All the sounds obtained present dynamic features and many of them, including some of those that are novel, retain important characteristics of sounds produced by acoustic means

    Towards the automated analysis of simple polyphonic music : a knowledge-based approach

    Get PDF
    PhDMusic understanding is a process closely related to the knowledge and experience of the listener. The amount of knowledge required is relative to the complexity of the task in hand. This dissertation is concerned with the problem of automatically decomposing musical signals into a score-like representation. It proposes that, as with humans, an automatic system requires knowledge about the signal and its expected behaviour to correctly analyse music. The proposed system uses the blackboard architecture to combine the use of knowledge with data provided by the bottom-up processing of the signal's information. Methods are proposed for the estimation of pitches, onset times and durations of notes in simple polyphonic music. A method for onset detection is presented. It provides an alternative to conventional energy-based algorithms by using phase information. Statistical analysis is used to create a detection function that evaluates the expected behaviour of the signal regarding onsets. Two methods for multi-pitch estimation are introduced. The first concentrates on the grouping of harmonic information in the frequency-domain. Its performance and limitations emphasise the case for the use of high-level knowledge. This knowledge, in the form of the individual waveforms of a single instrument, is used in the second proposed approach. The method is based on a time-domain linear additive model and it presents an alternative to common frequency-domain approaches. Results are presented and discussed for all methods, showing that, if reliably generated, the use of knowledge can significantly improve the quality of the analysis.Joint Information Systems Committee (JISC) in the UK National Science Foundation (N.S.F.) in the United states. Fundacion Gran Mariscal Ayacucho in Venezuela

    Assessing Music Perception in Young Children: Evidence for and Psychometric Features of the M-Factor

    Get PDF
    Given the relationship between language acquisition and music processing, musical perception (MP) skills have been proposed as a tool for early diagnosis of speech and language difficulties; therefore, a psychometric instrument is needed to assess music perception in children under 10 years of age, a crucial period in neurodevelopment. We created a set of 80 musical stimuli encompassing seven domains of music perception to inform perception of tonal, atonal, and modal stimuli, in a random sample of 1006 children, 6–13 years of age, equally distributed from first to fifth grades, from 14 schools (38% private schools) in So Paulo State. The underlying model was tested using confirmatory factor analysis. A model encompassing seven orthogonal specific domains (contour, loudness, scale, timbre, duration, pitch, and meter) and one general music perception factor, the “m-factor,” showed excellent fit indices. The m-factor, previously hypothesized in the literature but never formally tested, explains 93% of the reliable variance in measurement, while only 3.9% of the reliable variance could be attributed to the multidimensionality caused by the specific domains. The 80 items showed no differential item functioning based on sex, age, or enrolment in public vs. private school, demonstrating the important psychometric feature of invariance. Like Charles Spearman's g-factor of intelligence, the m-factor is robust and reliable. It provides a convenient measure of auditory stimulus apprehension that does not rely on verbal information, offering a new opportunity to probe biological and psychological relationships with music perception phenomena and the etiologies of speech and language disorders

    Analysis, Visualization, and Transformation of Audio Signals Using Dictionary-based Methods

    Get PDF
    date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +0000date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +000
    • …
    corecore