460 research outputs found

    Cortical and subcortical speech-evoked responses in young and older adults: Effects of background noise, arousal states, and neural excitability

    Get PDF
    This thesis investigated how the brain processes speech signals in human adults across a wide age-range in the sensory auditory systems using electroencephalography (EEG). Two types of speech-evoked phase-locked responses were focused on: (i) cortical responses (theta-band phase-locked responses) that reflect processing of low-frequency slowly-varying envelopes of speech; (ii) subcortical/peripheral responses (frequency-following responses; FFRs) that reflect encoding of speech periodicity and temporal fine structure information. The aims are to elucidate how these neural activities are affected by different internal (aging, hearing loss, level of arousal and neural excitability) and external (background noise) factors during our daily life through three studies. Study 1 investigated theta-band phase-locking and FFRs in noisy environments in young and older adults. It investigated how aging and hearing loss affect these activities under quiet and noisy environments, and how these activities are associated with speech-in-noise perception. The results showed that ageing and hearing loss affect speech-evoked phase-locked responses through different mechanisms, and the effects of aging on cortical and subcortical activities take different roles in speech-in-noise perception. Study 2 investigated how level of arousal, or consciousness, affects phase-locked responses in young and older adults. The results showed that both theta-band phase-locking and FFRs decreases following decreases in the level of arousal. It was further found that neuro-regulatory role of sleep spindles on theta-band phase-locking is distinct between young and older adults, indicating that the mechanisms of neuro-regulation for phase-locked responses in different arousal states are age-dependent. Study 3 established a causal relationship between the auditory cortical excitability and FFRs using combined transcranial direct current stimulation (tDCS) and EEG. FFRs were measured before and after tDCS was applied over the auditory cortices. The results showed that changes in neural excitability of the right auditory cortex can alter FFR magnitudes along the contralateral pathway. This shows important theoretical and clinical implications that causally link functions of auditory cortex with neural encoding of speech periodicity. Taken together, findings of this thesis will advance our understanding of how speech signals are processed via neural phase-locking in our everyday life across the lifespan

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

    Automatic annotation of musical audio for interactive applications

    Get PDF
    PhDAs machines become more and more portable, and part of our everyday life, it becomes apparent that developing interactive and ubiquitous systems is an important aspect of new music applications created by the research community. We are interested in developing a robust layer for the automatic annotation of audio signals, to be used in various applications, from music search engines to interactive installations, and in various contexts, from embedded devices to audio content servers. We propose adaptations of existing signal processing techniques to a real time context. Amongst these annotation techniques, we concentrate on low and mid-level tasks such as onset detection, pitch tracking, tempo extraction and note modelling. We present a framework to extract these annotations and evaluate the performances of different algorithms. The first task is to detect onsets and offsets in audio streams within short latencies. The segmentation of audio streams into temporal objects enables various manipulation and analysis of metrical structure. Evaluation of different algorithms and their adaptation to real time are described. We then tackle the problem of fundamental frequency estimation, again trying to reduce both the delay and the computational cost. Different algorithms are implemented for real time and experimented on monophonic recordings and complex signals. Spectral analysis can be used to label the temporal segments; the estimation of higher level descriptions is approached. Techniques for modelling of note objects and localisation of beats are implemented and discussed. Applications of our framework include live and interactive music installations, and more generally tools for the composers and sound engineers. Speed optimisations may bring a significant improvement to various automated tasks, such as automatic classification and recommendation systems. We describe the design of our software solution, for our research purposes and in view of its integration within other systems.EU-FP6-IST-507142 project SIMAC (Semantic Interaction with Music Audio Contents); EPSRC grants GR/R54620; GR/S75802/01

    Modelling Professional Singers: A Bayesian Machine Learning Approach with Enhanced Real-time Pitch Contour Extraction and Onset Processing from an Extended Dataset.

    Get PDF
    Singing signals are one of the input data that computer systems need to analyse, and singing is part of all the cultures in the world. However, although there have been several studies on audio signal processing during the last three decades, it is still an active research area because most of the available algorithms in the literature require improvement due to the complexity of audio/music signals. More efforts are needed for analysing sounds/music in a real-time environment since the algorithms should work only on the past data, while in an offline system, all the required data are available. In addition, the complexity of the data will be increased if the audio signals come from singing due to the unique features of singing signals (such as vocal system, vibration, pitch drift, and tuning approach) that make the signals different and more complicated than those from an instrument. This thesis is mainly focused on analysing singing signals and better understanding how trained- professional singers sing the pitch frequency and duration of the notes according to their position in a piece of music and the singing technique applied. To do this, it is discovered that by incorporating singing features, such as gender and BPM, a real-time pitch detection algorithm can be found to estimate fundamental frequencies with fewer errors. In addition, two novel algorithms were proposed, one for smoothing pitch contours and another for estimating onset, offset, and the transition between notes. These two algorithms showed better results as compared to several other state-of-the-art algorithms. Moreover, a new vocal dataset that included several annotations for 2688 singing files was published. Finally, this thesis presents two models for calculating pitches and the duration of notes according to their positions in a piece of music. In conclusion, optimizing results for pitch-oriented Music Information Retrieval (MIR) algorithms necessitates adapting/selecting them based on the unique characteristics of the signals. Achieving a universal algorithm that performs exceptionally well on all data types remains a formidable challenge given the current state of technology

    Automatic characterization and generation of music loops and instrument samples for electronic music production

    Get PDF
    Repurposing audio material to create new music - also known as sampling - was a foundation of electronic music and is a fundamental component of this practice. Currently, large-scale databases of audio offer vast collections of audio material for users to work with. The navigation on these databases is heavily focused on hierarchical tree directories. Consequently, sound retrieval is tiresome and often identified as an undesired interruption in the creative process. We address two fundamental methods for navigating sounds: characterization and generation. Characterizing loops and one-shots in terms of instruments or instrumentation allows for organizing unstructured collections and a faster retrieval for music-making. The generation of loops and one-shot sounds enables the creation of new sounds not present in an audio collection through interpolation or modification of the existing material. To achieve this, we employ deep-learning-based data-driven methodologies for classification and generation.Repurposing audio material to create new music - also known as sampling - was a foundation of electronic music and is a fundamental component of this practice. Currently, large-scale databases of audio offer vast collections of audio material for users to work with. The navigation on these databases is heavily focused on hierarchical tree directories. Consequently, sound retrieval is tiresome and often identified as an undesired interruption in the creative process. We address two fundamental methods for navigating sounds: characterization and generation. Characterizing loops and one-shots in terms of instruments or instrumentation allows for organizing unstructured collections and a faster retrieval for music-making. The generation of loops and one-shot sounds enables the creation of new sounds not present in an audio collection through interpolation or modification of the existing material. To achieve this, we employ deep-learning-based data-driven methodologies for classification and generation

    A computational framework for sound segregation in music signals

    Get PDF
    Tese de doutoramento. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200

    Tangible interface for composing music with limited degrees of freedom

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 83-88).This thesis presents SoundStrand, a novel tangible interface for composing music. A new paradigm is also presented - one that allows for music composition with limited degrees of freedom, and therefore is well suited for music creation through the use of tangible interfaces. SoundStrand is comprised of a set of building blocks that represent pre-composed musical segments. By sequentially connecting building blocks to one another, the user arranges these segments into a musical theme; and by individually twisting, stretching and bending the blocks, variations of the melodic, harmonic and rhythmic content are introduced. Software tools are made available to program the musical segments and govern SoundStrand's behavior. Additional work, namely the Coda system, is presented in order to put SoundStrand and the described paradigm in a wider context as tools for music sharing and learning.by Eyal Shahar.S.M
    corecore