1,252 research outputs found

    Scalable and perceptual audio compression

    Get PDF
    This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner

    On the Use of Perceptual Properties for Melody Estimation

    Get PDF
    cote interne IRCAM: Liao11aInternational audienceThis paper is about the use of perceptual principles for melody estimation. The melody stream is understood as generated by the most dominant source. Since the source with the strongest energy may not be perceptually the most dominant one, it is proposed to study the perceptual properties for melody estimation: loudness, masking effect and timbre similarity. The related criteria are integrated into a melody estimation system and their respective contributions are evaluated. The effectiveness of these perceptual criteria is confirmed by the evaluation results using more than one hundred excerpts of music recordings

    MEG, PSYCHOPHYSICAL AND COMPUTATIONAL STUDIES OF LOUDNESS, TIMBRE, AND AUDIOVISUAL INTEGRATION

    Get PDF
    Natural scenes and ecological signals are inherently complex and understanding of their perception and processing is incomplete. For example, a speech signal contains not only information at various frequencies, but is also not static; the signal is concurrently modulated temporally. In addition, an auditory signal may be paired with additional sensory information, as in the case of audiovisual speech. In order to make sense of the signal, a human observer must process the information provided by low-level sensory systems and integrate it across sensory modalities and with cognitive information (e.g., object identification information, phonetic information). The observer must then create functional relationships between the signals encountered to form a coherent percept. The neuronal and cognitive mechanisms underlying this integration can be quantified in several ways: by taking physiological measurements, assessing behavioral output for a given task and modeling signal relationships. While ecological tokens are complex in a way that exceeds our current understanding, progress can be made by utilizing synthetic signals that encompass specific essential features of ecological signals. The experiments presented here cover five aspects of complex signal processing using approximations of ecological signals : (i) auditory integration of complex tones comprised of different frequencies and component power levels; (ii) audiovisual integration approximating that of human speech; (iii) behavioral measurement of signal discrimination; (iv) signal classification via simple computational analyses and (v) neuronal processing of synthesized auditory signals approximating speech tokens. To investigate neuronal processing, magnetoencephalography (MEG) is employed to assess cortical processing non-invasively. Behavioral measures are employed to evaluate observer acuity in signal discrimination and to test the limits of perceptual resolution. Computational methods are used to examine the relationships in perceptual space and physiological processing between synthetic auditory signals, using features of the signals themselves as well as biologically-motivated models of auditory representation. Together, the various methodologies and experimental paradigms advance the understanding of ecological signal analytics concerning the complex interactions in ecological signal structure

    Computer Models for Musical Instrument Identification

    Get PDF
    PhDA particular aspect in the perception of sound is concerned with what is commonly termed as texture or timbre. From a perceptual perspective, timbre is what allows us to distinguish sounds that have similar pitch and loudness. Indeed most people are able to discern a piano tone from a violin tone or able to distinguish different voices or singers. This thesis deals with timbre modelling. Specifically, the formant theory of timbre is the main theme throughout. This theory states that acoustic musical instrument sounds can be characterised by their formant structures. Following this principle, the central point of our approach is to propose a computer implementation for building musical instrument identification and classification systems. Although the main thrust of this thesis is to propose a coherent and unified approach to the musical instrument identification problem, it is oriented towards the development of algorithms that can be used in Music Information Retrieval (MIR) frameworks. Drawing on research in speech processing, a complete supervised system taking into account both physical and perceptual aspects of timbre is described. The approach is composed of three distinct processing layers. Parametric models that allow us to represent signals through mid-level physical and perceptual representations are considered. Next, the use of the Line Spectrum Frequencies as spectral envelope and formant descriptors is emphasised. Finally, the use of generative and discriminative techniques for building instrument and database models is investigated. Our system is evaluated under realistic recording conditions using databases of isolated notes and melodic phrases

    Sound mosaics: a graphical user interface for sound synthesis based on audio-visual associations.

    Get PDF
    This thesis presents the design of a Graphical User Interface (GUI) for computer-based sound synthesis to support users in the externalisation of their musical ideas when interacting with the System in order to create and manipulate sound. The approach taken consisted of three research stages. The first stage was the formulation of a novel visualisation framework to display perceptual dimensions of sound in Visual terms. This framework was based on the findings of existing related studies and a series of empirical investigations of the associations between auditory and visual precepts that we performed for the first time in the area of computer-based sound synthesis. The results of our empirical investigations suggested associations between the colour dimensions of brightness and saturation with the auditory dimensions of pitch and loudness respectively, as well as associations between the multidimensional precepts of visual texture and timbre. The second stage of the research involved the design and implementation of Sound Mosaics, a prototype GUI for sound synthesis based on direct manipulation of visual representations that make use of the visualisation framework developed in the first stage. We followed an iterative design approach that involved the design and evaluation of an initial Sound Mosaics prototype. The insights gained during this first iteration assisted us in revising various aspects of the original design and visualisation framework that led to a revised implementation of Sound Mosaics. The final stage of this research involved an evaluation study of the revised Sound Mosaics prototype that comprised two controlled experiments. First, a comparison experiment with the widely used frequency-domain representations of sound indicated that visual representations created with Sound Mosaics were more comprehensible and intuitive. Comprehensibility was measured as the level of accuracy in a series of sound image association tasks, while intuitiveness was related to subjects' response times and perceived levels of confidence. Second, we conducted a formative evaluation of Sound Mosaics, in which it was exposed to a number of users with and without musical background. Three usability factors were measured: effectiveness, efficiency, and subjective satisfaction. Sound Mosaics was demonstrated to perform satisfactorily in ail three factors for music subjects, although non-music subjects yielded less satisfactory results that can be primarily attributed to the subjects' unfamiliarity with the task of sound synthesis. Overall, our research has set the necessary groundwork for empirically derived and validated associations between auditory and visual dimensions that can be used in the design of cognitively useful GUIs for computer-based sound synthesis and related area

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

    The Bird's Ear View: Audification for the Spectral Analysis of Heliospheric Time Series Data.

    Full text link
    The sciences are inundated with a tremendous volume of data, and the analysis of rapidly expanding data archives presents a persistent challenge. Previous research in the field of data sonification suggests that auditory display may serve a valuable function in the analysis of complex data sets. This dissertation uses the heliospheric sciences as a case study to empirically evaluate the use of audification (a specific form of sonification) for the spectral analysis of large time series. Three primary research questions guide this investigation, the first of which addresses the comparative capabilities of auditory and visual analysis methods in applied analysis tasks. A number of controlled within-subject studies revealed a strong correlation between auditory and visual observations, and demonstrated that auditory analysis provided a heightened sensitivity and accuracy in the detection of spectral features. The second research question addresses the capability of audification methods to reveal features that may be overlooked through visual analysis of spectrograms. A number of open-ended analysis tasks quantitatively demonstrated that participants using audification regularly discovered a greater percentage of embedded phenomena such as low-frequency wave storms. In addition, four case studies document collaborative research initiatives in which audification contributed to the acquisition of new domain-specific knowledge. The final question explores the potential benefits of audification when introduced into the workflow of a research scientist. A case study is presented in which a heliophysicist incorporated audification into their working practice, and the “Think-Aloud” protocol is applied to gain a sense for how audification augmented the researcher’s analytical abilities. Auditory observations are demonstrated to make significant contributions to ongoing research, including the detection of previously unidentified equipment-induced artifacts. This dissertation provides three primary contributions to the field: 1) an increased understanding of the comparative capabilities of auditory and visual analysis methods, 2) a methodological framework for conducting audification that may be transferred across scientific domains, and 3) a set of well-documented cases in which audification was applied to extract new knowledge from existing data archives. Collectively, this work presents a “bird’s ear view” afforded by audification methods—a macro understanding of time series data that preserves micro-level detail.PhDDesign ScienceUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111561/1/rlalexan_1.pd

    An approach toward function allocation between humans and machines in space station activities

    Get PDF
    Basic guidelines and data to assist in the allocation of functions between humans and automated systems in a manned permanent space station are provided. Human capabilities and limitations are described. Criteria and guidelines for various levels of automation and human participation are described. A collection of human factors data is included

    Psychophysiological Evidence of an Autocorrelation Mechanism in the Human Auditory System

    Get PDF
    This article details a model for evaluations of sound quality in the human auditory system. The model includes an autocorrelation function (ACF) mechanism. Thus, we conducted physiological and psychological experiments to search for evidence of the ACF mechanism in the human auditory system. To evaluate physiological responses related to the peak amplitude of the ACF of an auditory signal, which represents the degree of temporal regularity of the sound, we used magnetoencephalography (MEG) to record auditory evoked fields (AEFs). To evaluate psychological responses related to the envelope of the ACF of an auditory signal, which is a measure of the repetitive features of an auditory signal, we examined perceptions of loudness and annoyance. The results of the MEG experiments showed that the amplitude of the N1m, which is found above the left and right temporal lobes around 100 ms after stimulus onset, was a function of the peak amplitude and its delay time or the degree of envelope decay of the ACF. The results of the psychological experiments indicated that loudness and annoyance increased for sounds with envelope decay of the ACF in a certain range. These results suggest that an autocorrelation mechanism exists in the human auditory system
    • …
    corecore