155 research outputs found

    Modeling the auditory organization of speech: a summary and some comments

    Get PDF
    The preceding three chapters have been concerned with the issues arising as a result of the inconvenient fact that our ears are rarely presented with the sound of a single speaker in isolation, but more often with a combination of several speech and nonspeech sounds which may also have been further altered by the acoustic environment. Faced with such a mixture, the listener evidently needs to consider each source separately, and this process of information segregation is known as auditory organization or auditory scene analysis (Bregman, 1990). Pure curiosity as well as the possibility of applications in automatic signal interpretation drive us to investigate auditory scene analysis through psychological experiments and computational modeling. Having sketched this framework and the current limits to our understanding of the process of auditory organization, we can now examine the material of each of the three chapters in more detail, seeing how it fits into this framework and also where the framework may be inadequate. Following these discussions, we will conclude with some remarks suggested by the particular combination of results in this section

    Computational Models of Auditory Scene Analysis: A Review

    Get PDF
    Auditory scene analysis (ASA) refers to the process(es) of parsing the complex acoustic input into auditory perceptual objects representing either physical sources or temporal sound patterns, such as melodies, which contributed to the sound waves reaching the ears. A number of new computational models accounting for some of the perceptual phenomena of ASA have been published recently. Here we provide a theoretically motivated review of these computational models, aiming to relate their guiding principles to the central issues of the theoretical framework of ASA. Specifically, we ask how they achieve the grouping and separation of sound elements and whether they implement some form of competition between alternative interpretations of the sound input. We consider the extent to which they include predictive processes, as important current theories suggest that perception is inherently predictive, and also how they have been evaluated. We conclude that current computational models of ASA are fragmentary in the sense that rather than providing general competing interpretations of ASA, they focus on assessing the utility of specific processes (or algorithms) for finding the causes of the complex acoustic signal. This leaves open the possibility for integrating complementary aspects of the models into a more comprehensive theory of ASA

    Auditory Perceptual Organisation

    Get PDF
    Traveling pressure waves (ie. sounds) are produced by the movements or actions of objects. So sounds primarily convey information about what is happening in the environment. In addition, some information about the structure of the environment and the surface features of objects can be extracted by determining how the original (self-generated or exogenous) sounds are filtered or distorted by the environment (e.g. the notion of “acoustic daylight,” (Fay 2009)). In this article we consider how the auditory systems processes sound signals to extract information about the environment and the objects within it

    Neural Basis and Computational Strategies for Auditory Processing

    Get PDF
    Our senses are our window to the world, and hearing is the window through which we perceive the world of sound. While seemingly effortless, the process of hearing involves complex transformations by which the auditory system consolidates acoustic information from the environment into perceptual and cognitive experiences. Studies of auditory processing try to elucidate the mechanisms underlying the function of the auditory system, and infer computational strategies that are valuable both clinically and intellectually, hence contributing to our understanding of the function of the brain. In this thesis, we adopt both an experimental and computational approach in tackling various aspects of auditory processing. We first investigate the neural basis underlying the function of the auditory cortex, and explore the dynamics and computational mechanisms of cortical processing. Our findings offer physiological evidence for a role of primary cortical neurons in the integration of sound features at different time constants, and possibly in the formation of auditory objects. Based on physiological principles of sound processing, we explore computational implementations in tackling specific perceptual questions. We exploit our knowledge of the neural mechanisms of cortical auditory processing to formulate models addressing the problems of speech intelligibility and auditory scene analysis. The intelligibility model focuses on a computational approach for evaluating loss of intelligibility, inspired from mammalian physiology and human perception. It is based on a multi-resolution filter-bank implementation of cortical response patterns, which extends into a robust metric for assessing loss of intelligibility in communication channels and speech recordings. This same cortical representation is extended further to develop a computational scheme for auditory scene analysis. The model maps perceptual principles of auditory grouping and stream formation into a computational system that combines aspects of bottom-up, primitive sound processing with an internal representation of the world. It is based on a framework of unsupervised adaptive learning with Kalman estimation. The model is extremely valuable in exploring various aspects of sound organization in the brain, allowing us to gain interesting insight into the neural basis of auditory scene analysis, as well as practical implementations for sound separation in ``cocktail-party'' situations

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    PITCH ESTIMATION FOR NOISY SPEECH

    Get PDF
    In this dissertation a biologically plausible system of pitch estimation is proposed. The system is designed from the bottom up to be robust to challenging noise conditions. This robustness to the presence of noise in the signal is achieved by developing a new representation of the speech signal, based on the operation of damped harmonic oscillators, and temporal mode analysis of their output. This resulting representation is shown to possess qualities which are not degraded in presence of noise. A harmonic grouping based system is used to estimate the pitch frequency. A detailed statistical analysis is performed on the system, and performance compared with some of the most established and recent pitch estimation and tracking systems. The detailed analysis includes results of experiments with a variety of noises with a large range of signal to noise ratios, under different signal conditions. Situations where the interfering "noise" is speech from another speaker are also considered. The proposed system is able to estimate the pitch of both the main speaker, and the interfering speaker, thus emulating the phenomena of auditory streaming and "cocktail party effect" in terms of pitch perception. The results of the extensive statistical analysis show that the proposed system exhibits some very interesting properties in its ability of handling noise. The results also show that the proposed system’s overall performance is much better than any of the other systems tested, especially in presence of very large amounts of noise. The system is also shown to successfully simulate some very interesting psychoacoustical pitch perception phenomena. Through a detailed and comparative computational requirements analysis, it is also demonstrated that the proposed system is comparatively inexpensive in terms of processing and memory requirements

    Analysis and resynthesis of polyphonic music

    Get PDF
    This thesis examines applications of Digital Signal Processing to the analysis, transformation, and resynthesis of musical audio. First I give an overview of the human perception of music. I then examine in detail the requirements for a system that can analyse, transcribe, process, and resynthesise monaural polyphonic music. I then describe and compare the possible hardware and software platforms. After this I describe a prototype hybrid system that attempts to carry out these tasks using a method based on additive synthesis. Next I present results from its application to a variety of musical examples, and critically assess its performance and limitations. I then address these issues in the design of a second system based on Gabor wavelets. I conclude by summarising the research and outlining suggestions for future developments

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy. This edition celebrates twenty years of uninterrupted and succesfully research in the field of voice analysis

    The Case of the Missing Pitch Templates: How Harmonic Templates Emerge in the Early Auditory System

    Get PDF
    Periodicity pitch is the most salient and important of all pitch percepts.Psycho-acoustical models of this percept have long postulated the existenceof internalized harmonic templates against which incoming resolved spectracan be compared, and pitch determined according to the best matchingtemplates cite{goldstein:pitch}. However, it has been a mystery where andhow such harmonic templates can come about. Here we present a biologicallyplausible model for how such templates can form in the early stages of theauditory system. The model demonstrates that {it any} broadband stimulussuch as noise or random click trains, suffices for generating thetemplates, and that there is no need for any delay-lines, oscillators, orother neural temporal structures. The model consists of two key stages:cochlear filtering followed by coincidence detection. The cochlear stageprovides responses analogous to those seen on the auditory-nerve andcochlear nucleus. Specifically, it performs moderately sharp frequencyanalysis via a filter-bank with tonotopically ordered center frequencies(CFs); the rectified and phase-locked filter responses are further enhancedtemporally to resemble the synchronized responses of cells in the cochlearnucleus. The second stage is a matrix of coincidence detectors thatcompute the average pair-wise instantaneous correlation (or product)between responses from all CFs across the channels. Model simulations showthat for any broadband stimulus, high coincidences occur between cochlearchannels that are exactly harmonic distances apart. Accumulatingcoincidences over time results in the formation of harmonic templates forall fundamental frequencies in the phase-locking frequency range. Themodel explains the critical role played by three subtle but importantfactors in cochlear function: the nonlinear transformations following thefiltering stage; the rapid phase-shifts of the traveling wave near itsresonance; and the spectral resolution of the cochlear filters. Finally, wediscuss the physiological correlates and location of such a process and itsresulting templates

    Computational Methods for Cognitive and Cooperative Robotics

    Get PDF
    In the last decades design methods in control engineering made substantial progress in the areas of robotics and computer animation. Nowadays these methods incorporate the newest developments in machine learning and artificial intelligence. But the problems of flexible and online-adaptive combinations of motor behaviors remain challenging for human-like animations and for humanoid robotics. In this context, biologically-motivated methods for the analysis and re-synthesis of human motor programs provide new insights in and models for the anticipatory motion synthesis. This thesis presents the author’s achievements in the areas of cognitive and developmental robotics, cooperative and humanoid robotics and intelligent and machine learning methods in computer graphics. The first part of the thesis in the chapter “Goal-directed Imitation for Robots” considers imitation learning in cognitive and developmental robotics. The work presented here details the author’s progress in the development of hierarchical motion recognition and planning inspired by recent discoveries of the functions of mirror-neuron cortical circuits in primates. The overall architecture is capable of ‘learning for imitation’ and ‘learning by imitation’. The complete system includes a low-level real-time capable path planning subsystem for obstacle avoidance during arm reaching. The learning-based path planning subsystem is universal for all types of anthropomorphic robot arms, and is capable of knowledge transfer at the level of individual motor acts. Next, the problems of learning and synthesis of motor synergies, the spatial and spatio-temporal combinations of motor features in sequential multi-action behavior, and the problems of task-related action transitions are considered in the second part of the thesis “Kinematic Motion Synthesis for Computer Graphics and Robotics”. In this part, a new approach of modeling complex full-body human actions by mixtures of time-shift invariant motor primitives in presented. The online-capable full-body motion generation architecture based on dynamic movement primitives driving the time-shift invariant motor synergies was implemented as an online-reactive adaptive motion synthesis for computer graphics and robotics applications. The last chapter of the thesis entitled “Contraction Theory and Self-organized Scenarios in Computer Graphics and Robotics” is dedicated to optimal control strategies in multi-agent scenarios of large crowds of agents expressing highly nonlinear behaviors. This last part presents new mathematical tools for stability analysis and synthesis of multi-agent cooperative scenarios.In den letzten Jahrzehnten hat die Forschung in den Bereichen der Steuerung und Regelung komplexer Systeme erhebliche Fortschritte gemacht, insbesondere in den Bereichen Robotik und Computeranimation. Die Entwicklung solcher Systeme verwendet heutzutage neueste Methoden und Entwicklungen im Bereich des maschinellen Lernens und der künstlichen Intelligenz. Die flexible und echtzeitfähige Kombination von motorischen Verhaltensweisen ist eine wesentliche Herausforderung für die Generierung menschenähnlicher Animationen und in der humanoiden Robotik. In diesem Zusammenhang liefern biologisch motivierte Methoden zur Analyse und Resynthese menschlicher motorischer Programme neue Erkenntnisse und Modelle für die antizipatorische Bewegungssynthese. Diese Dissertation präsentiert die Ergebnisse der Arbeiten des Autors im Gebiet der kognitiven und Entwicklungsrobotik, kooperativer und humanoider Robotersysteme sowie intelligenter und maschineller Lernmethoden in der Computergrafik. Der erste Teil der Dissertation im Kapitel “Zielgerichtete Nachahmung für Roboter” behandelt das Imitationslernen in der kognitiven und Entwicklungsrobotik. Die vorgestellten Arbeiten beschreiben neue Methoden für die hierarchische Bewegungserkennung und -planung, die durch Erkenntnisse zur Funktion der kortikalen Spiegelneuronen-Schaltkreise bei Primaten inspiriert wurden. Die entwickelte Architektur ist in der Lage, ‘durch Imitation zu lernen’ und ‘zu lernen zu imitieren’. Das komplette entwickelte System enthält ein echtzeitfähiges Pfadplanungssubsystem zur Hindernisvermeidung während der Durchführung von Armbewegungen. Das lernbasierte Pfadplanungssubsystem ist universell und für alle Arten von anthropomorphen Roboterarmen in der Lage, Wissen auf der Ebene einzelner motorischer Handlungen zu übertragen. Im zweiten Teil der Arbeit “Kinematische Bewegungssynthese für Computergrafik und Robotik” werden die Probleme des Lernens und der Synthese motorischer Synergien, d.h. von räumlichen und räumlich-zeitlichen Kombinationen motorischer Bewegungselemente bei Bewegungssequenzen und bei aufgabenbezogenen Handlungs übergängen behandelt. Es wird ein neuer Ansatz zur Modellierung komplexer menschlicher Ganzkörperaktionen durch Mischungen von zeitverschiebungsinvarianten Motorprimitiven vorgestellt. Zudem wurde ein online-fähiger Synthesealgorithmus für Ganzköperbewegungen entwickelt, der auf dynamischen Bewegungsprimitiven basiert, die wiederum auf der Basis der gelernten verschiebungsinvarianten Primitive konstruiert werden. Dieser Algorithmus wurde für verschiedene Probleme der Bewegungssynthese für die Computergrafik- und Roboteranwendungen implementiert. Das letzte Kapitel der Dissertation mit dem Titel “Kontraktionstheorie und selbstorganisierte Szenarien in der Computergrafik und Robotik” widmet sich optimalen Kontrollstrategien in Multi-Agenten-Szenarien, wobei die Agenten durch eine hochgradig nichtlineare Kinematik gekennzeichnet sind. Dieser letzte Teil präsentiert neue mathematische Werkzeuge für die Stabilitätsanalyse und Synthese von kooperativen Multi-Agenten-Szenarien
    corecore