155 research outputs found
Modeling the auditory organization of speech: a summary and some comments
The preceding three chapters have been concerned with the issues arising as a result of the inconvenient fact that our ears are rarely presented with the sound of a single speaker in isolation, but more often with a combination of several speech and nonspeech sounds which may also have been further altered by the acoustic environment. Faced with such a mixture, the listener evidently needs to consider each source separately, and this process of information segregation is known as auditory organization or auditory scene analysis (Bregman, 1990). Pure curiosity as well as the possibility of applications in automatic signal interpretation drive us to investigate auditory scene analysis through psychological experiments and computational modeling. Having sketched this framework and the current limits to our understanding of the process of auditory organization, we can now examine the material of each of the three chapters in more detail, seeing how it fits into this framework and also where the framework may be inadequate. Following these discussions, we will conclude with some remarks suggested by the particular combination of results in this section
Computational Models of Auditory Scene Analysis: A Review
Auditory scene analysis (ASA) refers to the process(es) of parsing the complex acoustic input into auditory perceptual objects representing either physical sources or temporal sound patterns, such as melodies, which contributed to the sound waves reaching the ears. A number of new computational models accounting for some of the perceptual phenomena of ASA have been published recently. Here we provide a theoretically motivated review of these computational models, aiming to relate their guiding principles to the central issues of the theoretical framework of ASA. Specifically, we ask how they achieve the grouping and separation of sound elements and whether they implement some form of competition between alternative interpretations of the sound input. We consider the extent to which they include predictive processes, as important current theories suggest that perception is inherently predictive, and also how they have been evaluated. We conclude that current computational models of ASA are fragmentary in the sense that rather than providing general competing interpretations of ASA, they focus on assessing the utility of specific processes (or algorithms) for finding the causes of the complex acoustic signal. This leaves open the possibility for integrating complementary aspects of the models into a more comprehensive theory of ASA
Auditory Perceptual Organisation
Traveling pressure waves (ie. sounds) are produced by the movements or actions of objects. So sounds primarily convey information about what is happening in the environment. In addition, some information about the structure of the environment and the surface features of objects can be extracted by determining how the original (self-generated or exogenous) sounds are filtered or distorted by the environment (e.g. the notion of “acoustic daylight,” (Fay 2009)). In this article we consider how the auditory systems processes sound signals to extract information about the environment and the objects within it
Neural Basis and Computational Strategies for Auditory Processing
Our senses are our window to the world, and hearing is the window through which we perceive the world of sound. While seemingly effortless, the process of hearing involves complex transformations by which the auditory system consolidates acoustic information from the environment into perceptual and cognitive experiences. Studies of auditory processing try to elucidate the mechanisms underlying the function of the auditory system, and infer computational strategies that are valuable both clinically and intellectually, hence contributing to our understanding of the function of the brain.
In this thesis, we adopt both an experimental and computational approach in tackling various aspects of auditory processing. We first investigate the neural basis underlying the function of the auditory cortex, and explore the dynamics and computational mechanisms of cortical processing. Our findings offer physiological evidence for a role of primary cortical neurons in the integration of sound features at different time constants, and possibly in the formation of auditory objects.
Based on physiological principles of sound processing, we explore computational implementations in tackling specific perceptual questions. We exploit our knowledge of the neural mechanisms of cortical auditory processing to formulate models addressing the problems of speech intelligibility and auditory scene analysis. The intelligibility model focuses on a computational approach for evaluating loss of intelligibility, inspired from mammalian physiology and human perception. It is based on a multi-resolution filter-bank implementation of cortical response patterns, which extends into a robust metric for assessing loss of intelligibility in communication channels and speech recordings.
This same cortical representation is extended further to develop a computational scheme for auditory scene analysis. The model maps perceptual principles of auditory grouping and stream formation into a computational system that combines aspects of bottom-up, primitive sound processing with an internal representation of the world. It is based on a framework of unsupervised adaptive learning with Kalman estimation. The model is extremely valuable in exploring various aspects of sound organization in the brain, allowing us to gain interesting insight into the neural basis of auditory scene analysis, as well as practical implementations for sound separation in ``cocktail-party'' situations
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
PITCH ESTIMATION FOR NOISY SPEECH
In this dissertation a biologically plausible system of pitch estimation is proposed. The system is
designed from the bottom up to be robust to challenging noise conditions. This robustness to
the presence of noise in the signal is achieved by developing a new representation of the speech
signal, based on the operation of damped harmonic oscillators, and temporal mode analysis of
their output. This resulting representation is shown to possess qualities which are not degraded
in presence of noise. A harmonic grouping based system is used to estimate the pitch frequency.
A detailed statistical analysis is performed on the system, and performance compared with some
of the most established and recent pitch estimation and tracking systems. The detailed analysis
includes results of experiments with a variety of noises with a large range of signal to noise ratios,
under different signal conditions. Situations where the interfering "noise" is speech from another
speaker are also considered. The proposed system is able to estimate the pitch of both the main
speaker, and the interfering speaker, thus emulating the phenomena of auditory streaming and
"cocktail party effect" in terms of pitch perception. The results of the extensive statistical analysis
show that the proposed system exhibits some very interesting properties in its ability of handling
noise. The results also show that the proposed system’s overall performance is much better than
any of the other systems tested, especially in presence of very large amounts of noise. The system
is also shown to successfully simulate some very interesting psychoacoustical pitch perception
phenomena. Through a detailed and comparative computational requirements analysis, it is also
demonstrated that the proposed system is comparatively inexpensive in terms of processing and
memory requirements
Analysis and resynthesis of polyphonic music
This thesis examines applications of Digital Signal Processing to the analysis, transformation, and resynthesis of musical audio. First I give an overview of the human perception of music. I then examine in detail the requirements for a system that can analyse, transcribe, process, and resynthesise monaural polyphonic music. I then describe and compare the possible hardware and software platforms. After this I describe a prototype hybrid system that attempts to carry out these tasks using a method based on additive synthesis. Next I present results from its application to a variety of musical examples, and critically assess its performance and limitations. I then address these issues in the design of a second system based on Gabor wavelets. I conclude by summarising the research and outlining suggestions for future developments
Models and Analysis of Vocal Emissions for Biomedical Applications
The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy. This edition celebrates twenty years of uninterrupted and succesfully research in the field of voice analysis
The Case of the Missing Pitch Templates: How Harmonic Templates Emerge in the Early Auditory System
Periodicity pitch is the most salient and important of all pitch percepts.Psycho-acoustical models of this percept have long postulated the existenceof internalized harmonic templates against which incoming resolved spectracan be compared, and pitch determined according to the best matchingtemplates cite{goldstein:pitch}. However, it has been a mystery where andhow such harmonic templates can come about. Here we present a biologicallyplausible model for how such templates can form in the early stages of theauditory system. The model demonstrates that {it any} broadband stimulussuch as noise or random click trains, suffices for generating thetemplates, and that there is no need for any delay-lines, oscillators, orother neural temporal structures. The model consists of two key stages:cochlear filtering followed by coincidence detection. The cochlear stageprovides responses analogous to those seen on the auditory-nerve andcochlear nucleus. Specifically, it performs moderately sharp frequencyanalysis via a filter-bank with tonotopically ordered center frequencies(CFs); the rectified and phase-locked filter responses are further enhancedtemporally to resemble the synchronized responses of cells in the cochlearnucleus. The second stage is a matrix of coincidence detectors thatcompute the average pair-wise instantaneous correlation (or product)between responses from all CFs across the channels. Model simulations showthat for any broadband stimulus, high coincidences occur between cochlearchannels that are exactly harmonic distances apart. Accumulatingcoincidences over time results in the formation of harmonic templates forall fundamental frequencies in the phase-locking frequency range. Themodel explains the critical role played by three subtle but importantfactors in cochlear function: the nonlinear transformations following thefiltering stage; the rapid phase-shifts of the traveling wave near itsresonance; and the spectral resolution of the cochlear filters. Finally, wediscuss the physiological correlates and location of such a process and itsresulting templates
Computational Methods for Cognitive and Cooperative Robotics
In the last decades design methods in control engineering made substantial progress in
the areas of robotics and computer animation. Nowadays these methods incorporate the
newest developments in machine learning and artificial intelligence. But the problems
of flexible and online-adaptive combinations of motor behaviors remain challenging for
human-like animations and for humanoid robotics. In this context, biologically-motivated
methods for the analysis and re-synthesis of human motor programs provide new insights
in and models for the anticipatory motion synthesis.
This thesis presents the author’s achievements in the areas of cognitive and developmental robotics, cooperative and humanoid robotics and intelligent and machine learning methods in computer graphics. The first part of the thesis in the chapter “Goal-directed Imitation for Robots” considers imitation learning in cognitive and developmental robotics.
The work presented here details the author’s progress in the development of hierarchical
motion recognition and planning inspired by recent discoveries of the functions of mirror-neuron cortical circuits in primates. The overall architecture is capable of ‘learning for
imitation’ and ‘learning by imitation’. The complete system includes a low-level real-time
capable path planning subsystem for obstacle avoidance during arm reaching. The learning-based path planning subsystem is universal for all types of anthropomorphic robot arms, and is capable of knowledge transfer at the level of individual motor acts.
Next, the problems of learning and synthesis of motor synergies, the spatial and spatio-temporal combinations of motor features in sequential multi-action behavior, and the
problems of task-related action transitions are considered in the second part of the thesis
“Kinematic Motion Synthesis for Computer Graphics and Robotics”. In this part, a new
approach of modeling complex full-body human actions by mixtures of time-shift invariant
motor primitives in presented. The online-capable full-body motion generation architecture
based on dynamic movement primitives driving the time-shift invariant motor synergies
was implemented as an online-reactive adaptive motion synthesis for computer graphics
and robotics applications.
The last chapter of the thesis entitled “Contraction Theory and Self-organized Scenarios
in Computer Graphics and Robotics” is dedicated to optimal control strategies in multi-agent scenarios of large crowds of agents expressing highly nonlinear behaviors. This last
part presents new mathematical tools for stability analysis and synthesis of multi-agent
cooperative scenarios.In den letzten Jahrzehnten hat die Forschung in den Bereichen der Steuerung und Regelung
komplexer Systeme erhebliche Fortschritte gemacht, insbesondere in den Bereichen
Robotik und Computeranimation. Die Entwicklung solcher Systeme verwendet heutzutage
neueste Methoden und Entwicklungen im Bereich des maschinellen Lernens und der
künstlichen Intelligenz. Die flexible und echtzeitfähige Kombination von motorischen Verhaltensweisen
ist eine wesentliche Herausforderung für die Generierung menschenähnlicher
Animationen und in der humanoiden Robotik. In diesem Zusammenhang liefern biologisch
motivierte Methoden zur Analyse und Resynthese menschlicher motorischer Programme
neue Erkenntnisse und Modelle für die antizipatorische Bewegungssynthese.
Diese Dissertation präsentiert die Ergebnisse der Arbeiten des Autors im Gebiet der
kognitiven und Entwicklungsrobotik, kooperativer und humanoider Robotersysteme sowie
intelligenter und maschineller Lernmethoden in der Computergrafik. Der erste Teil der
Dissertation im Kapitel “Zielgerichtete Nachahmung für Roboter” behandelt das Imitationslernen
in der kognitiven und Entwicklungsrobotik. Die vorgestellten Arbeiten beschreiben
neue Methoden für die hierarchische Bewegungserkennung und -planung, die durch
Erkenntnisse zur Funktion der kortikalen Spiegelneuronen-Schaltkreise bei Primaten inspiriert
wurden. Die entwickelte Architektur ist in der Lage, ‘durch Imitation zu lernen’
und ‘zu lernen zu imitieren’. Das komplette entwickelte System enthält ein echtzeitfähiges
Pfadplanungssubsystem zur Hindernisvermeidung während der Durchführung von Armbewegungen.
Das lernbasierte Pfadplanungssubsystem ist universell und für alle Arten von
anthropomorphen Roboterarmen in der Lage, Wissen auf der Ebene einzelner motorischer
Handlungen zu übertragen.
Im zweiten Teil der Arbeit “Kinematische Bewegungssynthese für Computergrafik und
Robotik” werden die Probleme des Lernens und der Synthese motorischer Synergien, d.h.
von räumlichen und räumlich-zeitlichen Kombinationen motorischer Bewegungselemente
bei Bewegungssequenzen und bei aufgabenbezogenen Handlungs übergängen behandelt.
Es wird ein neuer Ansatz zur Modellierung komplexer menschlicher Ganzkörperaktionen
durch Mischungen von zeitverschiebungsinvarianten Motorprimitiven vorgestellt. Zudem
wurde ein online-fähiger Synthesealgorithmus für Ganzköperbewegungen entwickelt, der
auf dynamischen Bewegungsprimitiven basiert, die wiederum auf der Basis der gelernten
verschiebungsinvarianten Primitive konstruiert werden. Dieser Algorithmus wurde für
verschiedene Probleme der Bewegungssynthese für die Computergrafik- und Roboteranwendungen
implementiert.
Das letzte Kapitel der Dissertation mit dem Titel “Kontraktionstheorie und selbstorganisierte
Szenarien in der Computergrafik und Robotik” widmet sich optimalen Kontrollstrategien
in Multi-Agenten-Szenarien, wobei die Agenten durch eine hochgradig nichtlineare
Kinematik gekennzeichnet sind. Dieser letzte Teil präsentiert neue mathematische Werkzeuge
für die Stabilitätsanalyse und Synthese von kooperativen Multi-Agenten-Szenarien
- …