17 research outputs found

    Bayesian and echoic log-surprise for auditory saliency detection

    Get PDF
    Mención Internacional en el título de doctorAttention is defined as the mechanism that allows the brain to categorize and prioritize information acquired using our senses and act according to the environmental context and the available mental resources. The attention mechanism can be further subdivided into two types: top-down and bottomup. Top-down attention is goal or task-driven and implies that a participant has some previous knowledge about the task that he or she is trying to solve. Alternatively, bottom-up attention only depends on the perceived features of the target object and its surroundings and is a very fast mechanism that is believed to be crucial for human survival. Bottom-up attention is commonly known as saliency or salience, and can be defined as a property of the signals that are perceived by our senses that make them attentionally prominent for some reason. This thesis is related with the concept of saliency detection using automatic algorithms for audio signals. In recent years progress in the area of visual saliency research has been remarkable, a topic where the goal consists of detecting which objects or content from a visual scene are prominent enough to capture the attention of a spectator. However, this progress has not been carried out to other alternative modalities. This is the case of auditory saliency, where there is still no consensus about how to measure the saliency of an event, and consequently there are no specific labeled datasets to compare new algorithms and proposals. In this work two new auditory saliency detection algorithms are presented and evaluated. For their evaluation, we make use of Acoustic Event Detection/Classification datasets, whose labels include onset times among other aspects. We use such datasets and labeling since there is psychological evidence suggesting that human beings are quite sensitive to the spontaneous appearance of acoustic objects. We use three datasets: DCASE 2016 (Task 2), MIVIA road audio events and UPC-TALP, totalling 3400 labeled acoustic events. Regarding the algorithms that we employ for benchmarking, these comprise techniques for saliency detection designed by Kayser and Kalinli, a voice activity detector, an energy thresholding method and four music information retrieval onset detectors: NWPD, WPD, CD and SF. We put forward two auditory saliency algorithms: Bayesian Log-surprise and Echoic Log-surprise. The former is an evolution of Bayesian Surprise, a methodology that by means of the Kullback-Leibler divergence computed between two consecutive temporal windows is capable of detecting anomalous or salient events. As the output Surprise signal has some drawbacks that should be overcome, we introduce some improvements that led to the approach that we named Bayesian Log-surprise. These include an amplitude compression stage and the addition of perceptual knowledge to pre-process the input signal. The latter, named Echoic Log-surprise, fuses several Bayesian Log-surprise signals computed considering different memory lengths that represent different temporal scales. The fusion process is performed using statistical divergences, resulting in saliency signals with certain advantages such as a significant reduction in the background noise level and a noticeable increase in the detection scores. Moreover, since the original Echoic Log-surprise presents certain limitations, we propose a set of improvements: we test some alternative statistical divergences, we introduce a new fusion strategy and we change the thresholding mechanism used to determine if the final output signal is salient or not for a dynamic thresholding algorithm. Results show that the most significant modification in terms of performance is the latter, a proposal that reduces the dispersion observed in the scores produced by the system and enables online functioning. Finally, our last analysis concerns the robustness of all the algorithms presented in this thesis against environmental noise. We use noises of different natures, from stationary noise to pre-recorded noises acquired in real environments such as cafeterias, train stations, etc. The results suggest that for different signal-to-noise ratios the most robust algorithm is Echoic Log-surprise, since its detection capabilities are the least influenced by noise.La atención es definida como el mecanismo que permite a nuestro cerebro categorizar y priorizar la información percibida mediante nuestros sentidos, a la par que ayuda a actuar en función del contexto y los recursos mentales disponibles. Este mecanismo puede dividirse en dos variantes: top-down y bottom-up. La atención top-down posee un objetivo que el sujeto pretende cumplir, e implica que el individuo posee cierto conocimiento previo sobre la tarea que trata de realizar. Por otra parte, la atención bottom-up depende exclusivamente de las características físicas percibidas a partir de un objeto y su entorno, y actúa a partir de dicha información de forma autónoma y rápida. Se teoriza que dicho mecanismo es crucial para la supervivencia de los individuos frente a amenazas repentinas. La atención bottom-up es comúnmente denominada saliencia, y es definida como una propiedad de las señales que son percibidas por nuestros sentidos y que por algún motivo destacan sobre el resto de información adquirida. Esta tesis está relacionada con la detección automática de la saliencia en señales acústicas mediante la utilización de algoritmos. En los últimos años el avance en la investigación de la saliencia visual ha sido notable, un tema en el cual la principal meta consiste en detectar qué objetos o contenido de una escena visual son lo bastante prominentes para captar la atención de un espectador. Sin embargo, estos avances no han sido trasladados a otras modalidades. Tal es el caso de la saliencia auditiva, donde aún no existe consenso sobre cómo medir la prominencia de un evento acústico, y en consecuencia no existen bases de datos especializadas que permitan comparar nuevos algoritmos y modelos. En este trabajo evaluamos algunos algoritmos de detección de saliencia auditiva. Para ello, empleamos bases de datos para la detección y clasificación de eventos acústicos, cuyas etiquetas incluyen el tiempo de inicio (onset) de dichos eventos entre otras características. Nuestra hipótesis se basa en estudios psicológicos que sugieren que los seres humanos somos muy sensibles a la aparición de objetos acústicos. Empleamos tres bases de datos: DCASE 2016 (Task 2), MIVIA road audio events y UPC-TALP, las cuales suman en total 3400 eventos etiquetados. Respecto a los algoritmos utilizados en nuestro sistema de referencia (benchmark), incluimos los algoritmos de saliencia diseñados por Kayser y Kalinli, un detector de actividad vocal (VAD), un umbralizador energético y cuatro técnicas para la detección de onsets en música: NWPD, WPD, CD and SF. Presentamos dos algoritmos de saliencia auditiva: Bayesian Log-surprise y Echoic Log-surprise. El primero es una evolución de Bayesian Surprise, una metodología que utiliza la divergencia de Kullback-Leibler para detectar eventos salientes o anomalías entre ventanas consecutivas de tiempo. Dado que la señal producida por Bayesian Surprise posee ciertos inconvenientes introducimos una serie de mejoras, entre las que destacan una etapa de compresión de la amplitud de la señal de salida y el pre-procesado de la señal de entrada mediante la utilización de conocimiento perceptual. Denominamos a esta metodología Bayesian Log-surprise. Nuestro segundo algoritmo, denominado Echoic Log-surprise, combina la información de múltiples señales de saliencia producidas mediante Bayesian Log-surprise considerando distintas escalas temporales. El proceso de fusión se realiza mediante la utilización de divergencias estadísticas, y las señales de salida poseen un nivel de ruido menor a la par que un mayor rendimiento a la hora de detectar eventos salientes. Además, proponemos una serie de mejoras para Echoic Log-surprise dado que observamos que presentaba ciertas limitaciones: añadimos nuevas divergencias estadísticas al sistema para realizar la fusión, diseñamos una nueva estrategia para llevar a cabo dicho proceso y modificamos el sistema de umbralizado que originalmente se utilizaba para determinar si un fragmento de señal era saliente o no. Inicialmente dicho mecanismo era estático, y proponemos actualizarlo de tal forma se comporte de forma dinámica. Esta última demuestra ser la mejora más significativa en términos de rendimiento, ya que reduce la dispersión observada en las puntuaciones de evaluación entre distintos ficheros de audio, a la par que permite que el algoritmo funcione online. El último análisis que proponemos pretende estudiar la robustez de los algoritmos mencionados en esta tesis frente a ruido ambiental. Empleamos ruido de diversa índole, desde ruido blanco estacionario hasta señales pregrabadas en entornos reales tales y como cafeterías, estaciones de tren, etc. Los resultados sugieren que para distintos valores de relación señal/ruido el algoritmo más robusto es Echoic Log-surprise, dado que sus capacidades de detección son las menos afectadas por el ruido.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Fernando Díaz de María.- Secretario: Rubén Solera Ureña.- Vocal: José Luis Pérez Córdob

    Auditory Displays and Assistive Technologies: the use of head movements by visually impaired individuals and their implementation in binaural interfaces

    Get PDF
    Visually impaired people rely upon audition for a variety of purposes, among these are the use of sound to identify the position of objects in their surrounding environment. This is limited not just to localising sound emitting objects, but also obstacles and environmental boundaries, thanks to their ability to extract information from reverberation and sound reflections- all of which can contribute to effective and safe navigation, as well as serving a function in certain assistive technologies thanks to the advent of binaural auditory virtual reality. It is known that head movements in the presence of sound elicit changes in the acoustical signals which arrive at each ear, and these changes can improve common auditory localisation problems in headphone-based auditory virtual reality, such as front-to-back reversals. The goal of the work presented here is to investigate whether the visually impaired naturally engage head movement to facilitate auditory perception and to what extent it may be applicable to the design of virtual auditory assistive technology. Three novel experiments are presented; a field study of head movement behaviour during navigation, a questionnaire assessing the self-reported use of head movement in auditory perception by visually impaired individuals (each comparing visually impaired and sighted participants) and an acoustical analysis of inter-aural differences and cross- correlations as a function of head angle and sound source distance. It is found that visually impaired people self-report using head movement for auditory distance perception. This is supported by head movements observed during the field study, whilst the acoustical analysis showed that interaural correlations for sound sources within 5m of the listener were reduced as head angle or distance to sound source were increased, and that interaural differences and correlations in reflected sound were generally lower than that of direct sound. Subsequently, relevant guidelines for designers of assistive auditory virtual reality are proposed

    Brain Responses Track Patterns in Sound

    Get PDF
    This thesis uses specifically structured sound sequences, with electroencephalography (EEG) recording and behavioural tasks, to understand how the brain forms and updates a model of the auditory world. Experimental chapters 3-7 address different effects arising from statistical predictability, stimulus repetition and surprise. Stimuli comprised tone sequences, with frequencies varying in regular or random patterns. In Chapter 3, EEG data demonstrate fast recognition of predictable patterns, shown by an increase in responses to regular relative to random sequences. Behavioural experiments investigate attentional capture by stimulus structure, suggesting that regular sequences are easier to ignore. Responses to repetitive stimulation generally exhibit suppression, thought to form a building block of regularity learning. However, the patterns used in this thesis show the opposite effect, where predictable patterns show a strongly enhanced brain response, compared to frequency-matched random sequences. Chapter 4 presents a study which reconciles auditory sequence predictability and repetition in a single paradigm. Results indicate a system for automatic predictability monitoring which is distinct from, but concurrent with, repetition suppression. The brain’s internal model can be investigated via the response to rule violations. Chapters 5 and 6 present behavioural and EEG experiments where violations are inserted in the sequences. Outlier tones within regular sequences evoked a larger response than matched outliers in random sequences. However, this effect was not present when the violation comprised a silent gap. Chapter 7 concerns the ability of the brain to update an existing model. Regular patterns transitioned to a different rule, keeping the frequency content constant. Responses show a period of adjustment to the rule change, followed by a return to tracking the predictability of the sequence. These findings are consistent with the notion that the brain continually maintains a detailed representation of ongoing sensory input and that this representation shapes the processing of incoming information

    The Role of The Locus Coeruleus Noradrenergic System in Tracking the Statistics of Rapid Sound Sequences

    Get PDF
    The sensory world is full of uncertainty; most perception-relevant statistics are highly dynamic, featuring frequently-changing patterns. Rapid adaptation to the everchanging world requires brain sensitivity to environmental changes and resetting of functional neural networks as needed. Norepinephrine (NE) is proposed to mediate this process by initiating functional resetting (Dayan and Yu, 2006; Sara and Bouret, 2012) via the Locus Coeruleus (LC)-NE system. This doctoral thesis employs pupil diameter measurements – a reliable indicator of NE neural activity in the LC (Aston-Jones and Cohen, 2005; Joshi et al. 2016). Human participants listened to sequences of adjoined 50ms tone-pips (adapted from Barascud et al., 2016) containing transitions from random to regular frequency patterns and vice-versa. Participants were instructed to detect occasionally inserted silent gaps, ensuring attention to the auditory stream, not the transition itself. Although both transitions (regular-to-random and random-to-regular) are clearly detectable behaviourally and evoke strong MEG (Barascud et al., 2016), only violations of regularity (prediction errors) appear to elicit pupil responses. Noteworthily, this response is driven by pattern changes and not merely deviant detection. However, stimuli containing pattern emergences (precision increase) evoke no measurable pupil response; this is not due to pre-transition pupillary saturation, as transitions from random patterns to repeating single tones (random-to-repeating) evoke transient pupil dilation. Only when subjects actively reported changes in button-press did random-to-regular transitions evoke pupil dilations. Investigating the effect of task on evoked pupil responses found no response if subjects were not continuously tracking the sequences, e.g. with attention directed to visual or tactile stimuli. Multiple self-replications of these findings provide robust evidence that NE release acts as an automatic switch, resetting the brain’s internal model of the sensory environment and demonstrating that the unexpected uncertainty signalling process operates over much faster timescales than previously known, implicating NE in the fundamental bases of perception

    Early visual processing in ageing and Alzheimer's disease.

    Get PDF
    SIGLEAvailable from British Library Document Supply Centre-DSC:DXN029928 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    The robustness of echoic log-surprise auditory saliency detection

    Get PDF
    The concept of saliency describes how relevant a stimulus is for humans. This phenomenon hasbeen studied under different perspectives and modalities, such as audio, visual, or both. It has been employedin intelligent systems to interact with their environment in an attempt to emulate or even outperform humanbehavior in tasks, such as surveillance and alarm systems or even robotics. In this paper, we focus on theaural modality and our goal consists in measuring the robustness of Echoic log-surprise in comparison with aset of auditory saliency techniques when tested on noisy environments for the task of saliency detection. Theacoustic saliency methods that we have analyzed include Kalinli's saliency model, Bayesian log-surprise,and our proposed algorithm, Echoic log-surprise. This last method combines an unsupervised approachbased on the Bayesian log-surprise and the biological concept of echoic or auditory sensory memory bymeans of a statistical fusion scheme, where the use of different distance metrics or statistical divergences,such as Renyi's or Jensen-Shannon's among others, are considered. Additionally, for comparison purposes,we have also compared some classical onset detection techniques, such as those based on voice activity detec-tion or energy thresholding. Results show that Echoic log-surprise outperforms the detection capabilities ofthe rest of the techniques analyzed in this paper under a great variety of noises and signal-to-noise ratios,corroborating its robustness in noisy environments. In particular, our algorithm with the Jensen-Shannonfusion scheme produces the best F-scores. With the aim of better understanding the behavior of Echoic log-surprise, we have also studied the influence of its control parameters, depth and memory, and their influenceat different noise levels.This work is partially supported by the Spanish Government-MinECo projects TEC2014-5390-P and TEC2017-84395-P.Publicad

    Sensing the world through predictions and errors

    Get PDF

    Attention Restraint, Working Memory Capacity, and Mind Wandering: Do Emotional Valence or Intentionality Matter?

    Get PDF
    Attention restraint appears to mediate the relationship between working memory capacity (WMC) and mind wandering (Kane et al., 2016). Prior work has identifed two dimensions of mind wandering—emotional valence and intentionality. However, less is known about how WMC and attention restraint correlate with these dimensions. Te current study examined the relationship between WMC, attention restraint, and mind wandering by emotional valence and intentionality. A confrmatory factor analysis demonstrated that WMC and attention restraint were strongly correlated, but only attention restraint was related to overall mind wandering, consistent with prior fndings. However, when examining the emotional valence of mind wandering, attention restraint and WMC were related to negatively and positively valenced, but not neutral, mind wandering. Attention restraint was also related to intentional but not unintentional mind wandering. Tese results suggest that WMC and attention restraint predict some, but not all, types of mind wandering

    Modelling Learning to Count in Humanoid Robots

    Get PDF
    In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of Plymouth University's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.This thesis concerns the formulation of novel developmental robotics models of embodied phenomena in number learning. Learning to count is believed to be of paramount importance for the acquisition of the remarkable fluency with which humans are able to manipulate numbers and other abstract concepts derived from them later in life. The ever-increasing amount of evidence for the embodied nature of human mathematical thinking suggests that the investigation of numerical cognition with the use of robotic cognitive models has a high potential of contributing toward the better understanding of the involved mechanisms. This thesis focuses on two particular groups of embodied effects tightly linked with learning to count. The first considered phenomenon is the contribution of the counting gestures to the counting accuracy of young children during the period of their acquisition of the skill. The second phenomenon, which arises over a longer time scale, is the human tendency to internally associate numbers with space that results, among others, in the widely-studied SNARC effect. The PhD research contributes to the knowledge in the subject by formulating novel neuro-robotic cognitive models of these phenomena, and by employing these in two series of simulation experiments. In the context of the counting gestures the simulations provide evidence for the importance of learning the number words prior to learning to count, for the usefulness of the proprioceptive information connected with gestures to improving counting accuracy, and for the significance of the spatial correspondence between the indicative acts and the objects being enumerated. In the context of the model of spatial-numerical associations the simulations demonstrate for the first time that these may arise as a consequence of the consistent spatial biases present when children are learning to count. Finally, based on the experience gathered throughout both modelling experiments, specific guidelines concerning future efforts in the application of robotic modelling in mathematical cognition are formulated.This research has been supported by the EU project RobotDoC (235065) from the FP7 Marie Curie Actions ITN

    Methods in prosody

    Get PDF
    This book presents a collection of pioneering papers reflecting current methods in prosody research with a focus on Romance languages. The rapid expansion of the field of prosody research in the last decades has given rise to a proliferation of methods that has left little room for the critical assessment of these methods. The aim of this volume is to bridge this gap by embracing original contributions, in which experts in the field assess, reflect, and discuss different methods of data gathering and analysis. The book might thus be of interest to scholars and established researchers as well as to students and young academics who wish to explore the topic of prosody, an expanding and promising area of study
    corecore