17 research outputs found
Bayesian and echoic log-surprise for auditory saliency detection
Mención Internacional en el título de doctorAttention is defined as the mechanism that allows the brain to categorize
and prioritize information acquired using our senses and act according to
the environmental context and the available mental resources. The attention
mechanism can be further subdivided into two types: top-down and bottomup.
Top-down attention is goal or task-driven and implies that a participant
has some previous knowledge about the task that he or she is trying to solve.
Alternatively, bottom-up attention only depends on the perceived features
of the target object and its surroundings and is a very fast mechanism that
is believed to be crucial for human survival.
Bottom-up attention is commonly known as saliency or salience, and can
be defined as a property of the signals that are perceived by our senses that
make them attentionally prominent for some reason.
This thesis is related with the concept of saliency detection using automatic
algorithms for audio signals. In recent years progress in the area of
visual saliency research has been remarkable, a topic where the goal consists
of detecting which objects or content from a visual scene are prominent
enough to capture the attention of a spectator. However, this progress has
not been carried out to other alternative modalities. This is the case of auditory
saliency, where there is still no consensus about how to measure the
saliency of an event, and consequently there are no specific labeled datasets
to compare new algorithms and proposals.
In this work two new auditory saliency detection algorithms are presented
and evaluated. For their evaluation, we make use of Acoustic Event
Detection/Classification datasets, whose labels include onset times among
other aspects. We use such datasets and labeling since there is psychological
evidence suggesting that human beings are quite sensitive to the spontaneous
appearance of acoustic objects. We use three datasets: DCASE 2016
(Task 2), MIVIA road audio events and UPC-TALP, totalling 3400 labeled
acoustic events. Regarding the algorithms that we employ for benchmarking,
these comprise techniques for saliency detection designed by Kayser and
Kalinli, a voice activity detector, an energy thresholding method and four
music information retrieval onset detectors: NWPD, WPD, CD and SF.
We put forward two auditory saliency algorithms: Bayesian Log-surprise
and Echoic Log-surprise. The former is an evolution of Bayesian Surprise,
a methodology that by means of the Kullback-Leibler divergence computed
between two consecutive temporal windows is capable of detecting anomalous
or salient events. As the output Surprise signal has some drawbacks
that should be overcome, we introduce some improvements that led to the
approach that we named Bayesian Log-surprise. These include an amplitude
compression stage and the addition of perceptual knowledge to pre-process
the input signal.
The latter, named Echoic Log-surprise, fuses several Bayesian Log-surprise signals computed considering different memory lengths that represent different
temporal scales. The fusion process is performed using statistical
divergences, resulting in saliency signals with certain advantages such as a
significant reduction in the background noise level and a noticeable increase
in the detection scores.
Moreover, since the original Echoic Log-surprise presents certain limitations,
we propose a set of improvements: we test some alternative statistical
divergences, we introduce a new fusion strategy and we change the thresholding
mechanism used to determine if the final output signal is salient or
not for a dynamic thresholding algorithm. Results show that the most significant
modification in terms of performance is the latter, a proposal that
reduces the dispersion observed in the scores produced by the system and
enables online functioning.
Finally, our last analysis concerns the robustness of all the algorithms
presented in this thesis against environmental noise. We use noises of different
natures, from stationary noise to pre-recorded noises acquired in real
environments such as cafeterias, train stations, etc. The results suggest
that for different signal-to-noise ratios the most robust algorithm is Echoic
Log-surprise, since its detection capabilities are the least influenced by noise.La atención es definida como el mecanismo que permite a nuestro cerebro
categorizar y priorizar la información percibida mediante nuestros sentidos,
a la par que ayuda a actuar en función del contexto y los recursos mentales
disponibles. Este mecanismo puede dividirse en dos variantes: top-down y
bottom-up. La atención top-down posee un objetivo que el sujeto pretende
cumplir, e implica que el individuo posee cierto conocimiento previo sobre la
tarea que trata de realizar. Por otra parte, la atención bottom-up depende
exclusivamente de las características físicas percibidas a partir de un objeto
y su entorno, y actúa a partir de dicha información de forma autónoma y
rápida. Se teoriza que dicho mecanismo es crucial para la supervivencia de
los individuos frente a amenazas repentinas.
La atención bottom-up es comúnmente denominada saliencia, y es definida
como una propiedad de las señales que son percibidas por nuestros sentidos
y que por algún motivo destacan sobre el resto de información adquirida.
Esta tesis está relacionada con la detección automática de la saliencia en
señales acústicas mediante la utilización de algoritmos. En los últimos años
el avance en la investigación de la saliencia visual ha sido notable, un tema
en el cual la principal meta consiste en detectar qué objetos o contenido
de una escena visual son lo bastante prominentes para captar la atención
de un espectador. Sin embargo, estos avances no han sido trasladados a
otras modalidades. Tal es el caso de la saliencia auditiva, donde aún no
existe consenso sobre cómo medir la prominencia de un evento acústico,
y en consecuencia no existen bases de datos especializadas que permitan
comparar nuevos algoritmos y modelos.
En este trabajo evaluamos algunos algoritmos de detección de saliencia
auditiva. Para ello, empleamos bases de datos para la detección y clasificación
de eventos acústicos, cuyas etiquetas incluyen el tiempo de inicio
(onset) de dichos eventos entre otras características. Nuestra hipótesis se
basa en estudios psicológicos que sugieren que los seres humanos somos muy
sensibles a la aparición de objetos acústicos. Empleamos tres bases de datos:
DCASE 2016 (Task 2), MIVIA road audio events y UPC-TALP, las cuales
suman en total 3400 eventos etiquetados. Respecto a los algoritmos utilizados
en nuestro sistema de referencia (benchmark), incluimos los algoritmos
de saliencia diseñados por Kayser y Kalinli, un detector de actividad vocal
(VAD), un umbralizador energético y cuatro técnicas para la detección de
onsets en música: NWPD, WPD, CD and SF.
Presentamos dos algoritmos de saliencia auditiva: Bayesian Log-surprise
y Echoic Log-surprise. El primero es una evolución de Bayesian Surprise,
una metodología que utiliza la divergencia de Kullback-Leibler para detectar
eventos salientes o anomalías entre ventanas consecutivas de tiempo. Dado
que la señal producida por Bayesian Surprise posee ciertos inconvenientes
introducimos una serie de mejoras, entre las que destacan una etapa de compresión de la amplitud de la señal de salida y el pre-procesado de la señal de
entrada mediante la utilización de conocimiento perceptual. Denominamos
a esta metodología Bayesian Log-surprise.
Nuestro segundo algoritmo, denominado Echoic Log-surprise, combina la
información de múltiples señales de saliencia producidas mediante Bayesian
Log-surprise considerando distintas escalas temporales. El proceso de fusión
se realiza mediante la utilización de divergencias estadísticas, y las señales
de salida poseen un nivel de ruido menor a la par que un mayor rendimiento
a la hora de detectar eventos salientes.
Además, proponemos una serie de mejoras para Echoic Log-surprise
dado que observamos que presentaba ciertas limitaciones: añadimos nuevas
divergencias estadísticas al sistema para realizar la fusión, diseñamos una
nueva estrategia para llevar a cabo dicho proceso y modificamos el sistema de
umbralizado que originalmente se utilizaba para determinar si un fragmento
de señal era saliente o no. Inicialmente dicho mecanismo era estático, y
proponemos actualizarlo de tal forma se comporte de forma dinámica. Esta
última demuestra ser la mejora más significativa en términos de rendimiento,
ya que reduce la dispersión observada en las puntuaciones de evaluación entre
distintos ficheros de audio, a la par que permite que el algoritmo funcione
online.
El último análisis que proponemos pretende estudiar la robustez de los
algoritmos mencionados en esta tesis frente a ruido ambiental. Empleamos
ruido de diversa índole, desde ruido blanco estacionario hasta señales pregrabadas
en entornos reales tales y como cafeterías, estaciones de tren, etc.
Los resultados sugieren que para distintos valores de relación señal/ruido el
algoritmo más robusto es Echoic Log-surprise, dado que sus capacidades de
detección son las menos afectadas por el ruido.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Fernando Díaz de María.- Secretario: Rubén Solera Ureña.- Vocal: José Luis Pérez Córdob
Auditory Displays and Assistive Technologies: the use of head movements by visually impaired individuals and their implementation in binaural interfaces
Visually impaired people rely upon audition for a variety of purposes, among these are the use of sound to identify the position of objects in their surrounding environment. This is limited not just to localising sound emitting objects, but also obstacles and environmental boundaries, thanks to their ability to extract information from reverberation and sound reflections- all of which can contribute to effective and safe navigation, as well as serving a function in certain assistive technologies thanks to the advent of binaural auditory virtual reality. It is known that head movements in the presence of sound elicit changes in the acoustical signals which arrive at each ear, and these changes can improve common auditory localisation problems in headphone-based auditory virtual reality, such as front-to-back reversals. The goal of the work presented here is to investigate whether the visually impaired naturally engage head movement to facilitate auditory perception and to what extent it may be applicable to the design of virtual auditory assistive technology. Three novel experiments are presented; a field study of head movement behaviour during navigation, a questionnaire assessing the self-reported use of head movement in auditory perception by visually impaired individuals (each comparing visually impaired and sighted participants) and an acoustical analysis of inter-aural differences and cross- correlations as a function of head angle and sound source distance. It is found that visually impaired people self-report using head movement for auditory distance perception. This is supported by head movements observed during the field study, whilst the acoustical analysis showed that interaural correlations for sound sources within 5m of the listener were reduced as head angle or distance to sound source were increased, and that interaural differences and correlations in reflected sound were generally lower than that of direct sound. Subsequently, relevant guidelines for designers of assistive auditory virtual reality are proposed
Brain Responses Track Patterns in Sound
This thesis uses specifically structured sound sequences, with electroencephalography (EEG) recording and behavioural tasks, to understand how the brain forms and updates a model of the auditory world. Experimental chapters 3-7 address different effects arising from statistical predictability, stimulus repetition and surprise. Stimuli comprised tone sequences, with frequencies varying in regular or random patterns. In Chapter 3, EEG data demonstrate fast recognition of predictable patterns, shown by an increase in responses to regular relative to random sequences. Behavioural experiments investigate attentional capture by stimulus structure, suggesting that regular sequences are easier to ignore. Responses to repetitive stimulation generally exhibit suppression, thought to form a building block of regularity learning. However, the patterns used in this thesis show the opposite effect, where predictable patterns show a strongly enhanced brain response, compared to frequency-matched random sequences. Chapter 4 presents a study which reconciles auditory sequence predictability and repetition in a single paradigm. Results indicate a system for automatic predictability monitoring which is distinct from, but concurrent with, repetition suppression. The brain’s internal model can be investigated via the response to rule violations. Chapters 5 and 6 present behavioural and EEG experiments where violations are inserted in the sequences. Outlier tones within regular sequences evoked a larger response than matched outliers in random sequences. However, this effect was not present when the violation comprised a silent gap. Chapter 7 concerns the ability of the brain to update an existing model. Regular patterns transitioned to a different rule, keeping the frequency content constant. Responses show a period of adjustment to the rule change, followed by a return to tracking the predictability of the sequence. These findings are consistent with the notion that the brain continually maintains a detailed representation of ongoing sensory input and that this representation shapes the processing of incoming information
The Role of The Locus Coeruleus Noradrenergic System in Tracking the Statistics of Rapid Sound Sequences
The sensory world is full of uncertainty; most perception-relevant statistics are highly dynamic, featuring frequently-changing patterns. Rapid adaptation to the everchanging world requires brain sensitivity to environmental changes and resetting of functional neural networks as needed. Norepinephrine (NE) is proposed to mediate this process by initiating functional resetting (Dayan and Yu, 2006; Sara and Bouret, 2012) via the Locus Coeruleus (LC)-NE system. This doctoral thesis employs pupil diameter measurements – a reliable indicator of NE neural activity in the LC (Aston-Jones and Cohen, 2005; Joshi et al. 2016). Human participants listened to sequences of adjoined 50ms tone-pips (adapted from Barascud et al., 2016) containing transitions from random to regular frequency patterns and vice-versa. Participants were instructed to detect occasionally inserted silent gaps, ensuring attention to the auditory stream, not the transition itself. Although both transitions (regular-to-random and random-to-regular) are clearly detectable behaviourally and evoke strong MEG (Barascud et al., 2016), only violations of regularity (prediction errors) appear to elicit pupil responses. Noteworthily, this response is driven by pattern changes and not merely deviant detection. However, stimuli containing pattern emergences (precision increase) evoke no measurable pupil response; this is not due to pre-transition pupillary saturation, as transitions from random patterns to repeating single tones (random-to-repeating) evoke transient pupil dilation. Only when subjects actively reported changes in button-press did random-to-regular transitions evoke pupil dilations. Investigating the effect of task on evoked pupil responses found no response if subjects were not continuously tracking the sequences, e.g. with attention directed to visual or tactile stimuli. Multiple self-replications of these findings provide robust evidence that NE release acts as an automatic switch, resetting the brain’s internal model of the sensory environment and demonstrating that the unexpected uncertainty signalling process operates over much faster timescales than previously known, implicating NE in the fundamental bases of perception
Early visual processing in ageing and Alzheimer's disease.
SIGLEAvailable from British Library Document Supply Centre-DSC:DXN029928 / BLDSC - British Library Document Supply CentreGBUnited Kingdo
The robustness of echoic log-surprise auditory saliency detection
The concept of saliency describes how relevant a stimulus is for humans. This phenomenon hasbeen studied under different perspectives and modalities, such as audio, visual, or both. It has been employedin intelligent systems to interact with their environment in an attempt to emulate or even outperform humanbehavior in tasks, such as surveillance and alarm systems or even robotics. In this paper, we focus on theaural modality and our goal consists in measuring the robustness of Echoic log-surprise in comparison with aset of auditory saliency techniques when tested on noisy environments for the task of saliency detection. Theacoustic saliency methods that we have analyzed include Kalinli's saliency model, Bayesian log-surprise,and our proposed algorithm, Echoic log-surprise. This last method combines an unsupervised approachbased on the Bayesian log-surprise and the biological concept of echoic or auditory sensory memory bymeans of a statistical fusion scheme, where the use of different distance metrics or statistical divergences,such as Renyi's or Jensen-Shannon's among others, are considered. Additionally, for comparison purposes,we have also compared some classical onset detection techniques, such as those based on voice activity detec-tion or energy thresholding. Results show that Echoic log-surprise outperforms the detection capabilities ofthe rest of the techniques analyzed in this paper under a great variety of noises and signal-to-noise ratios,corroborating its robustness in noisy environments. In particular, our algorithm with the Jensen-Shannonfusion scheme produces the best F-scores. With the aim of better understanding the behavior of Echoic log-surprise, we have also studied the influence of its control parameters, depth and memory, and their influenceat different noise levels.This work is partially supported by the Spanish Government-MinECo projects TEC2014-5390-P and TEC2017-84395-P.Publicad
Attention Restraint, Working Memory Capacity, and Mind Wandering: Do Emotional Valence or Intentionality Matter?
Attention restraint appears to mediate the relationship between working memory capacity (WMC) and mind wandering (Kane et al., 2016). Prior work has identifed two dimensions of mind wandering—emotional valence and intentionality. However, less is known about how WMC and attention restraint correlate with these dimensions. Te current study examined the relationship between WMC, attention restraint, and mind wandering by emotional valence and intentionality. A confrmatory factor analysis demonstrated that WMC and attention restraint were strongly correlated, but only attention restraint was related to overall mind wandering, consistent with prior fndings. However, when examining the emotional valence of mind wandering, attention restraint and WMC were related to negatively and positively valenced, but not neutral, mind wandering. Attention restraint was also related to intentional but not unintentional mind wandering. Tese results suggest that WMC and attention restraint predict some, but not all, types of mind wandering
Modelling Learning to Count in Humanoid Robots
In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of Plymouth University's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.This thesis concerns the formulation of novel developmental robotics models of embodied phenomena in number learning. Learning to count is believed to be of paramount importance for the acquisition of the remarkable fluency with which humans are able to manipulate numbers and other abstract concepts derived from them later in life. The ever-increasing amount of evidence for the embodied nature of human mathematical thinking suggests that the investigation of numerical cognition with the use of robotic cognitive models has a high potential of contributing toward the better understanding of the involved mechanisms. This thesis focuses on two particular groups of embodied effects tightly linked with learning to count. The first considered phenomenon is the contribution of the counting gestures to the counting accuracy of young children during the period of their acquisition of the skill. The second phenomenon, which arises over a longer time scale, is the human tendency to internally associate numbers with space that results, among others, in the widely-studied SNARC effect. The PhD research contributes to the knowledge in the subject by formulating novel neuro-robotic cognitive models of these phenomena, and by employing these in two series of simulation experiments. In the context of the counting gestures the simulations provide evidence for the importance of learning the number words prior to learning to count, for the usefulness of the proprioceptive information connected with gestures to improving counting accuracy, and for the significance of the spatial correspondence between the indicative acts and the objects being enumerated. In the context of the model of spatial-numerical associations the simulations demonstrate for the first time that these may arise as a consequence of the consistent spatial biases present when children are learning to count. Finally, based on the experience gathered throughout both modelling experiments, specific guidelines concerning future efforts in the application of robotic modelling in mathematical cognition are formulated.This research has been supported by the EU project RobotDoC (235065) from the FP7 Marie Curie Actions ITN
Methods in prosody
This book presents a collection of pioneering papers reflecting current methods in prosody research with a focus on Romance languages. The rapid expansion of the field of prosody research in the last decades has given rise to a proliferation of methods that has left little room for the critical assessment of these methods. The aim of this volume is to bridge this gap by embracing original contributions, in which experts in the field assess, reflect, and discuss different methods of data gathering and analysis. The book might thus be of interest to scholars and established researchers as well as to students and young academics who wish to explore the topic of prosody, an expanding and promising area of study