428 research outputs found
A comparison of audio-based deep learning methods for detecting anomalous road events
Road surveillance systems have an important role in monitoring roads and safeguarding their users. Many of these systems are based on video streams acquired from urban video surveillance infrastructures, from which it is possible to reconstruct the dynamics of accidents and detect other events. However, such systems may lack accuracy in adverse environmental settings: for instance, poor lighting, weather conditions, and occlusions can reduce the effectiveness of the automatic detection and consequently increase the rate of false or missed alarms. These issues can be mitigated by integrating such solutions with audio analysis modules, that can improve the ability to recognize distinctive events such as car crashes. For this purpose, in this work we propose a preliminary analysis of solutions based on Deep Learning techniques for the automatic identification of hazardous events through the analysis of audio spectrograms
COMPARISON OF POTENTIAL ROAD ACCIDENT DETECTION ALGORITHMS FOR MODERN MACHINE VISION SYSTEM
Nowadays the robotics is relevant development industry. Robots are becoming more sophisticated, and this requires more sophisticated technologies. One of them is robot vision. This is needed for robots which communicate with the environment using vision instead of a batch of sensors. These data are utilized to analyze the situation at hand and develop a real-time action plan for the given scenario. This article explores the most suitable algorithm for detecting potential road accidents, specifically focusing on the scenario of turning left across one or more oncoming lanes. The selection of the optimal algorithm is based on a comparative analysis of evaluation and testing results, including metrics such as maximum frames per second for video processing during detection using robot’s hardware. The study categorises potential accidents into two classes: danger and not-danger. The Yolov7 and Detectron2 algorithms are compared, and the article aims to create simple models with the potential for future refinement. Also, this article provides conclusions and recommendations regarding the practical implementation of the proposed models and algorithm
"You Tube and I Find" - personalizing multimedia content access
Recent growth in broadband access and proliferation of small personal devices that capture images and videos has led to explosive growth of multimedia content available everywhereVfrom personal disks to the Web. While digital media capture and upload has become nearly universal with newer device technology, there is still a need for better tools and technologies to search large collections of multimedia data and to find and deliver the right content to a user according to her current needs and preferences. A renewed focus on the subjective dimension in the multimedia lifecycle, fromcreation, distribution, to delivery and consumption, is required to address this need beyond what is feasible today. Integration of the subjective aspects of the media itselfVits affective, perceptual, and physiological potential (both intended and achieved), together with those of the users themselves will allow for personalizing the content access, beyond today’s facility. This integration, transforming the traditional multimedia information retrieval (MIR) indexes to more effectively answer specific user needs, will allow a richer degree of personalization predicated on user intention and mode of interaction, relationship to the producer, content of the media, and their history and lifestyle. In this paper, we identify the challenges in achieving this integration, current approaches to interpreting content creation processes, to user modelling and profiling, and to personalized content selection, and we detail future directions. The structure of the paper is as follows: In Section I, we introduce the problem and present some definitions. In Section II, we present a review of the aspects of personalized content and current approaches for the same. Section III discusses the problem of obtaining metadata that is required for personalized media creation and present eMediate as a case study of an integrated media capture environment. Section IV presents the MAGIC system as a case study of capturing effective descriptive data and putting users first in distributed learning delivery. The aspects of modelling the user are presented as a case study in using user’s personality as a way to personalize summaries in Section V. Finally, Section VI concludes the paper with a discussion on the emerging challenges and the open problems
Exploring Sparse, Unstructured Video Collections of Places
The abundance of mobile devices and digital cameras with video capture makes it easy to obtain large collections of video clips that contain the same location, environment, or event. However, such an unstructured collection is difficult to comprehend and explore. We propose a system that analyses collections of unstructured but related video data to create a Videoscape: a data structure that enables interactive exploration of video collections by visually navigating — spatially and/or temporally — between different clips. We automatically identify transition opportunities, or portals. From these portals, we construct the Videoscape, a graph whose edges are video clips and whose nodes are portals between clips. Now structured, the videos can be interactively explored by walking the graph or by geographic map. Given this system, we gauge preference for different video transition styles in a user study, and generate heuristics that automatically choose an appropriate transition style. We evaluate our system using three further user studies, which allows us to conclude that Videoscapes provides significant benefits over related methods. Our system leads to previously unseen ways of interactive spatio-temporal exploration of casually captured videos, and we demonstrate this on several video collections
Bayesian and echoic log-surprise for auditory saliency detection
Mención Internacional en el título de doctorAttention is defined as the mechanism that allows the brain to categorize
and prioritize information acquired using our senses and act according to
the environmental context and the available mental resources. The attention
mechanism can be further subdivided into two types: top-down and bottomup.
Top-down attention is goal or task-driven and implies that a participant
has some previous knowledge about the task that he or she is trying to solve.
Alternatively, bottom-up attention only depends on the perceived features
of the target object and its surroundings and is a very fast mechanism that
is believed to be crucial for human survival.
Bottom-up attention is commonly known as saliency or salience, and can
be defined as a property of the signals that are perceived by our senses that
make them attentionally prominent for some reason.
This thesis is related with the concept of saliency detection using automatic
algorithms for audio signals. In recent years progress in the area of
visual saliency research has been remarkable, a topic where the goal consists
of detecting which objects or content from a visual scene are prominent
enough to capture the attention of a spectator. However, this progress has
not been carried out to other alternative modalities. This is the case of auditory
saliency, where there is still no consensus about how to measure the
saliency of an event, and consequently there are no specific labeled datasets
to compare new algorithms and proposals.
In this work two new auditory saliency detection algorithms are presented
and evaluated. For their evaluation, we make use of Acoustic Event
Detection/Classification datasets, whose labels include onset times among
other aspects. We use such datasets and labeling since there is psychological
evidence suggesting that human beings are quite sensitive to the spontaneous
appearance of acoustic objects. We use three datasets: DCASE 2016
(Task 2), MIVIA road audio events and UPC-TALP, totalling 3400 labeled
acoustic events. Regarding the algorithms that we employ for benchmarking,
these comprise techniques for saliency detection designed by Kayser and
Kalinli, a voice activity detector, an energy thresholding method and four
music information retrieval onset detectors: NWPD, WPD, CD and SF.
We put forward two auditory saliency algorithms: Bayesian Log-surprise
and Echoic Log-surprise. The former is an evolution of Bayesian Surprise,
a methodology that by means of the Kullback-Leibler divergence computed
between two consecutive temporal windows is capable of detecting anomalous
or salient events. As the output Surprise signal has some drawbacks
that should be overcome, we introduce some improvements that led to the
approach that we named Bayesian Log-surprise. These include an amplitude
compression stage and the addition of perceptual knowledge to pre-process
the input signal.
The latter, named Echoic Log-surprise, fuses several Bayesian Log-surprise signals computed considering different memory lengths that represent different
temporal scales. The fusion process is performed using statistical
divergences, resulting in saliency signals with certain advantages such as a
significant reduction in the background noise level and a noticeable increase
in the detection scores.
Moreover, since the original Echoic Log-surprise presents certain limitations,
we propose a set of improvements: we test some alternative statistical
divergences, we introduce a new fusion strategy and we change the thresholding
mechanism used to determine if the final output signal is salient or
not for a dynamic thresholding algorithm. Results show that the most significant
modification in terms of performance is the latter, a proposal that
reduces the dispersion observed in the scores produced by the system and
enables online functioning.
Finally, our last analysis concerns the robustness of all the algorithms
presented in this thesis against environmental noise. We use noises of different
natures, from stationary noise to pre-recorded noises acquired in real
environments such as cafeterias, train stations, etc. The results suggest
that for different signal-to-noise ratios the most robust algorithm is Echoic
Log-surprise, since its detection capabilities are the least influenced by noise.La atención es definida como el mecanismo que permite a nuestro cerebro
categorizar y priorizar la información percibida mediante nuestros sentidos,
a la par que ayuda a actuar en función del contexto y los recursos mentales
disponibles. Este mecanismo puede dividirse en dos variantes: top-down y
bottom-up. La atención top-down posee un objetivo que el sujeto pretende
cumplir, e implica que el individuo posee cierto conocimiento previo sobre la
tarea que trata de realizar. Por otra parte, la atención bottom-up depende
exclusivamente de las características físicas percibidas a partir de un objeto
y su entorno, y actúa a partir de dicha información de forma autónoma y
rápida. Se teoriza que dicho mecanismo es crucial para la supervivencia de
los individuos frente a amenazas repentinas.
La atención bottom-up es comúnmente denominada saliencia, y es definida
como una propiedad de las señales que son percibidas por nuestros sentidos
y que por algún motivo destacan sobre el resto de información adquirida.
Esta tesis está relacionada con la detección automática de la saliencia en
señales acústicas mediante la utilización de algoritmos. En los últimos años
el avance en la investigación de la saliencia visual ha sido notable, un tema
en el cual la principal meta consiste en detectar qué objetos o contenido
de una escena visual son lo bastante prominentes para captar la atención
de un espectador. Sin embargo, estos avances no han sido trasladados a
otras modalidades. Tal es el caso de la saliencia auditiva, donde aún no
existe consenso sobre cómo medir la prominencia de un evento acústico,
y en consecuencia no existen bases de datos especializadas que permitan
comparar nuevos algoritmos y modelos.
En este trabajo evaluamos algunos algoritmos de detección de saliencia
auditiva. Para ello, empleamos bases de datos para la detección y clasificación
de eventos acústicos, cuyas etiquetas incluyen el tiempo de inicio
(onset) de dichos eventos entre otras características. Nuestra hipótesis se
basa en estudios psicológicos que sugieren que los seres humanos somos muy
sensibles a la aparición de objetos acústicos. Empleamos tres bases de datos:
DCASE 2016 (Task 2), MIVIA road audio events y UPC-TALP, las cuales
suman en total 3400 eventos etiquetados. Respecto a los algoritmos utilizados
en nuestro sistema de referencia (benchmark), incluimos los algoritmos
de saliencia diseñados por Kayser y Kalinli, un detector de actividad vocal
(VAD), un umbralizador energético y cuatro técnicas para la detección de
onsets en música: NWPD, WPD, CD and SF.
Presentamos dos algoritmos de saliencia auditiva: Bayesian Log-surprise
y Echoic Log-surprise. El primero es una evolución de Bayesian Surprise,
una metodología que utiliza la divergencia de Kullback-Leibler para detectar
eventos salientes o anomalías entre ventanas consecutivas de tiempo. Dado
que la señal producida por Bayesian Surprise posee ciertos inconvenientes
introducimos una serie de mejoras, entre las que destacan una etapa de compresión de la amplitud de la señal de salida y el pre-procesado de la señal de
entrada mediante la utilización de conocimiento perceptual. Denominamos
a esta metodología Bayesian Log-surprise.
Nuestro segundo algoritmo, denominado Echoic Log-surprise, combina la
información de múltiples señales de saliencia producidas mediante Bayesian
Log-surprise considerando distintas escalas temporales. El proceso de fusión
se realiza mediante la utilización de divergencias estadísticas, y las señales
de salida poseen un nivel de ruido menor a la par que un mayor rendimiento
a la hora de detectar eventos salientes.
Además, proponemos una serie de mejoras para Echoic Log-surprise
dado que observamos que presentaba ciertas limitaciones: añadimos nuevas
divergencias estadísticas al sistema para realizar la fusión, diseñamos una
nueva estrategia para llevar a cabo dicho proceso y modificamos el sistema de
umbralizado que originalmente se utilizaba para determinar si un fragmento
de señal era saliente o no. Inicialmente dicho mecanismo era estático, y
proponemos actualizarlo de tal forma se comporte de forma dinámica. Esta
última demuestra ser la mejora más significativa en términos de rendimiento,
ya que reduce la dispersión observada en las puntuaciones de evaluación entre
distintos ficheros de audio, a la par que permite que el algoritmo funcione
online.
El último análisis que proponemos pretende estudiar la robustez de los
algoritmos mencionados en esta tesis frente a ruido ambiental. Empleamos
ruido de diversa índole, desde ruido blanco estacionario hasta señales pregrabadas
en entornos reales tales y como cafeterías, estaciones de tren, etc.
Los resultados sugieren que para distintos valores de relación señal/ruido el
algoritmo más robusto es Echoic Log-surprise, dado que sus capacidades de
detección son las menos afectadas por el ruido.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Fernando Díaz de María.- Secretario: Rubén Solera Ureña.- Vocal: José Luis Pérez Córdob
Comprehensive Survey and Analysis of Techniques, Advancements, and Challenges in Video-Based Traffic Surveillance Systems
The challenges inherent in video surveillance are compounded by a several factors, like dynamic lighting conditions, the coordination of object matching, diverse environmental scenarios, the tracking of heterogeneous objects, and coping with fluctuations in object poses, occlusions, and motion blur. This research endeavor aims to undertake a rigorous and in-depth analysis of deep learning- oriented models utilized for object identification and tracking. Emphasizing the development of effective model design methodologies, this study intends to furnish a exhaustive and in-depth analysis of object tracking and identification models within the specific domain of video surveillance
Varieties of interaction: from User Experience to Neuroergonomics:On the occasion of the Human Factors and Ergonomics Society Europe Chapter Annual Meeting in Rome, Italy 2017
Proceedings of the HFES European Chapter conferenc
- …