3,909 research outputs found

    Sparse Coding on Symmetric Positive Definite Manifolds using Bregman Divergences

    Full text link
    This paper introduces sparse coding and dictionary learning for Symmetric Positive Definite (SPD) matrices, which are often used in machine learning, computer vision and related areas. Unlike traditional sparse coding schemes that work in vector spaces, in this paper we discuss how SPD matrices can be described by sparse combination of dictionary atoms, where the atoms are also SPD matrices. We propose to seek sparse coding by embedding the space of SPD matrices into Hilbert spaces through two types of Bregman matrix divergences. This not only leads to an efficient way of performing sparse coding, but also an online and iterative scheme for dictionary learning. We apply the proposed methods to several computer vision tasks where images are represented by region covariance matrices. Our proposed algorithms outperform state-of-the-art methods on a wide range of classification tasks, including face recognition, action recognition, material classification and texture categorization

    Exploring Human Vision Driven Features for Pedestrian Detection

    Full text link
    Motivated by the center-surround mechanism in the human visual attention system, we propose to use average contrast maps for the challenge of pedestrian detection in street scenes due to the observation that pedestrians indeed exhibit discriminative contrast texture. Our main contributions are first to design a local, statistical multi-channel descriptorin order to incorporate both color and gradient information. Second, we introduce a multi-direction and multi-scale contrast scheme based on grid-cells in order to integrate expressive local variations. Contributing to the issue of selecting most discriminative features for assessing and classification, we perform extensive comparisons w.r.t. statistical descriptors, contrast measurements, and scale structures. This way, we obtain reasonable results under various configurations. Empirical findings from applying our optimized detector on the INRIA and Caltech pedestrian datasets show that our features yield state-of-the-art performance in pedestrian detection.Comment: Accepted for publication in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT

    Going Deeper into Action Recognition: A Survey

    Full text link
    Understanding human actions in visual data is tied to advances in complementary research areas including object recognition, human dynamics, domain adaptation and semantic segmentation. Over the last decade, human action analysis evolved from earlier schemes that are often limited to controlled environments to nowadays advanced solutions that can learn from millions of videos and apply to almost all daily activities. Given the broad range of applications from video surveillance to human-computer interaction, scientific milestones in action recognition are achieved more rapidly, eventually leading to the demise of what used to be good in a short time. This motivated us to provide a comprehensive review of the notable steps taken towards recognizing human actions. To this end, we start our discussion with the pioneering methods that use handcrafted representations, and then, navigate into the realm of deep learning based approaches. We aim to remain objective throughout this survey, touching upon encouraging improvements as well as inevitable fallbacks, in the hope of raising fresh questions and motivating new research directions for the reader

    Statistical region-based active contours for segmentation: an overview

    No full text
    International audienceIn this paper we propose a brief survey on geometric variational approaches and more precisely on statistical region-based active contours for medical image segmentation. In these approaches, image features are considered as random variables whose distribution may be either parametric, and belongs to the exponential family, or non-parametric estimated with a kernel density method. Statistical region-based terms are listed and reviewed showing that these terms can depict a wide spectrum of segmentation problems. A shape prior can also be incorporated to the previous statistical terms. A discussion of some optimization schemes available to solve the variational problem is also provided. Examples on real medical images are given to illustrate some of the given criteria

    Bayesian and echoic log-surprise for auditory saliency detection

    Get PDF
    Mención Internacional en el título de doctorAttention is defined as the mechanism that allows the brain to categorize and prioritize information acquired using our senses and act according to the environmental context and the available mental resources. The attention mechanism can be further subdivided into two types: top-down and bottomup. Top-down attention is goal or task-driven and implies that a participant has some previous knowledge about the task that he or she is trying to solve. Alternatively, bottom-up attention only depends on the perceived features of the target object and its surroundings and is a very fast mechanism that is believed to be crucial for human survival. Bottom-up attention is commonly known as saliency or salience, and can be defined as a property of the signals that are perceived by our senses that make them attentionally prominent for some reason. This thesis is related with the concept of saliency detection using automatic algorithms for audio signals. In recent years progress in the area of visual saliency research has been remarkable, a topic where the goal consists of detecting which objects or content from a visual scene are prominent enough to capture the attention of a spectator. However, this progress has not been carried out to other alternative modalities. This is the case of auditory saliency, where there is still no consensus about how to measure the saliency of an event, and consequently there are no specific labeled datasets to compare new algorithms and proposals. In this work two new auditory saliency detection algorithms are presented and evaluated. For their evaluation, we make use of Acoustic Event Detection/Classification datasets, whose labels include onset times among other aspects. We use such datasets and labeling since there is psychological evidence suggesting that human beings are quite sensitive to the spontaneous appearance of acoustic objects. We use three datasets: DCASE 2016 (Task 2), MIVIA road audio events and UPC-TALP, totalling 3400 labeled acoustic events. Regarding the algorithms that we employ for benchmarking, these comprise techniques for saliency detection designed by Kayser and Kalinli, a voice activity detector, an energy thresholding method and four music information retrieval onset detectors: NWPD, WPD, CD and SF. We put forward two auditory saliency algorithms: Bayesian Log-surprise and Echoic Log-surprise. The former is an evolution of Bayesian Surprise, a methodology that by means of the Kullback-Leibler divergence computed between two consecutive temporal windows is capable of detecting anomalous or salient events. As the output Surprise signal has some drawbacks that should be overcome, we introduce some improvements that led to the approach that we named Bayesian Log-surprise. These include an amplitude compression stage and the addition of perceptual knowledge to pre-process the input signal. The latter, named Echoic Log-surprise, fuses several Bayesian Log-surprise signals computed considering different memory lengths that represent different temporal scales. The fusion process is performed using statistical divergences, resulting in saliency signals with certain advantages such as a significant reduction in the background noise level and a noticeable increase in the detection scores. Moreover, since the original Echoic Log-surprise presents certain limitations, we propose a set of improvements: we test some alternative statistical divergences, we introduce a new fusion strategy and we change the thresholding mechanism used to determine if the final output signal is salient or not for a dynamic thresholding algorithm. Results show that the most significant modification in terms of performance is the latter, a proposal that reduces the dispersion observed in the scores produced by the system and enables online functioning. Finally, our last analysis concerns the robustness of all the algorithms presented in this thesis against environmental noise. We use noises of different natures, from stationary noise to pre-recorded noises acquired in real environments such as cafeterias, train stations, etc. The results suggest that for different signal-to-noise ratios the most robust algorithm is Echoic Log-surprise, since its detection capabilities are the least influenced by noise.La atención es definida como el mecanismo que permite a nuestro cerebro categorizar y priorizar la información percibida mediante nuestros sentidos, a la par que ayuda a actuar en función del contexto y los recursos mentales disponibles. Este mecanismo puede dividirse en dos variantes: top-down y bottom-up. La atención top-down posee un objetivo que el sujeto pretende cumplir, e implica que el individuo posee cierto conocimiento previo sobre la tarea que trata de realizar. Por otra parte, la atención bottom-up depende exclusivamente de las características físicas percibidas a partir de un objeto y su entorno, y actúa a partir de dicha información de forma autónoma y rápida. Se teoriza que dicho mecanismo es crucial para la supervivencia de los individuos frente a amenazas repentinas. La atención bottom-up es comúnmente denominada saliencia, y es definida como una propiedad de las señales que son percibidas por nuestros sentidos y que por algún motivo destacan sobre el resto de información adquirida. Esta tesis está relacionada con la detección automática de la saliencia en señales acústicas mediante la utilización de algoritmos. En los últimos años el avance en la investigación de la saliencia visual ha sido notable, un tema en el cual la principal meta consiste en detectar qué objetos o contenido de una escena visual son lo bastante prominentes para captar la atención de un espectador. Sin embargo, estos avances no han sido trasladados a otras modalidades. Tal es el caso de la saliencia auditiva, donde aún no existe consenso sobre cómo medir la prominencia de un evento acústico, y en consecuencia no existen bases de datos especializadas que permitan comparar nuevos algoritmos y modelos. En este trabajo evaluamos algunos algoritmos de detección de saliencia auditiva. Para ello, empleamos bases de datos para la detección y clasificación de eventos acústicos, cuyas etiquetas incluyen el tiempo de inicio (onset) de dichos eventos entre otras características. Nuestra hipótesis se basa en estudios psicológicos que sugieren que los seres humanos somos muy sensibles a la aparición de objetos acústicos. Empleamos tres bases de datos: DCASE 2016 (Task 2), MIVIA road audio events y UPC-TALP, las cuales suman en total 3400 eventos etiquetados. Respecto a los algoritmos utilizados en nuestro sistema de referencia (benchmark), incluimos los algoritmos de saliencia diseñados por Kayser y Kalinli, un detector de actividad vocal (VAD), un umbralizador energético y cuatro técnicas para la detección de onsets en música: NWPD, WPD, CD and SF. Presentamos dos algoritmos de saliencia auditiva: Bayesian Log-surprise y Echoic Log-surprise. El primero es una evolución de Bayesian Surprise, una metodología que utiliza la divergencia de Kullback-Leibler para detectar eventos salientes o anomalías entre ventanas consecutivas de tiempo. Dado que la señal producida por Bayesian Surprise posee ciertos inconvenientes introducimos una serie de mejoras, entre las que destacan una etapa de compresión de la amplitud de la señal de salida y el pre-procesado de la señal de entrada mediante la utilización de conocimiento perceptual. Denominamos a esta metodología Bayesian Log-surprise. Nuestro segundo algoritmo, denominado Echoic Log-surprise, combina la información de múltiples señales de saliencia producidas mediante Bayesian Log-surprise considerando distintas escalas temporales. El proceso de fusión se realiza mediante la utilización de divergencias estadísticas, y las señales de salida poseen un nivel de ruido menor a la par que un mayor rendimiento a la hora de detectar eventos salientes. Además, proponemos una serie de mejoras para Echoic Log-surprise dado que observamos que presentaba ciertas limitaciones: añadimos nuevas divergencias estadísticas al sistema para realizar la fusión, diseñamos una nueva estrategia para llevar a cabo dicho proceso y modificamos el sistema de umbralizado que originalmente se utilizaba para determinar si un fragmento de señal era saliente o no. Inicialmente dicho mecanismo era estático, y proponemos actualizarlo de tal forma se comporte de forma dinámica. Esta última demuestra ser la mejora más significativa en términos de rendimiento, ya que reduce la dispersión observada en las puntuaciones de evaluación entre distintos ficheros de audio, a la par que permite que el algoritmo funcione online. El último análisis que proponemos pretende estudiar la robustez de los algoritmos mencionados en esta tesis frente a ruido ambiental. Empleamos ruido de diversa índole, desde ruido blanco estacionario hasta señales pregrabadas en entornos reales tales y como cafeterías, estaciones de tren, etc. Los resultados sugieren que para distintos valores de relación señal/ruido el algoritmo más robusto es Echoic Log-surprise, dado que sus capacidades de detección son las menos afectadas por el ruido.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Fernando Díaz de María.- Secretario: Rubén Solera Ureña.- Vocal: José Luis Pérez Córdob

    Machine Learning in Image Analysis and Pattern Recognition

    Get PDF
    This book is to chart the progress in applying machine learning, including deep learning, to a broad range of image analysis and pattern recognition problems and applications. In this book, we have assembled original research articles making unique contributions to the theory, methodology and applications of machine learning in image analysis and pattern recognition
    corecore