172 research outputs found

    Efficient Pedestrian Detection in Urban Traffic Scenes

    Get PDF
    Pedestrians are important participants in urban traffic environments, and thus act as an interesting category of objects for autonomous cars. Automatic pedestrian detection is an essential task for protecting pedestrians from collision. In this thesis, we investigate and develop novel approaches by interpreting spatial and temporal characteristics of pedestrians, in three different aspects: shape, cognition and motion. The special up-right human body shape, especially the geometry of the head and shoulder area, is the most discriminative characteristic for pedestrians from other object categories. Inspired by the success of Haar-like features for detecting human faces, which also exhibit a uniform shape structure, we propose to design particular Haar-like features for pedestrians. Tailored to a pre-defined statistical pedestrian shape model, Haar-like templates with multiple modalities are designed to describe local difference of the shape structure. Cognition theories aim to explain how human visual systems process input visual signals in an accurate and fast way. By emulating the center-surround mechanism in human visual systems, we design multi-channel, multi-direction and multi-scale contrast features, and boost them to respond to the appearance of pedestrians. In this way, our detector is considered as a top-down saliency system. In the last part of this thesis, we exploit the temporal characteristics for moving pedestrians and then employ motion information for feature design, as well as for regions of interest (ROIs) selection. Motion segmentation on optical flow fields enables us to select those blobs most probably containing moving pedestrians; a combination of Histogram of Oriented Gradients (HOG) and motion self difference features further enables robust detection. We test our three approaches on image and video data captured in urban traffic scenes, which are rather challenging due to dynamic and complex backgrounds. The achieved results demonstrate that our approaches reach and surpass state-of-the-art performance, and can also be employed for other applications, such as indoor robotics or public surveillance. In this thesis, we investigate and develop novel approaches by interpreting spatial and temporal characteristics of pedestrians, in three different aspects: shape, cognition and motion. The special up-right human body shape, especially the geometry of the head and shoulder area, is the most discriminative characteristic for pedestrians from other object categories. Inspired by the success of Haar-like features for detecting human faces, which also exhibit a uniform shape structure, we propose to design particular Haar-like features for pedestrians. Tailored to a pre-defined statistical pedestrian shape model, Haar-like templates with multiple modalities are designed to describe local difference of the shape structure. Cognition theories aim to explain how human visual systems process input visual signals in an accurate and fast way. By emulating the center-surround mechanism in human visual systems, we design multi-channel, multi-direction and multi-scale contrast features, and boost them to respond to the appearance of pedestrians. In this way, our detector is considered as a top-down saliency system. In the last part of this thesis, we exploit the temporal characteristics for moving pedestrians and then employ motion information for feature design, as well as for regions of interest (ROIs) selection. Motion segmentation on optical flow fields enables us to select those blobs most probably containing moving pedestrians; a combination of Histogram of Oriented Gradients (HOG) and motion self difference features further enables robust detection. We test our three approaches on image and video data captured in urban traffic scenes, which are rather challenging due to dynamic and complex backgrounds. The achieved results demonstrate that our approaches reach and surpass state-of-the-art performance, and can also be employed for other applications, such as indoor robotics or public surveillance

    Fast Uncertainty Estimation for Deep Learning Based Optical Flow

    Get PDF
    We present a novel approach to reduce the processing time required to derive the estimation uncertainty map in deep learning-based optical flow determination methods. Without uncertainty aware reasoning, the optical flow model, especially when it is used for mission critical fields such as robotics and aerospace, can cause catastrophic failures. Although several approaches such as the ones based on Bayesian neural networks have been proposed to handle this issue, they are computationally expensive. Thus, to speed up the processing time, our approach applies a generative model, which is trained by input images and an uncertainty map derived through a Bayesian approach. By using synthetically generated images of spacecraft, we demonstrate that the trained generative model can produce the uncertainty map 100∼700 times faster than the conventional uncertainty estimation method used for training the generative model itself. We also show that the quality of uncertainty map derived by the generative model is close to that of the original uncertainty map. By applying the proposed approach, the deep learning model operated in real-time can avoid disastrous failures by considering the uncertainty as well as achieving better performance removing uncertain portions of the prediction result

    Fast Uncertainty Estimation for Deep Learning Based Optical Flow

    Get PDF
    We present a novel approach to reduce the processing time required to derive the estimation uncertainty map in deep learning-based optical flow determination methods. Without uncertainty aware reasoning, the optical flow model, especially when it is used for mission critical fields such as robotics and aerospace, can cause catastrophic failures. Although several approaches such as the ones based on Bayesian neural networks have been proposed to handle this issue, they are computationally expensive. Thus, to speed up the processing time, our approach applies a generative model, which is trained by input images and an uncertainty map derived through a Bayesian approach. By using synthetically generated images of spacecraft, we demonstrate that the trained generative model can produce the uncertainty map 100∼700 times faster than the conventional uncertainty estimation method used for training the generative model itself. We also show that the quality of uncertainty map derived by the generative model is close to that of the original uncertainty map. By applying the proposed approach, the deep learning model operated in real-time can avoid disastrous failures by considering the uncertainty as well as achieving better performance removing uncertain portions of the prediction result

    Spatial and temporal background modelling of non-stationary visual scenes

    Get PDF
    PhDThe prevalence of electronic imaging systems in everyday life has become increasingly apparent in recent years. Applications are to be found in medical scanning, automated manufacture, and perhaps most significantly, surveillance. Metropolitan areas, shopping malls, and road traffic management all employ and benefit from an unprecedented quantity of video cameras for monitoring purposes. But the high cost and limited effectiveness of employing humans as the final link in the monitoring chain has driven scientists to seek solutions based on machine vision techniques. Whilst the field of machine vision has enjoyed consistent rapid development in the last 20 years, some of the most fundamental issues still remain to be solved in a satisfactory manner. Central to a great many vision applications is the concept of segmentation, and in particular, most practical systems perform background subtraction as one of the first stages of video processing. This involves separation of ‘interesting foreground’ from the less informative but persistent background. But the definition of what is ‘interesting’ is somewhat subjective, and liable to be application specific. Furthermore, the background may be interpreted as including the visual appearance of normal activity of any agents present in the scene, human or otherwise. Thus a background model might be called upon to absorb lighting changes, moving trees and foliage, or normal traffic flow and pedestrian activity, in order to effect what might be termed in ‘biologically-inspired’ vision as pre-attentive selection. This challenge is one of the Holy Grails of the computer vision field, and consequently the subject has received considerable attention. This thesis sets out to address some of the limitations of contemporary methods of background segmentation by investigating methods of inducing local mutual support amongst pixels in three starkly contrasting paradigms: (1) locality in the spatial domain, (2) locality in the shortterm time domain, and (3) locality in the domain of cyclic repetition frequency. Conventional per pixel models, such as those based on Gaussian Mixture Models, offer no spatial support between adjacent pixels at all. At the other extreme, eigenspace models impose a structure in which every image pixel bears the same relation to every other pixel. But Markov Random Fields permit definition of arbitrary local cliques by construction of a suitable graph, and 3 are used here to facilitate a novel structure capable of exploiting probabilistic local cooccurrence of adjacent Local Binary Patterns. The result is a method exhibiting strong sensitivity to multiple learned local pattern hypotheses, whilst relying solely on monochrome image data. Many background models enforce temporal consistency constraints on a pixel in attempt to confirm background membership before being accepted as part of the model, and typically some control over this process is exercised by a learning rate parameter. But in busy scenes, a true background pixel may be visible for a relatively small fraction of the time and in a temporally fragmented fashion, thus hindering such background acquisition. However, support in terms of temporal locality may still be achieved by using Combinatorial Optimization to derive shortterm background estimates which induce a similar consistency, but are considerably more robust to disturbance. A novel technique is presented here in which the short-term estimates act as ‘pre-filtered’ data from which a far more compact eigen-background may be constructed. Many scenes entail elements exhibiting repetitive periodic behaviour. Some road junctions employing traffic signals are among these, yet little is to be found amongst the literature regarding the explicit modelling of such periodic processes in a scene. Previous work focussing on gait recognition has demonstrated approaches based on recurrence of self-similarity by which local periodicity may be identified. The present work harnesses and extends this method in order to characterize scenes displaying multiple distinct periodicities by building a spatio-temporal model. The model may then be used to highlight abnormality in scene activity. Furthermore, a Phase Locked Loop technique with a novel phase detector is detailed, enabling such a model to maintain correct synchronization with scene activity in spite of noise and drift of periodicity. This thesis contends that these three approaches are all manifestations of the same broad underlying concept: local support in each of the space, time and frequency domains, and furthermore, that the support can be harnessed practically, as will be demonstrated experimentally

    Audio source separation for music in low-latency and high-latency scenarios

    Get PDF
    Aquesta tesi proposa mètodes per tractar les limitacions de les tècniques existents de separació de fonts musicals en condicions de baixa i alta latència. En primer lloc, ens centrem en els mètodes amb un baix cost computacional i baixa latència. Proposem l'ús de la regularització de Tikhonov com a mètode de descomposició de l'espectre en el context de baixa latència. El comparem amb les tècniques existents en tasques d'estimació i seguiment dels tons, que són passos crucials en molts mètodes de separació. A continuació utilitzem i avaluem el mètode de descomposició de l'espectre en tasques de separació de veu cantada, baix i percussió. En segon lloc, proposem diversos mètodes d'alta latència que milloren la separació de la veu cantada, gràcies al modelatge de components específics, com la respiració i les consonants. Finalment, explorem l'ús de correlacions temporals i anotacions manuals per millorar la separació dels instruments de percussió i dels senyals musicals polifònics complexes.Esta tesis propone métodos para tratar las limitaciones de las técnicas existentes de separación de fuentes musicales en condiciones de baja y alta latencia. En primer lugar, nos centramos en los métodos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularización de Tikhonov como método de descomposición del espectro en el contexto de baja latencia. Lo comparamos con las técnicas existentes en tareas de estimación y seguimiento de los tonos, que son pasos cruciales en muchos métodos de separación. A continuación utilizamos y evaluamos el método de descomposición del espectro en tareas de separación de voz cantada, bajo y percusión. En segundo lugar, proponemos varios métodos de alta latencia que mejoran la separación de la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiración y las consonantes. Finalmente, exploramos el uso de correlaciones temporales y anotaciones manuales para mejorar la separación de los instrumentos de percusión y señales musicales polifónicas complejas.This thesis proposes specific methods to address the limitations of current music source separation methods in low-latency and high-latency scenarios. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decomposition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals

    Predicting human behavior in smart environments: theory and application to gaze prediction

    Get PDF
    Predicting human behavior is desirable in many application scenarios in smart environments. The existing models for eye movements do not take contextual factors into account. This addressed in this thesis using a systematic machine-learning approach, where user profiles for eye movements behaviors are learned from data. In addition, a theoretical innovation is presented, which goes beyond pure data analysis. The thesis proposed the modeling of eye movements as a Markov Decision Processes. It uses Inverse Reinforcement Learning paradigm to infer the user eye movements behaviors

    Multimodal Computational Attention for Scene Understanding

    Get PDF
    Robotic systems have limited computational capacities. Hence, computational attention models are important to focus on specific stimuli and allow for complex cognitive processing. For this purpose, we developed auditory and visual attention models that enable robotic platforms to efficiently explore and analyze natural scenes. To allow for attention guidance in human-robot interaction, we use machine learning to integrate the influence of verbal and non-verbal social signals into our models

    World Modeling for Intelligent Autonomous Systems

    Get PDF
    The functioning of intelligent autonomous systems requires constant situation awareness and cognition analysis. Thus, it needs a memory structure that contains a description of the surrounding environment (world model) and serves as a central information hub. This book presents a row of theoretical and experimental results in the field of world modeling. This includes areas of dynamic and prior knowledge modeling, information fusion, management and qualitative/quantitative information analysis
    • …
    corecore