31 research outputs found
Recommended from our members
Recognizing human activity using RGBD data
textTraditional computer vision algorithms try to understand the world using visible light cameras. However, there are inherent limitations of this type of data source. First, visible light images are sensitive to illumination changes and background clutter. Second, the 3D structural information of the scene is lost when projecting the 3D world to 2D images. Recovering the 3D information from 2D images is a challenging problem. Range sensors have existed for over thirty years, which capture 3D characteristics of the scene. However, earlier range sensors were either too expensive, difficult to use in human environments, slow at acquiring data, or provided a poor estimation of distance. Recently, the easy access to the RGBD data at real-time frame rate is leading to a revolution in perception and inspired many new research using RGBD data. I propose algorithms to detect persons and understand the activities using RGBD data. I demonstrate the solutions to many computer vision problems may be improved with the added depth channel. The 3D structural information may give rise to algorithms with real-time and view-invariant properties in a faster and easier fashion. When both data sources are available, the features extracted from the depth channel may be combined with traditional features computed from RGB channels to generate more robust systems with enhanced recognition abilities, which may be able to deal with more challenging scenarios. As a starting point, the first problem is to find the persons of various poses in the scene, including moving or static persons. Localizing humans from RGB images is limited by the lighting conditions and background clutter. Depth image gives alternative ways to find the humans in the scene. In the past, detection of humans from range data is usually achieved by tracking, which does not work for indoor person detection. In this thesis, I propose a model based approach to detect the persons using the structural information embedded in the depth image. I propose a 2D head contour model and a 3D head surface model to look for the head-shoulder part of the person. Then, a segmentation scheme is proposed to segment the full human body from the background and extract the contour. I also give a tracking algorithm based on the detection result. I further research on recognizing human actions and activities. I propose two features for recognizing human activities. The first feature is drawn from the skeletal joint locations estimated from a depth image. It is a compact representation of the human posture called histograms of 3D joint locations (HOJ3D). This representation is view-invariant and the whole algorithm runs at real-time. This feature may benefit many applications to get a fast estimation of the posture and action of the human subject. The second feature is a spatio-temporal feature for depth video, which is called Depth Cuboid Similarity Feature (DCSF). The interest points are extracted using an algorithm that effectively suppresses the noise and finds salient human motions. DCSF is extracted centered on each interest point, which forms the description of the video contents. This descriptor can be used to recognize the activities with no dependence on skeleton information or pre-processing steps such as motion segmentation, tracking, or even image de-noising or hole-filling. It is more flexible and widely applicable to many scenarios. Finally, all the features herein developed are combined to solve a novel problem: first-person human activity recognition using RGBD data. Traditional activity recognition algorithms focus on recognizing activities from a third-person perspective. I propose to recognize activities from a first-person perspective with RGBD data. This task is very novel and extremely challenging due to the large amount of camera motion either due to self exploration or the response of the interaction. I extracted 3D optical flow features as the motion descriptor, 3D skeletal joints features as posture descriptors, spatio-temporal features as local appearance descriptors to describe the first-person videos. To address the ego-motion of the camera, I propose an attention mask to guide the recognition procedures and separate the features on the ego-motion region and independent-motion region. The 3D features are very useful at summarizing the discerning information of the activities. In addition, the combination of the 3D features with existing 2D features brings more robust recognition results and make the algorithm capable of dealing with more challenging cases.Electrical and Computer Engineerin
Design, implementation and evaluation of automated surveillance systems
El reconocimiento de patrones ha conseguido un nivel de complejidad que nos permite reconocer diferente
tipo de eventos, incluso peligros, y actuar en concordancia para minimizar el impacto de una situación
complicada y abordarla de la mejor manera posible. Sin embargo, creemos que todavía se puede llegar
a alcanzar aplicaciones más eficientes con algoritmos más precisos. Nuestra aplicación quiere probar
a incluir el nuevo paradigma de la programación, las redes neuronales. Nuestra idea en principio fue
explorar la alternativa que las nuevas redes neuronales convolucionales aportaban, en donde se podía
ver en vídeos de ejemplos la alta tasa de detección e identificación que, por ejemplo, YOLOv2 podría
mostrar. Después de comparar las características, vimos que YOLOv3 ofrecía un buen balance entre
precisión y rapidez como comentaremos más adelante. Debido a la tasa de baja detecciones, haremos
uso de los filtros de Kalman para ayudarnos a la hora de hacer reidentificación de personas y objetos.
En este proyecto, haremos un estudio además de las alternativas de videovigilancia con las que cuentan
empresas del sector y veremos que clase de productos ofrecen y, por otro lado, observaremos cuales son
los trabajos de los grupos de investigadores de otras universidades que más similitudes tienen con nuestro objetivo. Dedicaremos, por lo tanto, el uso de esta red neuronal para detectar eventos como el abandono de mochilas y para mostrar la densidad de tránsito en localizaciones concretas, así como utilizaremos una metodología más tradicional, el flujo óptico, para detectar actuaciones anormales en una multitud.Automatic surveillance system is getting more and more sophisticated with the increasing calculation
power that computers are reaching. The aim of this project is to take advantage of these tools and
with the new classification and detection technology brought by neural networks, develop a surveillance
application that can recognize certain behaviours (which are the detection of lost backpacks and suitcases,
detection of abnormal crowd activity and heatmap of density occupation). To develop this program,
python has been the selected programming language used, where YOLO and OpenCV form the spine of
this project. After testing the code, it has been proved that due to the constrains of the detection for
small objects, the project does not perform as it should for real development, but still it shows potential
for the detection of lost backpacks in certain videos from the GBA dataset [1] and PETS2006 dataset [2].
The abnormal activity detection for crowds is made with a simple algorithm that seems to perform well,
detecting the anomalies in all the testing dataset used, generated by the University of Minnesota [3].
Finally, the heatmap can display correctly the projection of people on the ground for five second, just as
intended. The objective of this software is to be part of the core of what could be a future application
with more modules that will be able to perform full automated surveillance tasks and gather useful
information data, and these advances and future proposal will be explained in this memory.Máster Universitario en Ingeniería Industrial (M141
Similarity learning for person re-identification and semantic video retrieval
Many computer vision problems boil down to the learning of a good visual similarity function that calculates a score of how likely two instances share the same semantic concept. In this thesis, we focus on two problems related to similarity learning: Person Re-Identification, and Semantic Video Retrieval.
Person Re-Identification aims to maintain the identity of an individual in diverse locations through different non-overlapping camera views. Starting with two cameras, we propose a novel visual word co-occurrence based appearance model to measure the similarities between pedestrian images. This model naturally accounts for spatial similarities and variations caused by pose, illumination and configuration changes across camera views. As a generalization to multiple camera views, we introduce the Group Membership Prediction (GMP) problem. The GMP problem involves predicting whether a collection of instances shares the same semantic property. In this context, we propose a novel probability model and introduce latent view-specific and view-shared random variables to jointly account for the view-specific appearance and cross-view similarities among data instances. Our method is tested on various benchmarks demonstrating superior accuracy over state-of-art.
Semantic Video Retrieval seeks to match complex activities in a surveillance video to user described queries. In surveillance scenarios with noise and clutter usually present, visual uncertainties introduced by error-prone low-level detectors, classifiers and trackers compose a significant part of the semantic gap between user defined queries and the archive video. To bridge the gap, we propose a novel probabilistic activity localization formulation that incorporates learning of object attributes, between-object relationships, and object re-identification without activity-level training data. Our experiments demonstrate that the introduction of similarity learning components effectively compensate for noise and error in previous stages, and result in preferable performance on both aerial and ground surveillance videos.
Considering the computational complexity of our similarity learning models, we attempt to develop a way of training complicated models efficiently while remaining good performance. As a proof-of-concept, we propose training deep neural networks for supervised learning of hash codes. With slight changes in the optimization formulation, we could explore the possibilities of incorporating the training framework for Person Re-Identification and related problems.2019-07-09T00:00:00