112 research outputs found
Recommended from our members
Recognizing human activity using RGBD data
textTraditional computer vision algorithms try to understand the world using visible light cameras. However, there are inherent limitations of this type of data source. First, visible light images are sensitive to illumination changes and background clutter. Second, the 3D structural information of the scene is lost when projecting the 3D world to 2D images. Recovering the 3D information from 2D images is a challenging problem. Range sensors have existed for over thirty years, which capture 3D characteristics of the scene. However, earlier range sensors were either too expensive, difficult to use in human environments, slow at acquiring data, or provided a poor estimation of distance. Recently, the easy access to the RGBD data at real-time frame rate is leading to a revolution in perception and inspired many new research using RGBD data. I propose algorithms to detect persons and understand the activities using RGBD data. I demonstrate the solutions to many computer vision problems may be improved with the added depth channel. The 3D structural information may give rise to algorithms with real-time and view-invariant properties in a faster and easier fashion. When both data sources are available, the features extracted from the depth channel may be combined with traditional features computed from RGB channels to generate more robust systems with enhanced recognition abilities, which may be able to deal with more challenging scenarios. As a starting point, the first problem is to find the persons of various poses in the scene, including moving or static persons. Localizing humans from RGB images is limited by the lighting conditions and background clutter. Depth image gives alternative ways to find the humans in the scene. In the past, detection of humans from range data is usually achieved by tracking, which does not work for indoor person detection. In this thesis, I propose a model based approach to detect the persons using the structural information embedded in the depth image. I propose a 2D head contour model and a 3D head surface model to look for the head-shoulder part of the person. Then, a segmentation scheme is proposed to segment the full human body from the background and extract the contour. I also give a tracking algorithm based on the detection result. I further research on recognizing human actions and activities. I propose two features for recognizing human activities. The first feature is drawn from the skeletal joint locations estimated from a depth image. It is a compact representation of the human posture called histograms of 3D joint locations (HOJ3D). This representation is view-invariant and the whole algorithm runs at real-time. This feature may benefit many applications to get a fast estimation of the posture and action of the human subject. The second feature is a spatio-temporal feature for depth video, which is called Depth Cuboid Similarity Feature (DCSF). The interest points are extracted using an algorithm that effectively suppresses the noise and finds salient human motions. DCSF is extracted centered on each interest point, which forms the description of the video contents. This descriptor can be used to recognize the activities with no dependence on skeleton information or pre-processing steps such as motion segmentation, tracking, or even image de-noising or hole-filling. It is more flexible and widely applicable to many scenarios. Finally, all the features herein developed are combined to solve a novel problem: first-person human activity recognition using RGBD data. Traditional activity recognition algorithms focus on recognizing activities from a third-person perspective. I propose to recognize activities from a first-person perspective with RGBD data. This task is very novel and extremely challenging due to the large amount of camera motion either due to self exploration or the response of the interaction. I extracted 3D optical flow features as the motion descriptor, 3D skeletal joints features as posture descriptors, spatio-temporal features as local appearance descriptors to describe the first-person videos. To address the ego-motion of the camera, I propose an attention mask to guide the recognition procedures and separate the features on the ego-motion region and independent-motion region. The 3D features are very useful at summarizing the discerning information of the activities. In addition, the combination of the 3D features with existing 2D features brings more robust recognition results and make the algorithm capable of dealing with more challenging cases.Electrical and Computer Engineerin
SDR-GAIN: A High Real-Time Occluded Pedestrian Pose Completion Method for Autonomous Driving
To mitigate the challenges arising from partial occlusion in human pose
keypoint based pedestrian detection methods , we present a novel pedestrian
pose keypoint completion method called the separation and dimensionality
reduction-based generative adversarial imputation networks (SDR-GAIN) .
Firstly, we utilize OpenPose to estimate pedestrian poses in images. Then, we
isolate the head and torso keypoints of pedestrians with incomplete keypoints
due to occlusion or other factors and perform dimensionality reduction to
enhance features and further unify feature distribution. Finally, we introduce
two generative models based on the generative adversarial networks (GAN)
framework, which incorporate Huber loss, residual structure, and L1
regularization to generate missing parts of the incomplete head and torso pose
keypoints of partially occluded pedestrians, resulting in pose completion. Our
experiments on MS COCO and JAAD datasets demonstrate that SDR-GAIN outperforms
basic GAIN framework, interpolation methods PCHIP and MAkima, machine learning
methods k-NN and MissForest in terms of pose completion task. In addition, the
runtime of SDR-GAIN is approximately 0.4ms, displaying high real-time
performance and significant application value in the field of autonomous
driving
Automatic visual detection of human behavior: a review from 2000 to 2014
Due to advances in information technology (e.g., digital video cameras, ubiquitous sensors), the automatic detection of human behaviors from video is a very recent research topic. In this paper, we perform a systematic and recent literature review on this topic, from 2000 to 2014, covering a selection of 193 papers that were searched from six major scientific publishers. The selected papers were classified into three main subjects: detection techniques, datasets and applications. The detection techniques were divided into four categories (initialization, tracking, pose estimation and recognition). The list of datasets includes eight examples (e.g., Hollywood action). Finally, several application areas were identified, including human detection, abnormal activity detection, action recognition, player modeling and pedestrian detection. Our analysis provides a road map to guide future research for designing automatic visual human behavior detection systems.This work is funded by the Portuguese Foundation for Science and Technology (FCT - Fundacao para a Ciencia e a Tecnologia) under research Grant SFRH/BD/84939/2012
Human Pose Estimation from Monocular Images : a Comprehensive Survey
Human pose estimation refers to the estimation of the location of body parts and how they are connected in an image. Human pose estimation from monocular images has wide applications (e.g., image indexing). Several surveys on human pose estimation can be found in the literature, but they focus on a certain category; for example, model-based approaches or human motion analysis, etc. As far as we know, an overall review of this problem domain has yet to be provided. Furthermore, recent advancements based on deep learning have brought novel algorithms for this problem. In this paper, a comprehensive survey of human pose estimation from monocular images is carried out including milestone works and recent advancements. Based on one standard pipeline for the solution of computer vision problems, this survey splits the problema into several modules: feature extraction and description, human body models, and modelin methods. Problem modeling methods are approached based on two means of categorization in this survey. One way to categorize includes top-down and bottom-up methods, and another way includes generative and discriminative methods. Considering the fact that one direct application of human pose estimation is to provide initialization for automatic video surveillance, there are additional sections for motion-related methods in all modules: motion features, motion models, and motion-based methods. Finally, the paper also collects 26 publicly available data sets for validation and provides error measurement methods that are frequently used
Efficient Pedestrian Detection in Urban Traffic Scenes
Pedestrians are important participants in urban traffic environments, and thus act as an interesting category of objects for autonomous cars. Automatic pedestrian detection is an essential task for protecting pedestrians from collision. In this thesis, we investigate and develop novel approaches by interpreting spatial and temporal characteristics of pedestrians, in three different aspects: shape, cognition and motion. The special up-right human body shape, especially the geometry of the head and shoulder area, is the most discriminative characteristic for pedestrians from other object categories. Inspired by the success of Haar-like features for detecting human faces, which also exhibit a uniform shape structure, we propose to design particular Haar-like features for pedestrians. Tailored to a pre-defined statistical pedestrian shape model, Haar-like templates with multiple modalities are designed to describe local difference of the shape structure. Cognition theories aim to explain how human visual systems process input visual signals in an accurate and fast way. By emulating the center-surround mechanism in human visual systems, we design multi-channel, multi-direction and multi-scale contrast features, and boost them to respond to the appearance of pedestrians. In this way, our detector is considered as a top-down saliency system. In the last part of this thesis, we exploit the temporal characteristics for moving pedestrians and then employ motion information for feature design, as well as for regions of interest (ROIs) selection. Motion segmentation on optical flow fields enables us to select those blobs most probably containing moving pedestrians; a combination of Histogram of Oriented Gradients (HOG) and motion self difference features further enables robust detection. We test our three approaches on image and video data captured in urban traffic scenes, which are rather challenging due to dynamic and complex backgrounds. The achieved results demonstrate that our approaches reach and surpass state-of-the-art performance, and can also be employed for other applications, such as indoor robotics or public surveillance. In this thesis, we investigate and develop novel approaches by interpreting spatial and temporal characteristics of pedestrians, in three different aspects: shape, cognition and motion. The special up-right human body shape, especially the geometry of the head and shoulder area, is the most discriminative characteristic for pedestrians from other object categories. Inspired by the success of Haar-like features for detecting human faces, which also exhibit a uniform shape structure, we propose to design particular Haar-like features for pedestrians. Tailored to a pre-defined statistical pedestrian shape model, Haar-like templates with multiple modalities are designed to describe local difference of the shape structure. Cognition theories aim to explain how human visual systems process input visual signals in an accurate and fast way. By emulating the center-surround mechanism in human visual systems, we design multi-channel, multi-direction and multi-scale contrast features, and boost them to respond to the appearance of pedestrians. In this way, our detector is considered as a top-down saliency system. In the last part of this thesis, we exploit the temporal characteristics for moving pedestrians and then employ motion information for feature design, as well as for regions of interest (ROIs) selection. Motion segmentation on optical flow fields enables us to select those blobs most probably containing moving pedestrians; a combination of Histogram of Oriented Gradients (HOG) and motion self difference features further enables robust detection. We test our three approaches on image and video data captured in urban traffic scenes, which are rather challenging due to dynamic and complex backgrounds. The achieved results demonstrate that our approaches reach and surpass state-of-the-art performance, and can also be employed for other applications, such as indoor robotics or public surveillance
Human Pose Tracking from Monocular Image Sequences
This thesis proposes various novel approaches for improving the performance of automatic 2D human pose tracking system including multi-scale strategy, mid-level spatial dependencies to constrain more relations of multiple body parts, additional constraints between symmetric body parts and the left/right confusion correction by a head orientation estimator. These proposed approaches are employed to develop a complete human pose tracking system. The experimental results demonstrate significant improvements of all the proposed approaches towards accuracy and efficiency
CAMBADA@Home: deteção e seguimento de humanos
Mestrado em Engenharia Electrónica e TelecomunicaçõesEste trabalho apresenta uma abordagem ao problema da deteção e seguimento
de humanos, usando uma câmara RGB-D. Existem soluções propostas
para este tipo de problema, no entanto, algumas são baseadas em
técnicas de extração de fundo ou outras e, como tal, necessitam que a
câmara se encontre numa posição estacionária. Com o sistema proposto,
a deteção e seguimento podem ser desempenhadas enquanto a câmara se
move, em tempo real.
O objetivo deste projeto é a implementação de um sistema de deteção
e seguimento de pessoas para o robô de serviço CAMBADA@Home, permitindo
assim o desenvolvimento de futuras aplicações na área da interação
humano-robô.
O sistema aqui descrito permite realizar deteção, classificação e monitorização de múltiplas pessoas. Na primeira etapa, regiões de interesse (ROIs)
são segmentadas através da análise do histograma da imagem de profundidade
seguido da utilização de um algoritmo de preenchimento. Na etapa
seguinte, cada região é classificada como humana ou não-humana através
de uma técnica de correspondência de modelos, baseada no algoritmo de
descida de gradiantes RPROP, com suporte para múltiplos modelos. A terceira
e última etapa permite a monitorização de várias pessoas, através de
um método de atribuição de identificadores únicos baseado em comparação
de histogramas, assim como estimação de pose e localização.
Os resultados obtidos em ambiente não controlado são encorajadores, com
altas taxas de deteção, e, em geral, os algoritmos de estimação de pose
e localização são executados como esperado. Para além disto, o projeto
CAMBADA@Home foi premiado com o primeiro lugar no Desafio Free Bots,
que teve lugar durante o campeonato nacional de robótica, Robótica 2013,
onde o robô provou ser capaz de executar rondas autónomas num ambiente
desconhecido enquanto detetava e monitorizava pessoas com as quais se
cruzava.This work presents an approach to the people detection and tracking problem,
using an RGB-D camera. While there are already solutions for this
problem, some are based on background extraction techniques or other,
which require the camera to be in a stationary position. With the proposed
method, detection and tracking can be performed while the camera is moving,
in real time.
The aim of this project is the implementation of a people detection and
tracking system for the CAMBADA@Home service robot, enabling the development
of further human-robot interaction applications.
The system here described enables object detection, classi cation and multiple
person tracking. In the rst stage, regions of interest (ROIs) are
segmented through the analysis of the depth image histogram and using a
ood ll algorithm. On the next stage, each region is classi ed as human
or not-human using a template matching technique, based on the RPROP
gradient descent algorithm, with support for multiple templates. The third
and last stage enables the tracking for multiple persons, using a unique
identi cation assignment method based on histogram comparison, as well
as pose and location estimation.
The results obtained in unconstrained environments are encouraging, with
high detection rates, and, in general, the algorithms for pose and location
estimation perform as expected. Furthermore the CAMBADA@Home
project has been awarded with the rst place in the Free Bots Challenge,
which took place on the Rob otica 2013 robotics national championship,
where the robot was proven to be capable of performing autonomous tours
in an unknown environment while at the same time detecting and tracking
people it came across
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
- …