Search CORE

112 research outputs found

Recommended from our members

Recognizing human activity using RGBD data

Author: Xia Lu, active 21st century
Publication venue
Publication date: 03/07/2014
Field of study

textTraditional computer vision algorithms try to understand the world using visible light cameras. However, there are inherent limitations of this type of data source. First, visible light images are sensitive to illumination changes and background clutter. Second, the 3D structural information of the scene is lost when projecting the 3D world to 2D images. Recovering the 3D information from 2D images is a challenging problem. Range sensors have existed for over thirty years, which capture 3D characteristics of the scene. However, earlier range sensors were either too expensive, difficult to use in human environments, slow at acquiring data, or provided a poor estimation of distance. Recently, the easy access to the RGBD data at real-time frame rate is leading to a revolution in perception and inspired many new research using RGBD data. I propose algorithms to detect persons and understand the activities using RGBD data. I demonstrate the solutions to many computer vision problems may be improved with the added depth channel. The 3D structural information may give rise to algorithms with real-time and view-invariant properties in a faster and easier fashion. When both data sources are available, the features extracted from the depth channel may be combined with traditional features computed from RGB channels to generate more robust systems with enhanced recognition abilities, which may be able to deal with more challenging scenarios. As a starting point, the first problem is to find the persons of various poses in the scene, including moving or static persons. Localizing humans from RGB images is limited by the lighting conditions and background clutter. Depth image gives alternative ways to find the humans in the scene. In the past, detection of humans from range data is usually achieved by tracking, which does not work for indoor person detection. In this thesis, I propose a model based approach to detect the persons using the structural information embedded in the depth image. I propose a 2D head contour model and a 3D head surface model to look for the head-shoulder part of the person. Then, a segmentation scheme is proposed to segment the full human body from the background and extract the contour. I also give a tracking algorithm based on the detection result. I further research on recognizing human actions and activities. I propose two features for recognizing human activities. The first feature is drawn from the skeletal joint locations estimated from a depth image. It is a compact representation of the human posture called histograms of 3D joint locations (HOJ3D). This representation is view-invariant and the whole algorithm runs at real-time. This feature may benefit many applications to get a fast estimation of the posture and action of the human subject. The second feature is a spatio-temporal feature for depth video, which is called Depth Cuboid Similarity Feature (DCSF). The interest points are extracted using an algorithm that effectively suppresses the noise and finds salient human motions. DCSF is extracted centered on each interest point, which forms the description of the video contents. This descriptor can be used to recognize the activities with no dependence on skeleton information or pre-processing steps such as motion segmentation, tracking, or even image de-noising or hole-filling. It is more flexible and widely applicable to many scenarios. Finally, all the features herein developed are combined to solve a novel problem: first-person human activity recognition using RGBD data. Traditional activity recognition algorithms focus on recognizing activities from a third-person perspective. I propose to recognize activities from a first-person perspective with RGBD data. This task is very novel and extremely challenging due to the large amount of camera motion either due to self exploration or the response of the interaction. I extracted 3D optical flow features as the motion descriptor, 3D skeletal joints features as posture descriptors, spatio-temporal features as local appearance descriptors to describe the first-person videos. To address the ego-motion of the camera, I propose an attention mask to guide the recognition procedures and separate the features on the ego-motion region and independent-motion region. The 3D features are very useful at summarizing the discerning information of the activities. In addition, the combination of the 3D features with existing 2D features brings more robust recognition results and make the algorithm capable of dealing with more challenging cases.Electrical and Computer Engineerin

Texas ScholarWorks

SDR-GAIN: A High Real-Time Occluded Pedestrian Pose Completion Method for Autonomous Driving

Author: Fu Honghao
Publication venue
Publication date: 06/06/2023
Field of study

To mitigate the challenges arising from partial occlusion in human pose keypoint based pedestrian detection methods , we present a novel pedestrian pose keypoint completion method called the separation and dimensionality reduction-based generative adversarial imputation networks (SDR-GAIN) . Firstly, we utilize OpenPose to estimate pedestrian poses in images. Then, we isolate the head and torso keypoints of pedestrians with incomplete keypoints due to occlusion or other factors and perform dimensionality reduction to enhance features and further unify feature distribution. Finally, we introduce two generative models based on the generative adversarial networks (GAN) framework, which incorporate Huber loss, residual structure, and L1 regularization to generate missing parts of the incomplete head and torso pose keypoints of partially occluded pedestrians, resulting in pose completion. Our experiments on MS COCO and JAAD datasets demonstrate that SDR-GAIN outperforms basic GAIN framework, interpolation methods PCHIP and MAkima, machine learning methods k-NN and MissForest in terms of pose completion task. In addition, the runtime of SDR-GAIN is approximately 0.4ms, displaying high real-time performance and significant application value in the field of autonomous driving

arXiv.org e-Print Archive

Automatic visual detection of human behavior: a review from 2000 to 2014

Author: Afsar Palwasha
Cortez Paulo
Santos Henrique
Publication venue: 'Elsevier BV'
Publication date: 01/10/2015
Field of study

Due to advances in information technology (e.g., digital video cameras, ubiquitous sensors), the automatic detection of human behaviors from video is a very recent research topic. In this paper, we perform a systematic and recent literature review on this topic, from 2000 to 2014, covering a selection of 193 papers that were searched from six major scientific publishers. The selected papers were classified into three main subjects: detection techniques, datasets and applications. The detection techniques were divided into four categories (initialization, tracking, pose estimation and recognition). The list of datasets includes eight examples (e.g., Hollywood action). Finally, several application areas were identified, including human detection, abnormal activity detection, action recognition, player modeling and pedestrian detection. Our analysis provides a road map to guide future research for designing automatic visual human behavior detection systems.This work is funded by the Portuguese Foundation for Science and Technology (FCT - Fundacao para a Ciencia e a Tecnologia) under research Grant SFRH/BD/84939/2012

Universidade do Minho: RepositoriUM

Human Pose Estimation from Monocular Images : a Comprehensive Survey

Author: Bouwmans Thierry
Gong Wenjuan
Gonzàlez i Sabaté Jordi
Sobral Andrews
Tu Changhe
Zahzah El-hadi
Zhang Xuena
Publication venue: 'MDPI AG'
Publication date: 01/01/2016
Field of study

Human pose estimation refers to the estimation of the location of body parts and how they are connected in an image. Human pose estimation from monocular images has wide applications (e.g., image indexing). Several surveys on human pose estimation can be found in the literature, but they focus on a certain category; for example, model-based approaches or human motion analysis, etc. As far as we know, an overall review of this problem domain has yet to be provided. Furthermore, recent advancements based on deep learning have brought novel algorithms for this problem. In this paper, a comprehensive survey of human pose estimation from monocular images is carried out including milestone works and recent advancements. Based on one standard pipeline for the solution of computer vision problems, this survey splits the problema into several modules: feature extraction and description, human body models, and modelin methods. Problem modeling methods are approached based on two means of categorization in this survey. One way to categorize includes top-down and bottom-up methods, and another way includes generative and discriminative methods. Considering the fact that one direct application of human pose estimation is to provide initialization for automatic video surveillance, there are additional sections for motion-related methods in all modules: motion features, motion models, and motion-based methods. Finally, the paper also collects 26 publicly available data sets for validation and provides error measurement methods that are frequently used

Multidisciplinary Digital Publishing Institute

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Diposit Digital de Documents de la UAB

Efficient Pedestrian Detection in Urban Traffic Scenes

Author: Zhang Shanshan
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Pedestrians are important participants in urban traffic environments, and thus act as an interesting category of objects for autonomous cars. Automatic pedestrian detection is an essential task for protecting pedestrians from collision. In this thesis, we investigate and develop novel approaches by interpreting spatial and temporal characteristics of pedestrians, in three different aspects: shape, cognition and motion. The special up-right human body shape, especially the geometry of the head and shoulder area, is the most discriminative characteristic for pedestrians from other object categories. Inspired by the success of Haar-like features for detecting human faces, which also exhibit a uniform shape structure, we propose to design particular Haar-like features for pedestrians. Tailored to a pre-defined statistical pedestrian shape model, Haar-like templates with multiple modalities are designed to describe local difference of the shape structure. Cognition theories aim to explain how human visual systems process input visual signals in an accurate and fast way. By emulating the center-surround mechanism in human visual systems, we design multi-channel, multi-direction and multi-scale contrast features, and boost them to respond to the appearance of pedestrians. In this way, our detector is considered as a top-down saliency system. In the last part of this thesis, we exploit the temporal characteristics for moving pedestrians and then employ motion information for feature design, as well as for regions of interest (ROIs) selection. Motion segmentation on optical flow fields enables us to select those blobs most probably containing moving pedestrians; a combination of Histogram of Oriented Gradients (HOG) and motion self difference features further enables robust detection. We test our three approaches on image and video data captured in urban traffic scenes, which are rather challenging due to dynamic and complex backgrounds. The achieved results demonstrate that our approaches reach and surpass state-of-the-art performance, and can also be employed for other applications, such as indoor robotics or public surveillance. In this thesis, we investigate and develop novel approaches by interpreting spatial and temporal characteristics of pedestrians, in three different aspects: shape, cognition and motion. The special up-right human body shape, especially the geometry of the head and shoulder area, is the most discriminative characteristic for pedestrians from other object categories. Inspired by the success of Haar-like features for detecting human faces, which also exhibit a uniform shape structure, we propose to design particular Haar-like features for pedestrians. Tailored to a pre-defined statistical pedestrian shape model, Haar-like templates with multiple modalities are designed to describe local difference of the shape structure. Cognition theories aim to explain how human visual systems process input visual signals in an accurate and fast way. By emulating the center-surround mechanism in human visual systems, we design multi-channel, multi-direction and multi-scale contrast features, and boost them to respond to the appearance of pedestrians. In this way, our detector is considered as a top-down saliency system. In the last part of this thesis, we exploit the temporal characteristics for moving pedestrians and then employ motion information for feature design, as well as for regions of interest (ROIs) selection. Motion segmentation on optical flow fields enables us to select those blobs most probably containing moving pedestrians; a combination of Histogram of Oriented Gradients (HOG) and motion self difference features further enables robust detection. We test our three approaches on image and video data captured in urban traffic scenes, which are rather challenging due to dynamic and complex backgrounds. The achieved results demonstrate that our approaches reach and surpass state-of-the-art performance, and can also be employed for other applications, such as indoor robotics or public surveillance

bonndoc – Der Publikationsserver der Universität Bonn

Human Pose Tracking from Monocular Image Sequences

Author: Tian Jinglan
Publication venue: Curtin University
Publication date: 01/01/2016
Field of study

This thesis proposes various novel approaches for improving the performance of automatic 2D human pose tracking system including multi-scale strategy, mid-level spatial dependencies to constrain more relations of multiple body parts, additional constraints between symmetric body parts and the left/right confusion correction by a head orientation estimator. These proposed approaches are employed to develop a complete human pose tracking system. The experimental results demonstrate significant improvements of all the proposed approaches towards accuracy and efficiency

espace@Curtin

CAMBADA@Home: deteção e seguimento de humanos

Author: Ferreira Luís Francisco Bento
Publication venue: Universidade de Aveiro
Publication date: 01/01/2013
Field of study

Mestrado em Engenharia Electrónica e TelecomunicaçõesEste trabalho apresenta uma abordagem ao problema da deteção e seguimento de humanos, usando uma câmara RGB-D. Existem soluções propostas para este tipo de problema, no entanto, algumas são baseadas em técnicas de extração de fundo ou outras e, como tal, necessitam que a câmara se encontre numa posição estacionária. Com o sistema proposto, a deteção e seguimento podem ser desempenhadas enquanto a câmara se move, em tempo real. O objetivo deste projeto é a implementação de um sistema de deteção e seguimento de pessoas para o robô de serviço CAMBADA@Home, permitindo assim o desenvolvimento de futuras aplicações na área da interação humano-robô. O sistema aqui descrito permite realizar deteção, classificação e monitorização de múltiplas pessoas. Na primeira etapa, regiões de interesse (ROIs) são segmentadas através da análise do histograma da imagem de profundidade seguido da utilização de um algoritmo de preenchimento. Na etapa seguinte, cada região é classificada como humana ou não-humana através de uma técnica de correspondência de modelos, baseada no algoritmo de descida de gradiantes RPROP, com suporte para múltiplos modelos. A terceira e última etapa permite a monitorização de várias pessoas, através de um método de atribuição de identificadores únicos baseado em comparação de histogramas, assim como estimação de pose e localização. Os resultados obtidos em ambiente não controlado são encorajadores, com altas taxas de deteção, e, em geral, os algoritmos de estimação de pose e localização são executados como esperado. Para além disto, o projeto CAMBADA@Home foi premiado com o primeiro lugar no Desafio Free Bots, que teve lugar durante o campeonato nacional de robótica, Robótica 2013, onde o robô provou ser capaz de executar rondas autónomas num ambiente desconhecido enquanto detetava e monitorizava pessoas com as quais se cruzava.This work presents an approach to the people detection and tracking problem, using an RGB-D camera. While there are already solutions for this problem, some are based on background extraction techniques or other, which require the camera to be in a stationary position. With the proposed method, detection and tracking can be performed while the camera is moving, in real time. The aim of this project is the implementation of a people detection and tracking system for the CAMBADA@Home service robot, enabling the development of further human-robot interaction applications. The system here described enables object detection, classi cation and multiple person tracking. In the rst stage, regions of interest (ROIs) are segmented through the analysis of the depth image histogram and using a ood ll algorithm. On the next stage, each region is classi ed as human or not-human using a template matching technique, based on the RPROP gradient descent algorithm, with support for multiple templates. The third and last stage enables the tracking for multiple persons, using a unique identi cation assignment method based on histogram comparison, as well as pose and location estimation. The results obtained in unconstrained environments are encouraging, with high detection rates, and, in general, the algorithms for pose and location estimation perform as expected. Furthermore the CAMBADA@Home project has been awarded with the rst place in the Free Bots Challenge, which took place on the Rob otica 2013 robotics national championship, where the robot was proven to be capable of performing autonomous tours in an unknown environment while at the same time detecting and tracking people it came across

Repositório Institucional da Universidade de Aveiro

Vision-Based 2D and 3D Human Activity Recognition

Author: Holte Michael Boelstoft
Publication venue: Department of Architecture, Design & Media Technology, Aalborg University
Publication date: 01/01/2012
Field of study

VBN

The Evolution of First Person Vision Methods: A Survey

Author: Betancourt Alejandro
Morerio Pietro
Rauterberg Matthias
Regazzoni Carlo S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

The emergence of new wearable technologies such as action cameras and smart-glasses has increased the interest of computer vision scientists in the First Person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices with First Person Vision recording capabilities. Due to this interest, an increasing demand of methods to process these videos, possibly in real-time, is expected. Current approaches present a particular combinations of different image features and quantitative methods to accomplish specific objectives like object detection, activity recognition, user machine interaction and so on. This paper summarizes the evolution of the state of the art in First Person Vision video analysis between 1997 and 2014, highlighting, among others, most commonly used features, methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart Glasses, Computer Vision, Video Analytics, Human-machine Interactio

arXiv.org e-Print Archive

CiteSeerX

Pure OAI Repository

Archivio istituzionale della ricerca - Università di Genova