2,545 research outputs found
Automatic visual detection of human behavior: a review from 2000 to 2014
Due to advances in information technology (e.g., digital video cameras, ubiquitous sensors), the automatic detection of human behaviors from video is a very recent research topic. In this paper, we perform a systematic and recent literature review on this topic, from 2000 to 2014, covering a selection of 193 papers that were searched from six major scientific publishers. The selected papers were classified into three main subjects: detection techniques, datasets and applications. The detection techniques were divided into four categories (initialization, tracking, pose estimation and recognition). The list of datasets includes eight examples (e.g., Hollywood action). Finally, several application areas were identified, including human detection, abnormal activity detection, action recognition, player modeling and pedestrian detection. Our analysis provides a road map to guide future research for designing automatic visual human behavior detection systems.This work is funded by the Portuguese Foundation for Science and Technology (FCT - Fundacao para a Ciencia e a Tecnologia) under research Grant SFRH/BD/84939/2012
Multi-Action Recognition via Stochastic Modelling of Optical Flow and Gradients
In this paper we propose a novel approach to multi-action recognition that
performs joint segmentation and classification. This approach models each
action using a Gaussian mixture using robust low-dimensional action features.
Segmentation is achieved by performing classification on overlapping temporal
windows, which are then merged to produce the final result. This approach is
considerably less complicated than previous methods which use dynamic
programming or computationally expensive hidden Markov models (HMMs). Initial
experiments on a stitched version of the KTH dataset show that the proposed
approach achieves an accuracy of 78.3%, outperforming a recent HMM-based
approach which obtained 71.2%
A Survey of Applications and Human Motion Recognition with Microsoft Kinect
Microsoft Kinect, a low-cost motion sensing device, enables users to interact with computers or game consoles naturally through gestures and spoken commands without any other peripheral equipment. As such, it has commanded intense interests in research and development on the Kinect technology. In this paper, we present, a comprehensive survey on Kinect applications, and the latest research and development on motion recognition using data captured by the Kinect sensor. On the applications front, we review the applications of the Kinect technology in a variety of areas, including healthcare, education and performing arts, robotics, sign language recognition, retail services, workplace safety training, as well as 3D reconstructions. On the technology front, we provide an overview of the main features of both versions of the Kinect sensor together with the depth sensing technologies used, and review literatures on human motion recognition techniques used in Kinect applications. We provide a classification of motion recognition techniques to highlight the different approaches used in human motion recognition. Furthermore, we compile a list of publicly available Kinect datasets. These datasets are valuable resources for researchers to investigate better methods for human motion recognition and lower-level computer vision tasks such as segmentation, object detection and human pose estimation
Recommended from our members
Generating 3D product design models in real-time using hand motion and gesture
This thesis was submitted for the degree of Master of Philosophy and awarded by Brunel University.Three dimensional product design models are widely used in conceptual design and in the early stage of prototyping during the design processes. A product design specification often demands a substantial amount of 3D models to be constructed within a short period of time. Current methods begin with designers sketching product concepts in 2D using pencil and paper, which in turn are then translated into 3D models by a design individual with CAD expertise, using a 3D modelling software package such as Pro Engineer, Solid Works, Auto CAD etc. Several novel methods have been used to incorporate hand motion as a way of interacting with computers. There are three main types of technology available to capture motion data, capable of translating human motion into numeric data which can be read by a computer system. The first being, hand gesture glove-based systems such as “Cyberglove”, these systems are generally used to capture hand gesture and joint angle information. The second is full body motion capture systems, optical and non-optical-based, and finally vision based gesture recognition systems which capture full degree of - freedom (DOF) hand motion estimation. There has yet to be a method using any of the above mentioned input devices to rapidly produce 3D product design models in real time, using hand motion and gestures. In this research, a novel method is presented, using a motion capture system to capture hand gestures and motion in real time, to recreate 3D curves and surfaces, which can be translated into 3D product design models. The main aim of this research is to develop a hand motion and gesture-based rapid 3D product modelling method, allowing designers to interactively sketch out 3D concepts in real time using a virtual workspace.
A database of a number of hand signs was built for both architectural hand signs (preliminary study) and Product Design hand signs. A marker set model with a total of eight markers (five on the left hand and three on right hand/marker pen) was designed and used in the capture of hand gestures with the use of an Optical Motion Capture System. A preliminary testing session was successfully completed to determine whether the Motion Capture system would be suitable for a real-time application, by effectively modelling a train station in an offline state using hand motion and gesture. An OpenGL software application was programmed using C++ and the Microsoft Foundation Classes which was used to communicate and pass information of captured motion from the EVaRT system to the user
On the 3D point cloud for human-pose estimation
This thesis aims at investigating methodologies for estimating a human pose from a 3D point cloud that is captured by a static depth sensor. Human-pose estimation (HPE) is important for a range of applications, such as human-robot interaction, healthcare, surveillance, and so forth. Yet, HPE is challenging because of the uncertainty in sensor measurements and the complexity of human poses. In this research, we focus on addressing challenges related to two crucial components in the estimation process, namely, human-pose feature extraction and human-pose modeling.
In feature extraction, the main challenge involves reducing feature ambiguity. We propose a 3D-point-cloud feature called viewpoint and shape feature histogram (VISH) to reduce feature ambiguity by capturing geometric properties of the 3D point cloud of a human. The feature extraction consists of three steps: 3D-point-cloud pre-processing, hierarchical structuring, and feature extraction. In the pre-processing step, 3D points corresponding to a human are extracted and outliers from the environment are removed to retain the 3D points of interest. This step is important because it allows us to reduce the number of 3D points by keeping only those points that correspond to the human body for further processing. In the hierarchical structuring, the pre-processed 3D point cloud is partitioned and replicated into a tree structure as nodes. Viewpoint feature histogram (VFH) and shape features are extracted from each node in the tree to provide a descriptor to represent each node. As the features are obtained based on histograms, coarse-level details are highlighted in large regions and fine-level details are highlighted in small regions. Therefore, the features from the point cloud in the tree can capture coarse level to fine level information to reduce feature ambiguity.
In human-pose modeling, the main challenges involve reducing the dimensionality of human-pose space and designing appropriate factors that represent the underlying probability distributions for estimating human poses. To reduce the dimensionality, we propose a non-parametric action-mixture model (AMM). It represents high-dimensional human-pose space using low-dimensional manifolds in searching human poses. In each manifold, a probability distribution is estimated based on feature similarity. The distributions in the manifolds are then redistributed according to the stationary distribution of a Markov chain that models the frequency of human actions. After the redistribution, the manifolds are combined according to a probability distribution determined by action classification. Experiments were conducted using VISH features as input to the AMM. The results showed that the overall error and standard deviation of the AMM were reduced by about 7.9% and 7.1%, respectively, compared with a model without action classification.
To design appropriate factors, we consider the AMM as a Bayesian network and propose a mapping that converts the Bayesian network to a neural network called NN-AMM. The proposed mapping consists of two steps: structure identification and parameter learning. In structure identification, we have developed a bottom-up approach to build a neural network while preserving the Bayesian-network structure. In parameter learning, we have created a part-based approach to learn synaptic weights by decomposing a neural network into parts. Based on the concept of distributed representation, the NN-AMM is further modified into a scalable neural network called NND-AMM. A neural-network-based system is then built by using VISH features to represent 3D-point-cloud input and the NND-AMM to estimate 3D human poses. The results showed that the proposed mapping can be utilized to design AMM factors automatically. The NND-AMM can provide more accurate human-pose estimates with fewer hidden neurons than both the AMM and NN-AMM can. Both the NN-AMM and NND-AMM can adapt to different types of input, showing the advantage of using neural networks to design factors
Portuguese sign language recognition via computer vision and depth sensor
Sign languages are used worldwide by a multitude of individuals. They are mostly used by the deaf communities and their teachers, or people associated with them by ties of friendship or family. Speakers are a minority of citizens, often segregated, and over the years not much attention has been given to this form of communication, even by the scientific community. In fact, in Computer Science there is some, but limited, research and development in this area. In the particular case of sign Portuguese Sign Language-PSL that fact is more evident and, to our knowledge there isn’t yet an efficient system to perform the automatic recognition of PSL signs. With the advent and wide spreading of devices such as depth sensors, there are new possibilities to address this problem.
In this thesis, we have specified, developed, tested and preliminary evaluated, solutions that we think will bring valuable contributions to the problem of Automatic Gesture Recognition, applied to Sign Languages, such as the case of Portuguese Sign Language.
In the context of this work, Computer Vision techniques were adapted to the case of Depth Sensors. A proper gesture taxonomy for this problem was proposed, and techniques for feature extraction, representation, storing and classification were presented. Two novel algorithms to solve the problem of real-time recognition of isolated static poses were specified, developed, tested and evaluated. Two other algorithms for isolated dynamic movements for gesture recognition (one of them novel), have been also specified, developed, tested and evaluated. Analyzed results compare well with the literature.As Línguas Gestuais são utilizadas em todo o Mundo por uma imensidão de indivíduos. Trata-se na sua grande maioria de surdos e/ou mudos, ou pessoas a eles associados por laços familiares de amizade ou professores de Língua Gestual. Tratando-se de uma minoria, muitas vezes segregada, não tem vindo a ser dada ao longo dos anos pela comunidade científica, a devida atenção a esta forma de comunicação.
Na área das Ciências da Computação existem alguns, mas poucos trabalhos de investigação e desenvolvimento. No caso particular da Língua Gestual Portuguesa - LGP esse facto é ainda mais evidente não sendo nosso conhecimento a existência de um sistema eficaz e efetivo para fazer o reconhecimento automático de gestos da LGP.
Com o aparecimento ou massificação de dispositivos, tais como sensores de profundidade, surgem novas possibilidades para abordar este problema.
Nesta tese, foram especificadas, desenvolvidas, testadas e efectuada a avaliação preliminar de soluções que acreditamos que trarão valiosas contribuições para o problema do Reconhecimento Automático de Gestos, aplicado às Línguas Gestuais, como é o caso da Língua Gestual Portuguesa.
Foram adaptadas técnicas de Visão por Computador ao caso dos Sensores de Profundidade.
Foi proposta uma taxonomia adequada ao problema, e apresentadas técnicas para a extração, representação e armazenamento de características. Foram especificados, desenvolvidos, testados e avaliados dois algoritmos para resolver o problema do reconhecimento em tempo real de poses estáticas isoladas. Foram também especificados, desenvolvidos, testados e avaliados outros dois algoritmos para o Reconhecimento de Movimentos Dinâmicos Isolados de Gestos(um deles novo).Os resultados analisados são comparáveis à literatura.Las lenguas de Signos se utilizan en todo el Mundo por una multitud de personas. En su mayoría son personas sordas y/o mudas, o personas asociadas con ellos por vínculos de amistad o familiares y profesores de Lengua de Signos. Es una minoría de personas, a menudo segregadas, y no se ha dado en los últimos años por la comunidad científica, la atención debida a esta forma de comunicación.
En el área de Ciencias de la Computación hay alguna pero poca investigación y desarrollo. En el caso particular de la Lengua de Signos Portuguesa - LSP, no es de nuestro conocimiento la existencia de un sistema eficiente y eficaz para el reconocimiento automático.
Con la llegada en masa de dispositivos tales como Sensores de Profundidad, hay nuevas posibilidades para abordar el problema del Reconocimiento de Gestos.
En esta tesis se han especificado, desarrollado, probado y hecha una evaluación preliminar de soluciones, aplicada a las Lenguas de Signos como el caso de la Lengua de Signos Portuguesa - LSP.
Se han adaptado las técnicas de Visión por Ordenador para el caso de los Sensores de Profundidad.
Se propone una taxonomía apropiada para el problema y se presentan técnicas para la extracción, representación y el almacenamiento de características.
Se desarrollaran, probaran, compararan y analizan los resultados de dos nuevos algoritmos para resolver el problema del Reconocimiento Aislado y Estático de Posturas. Otros dos algoritmos (uno de ellos nuevo) fueran también desarrollados, probados, comparados y analizados los resultados, para el Reconocimiento de Movimientos Dinámicos Aislados de los Gestos
- …