1,622 research outputs found
Hand gesture recognition with jointly calibrated Leap Motion and depth sensor
Novel 3D acquisition devices like depth cameras and the Leap Motion have recently reached the market. Depth cameras allow to obtain a complete 3D description of the framed scene while the Leap Motion sensor is a device explicitly targeted for hand gesture recognition and provides only a limited set of relevant points. This paper shows how to jointly exploit the two types of sensors for accurate gesture recognition. An ad-hoc solution for the joint calibration of the two devices is firstly presented. Then a set of novel feature descriptors is introduced both for the Leap Motion and for depth data. Various schemes based on the distances of the hand samples from the centroid, on the curvature of the hand contour and on the convex hull of the hand shape are employed and the use of Leap Motion data to aid feature extraction is also considered. The proposed feature sets are fed to two different classifiers, one based on multi-class SVMs and one exploiting Random Forests. Different feature selection algorithms have also been tested in order to reduce the complexity of the approach. Experimental results show that a very high accuracy can be obtained from the proposed method. The current implementation is also able to run in real-time
Affine differential geometry analysis of human arm movements
Humans interact with their environment through sensory information and motor actions. These interactions may be understood via the underlying geometry of both perception and action. While the motor space is typically considered by default to be Euclidean, persistent behavioral observations point to a different underlying geometric structure. These observed regularities include the “two-thirds power law” which connects path curvature with velocity, and “local isochrony” which prescribes the relation between movement time and its extent. Starting with these empirical observations, we have developed a mathematical framework based on differential geometry, Lie group theory and Cartan’s moving frame method for the analysis of human hand trajectories. We also use this method to identify possible motion primitives, i.e., elementary building blocks from which more complicated movements are constructed. We show that a natural geometric description of continuous repetitive hand trajectories is not Euclidean but equi-affine. Specifically, equi-affine velocity is piecewise constant along movement segments, and movement execution time for a given segment is proportional to its equi-affine arc-length. Using this mathematical framework, we then analyze experimentally recorded drawing movements. To examine movement segmentation and classification, the two fundamental equi-affine differential invariants—equi-affine arc-length and curvature are calculated for the recorded movements. We also discuss the possible role of conic sections, i.e., curves with constant equi-affine curvature, as motor primitives and focus in more detail on parabolas, the equi-affine geodesics. Finally, we explore possible schemes for the internal neural coding of motor commands by showing that the equi-affine framework is compatible with the common model of population coding of the hand velocity vector when combined with a simple assumption on its dynamics. We then discuss several alternative explanations for the role that the equi-affine metric may play in internal representations of motion perception and production
STV-based Video Feature Processing for Action Recognition
In comparison to still image-based processes, video features can provide rich and intuitive information about dynamic events occurred over a period of time, such as human actions, crowd behaviours, and other subject pattern changes. Although substantial progresses have been made in the last decade on image processing and seen its successful applications in face matching and object recognition, video-based event detection still remains one of the most difficult challenges in computer vision research due to its complex continuous or discrete input signals, arbitrary dynamic feature definitions, and the often ambiguous analytical methods. In this paper, a Spatio-Temporal Volume (STV) and region intersection (RI) based 3D shape-matching method has been proposed to facilitate the definition and recognition of human actions recorded in videos. The distinctive characteristics and the performance gain of the devised approach stemmed from a coefficient factor-boosted 3D region intersection and matching mechanism developed in this research. This paper also reported the investigation into techniques for efficient STV data filtering to reduce the amount of voxels (volumetric-pixels) that need to be processed in each operational cycle in the implemented system. The encouraging features and improvements on the operational performance registered in the experiments have been discussed at the end
- …