Search CORE

489 research outputs found

Log-Euclidean Bag of Words for Human Action Recognition

Author: Bhatia R.
Conrad Sanderson
Lazebnik S.
Masoud Faraki
Maziar Palhang
Wong Y.
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2015
Field of study

Representing videos by densely extracted local space-time features has recently become a popular approach for analysing actions. In this paper, we tackle the problem of categorising human actions by devising Bag of Words (BoW) models based on covariance matrices of spatio-temporal features, with the features formed from histograms of optical flow. Since covariance matrices form a special type of Riemannian manifold, the space of Symmetric Positive Definite (SPD) matrices, non-Euclidean geometry should be taken into account while discriminating between covariance matrices. To this end, we propose to embed SPD manifolds to Euclidean spaces via a diffeomorphism and extend the BoW approach to its Riemannian version. The proposed BoW approach takes into account the manifold geometry of SPD matrices during the generation of the codebook and histograms. Experiments on challenging human action datasets show that the proposed method obtains notable improvements in discrimination accuracy, in comparison to several state-of-the-art methods

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Queensland University of Technology ePrints Archive

University of Queensland eSpace

Mining Mid-level Features for Action Recognition Based on Effective Skeleton Representation

Author: Gao Zhimin
Li Wanqing
Ogunbona Philip
Wang Pichao
Zhang Hanling
Publication venue
Publication date: 01/01/2014
Field of study

Recently, mid-level features have shown promising performance in computer vision. Mid-level features learned by incorporating class-level information are potentially more discriminative than traditional low-level local features. In this paper, an effective method is proposed to extract mid-level features from Kinect skeletons for 3D human action recognition. Firstly, the orientations of limbs connected by two skeleton joints are computed and each orientation is encoded into one of the 27 states indicating the spatial relationship of the joints. Secondly, limbs are combined into parts and the limb's states are mapped into part states. Finally, frequent pattern mining is employed to mine the most frequent and relevant (discriminative, representative and non-redundant) states of parts in continuous several frames. These parts are referred to as Frequent Local Parts or FLPs. The FLPs allow us to build powerful bag-of-FLP-based action representation. This new representation yields state-of-the-art results on MSR DailyActivity3D and MSR ActionPairs3D

arXiv.org e-Print Archive

Crossref

Research Online

Local Rotation Invariant Patch Descriptors for 3D Vector Fields

Author: Janis Fehr
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Abstract—In this paper, we present two novel methods for the fast computation of local rotation invariant patch descriptors for 3D vectorial data. Patch based algorithms have recently become very popular approach for a wide range of 2D computer vision problems. Our local rotation invariant patch descriptors allow an extension of these methods to 3D vector fields. Our approaches are based on a harmonic representation for local spherical 3D vector field patches, which enables us to derive fast algorithms for the computation of rotation invariant power spectrum and bispectrum feature descriptors of such patches. Keywords-local feature; 3D vector field; invariance; I

CiteSeerX

Crossref

Face recognition in the wild.

Author: Farag Mostafa A.
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/12/2013
Field of study

Research in face recognition deals with problems related to Age, Pose, Illumination and Expression (A-PIE), and seeks approaches that are invariant to these factors. Video images add a temporal aspect to the image acquisition process. Another degree of complexity, above and beyond A-PIE recognition, occurs when multiple pieces of information are known about people, which may be distorted, partially occluded, or disguised, and when the imaging conditions are totally unorthodox! A-PIE recognition in these circumstances becomes really “wild” and therefore, Face Recognition in the Wild has emerged as a field of research in the past few years. Its main purpose is to challenge constrained approaches of automatic face recognition, emulating some of the virtues of the Human Visual System (HVS) which is very tolerant to age, occlusion and distortions in the imaging process. HVS also integrates information about individuals and adds contexts together to recognize people within an activity or behavior. Machine vision has a very long road to emulate HVS, but face recognition in the wild, using the computer, is a road to perform face recognition in that path. In this thesis, Face Recognition in the Wild is defined as unconstrained face recognition under A-PIE+; the (+) connotes any alterations to the design scenario of the face recognition system. This thesis evaluates the Biometric Optical Surveillance System (BOSS) developed at the CVIP Lab, using low resolution imaging sensors. Specifically, the thesis tests the BOSS using cell phone cameras, and examines the potential of facial biometrics on smart portable devices like iPhone, iPads, and Tablets. For quantitative evaluation, the thesis focused on a specific testing scenario of BOSS software using iPhone 4 cell phones and a laptop. Testing was carried out indoor, at the CVIP Lab, using 21 subjects at distances of 5, 10 and 15 feet, with three poses, two expressions and two illumination levels. The three steps (detection, representation and matching) of the BOSS system were tested in this imaging scenario. False positives in facial detection increased with distances and with pose angles above ± 15°. The overall identification rate (face detection at confidence levels above 80%) also degraded with distances, pose, and expressions. The indoor lighting added challenges also, by inducing shadows which affected the image quality and the overall performance of the system. While this limited number of subjects and somewhat constrained imaging environment does not fully support a “wild” imaging scenario, it did provide a deep insight on the issues with automatic face recognition. The recognition rate curves demonstrate the limits of low-resolution cameras for face recognition at a distance (FRAD), yet it also provides a plausible defense for possible A-PIE face recognition on portable devices

University of Louisville

Local Spherical Harmonics Improve Skeleton-Based Hand Action Recognition

Author: Jung Steffen
Keuper Margret
Prasse Katharina
Zhou Yuxuan
Publication venue
Publication date: 21/08/2023
Field of study

Hand action recognition is essential. Communication, human-robot interactions, and gesture control are dependent on it. Skeleton-based action recognition traditionally includes hands, which belong to the classes which remain challenging to correctly recognize to date. We propose a method specifically designed for hand action recognition which uses relative angular embeddings and local Spherical Harmonics to create novel hand representations. The use of Spherical Harmonics creates rotation-invariant representations which make hand action recognition even more robust against inter-subject differences and viewpoint changes. We conduct extensive experiments on the hand joints in the First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations, and on the NTU RGB+D 120 dataset, demonstrating the benefit of using Local Spherical Harmonics Representations. Our code is available at https://github.com/KathPra/LSHR_LSHT

arXiv.org e-Print Archive

Learning Equivariant Representations

Author: Esteves Carlos
Publication venue
Publication date: 01/01/2020
Field of study

State-of-the-art deep learning systems often require large amounts of data and computation. For this reason, leveraging known or unknown structure of the data is paramount. Convolutional neural networks (CNNs) are successful examples of this principle, their defining characteristic being the shift-equivariance. By sliding a filter over the input, when the input shifts, the response shifts by the same amount, exploiting the structure of natural images where semantic content is independent of absolute pixel positions. This property is essential to the success of CNNs in audio, image and video recognition tasks. In this thesis, we extend equivariance to other kinds of transformations, such as rotation and scaling. We propose equivariant models for different transformations defined by groups of symmetries. The main contributions are (i) polar transformer networks, achieving equivariance to the group of similarities on the plane, (ii) equivariant multi-view networks, achieving equivariance to the group of symmetries of the icosahedron, (iii) spherical CNNs, achieving equivariance to the continuous 3D rotation group, (iv) cross-domain image embeddings, achieving equivariance to 3D rotations for 2D inputs, and (v) spin-weighted spherical CNNs, generalizing the spherical CNNs and achieving equivariance to 3D rotations for spherical vector fields. Applications include image classification, 3D shape classification and retrieval, panoramic image classification and segmentation, shape alignment and pose estimation. What these models have in common is that they leverage symmetries in the data to reduce sample and model complexity and improve generalization performance. The advantages are more significant on (but not limited to) challenging tasks where data is limited or input perturbations such as arbitrary rotations are present

arXiv.org e-Print Archive

ScholarlyCommons@Penn

Vision-Based 2D and 3D Human Activity Recognition

Author: Holte Michael Boelstoft
Publication venue: Department of Architecture, Design & Media Technology, Aalborg University
Publication date: 01/01/2012
Field of study

VBN

Autonomous navigation for guide following in crowded indoor environments

Author: Ballantyne James
Ballantyne James
Publication venue: Surgery and Cancer, Imperial College London
Publication date: 01/02/2011
Field of study

The requirements for assisted living are rapidly changing as the number of elderly patients over the age of 60 continues to increase. This rise places a high level of stress on nurse practitioners who must care for more patients than they are capable. As this trend is expected to continue, new technology will be required to help care for patients. Mobile robots present an opportunity to help alleviate the stress on nurse practitioners by monitoring and performing remedial tasks for elderly patients. In order to produce mobile robots with the ability to perform these tasks, however, many challenges must be overcome. The hospital environment requires a high level of safety to prevent patient injury. Any facility that uses mobile robots, therefore, must be able to ensure that no harm will come to patients whilst in a care environment. This requires the robot to build a high level of understanding about the environment and the people with close proximity to the robot. Hitherto, most mobile robots have used vision-based sensors or 2D laser range finders. 3D time-of-flight sensors have recently been introduced and provide dense 3D point clouds of the environment at real-time frame rates. This provides mobile robots with previously unavailable dense information in real-time. I investigate the use of time-of-flight cameras for mobile robot navigation in crowded environments in this thesis. A unified framework to allow the robot to follow a guide through an indoor environment safely and efficiently is presented. Each component of the framework is analyzed in detail, with real-world scenarios illustrating its practical use. Time-of-flight cameras are relatively new sensors and, therefore, have inherent problems that must be overcome to receive consistent and accurate data. I propose a novel and practical probabilistic framework to overcome many of the inherent problems in this thesis. The framework fuses multiple depth maps with color information forming a reliable and consistent view of the world. In order for the robot to interact with the environment, contextual information is required. To this end, I propose a region-growing segmentation algorithm to group points based on surface characteristics, surface normal and surface curvature. The segmentation process creates a distinct set of surfaces, however, only a limited amount of contextual information is available to allow for interaction. Therefore, a novel classifier is proposed using spherical harmonics to differentiate people from all other objects. The added ability to identify people allows the robot to find potential candidates to follow. However, for safe navigation, the robot must continuously track all visible objects to obtain positional and velocity information. A multi-object tracking system is investigated to track visible objects reliably using multiple cues, shape and color. The tracking system allows the robot to react to the dynamic nature of people by building an estimate of the motion flow. This flow provides the robot with the necessary information to determine where and at what speeds it is safe to drive. In addition, a novel search strategy is proposed to allow the robot to recover a guide who has left the field-of-view. To achieve this, a search map is constructed with areas of the environment ranked according to how likely they are to reveal the guide’s true location. Then, the robot can approach the most likely search area to recover the guide. Finally, all components presented are joined to follow a guide through an indoor environment. The results achieved demonstrate the efficacy of the proposed components

Spiral - Imperial College Digital Repository