36,700 research outputs found
Fast, invariant representation for human action in the visual system
Humans can effortlessly recognize others' actions in the presence of complex
transformations, such as changes in viewpoint. Several studies have located the
regions in the brain involved in invariant action recognition, however, the
underlying neural computations remain poorly understood. We use
magnetoencephalography (MEG) decoding and a dataset of well-controlled,
naturalistic videos of five actions (run, walk, jump, eat, drink) performed by
different actors at different viewpoints to study the computational steps used
to recognize actions across complex transformations. In particular, we ask when
the brain discounts changes in 3D viewpoint relative to when it initially
discriminates between actions. We measure the latency difference between
invariant and non-invariant action decoding when subjects view full videos as
well as form-depleted and motion-depleted stimuli. Our results show no
difference in decoding latency or temporal profile between invariant and
non-invariant action recognition in full videos. However, when either form or
motion information is removed from the stimulus set, we observe a decrease and
delay in invariant action decoding. Our results suggest that the brain
recognizes actions and builds invariance to complex transformations at the same
time, and that both form and motion information are crucial for fast, invariant
action recognition
Histogram of Oriented Principal Components for Cross-View Action Recognition
Existing techniques for 3D action recognition are sensitive to viewpoint
variations because they extract features from depth images which are viewpoint
dependent. In contrast, we directly process pointclouds for cross-view action
recognition from unknown and unseen views. We propose the Histogram of Oriented
Principal Components (HOPC) descriptor that is robust to noise, viewpoint,
scale and action speed variations. At a 3D point, HOPC is computed by
projecting the three scaled eigenvectors of the pointcloud within its local
spatio-temporal support volume onto the vertices of a regular dodecahedron.
HOPC is also used for the detection of Spatio-Temporal Keypoints (STK) in 3D
pointcloud sequences so that view-invariant STK descriptors (or Local HOPC
descriptors) at these key locations only are used for action recognition. We
also propose a global descriptor computed from the normalized spatio-temporal
distribution of STKs in 4-D, which we refer to as STK-D. We have evaluated the
performance of our proposed descriptors against nine existing techniques on two
cross-view and three single-view human action recognition datasets. The
Experimental results show that our techniques provide significant improvement
over state-of-the-art methods
- …