8,386 research outputs found
Temporal Extension of Scale Pyramid and Spatial Pyramid Matching for Action Recognition
Historically, researchers in the field have spent a great deal of effort to
create image representations that have scale invariance and retain spatial
location information. This paper proposes to encode equivalent temporal
characteristics in video representations for action recognition. To achieve
temporal scale invariance, we develop a method called temporal scale pyramid
(TSP). To encode temporal information, we present and compare two methods
called temporal extension descriptor (TED) and temporal division pyramid (TDP)
. Our purpose is to suggest solutions for matching complex actions that have
large variation in velocity and appearance, which is missing from most current
action representations. The experimental results on four benchmark datasets,
UCF50, HMDB51, Hollywood2 and Olympic Sports, support our approach and
significantly outperform state-of-the-art methods. Most noticeably, we achieve
65.0% mean accuracy and 68.2% mean average precision on the challenging HMDB51
and Hollywood2 datasets which constitutes an absolute improvement over the
state-of-the-art by 7.8% and 3.9%, respectively
The footprint of cometary dust analogs: I. Laboratory experiments of low-velocity impacts and comparison with Rosetta data
Cometary dust provides a unique window on dust growth mechanisms during the
onset of planet formation. Measurements by the Rosetta spacecraft show that the
dust in the coma of comet 67P/Churyumov-Gerasimenko has a granular structure at
size scales from sub-um up to several hundreds of um, indicating hierarchical
growth took place across these size scales. However, these dust particles may
have been modified during their collection by the spacecraft instruments. Here
we present the results of laboratory experiments that simulate the impact of
dust on the collection surfaces of COSIMA and MIDAS, instruments onboard the
Rosetta spacecraft. We map the size and structure of the footprints left by the
dust particles as a function of their initial size (up to several hundred um)
and velocity (up to 6 m/s). We find that in most collisions, only part of the
dust particle is left on the target; velocity is the main driver of the
appearance of these deposits. A boundary between sticking/bouncing and
fragmentation as an outcome of the particle-target collision is found at v ~ 2
m/s. For velocities below this value, particles either stick and leave a single
deposit on the target plate, or bounce, leaving a shallow footprint of
monomers. At velocities > 2 m/s and sizes > 80 um, particles fragment upon
collision, transferring up to 50 per cent of their mass in a rubble-pile-like
deposit on the target plate. The amount of mass transferred increases with the
impact velocity. The morphologies of the deposits are qualitatively similar to
those found by the COSIMA instrument.Comment: 14 pages, 12 figures, accepted for publication in MNRA
Key-Pose Prediction in Cyclic Human Motion
In this paper we study the problem of estimating innercyclic time intervals
within repetitive motion sequences of top-class swimmers in a swimming channel.
Interval limits are given by temporal occurrences of key-poses, i.e.
distinctive postures of the body. A key-pose is defined by means of only one or
two specific features of the complete posture. It is often difficult to detect
such subtle features directly. We therefore propose the following method: Given
that we observe the swimmer from the side, we build a pictorial structure of
poselets to robustly identify random support poses within the regular motion of
a swimmer. We formulate a maximum likelihood model which predicts a key-pose
given the occurrences of multiple support poses within one stroke. The maximum
likelihood can be extended with prior knowledge about the temporal location of
a key-pose in order to improve the prediction recall. We experimentally show
that our models reliably and robustly detect key-poses with a high precision
and that their performance can be improved by extending the framework with
additional camera views.Comment: Accepted at WACV 2015, 8 pages, 3 figure
Beyond Gaussian Pyramid: Multi-skip Feature Stacking for Action Recognition
Most state-of-the-art action feature extractors involve differential
operators, which act as highpass filters and tend to attenuate low frequency
action information. This attenuation introduces bias to the resulting features
and generates ill-conditioned feature matrices. The Gaussian Pyramid has been
used as a feature enhancing technique that encodes scale-invariant
characteristics into the feature space in an attempt to deal with this
attenuation. However, at the core of the Gaussian Pyramid is a convolutional
smoothing operation, which makes it incapable of generating new features at
coarse scales. In order to address this problem, we propose a novel feature
enhancing technique called Multi-skIp Feature Stacking (MIFS), which stacks
features extracted using a family of differential filters parameterized with
multiple time skips and encodes shift-invariance into the frequency space. MIFS
compensates for information lost from using differential operators by
recapturing information at coarse scales. This recaptured information allows us
to match actions at different speeds and ranges of motion. We prove that MIFS
enhances the learnability of differential-based features exponentially. The
resulting feature matrices from MIFS have much smaller conditional numbers and
variances than those from conventional methods. Experimental results show
significantly improved performance on challenging action recognition and event
detection tasks. Specifically, our method exceeds the state-of-the-arts on
Hollywood2, UCF101 and UCF50 datasets and is comparable to state-of-the-arts on
HMDB51 and Olympics Sports datasets. MIFS can also be used as a speedup
strategy for feature extraction with minimal or no accuracy cost
Eye in the Sky: Real-time Drone Surveillance System (DSS) for Violent Individuals Identification using ScatterNet Hybrid Deep Learning Network
Drone systems have been deployed by various law enforcement agencies to
monitor hostiles, spy on foreign drug cartels, conduct border control
operations, etc. This paper introduces a real-time drone surveillance system to
identify violent individuals in public areas. The system first uses the Feature
Pyramid Network to detect humans from aerial images. The image region with the
human is used by the proposed ScatterNet Hybrid Deep Learning (SHDL) network
for human pose estimation. The orientations between the limbs of the estimated
pose are next used to identify the violent individuals. The proposed deep
network can learn meaningful representations quickly using ScatterNet and
structural priors with relatively fewer labeled examples. The system detects
the violent individuals in real-time by processing the drone images in the
cloud. This research also introduces the aerial violent individual dataset used
for training the deep network which hopefully may encourage researchers
interested in using deep learning for aerial surveillance. The pose estimation
and violent individuals identification performance is compared with the
state-of-the-art techniques.Comment: To Appear in the Efficient Deep Learning for Computer Vision (ECV)
workshop at IEEE Computer Vision and Pattern Recognition (CVPR) 2018. Youtube
demo at this: https://www.youtube.com/watch?v=zYypJPJipY
Active User Authentication for Smartphones: A Challenge Data Set and Benchmark Results
In this paper, automated user verification techniques for smartphones are
investigated. A unique non-commercial dataset, the University of Maryland
Active Authentication Dataset 02 (UMDAA-02) for multi-modal user authentication
research is introduced. This paper focuses on three sensors - front camera,
touch sensor and location service while providing a general description for
other modalities. Benchmark results for face detection, face verification,
touch-based user identification and location-based next-place prediction are
presented, which indicate that more robust methods fine-tuned to the mobile
platform are needed to achieve satisfactory verification accuracy. The dataset
will be made available to the research community for promoting additional
research.Comment: 8 pages, 12 figures, 6 tables. Best poster award at BTAS 201
- …