Most state-of-the-art action feature extractors involve differential
operators, which act as highpass filters and tend to attenuate low frequency
action information. This attenuation introduces bias to the resulting features
and generates ill-conditioned feature matrices. The Gaussian Pyramid has been
used as a feature enhancing technique that encodes scale-invariant
characteristics into the feature space in an attempt to deal with this
attenuation. However, at the core of the Gaussian Pyramid is a convolutional
smoothing operation, which makes it incapable of generating new features at
coarse scales. In order to address this problem, we propose a novel feature
enhancing technique called Multi-skIp Feature Stacking (MIFS), which stacks
features extracted using a family of differential filters parameterized with
multiple time skips and encodes shift-invariance into the frequency space. MIFS
compensates for information lost from using differential operators by
recapturing information at coarse scales. This recaptured information allows us
to match actions at different speeds and ranges of motion. We prove that MIFS
enhances the learnability of differential-based features exponentially. The
resulting feature matrices from MIFS have much smaller conditional numbers and
variances than those from conventional methods. Experimental results show
significantly improved performance on challenging action recognition and event
detection tasks. Specifically, our method exceeds the state-of-the-arts on
Hollywood2, UCF101 and UCF50 datasets and is comparable to state-of-the-arts on
HMDB51 and Olympics Sports datasets. MIFS can also be used as a speedup
strategy for feature extraction with minimal or no accuracy cost