216 research outputs found
Convolutional Neural Network on Three Orthogonal Planes for Dynamic Texture Classification
Dynamic Textures (DTs) are sequences of images of moving scenes that exhibit
certain stationarity properties in time such as smoke, vegetation and fire. The
analysis of DT is important for recognition, segmentation, synthesis or
retrieval for a range of applications including surveillance, medical imaging
and remote sensing. Deep learning methods have shown impressive results and are
now the new state of the art for a wide range of computer vision tasks
including image and video recognition and segmentation. In particular,
Convolutional Neural Networks (CNNs) have recently proven to be well suited for
texture analysis with a design similar to a filter bank approach. In this
paper, we develop a new approach to DT analysis based on a CNN method applied
on three orthogonal planes x y , xt and y t . We train CNNs on spatial frames
and temporal slices extracted from the DT sequences and combine their outputs
to obtain a competitive DT classifier. Our results on a wide range of commonly
used DT classification benchmark datasets prove the robustness of our approach.
Significant improvement of the state of the art is shown on the larger
datasets.Comment: 19 pages, 10 figure
Beyond Gauss: Image-Set Matching on the Riemannian Manifold of PDFs
State-of-the-art image-set matching techniques typically implicitly model
each image-set with a Gaussian distribution. Here, we propose to go beyond
these representations and model image-sets as probability distribution
functions (PDFs) using kernel density estimators. To compare and match
image-sets, we exploit Csiszar f-divergences, which bear strong connections to
the geodesic distance defined on the space of PDFs, i.e., the statistical
manifold. Furthermore, we introduce valid positive definite kernels on the
statistical manifolds, which let us make use of more powerful classification
schemes to match image-sets. Finally, we introduce a supervised dimensionality
reduction technique that learns a latent space where f-divergences reflect the
class labels of the data. Our experiments on diverse problems, such as
video-based face recognition and dynamic texture classification, evidence the
benefits of our approach over the state-of-the-art image-set matching methods
Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics
This paper proposes a novel pretext task to address the self-supervised video
representation learning problem. Specifically, given an unlabeled video clip,
we compute a series of spatio-temporal statistical summaries, such as the
spatial location and dominant direction of the largest motion, the spatial
location and dominant color of the largest color diversity along the temporal
axis, etc. Then a neural network is built and trained to yield the statistical
summaries given the video frames as inputs. In order to alleviate the learning
difficulty, we employ several spatial partitioning patterns to encode rough
spatial locations instead of exact spatial Cartesian coordinates. Our approach
is inspired by the observation that human visual system is sensitive to rapidly
changing contents in the visual field, and only needs impressions about rough
spatial locations to understand the visual contents. To validate the
effectiveness of the proposed approach, we conduct extensive experiments with
four 3D backbone networks, i.e., C3D, 3D-ResNet, R(2+1)D and S3D-G. The results
show that our approach outperforms the existing approaches across these
backbone networks on four downstream video analysis tasks including action
recognition, video retrieval, dynamic scene recognition, and action similarity
labeling. The source code is publicly available at:
https://github.com/laura-wang/video_repres_sts.Comment: Accepted by TPAMI. An extension of our previous work at
arXiv:1904.0359
Delta Descriptors: Change-Based Place Representation for Robust Visual Localization
Visual place recognition is challenging because there are so many factors
that can cause the appearance of a place to change, from day-night cycles to
seasonal change to atmospheric conditions. In recent years a large range of
approaches have been developed to address this challenge including deep-learnt
image descriptors, domain translation, and sequential filtering, all with
shortcomings including generality and velocity-sensitivity. In this paper we
propose a novel descriptor derived from tracking changes in any learned global
descriptor over time, dubbed Delta Descriptors. Delta Descriptors mitigate the
offsets induced in the original descriptor matching space in an unsupervised
manner by considering temporal differences across places observed along a
route. Like all other approaches, Delta Descriptors have a shortcoming -
volatility on a frame to frame basis - which can be overcome by combining them
with sequential filtering methods. Using two benchmark datasets, we first
demonstrate the high performance of Delta Descriptors in isolation, before
showing new state-of-the-art performance when combined with sequence-based
matching. We also present results demonstrating the approach working with four
different underlying descriptor types, and two other beneficial properties of
Delta Descriptors in comparison to existing techniques: their increased
inherent robustness to variations in camera motion and a reduced rate of
performance degradation as dimensional reduction is applied. Source code is
made available at https://github.com/oravus/DeltaDescriptors.Comment: 8 pages and 7 figures. Published in 2020 IEEE Robotics and Automation
Letters (RA-L
“This is the way the world ends, not…”: towards a polis of performing ecology
In the opening decade of the twenty-first century humans faced a rising surplus of historical double binds that threatened no shortage of highly charged political and ethical dilemmas. For example, humanity’s success at performing survival began to outstrip the carrying capacity of Earth. And, of course, such blatant global dramas offer no obvious denouement. When all futures seem to promise only impossible scenarios, such as an end to ‘history’ or even ‘nature’, what kinds of performance paradigm might offer some glimmers of hope? This presentation approaches that prospect paradoxically by attempting to treat it lightly, as if we are always already such stuff as dreams are made on. So it delves into an end to all ethics and the onset of an especially extreme state of political exception for Homo sapiens as the species passes under a rainbow called climate change. For this particular specimen, on the left is a 1970s Hawaiian happening titled H.C.A.W. – Happy Cleaner Air Week – to the right a recent land-based installation known as A Meadow Meander. Between these unlikely materials it aims to conjure up a few random poles of a dynamic dispersal of Earthly doom that goes by the dubious bioethical alias of ‘performing ecology’
- …