216 research outputs found

    Convolutional Neural Network on Three Orthogonal Planes for Dynamic Texture Classification

    Get PDF
    Dynamic Textures (DTs) are sequences of images of moving scenes that exhibit certain stationarity properties in time such as smoke, vegetation and fire. The analysis of DT is important for recognition, segmentation, synthesis or retrieval for a range of applications including surveillance, medical imaging and remote sensing. Deep learning methods have shown impressive results and are now the new state of the art for a wide range of computer vision tasks including image and video recognition and segmentation. In particular, Convolutional Neural Networks (CNNs) have recently proven to be well suited for texture analysis with a design similar to a filter bank approach. In this paper, we develop a new approach to DT analysis based on a CNN method applied on three orthogonal planes x y , xt and y t . We train CNNs on spatial frames and temporal slices extracted from the DT sequences and combine their outputs to obtain a competitive DT classifier. Our results on a wide range of commonly used DT classification benchmark datasets prove the robustness of our approach. Significant improvement of the state of the art is shown on the larger datasets.Comment: 19 pages, 10 figure

    Beyond Gauss: Image-Set Matching on the Riemannian Manifold of PDFs

    Get PDF
    State-of-the-art image-set matching techniques typically implicitly model each image-set with a Gaussian distribution. Here, we propose to go beyond these representations and model image-sets as probability distribution functions (PDFs) using kernel density estimators. To compare and match image-sets, we exploit Csiszar f-divergences, which bear strong connections to the geodesic distance defined on the space of PDFs, i.e., the statistical manifold. Furthermore, we introduce valid positive definite kernels on the statistical manifolds, which let us make use of more powerful classification schemes to match image-sets. Finally, we introduce a supervised dimensionality reduction technique that learns a latent space where f-divergences reflect the class labels of the data. Our experiments on diverse problems, such as video-based face recognition and dynamic texture classification, evidence the benefits of our approach over the state-of-the-art image-set matching methods

    Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics

    Get PDF
    This paper proposes a novel pretext task to address the self-supervised video representation learning problem. Specifically, given an unlabeled video clip, we compute a series of spatio-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion, the spatial location and dominant color of the largest color diversity along the temporal axis, etc. Then a neural network is built and trained to yield the statistical summaries given the video frames as inputs. In order to alleviate the learning difficulty, we employ several spatial partitioning patterns to encode rough spatial locations instead of exact spatial Cartesian coordinates. Our approach is inspired by the observation that human visual system is sensitive to rapidly changing contents in the visual field, and only needs impressions about rough spatial locations to understand the visual contents. To validate the effectiveness of the proposed approach, we conduct extensive experiments with four 3D backbone networks, i.e., C3D, 3D-ResNet, R(2+1)D and S3D-G. The results show that our approach outperforms the existing approaches across these backbone networks on four downstream video analysis tasks including action recognition, video retrieval, dynamic scene recognition, and action similarity labeling. The source code is publicly available at: https://github.com/laura-wang/video_repres_sts.Comment: Accepted by TPAMI. An extension of our previous work at arXiv:1904.0359

    Delta Descriptors: Change-Based Place Representation for Robust Visual Localization

    Full text link
    Visual place recognition is challenging because there are so many factors that can cause the appearance of a place to change, from day-night cycles to seasonal change to atmospheric conditions. In recent years a large range of approaches have been developed to address this challenge including deep-learnt image descriptors, domain translation, and sequential filtering, all with shortcomings including generality and velocity-sensitivity. In this paper we propose a novel descriptor derived from tracking changes in any learned global descriptor over time, dubbed Delta Descriptors. Delta Descriptors mitigate the offsets induced in the original descriptor matching space in an unsupervised manner by considering temporal differences across places observed along a route. Like all other approaches, Delta Descriptors have a shortcoming - volatility on a frame to frame basis - which can be overcome by combining them with sequential filtering methods. Using two benchmark datasets, we first demonstrate the high performance of Delta Descriptors in isolation, before showing new state-of-the-art performance when combined with sequence-based matching. We also present results demonstrating the approach working with four different underlying descriptor types, and two other beneficial properties of Delta Descriptors in comparison to existing techniques: their increased inherent robustness to variations in camera motion and a reduced rate of performance degradation as dimensional reduction is applied. Source code is made available at https://github.com/oravus/DeltaDescriptors.Comment: 8 pages and 7 figures. Published in 2020 IEEE Robotics and Automation Letters (RA-L

    “This is the way the world ends, not…”: towards a polis of performing ecology

    Get PDF
    In the opening decade of the twenty-first century humans faced a rising surplus of historical double binds that threatened no shortage of highly charged political and ethical dilemmas. For example, humanity’s success at performing survival began to outstrip the carrying capacity of Earth. And, of course, such blatant global dramas offer no obvious denouement. When all futures seem to promise only impossible scenarios, such as an end to ‘history’ or even ‘nature’, what kinds of performance paradigm might offer some glimmers of hope? This presentation approaches that prospect paradoxically by attempting to treat it lightly, as if we are always already such stuff as dreams are made on. So it delves into an end to all ethics and the onset of an especially extreme state of political exception for Homo sapiens as the species passes under a rainbow called climate change. For this particular specimen, on the left is a 1970s Hawaiian happening titled H.C.A.W. – Happy Cleaner Air Week – to the right a recent land-based installation known as A Meadow Meander. Between these unlikely materials it aims to conjure up a few random poles of a dynamic dispersal of Earthly doom that goes by the dubious bioethical alias of ‘performing ecology’
    corecore