117,998 research outputs found

    Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition

    Full text link
    RGB-D action and gesture recognition remain an interesting topic in human-centered scene understanding, primarily due to the multiple granularities and large variation in human motion. Although many RGB-D based action and gesture recognition approaches have demonstrated remarkable results by utilizing highly integrated spatio-temporal representations across multiple modalities (i.e., RGB and depth data), they still encounter several challenges. Firstly, vanilla 3D convolution makes it hard to capture fine-grained motion differences between local clips under different modalities. Secondly, the intricate nature of highly integrated spatio-temporal modeling can lead to optimization difficulties. Thirdly, duplicate and unnecessary information can add complexity and complicate entangled spatio-temporal modeling. To address the above issues, we propose an innovative heuristic architecture called Multi-stage Factorized Spatio-Temporal (MFST) for RGB-D action and gesture recognition. The proposed MFST model comprises a 3D Central Difference Convolution Stem (CDC-Stem) module and multiple factorized spatio-temporal stages. The CDC-Stem enriches fine-grained temporal perception, and the multiple hierarchical spatio-temporal stages construct dimension-independent higher-order semantic primitives. Specifically, the CDC-Stem module captures bottom-level spatio-temporal features and passes them successively to the following spatio-temporal factored stages to capture the hierarchical spatial and temporal features through the Multi- Scale Convolution and Transformer (MSC-Trans) hybrid block and Weight-shared Multi-Scale Transformer (WMS-Trans) block. The seamless integration of these innovative designs results in a robust spatio-temporal representation that outperforms state-of-the-art approaches on RGB-D action and gesture recognition datasets.Comment: ACM MM'2

    A novel wideband dynamic directional indoor channel model based on a Markov process

    Get PDF
    This document is made available in accordance with publisher policies. Please cite only the published version using the reference above. Full terms of use are available

    Temporal Extension of Scale Pyramid and Spatial Pyramid Matching for Action Recognition

    Full text link
    Historically, researchers in the field have spent a great deal of effort to create image representations that have scale invariance and retain spatial location information. This paper proposes to encode equivalent temporal characteristics in video representations for action recognition. To achieve temporal scale invariance, we develop a method called temporal scale pyramid (TSP). To encode temporal information, we present and compare two methods called temporal extension descriptor (TED) and temporal division pyramid (TDP) . Our purpose is to suggest solutions for matching complex actions that have large variation in velocity and appearance, which is missing from most current action representations. The experimental results on four benchmark datasets, UCF50, HMDB51, Hollywood2 and Olympic Sports, support our approach and significantly outperform state-of-the-art methods. Most noticeably, we achieve 65.0% mean accuracy and 68.2% mean average precision on the challenging HMDB51 and Hollywood2 datasets which constitutes an absolute improvement over the state-of-the-art by 7.8% and 3.9%, respectively

    Modeling Dynamic Swarms

    Full text link
    This paper proposes the problem of modeling video sequences of dynamic swarms (DS). We define DS as a large layout of stochastically repetitive spatial configurations of dynamic objects (swarm elements) whose motions exhibit local spatiotemporal interdependency and stationarity, i.e., the motions are similar in any small spatiotemporal neighborhood. Examples of DS abound in nature, e.g., herds of animals and flocks of birds. To capture the local spatiotemporal properties of the DS, we present a probabilistic model that learns both the spatial layout of swarm elements and their joint dynamics that are modeled as linear transformations. To this end, a spatiotemporal neighborhood is associated with each swarm element, in which local stationarity is enforced both spatially and temporally. We assume that the prior on the swarm dynamics is distributed according to an MRF in both space and time. Embedding this model in a MAP framework, we iterate between learning the spatial layout of the swarm and its dynamics. We learn the swarm transformations using ICM, which iterates between estimating these transformations and updating their distribution in the spatiotemporal neighborhoods. We demonstrate the validity of our method by conducting experiments on real video sequences. Real sequences of birds, geese, robot swarms, and pedestrians evaluate the applicability of our model to real world data.Comment: 11 pages, 17 figures, conference paper, computer visio

    Ground Motion Model of the SLAC Site

    Get PDF
    We present a ground motion model for the SLAC site. This model is based on recent ground motion studies performed at SLAC as well as on historical data. The model includes wave-like, diffusive and systematic types of motion. An attempt is made to relate measurable secondary properties of the ground motion with more basic characteristics such as the layered geological structure of the surrounding earth, depth of the tunnel, etc. This model is an essential step in evaluating sites for a future linear collider.Comment: submitted to XX International Linac Conferenc

    Stochastic representation of the Reynolds transport theorem: revisiting large-scale modeling

    Get PDF
    We explore the potential of a formulation of the Navier-Stokes equations incorporating a random description of the small-scale velocity component. This model, established from a version of the Reynolds transport theorem adapted to a stochastic representation of the flow, gives rise to a large-scale description of the flow dynamics in which emerges an anisotropic subgrid tensor, reminiscent to the Reynolds stress tensor, together with a drift correction due to an inhomogeneous turbulence. The corresponding subgrid model, which depends on the small scales velocity variance, generalizes the Boussinesq eddy viscosity assumption. However, it is not anymore obtained from an analogy with molecular dissipation but ensues rigorously from the random modeling of the flow. This principle allows us to propose several subgrid models defined directly on the resolved flow component. We assess and compare numerically those models on a standard Green-Taylor vortex flow at Reynolds 1600. The numerical simulations, carried out with an accurate divergence-free scheme, outperform classical large-eddies formulations and provides a simple demonstration of the pertinence of the proposed large-scale modeling

    Scalable Dense Non-rigid Structure-from-Motion: A Grassmannian Perspective

    Full text link
    This paper addresses the task of dense non-rigid structure-from-motion (NRSfM) using multiple images. State-of-the-art methods to this problem are often hurdled by scalability, expensive computations, and noisy measurements. Further, recent methods to NRSfM usually either assume a small number of sparse feature points or ignore local non-linearities of shape deformations, and thus cannot reliably model complex non-rigid deformations. To address these issues, in this paper, we propose a new approach for dense NRSfM by modeling the problem on a Grassmann manifold. Specifically, we assume the complex non-rigid deformations lie on a union of local linear subspaces both spatially and temporally. This naturally allows for a compact representation of the complex non-rigid deformation over frames. We provide experimental results on several synthetic and real benchmark datasets. The procured results clearly demonstrate that our method, apart from being scalable and more accurate than state-of-the-art methods, is also more robust to noise and generalizes to highly non-linear deformations.Comment: 10 pages, 7 figure, 4 tables. Accepted for publication in Conference on Computer Vision and Pattern Recognition (CVPR), 2018, typos fixed and acknowledgement adde
    • 

    corecore