25 research outputs found

    ConViViT -- A Deep Neural Network Combining Convolutions and Factorized Self-Attention for Human Activity Recognition

    Full text link
    The Transformer architecture has gained significant popularity in computer vision tasks due to its capacity to generalize and capture long-range dependencies. This characteristic makes it well-suited for generating spatiotemporal tokens from videos. On the other hand, convolutions serve as the fundamental backbone for processing images and videos, as they efficiently aggregate information within small local neighborhoods to create spatial tokens that describe the spatial dimension of a video. While both CNN-based architectures and pure transformer architectures are extensively studied and utilized by researchers, the effective combination of these two backbones has not received comparable attention in the field of activity recognition. In this research, we propose a novel approach that leverages the strengths of both CNNs and Transformers in an hybrid architecture for performing activity recognition using RGB videos. Specifically, we suggest employing a CNN network to enhance the video representation by generating a 128-channel video that effectively separates the human performing the activity from the background. Subsequently, the output of the CNN module is fed into a transformer to extract spatiotemporal tokens, which are then used for classification purposes. Our architecture has achieved new SOTA results with 90.05 \%, 99.6\%, and 95.09\% on HMDB51, UCF101, and ETRI-Activity3D respectively

    Geometric Deep Learning on Skeleton Sequences for 2D/3D Action Recognition

    No full text
    International audienc

    3D Geodesic Shape Spectrum Descriptor for object retrieval

    No full text
    Dans ce papier, nous nous intéressons au problème d’indexation d’objets 3D par descripteurs de formes. L’approche proposée consiste à construire un nouveau descripteur intrinsèquement invariant aux transformations géométriques et robuste aux changements de topologie et au remaillage. Il s’agit de calculer au voisinage des points d’intérêts de la surface externe de la forme, le spectre de forme 3D (SF3D) retenu dans le standard MPEG-7 (Zaharia, 2001). Le voisinage autour de chaque point d’intérêt est obtenu par intersection des courbes de niveaux géodésiques et des lignes radiales. Les courbes de niveaux correspondent aux points situés à égales distances géodésiques du point d’intérêt. Les expériences effectuées sur les bases de test de SHREC’09 et de SHREC’11 montrent les propriétés du descripteur proposé et valident la pertinence de notre approche par rapport aux descripteurs proposés dans la littérature (Lian, 2009 ;2011).In this paper, we address the problem of 3D object retrieval based on 3D shape descriptors. The proposed approach builds a new descriptor intrinsically invariant to geometric transformations and robust to topology changes and remeshing. The 3D shape spectrum Descriptor (3D SSD), proposed in the MPEG-7 (Zaharia, 2001), is computed on an intrinsic interest point neighborhood. The neighborhood around each interest point is composed of a set of geodesic level curves and radial ones. The level curves correspond to the points at equal geodesic distances from the interest point. The experiments carried out on the SHREC’09 and SHREC’11 datasets show the performance of the proposed descriptor and compare it to further descriptors proposed in the literature (Lian, 2009;2011)

    Edge-aware wedgelet estimation for depth maps compression

    No full text
    In recent years, Multi-view Video plus Depth (MVD) compression has received much attention thanks to its relevance to free viewpoint applications needs. An efficient compression, that causes the least distortion without excessive rate and complexity increase, becomes a must particularly for depth maps. These latter can be compressed efficiently by the 3D extension of High Efficiency Video Coding (3D-HEVC), which has explored wedgelets. Such functions lead to significant Rate-Distortion tradeoffs. However, they require a very large computational complexity involved by the exhaustive search used for the estimation of the wedgelet subdivision line. In this paper, we propose a rapid localization of this latter using an edge detection approach. The experimental results show that the proposed approach allows an important gain in terms of encoding delay, while providing better depth maps and synthesized views quality compared to the exhaustive search approach

    Efficient B-spline wavelets based dictionary for depth coding and view rendering

    Get PDF
    Video representations that support view synthesis based on depth maps, such as multiview plus depth, have been widely emerged raising interest in efficient depth maps coding tools. In this paper, we propose an innovative sparse decomposition on wavelets based dictionary specially designed for the piece-wise planar nature of depth signal. We also evaluate performances of the proposed dictionary for depth maps coding while paying special attention to the impact of depth coding errors on resulting synthesized images. Obtained results prove the relevance of the proposed scheme able to considerably improve the perceived quality of synthesized images

    Geometric Deep Neural Network using Rigid and Non-Rigid Transformations for Human Action Recognition

    No full text
    International audienc

    An unsupervised 3D mesh segmentation based on HMRF-EM algorithm

    Get PDF
    We propose a new 3D mesh segmentation method based on the HMRF-EM framework. The clustering method relies on the curvature attribute and considers the spatial information encoded by the mutual influences of neighboring mesh elements. A region growing process is then carried out in order to extract connected regions followed by a merging procedure. The purpose of this latter process is to only preserve meaningful regions. Experiments conducted on different meshes are encouraging and show that the proposed method gives satisfying results compared with classical statistical ones such as kmeans and EM algorithms

    Edge-aware wedgelet estimation for depth maps compression

    Get PDF
    In recent years, Multi-view Video plus Depth (MVD) compression has received much attention thanks to its relevance to free viewpoint applications needs. An efficient compression, that causes the least distortion without excessive rate and complexity increase, becomes a must particularly for depth maps. These latter can be compressed efficiently by the 3D extension of High Efficiency Video Coding (3D-HEVC), which has explored wedgelets. Such functions lead to significant Rate-Distortion tradeoffs. However, they require a very large computational complexity involved by the exhaustive search used for the estimation of the wedgelet subdivision line. In this paper, we propose a rapid localization of this latter using an edge detection approach. The experimental results show that the proposed approach allows an important gain in terms of encoding delay, while providing better depth maps and synthesized views quality compared to the exhaustive search approach
    corecore