25 research outputs found
ConViViT -- A Deep Neural Network Combining Convolutions and Factorized Self-Attention for Human Activity Recognition
The Transformer architecture has gained significant popularity in computer
vision tasks due to its capacity to generalize and capture long-range
dependencies. This characteristic makes it well-suited for generating
spatiotemporal tokens from videos. On the other hand, convolutions serve as the
fundamental backbone for processing images and videos, as they efficiently
aggregate information within small local neighborhoods to create spatial tokens
that describe the spatial dimension of a video. While both CNN-based
architectures and pure transformer architectures are extensively studied and
utilized by researchers, the effective combination of these two backbones has
not received comparable attention in the field of activity recognition. In this
research, we propose a novel approach that leverages the strengths of both CNNs
and Transformers in an hybrid architecture for performing activity recognition
using RGB videos. Specifically, we suggest employing a CNN network to enhance
the video representation by generating a 128-channel video that effectively
separates the human performing the activity from the background. Subsequently,
the output of the CNN module is fed into a transformer to extract
spatiotemporal tokens, which are then used for classification purposes. Our
architecture has achieved new SOTA results with 90.05 \%, 99.6\%, and 95.09\%
on HMDB51, UCF101, and ETRI-Activity3D respectively
Geometric Deep Learning on Skeleton Sequences for 2D/3D Action Recognition
International audienc
3D Geodesic Shape Spectrum Descriptor for object retrieval
Dans ce papier, nous nous intéressons au problème d’indexation d’objets 3D par descripteurs de formes. L’approche proposée consiste à construire un nouveau descripteur intrinsèquement invariant aux transformations géométriques et robuste aux changements de topologie et au remaillage. Il s’agit de calculer au voisinage des points d’intérêts de la surface externe de la forme, le spectre de forme 3D (SF3D) retenu dans le standard MPEG-7 (Zaharia, 2001). Le voisinage autour de chaque point d’intérêt est obtenu par intersection des courbes de niveaux géodésiques et des lignes radiales. Les courbes de niveaux correspondent aux points situés à égales distances géodésiques du point d’intérêt. Les expériences effectuées sur les bases de test de SHREC’09 et de SHREC’11 montrent les propriétés du descripteur proposé et valident la pertinence de notre approche par rapport aux descripteurs proposés dans la littérature (Lian, 2009 ;2011).In this paper, we address the problem of 3D object retrieval based on 3D shape descriptors. The proposed approach builds a new descriptor intrinsically invariant to geometric transformations and robust to topology changes and remeshing. The 3D shape spectrum Descriptor (3D SSD), proposed in the MPEG-7 (Zaharia, 2001), is computed on an intrinsic interest point neighborhood. The neighborhood around each interest point is composed of a set of geodesic level curves and radial ones. The level curves correspond to the points at equal geodesic distances from the interest point. The experiments carried out on the SHREC’09 and SHREC’11 datasets show the performance of the proposed descriptor and compare it to further descriptors proposed in the literature (Lian, 2009;2011)
Edge-aware wedgelet estimation for depth maps compression
In recent years, Multi-view Video plus Depth (MVD) compression has received much attention thanks to its relevance to free viewpoint applications needs. An efficient compression, that causes the least distortion without excessive rate and complexity increase, becomes a must particularly for depth maps. These latter can be compressed efficiently by the 3D extension of High Efficiency Video Coding (3D-HEVC), which has explored wedgelets. Such functions lead to significant Rate-Distortion tradeoffs. However, they require a very large computational complexity involved by the exhaustive search used for the estimation of the wedgelet subdivision line. In this paper, we propose a rapid localization of this latter using an edge detection approach. The experimental results show that the proposed approach allows an important gain in terms of encoding delay, while providing better depth maps and synthesized views quality compared to the exhaustive search approach
Efficient B-spline wavelets based dictionary for depth coding and view rendering
Video representations that support view synthesis based on depth maps, such as multiview plus depth, have been widely
emerged raising interest in efficient depth maps coding tools. In this paper, we propose an innovative sparse decomposition on
wavelets based dictionary specially designed for the piece-wise planar nature of depth signal. We also evaluate performances
of the proposed dictionary for depth maps coding while paying special attention to the impact of depth coding errors on
resulting synthesized images. Obtained results prove the relevance of the proposed scheme able to considerably improve the
perceived quality of synthesized images
Geometric Deep Neural Network using Rigid and Non-Rigid Transformations for Human Action Recognition
International audienc
An unsupervised 3D mesh segmentation based on HMRF-EM algorithm
We propose a new 3D mesh segmentation method based on the HMRF-EM framework. The clustering method
relies on the curvature attribute and considers the spatial information encoded by the mutual influences of neighboring
mesh elements. A region growing process is then carried out in order to extract connected regions followed by
a merging procedure. The purpose of this latter process is to only preserve meaningful regions. Experiments conducted
on different meshes are encouraging and show that the proposed method gives satisfying results compared
with classical statistical ones such as kmeans and EM algorithms
Edge-aware wedgelet estimation for depth maps compression
In recent years, Multi-view Video plus Depth (MVD) compression has received much attention thanks to its relevance to free viewpoint applications needs. An efficient compression, that causes the least distortion without excessive rate and complexity increase, becomes a must particularly for depth maps. These latter can be compressed efficiently by the 3D extension of High Efficiency Video Coding (3D-HEVC), which has explored wedgelets. Such functions lead to significant Rate-Distortion tradeoffs. However, they require a very large computational complexity involved by the exhaustive search used for the estimation of the wedgelet subdivision line. In this paper, we propose a rapid localization of this latter using an edge detection approach. The experimental results show that the proposed approach allows an important gain in terms of encoding delay, while providing better depth maps and synthesized views quality compared to the exhaustive search approach
Representations, Analysis and Recognition of Shape and Motion from Imaging Data
International audienc