2 research outputs found
Three-stream 3D/1D CNN for fine-grained action classification and segmentation in table tennis
This paper proposes a fusion method of modalities extracted from videothrough a three-stream network with spatio-temporal and temporal convolutionsfor fine-grained action classification in sport. It is applied to TTStroke-21dataset which consists of untrimmed videos of table tennis games. The goal isto detect and classify table tennis strokes in the videos, the first step of abigger scheme aiming at giving feedback to the players for improving theirperformance. The three modalities are raw RGB data, the computed optical flowand the estimated pose of the player. The network consists of three brancheswith attention blocks. Features are fused at the latest stage of the networkusing bilinear layers. Compared to previous approaches, the use of threemodalities allows faster convergence and better performances on both tasks:classification of strokes with known temporal boundaries and joint segmentationand classification. The pose is also further investigated in order to offerricher feedback to the athletes.<br
3D Convolutional Networks for Action Recognition: Application to Sport Gesture Recognition
3D convolutional networks is a good means to perform tasks such as video
segmentation into coherent spatio-temporal chunks and classification of them
with regard to a target taxonomy. In the chapter we are interested in the
classification of continuous video takes with repeatable actions, such as
strokes of table tennis. Filmed in a free marker less ecological environment,
these videos represent a challenge from both segmentation and classification
point of view. The 3D convnets are an efficient tool for solving these problems
with window-based approaches.Comment: Multi-faceted Deep Learning, 202