5,511 research outputs found
Indirect Match Highlights Detection with Deep Convolutional Neural Networks
Highlights in a sport video are usually referred as actions that stimulate
excitement or attract attention of the audience. A big effort is spent in
designing techniques which find automatically highlights, in order to
automatize the otherwise manual editing process. Most of the state-of-the-art
approaches try to solve the problem by training a classifier using the
information extracted on the tv-like framing of players playing on the game
pitch, learning to detect game actions which are labeled by human observers
according to their perception of highlight. Obviously, this is a long and
expensive work. In this paper, we reverse the paradigm: instead of looking at
the gameplay, inferring what could be exciting for the audience, we directly
analyze the audience behavior, which we assume is triggered by events happening
during the game. We apply deep 3D Convolutional Neural Network (3D-CNN) to
extract visual features from cropped video recordings of the supporters that
are attending the event. Outputs of the crops belonging to the same frame are
then accumulated to produce a value indicating the Highlight Likelihood (HL)
which is then used to discriminate between positive (i.e. when a highlight
occurs) and negative samples (i.e. standard play or time-outs). Experimental
results on a public dataset of ice-hockey matches demonstrate the effectiveness
of our method and promote further research in this new exciting direction.Comment: "Social Signal Processing and Beyond" workshop, in conjunction with
ICIAP 201
A Neural Multi-sequence Alignment TeCHnique (NeuMATCH)
The alignment of heterogeneous sequential data (video to text) is an
important and challenging problem. Standard techniques for this task, including
Dynamic Time Warping (DTW) and Conditional Random Fields (CRFs), suffer from
inherent drawbacks. Mainly, the Markov assumption implies that, given the
immediate past, future alignment decisions are independent of further history.
The separation between similarity computation and alignment decision also
prevents end-to-end training. In this paper, we propose an end-to-end neural
architecture where alignment actions are implemented as moving data between
stacks of Long Short-term Memory (LSTM) blocks. This flexible architecture
supports a large variety of alignment tasks, including one-to-one, one-to-many,
skipping unmatched elements, and (with extensions) non-monotonic alignment.
Extensive experiments on semi-synthetic and real datasets show that our
algorithm outperforms state-of-the-art baselines.Comment: Accepted at CVPR 2018 (Spotlight). arXiv file includes the paper and
the supplemental materia
- …