1,503 research outputs found
Synchronization of passes in event and spatiotemporal soccer data
The majority of soccer analysis studies investigates specific scenarios through the implementation of computational techniques, which involve the examination of either spatiotemporal position data (movement of players and the ball on the pitch) or event data (relating to significant situations during a match). Yet, only a few applications perform a joint analysis of both data sources despite the various involved advantages emerging from such an approach. One possible reason for this is a non-systematic error in the event data, causing a temporal misalignment of the two data sources. To address this problem, we propose a solution that combines the SwiftEvent online algorithm (Gensler and Sick in Pattern Anal Appl 21:543–562, 2018) with a subsequent refinement step that corrects pass timestamps by exploiting the statistical properties of passes in the position data. We evaluate our proposed algorithm on ground-truth pass labels of four top-flight soccer matches from the 2014/15 season. Results show that the percentage of passes within half a second to ground truth increases from 14 to 70%, while our algorithm also detects localization errors (noise) in the position data. A comparison with other models shows that our algorithm is superior to baseline models and comparable to a deep learning pass detection method (while requiring significantly less data). Hence, our proposed lightweight framework offers a viable solution that enables groups facing limited access to (recent) data sources to effectively synchronize passes in the event and position data
BodyNet: Volumetric Inference of 3D Human Body Shapes
Human shape estimation is an important task for video editing, animation and
fashion industry. Predicting 3D human body shape from natural images, however,
is highly challenging due to factors such as variation in human bodies,
clothing and viewpoint. Prior methods addressing this problem typically attempt
to fit parametric body models with certain priors on pose and shape. In this
work we argue for an alternative representation and propose BodyNet, a neural
network for direct inference of volumetric body shape from a single image.
BodyNet is an end-to-end trainable network that benefits from (i) a volumetric
3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate
supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them
results in performance improvement as demonstrated by our experiments. To
evaluate the method, we fit the SMPL model to our network output and show
state-of-the-art results on the SURREAL and Unite the People datasets,
outperforming recent approaches. Besides achieving state-of-the-art
performance, our method also enables volumetric body-part segmentation.Comment: Appears in: European Conference on Computer Vision 2018 (ECCV 2018).
27 page
Learning Features by Watching Objects Move
This paper presents a novel yet intuitive approach to unsupervised feature
learning. Inspired by the human visual system, we explore whether low-level
motion-based grouping cues can be used to learn an effective visual
representation. Specifically, we use unsupervised motion-based segmentation on
videos to obtain segments, which we use as 'pseudo ground truth' to train a
convolutional network to segment objects from a single frame. Given the
extensive evidence that motion plays a key role in the development of the human
visual system, we hope that this straightforward approach to unsupervised
learning will be more effective than cleverly designed 'pretext' tasks studied
in the literature. Indeed, our extensive experiments show that this is the
case. When used for transfer learning on object detection, our representation
significantly outperforms previous unsupervised approaches across multiple
settings, especially when training data for the target task is scarce.Comment: CVPR 201
Recommended from our members
Machine Learning Methods for Activity Detection in Wearable Sensor Data Streams
Wearable wireless sensors have the potential for transformative impact on the fields of health and behavioral science. Recent advances in wearable sensor technology have made it possible to simultaneously collect multiple streams of physiological and context data from individuals in natural environments; however, extracting reliable high-level inferences from these raw data streams remains a key data analysis challenge. In this dissertation, we address three challenges that arise when trying to perform activity detection from wearable sensor streams. First, we address the challenge of learning from small amounts of noisy data by proposing a class of conditional random field models for activity detection. We apply this model class to three different activity detection problems, improving performance in all three when compared with standard independent and structured models. Second, we address the challenge of inferring activities from long input sequences by evaluating strategies for pruning the inference dynamic programs used in structured prediction models. We apply these strategies to the proposed structured activity detection models resulting in inference speedups ranging from 66x to 257x with little to no decrease in predictive performance. Finally, we address the challenge of learning from imprecise annotations by proposing a weak supervision framework for learning discrete-time detection models from imprecise continuous-time observations. We apply this framework to both independent and structured models and demonstrate improved performance over weak supervision baselines
Weakly-Supervised Alignment of Video With Text
Suppose that we are given a set of videos, along with natural language
descriptions in the form of multiple sentences (e.g., manual annotations, movie
scripts, sport summaries etc.), and that these sentences appear in the same
temporal order as their visual counterparts. We propose in this paper a method
for aligning the two modalities, i.e., automatically providing a time stamp for
every sentence. Given vectorial features for both video and text, we propose to
cast this task as a temporal assignment problem, with an implicit linear
mapping between the two feature modalities. We formulate this problem as an
integer quadratic program, and solve its continuous convex relaxation using an
efficient conditional gradient algorithm. Several rounding procedures are
proposed to construct the final integer solution. After demonstrating
significant improvements over the state of the art on the related task of
aligning video with symbolic labels [7], we evaluate our method on a
challenging dataset of videos with associated textual descriptions [36], using
both bag-of-words and continuous representations for text.Comment: ICCV 2015 - IEEE International Conference on Computer Vision, Dec
2015, Santiago, Chil
Cell Segmentation and Tracking using CNN-Based Distance Predictions and a Graph-Based Matching Strategy
The accurate segmentation and tracking of cells in microscopy image sequences
is an important task in biomedical research, e.g., for studying the development
of tissues, organs or entire organisms. However, the segmentation of touching
cells in images with a low signal-to-noise-ratio is still a challenging
problem. In this paper, we present a method for the segmentation of touching
cells in microscopy images. By using a novel representation of cell borders,
inspired by distance maps, our method is capable to utilize not only touching
cells but also close cells in the training process. Furthermore, this
representation is notably robust to annotation errors and shows promising
results for the segmentation of microscopy images containing in the training
data underrepresented or not included cell types. For the prediction of the
proposed neighbor distances, an adapted U-Net convolutional neural network
(CNN) with two decoder paths is used. In addition, we adapt a graph-based cell
tracking algorithm to evaluate our proposed method on the task of cell
tracking. The adapted tracking algorithm includes a movement estimation in the
cost function to re-link tracks with missing segmentation masks over a short
sequence of frames. Our combined tracking by detection method has proven its
potential in the IEEE ISBI 2020 Cell Tracking Challenge
(http://celltrackingchallenge.net/) where we achieved as team KIT-Sch-GE
multiple top three rankings including two top performances using a single
segmentation model for the diverse data sets.Comment: 25 pages, 14 figures, methods of the team KIT-Sch-GE for the IEEE
ISBI 2020 Cell Tracking Challeng
- …