2,031 research outputs found
Automated Top View Registration of Broadcast Football Videos
In this paper, we propose a novel method to register football broadcast video
frames on the static top view model of the playing surface. The proposed method
is fully automatic in contrast to the current state of the art which requires
manual initialization of point correspondences between the image and the static
model. Automatic registration using existing approaches has been difficult due
to the lack of sufficient point correspondences. We investigate an alternate
approach exploiting the edge information from the line markings on the field.
We formulate the registration problem as a nearest neighbour search over a
synthetically generated dictionary of edge map and homography pairs. The
synthetic dictionary generation allows us to exhaustively cover a wide variety
of camera angles and positions and reduce this problem to a minimal per-frame
edge map matching procedure. We show that the per-frame results can be improved
in videos using an optimization framework for temporal camera stabilization. We
demonstrate the efficacy of our approach by presenting extensive results on a
dataset collected from matches of football World Cup 2014
Accurate Long-Term Multiple People Tracking Using Video and Body-Worn IMUs
Most modern approaches for video-based multiple people tracking rely on human appearance to exploit similarities between person detections. Consequently, tracking accuracy degrades if this kind of information is not discriminative or if people change apparel. In contrast, we present a method to fuse video information with additional motion signals from body-worn inertial measurement units (IMUs). In particular, we propose a neural network to relate person detections with IMU orientations, and formulate a graph labeling problem to obtain a tracking solution that is globally consistent with the video and inertial recordings. The fusion of visual and inertial cues provides several advantages. The association of detection boxes in the video and IMU devices is based on motion, which is independent of a person's outward appearance. Furthermore, inertial sensors provide motion information irrespective of visual occlusions. Hence, once detections in the video are associated with an IMU device, intermediate positions can be reconstructed from corresponding inertial sensor data, which would be unstable using video only. Since no dataset exists for this new setting, we release a dataset of challenging tracking sequences, containing video and IMU recordings together with ground-truth annotations. We evaluate our approach on our new dataset, achieving an average IDF1 score of 91.2%. The proposed method is applicable to any situation that allows one to equip people with inertial sensors. © 1992-2012 IEEE
Tracking interacting targets in multi-modal sensors
PhDObject tracking is one of the fundamental tasks in various applications such as surveillance,
sports, video conferencing and activity recognition. Factors such as occlusions,
illumination changes and limited field of observance of the sensor make tracking a challenging
task. To overcome these challenges the focus of this thesis is on using multiple
modalities such as audio and video for multi-target, multi-modal tracking. Particularly,
this thesis presents contributions to four related research topics, namely, pre-processing of
input signals to reduce noise, multi-modal tracking, simultaneous detection and tracking,
and interaction recognition.
To improve the performance of detection algorithms, especially in the presence
of noise, this thesis investigate filtering of the input data through spatio-temporal feature
analysis as well as through frequency band analysis. The pre-processed data from multiple
modalities is then fused within Particle filtering (PF). To further minimise the discrepancy
between the real and the estimated positions, we propose a strategy that associates the
hypotheses and the measurements with a real target, using a Weighted Probabilistic Data
Association (WPDA). Since the filtering involved in the detection process reduces the
available information and is inapplicable on low signal-to-noise ratio data, we investigate
simultaneous detection and tracking approaches and propose a multi-target track-beforedetect
Particle filtering (MT-TBD-PF). The proposed MT-TBD-PF algorithm bypasses
the detection step and performs tracking in the raw signal. Finally, we apply the proposed
multi-modal tracking to recognise interactions between targets in regions within, as well
as outside the cameras’ fields of view.
The efficiency of the proposed approaches are demonstrated on large uni-modal,
multi-modal and multi-sensor scenarios from real world detections, tracking and event
recognition datasets and through participation in evaluation campaigns
Ball Trajectory Inference from Multi-Agent Sports Contexts Using Set Transformer and Hierarchical Bi-LSTM
As artificial intelligence spreads out to numerous fields, the application of
AI to sports analytics is also in the spotlight. However, one of the major
challenges is the difficulty of automated acquisition of continuous movement
data during sports matches. In particular, it is a conundrum to reliably track
a tiny ball on a wide soccer pitch with obstacles such as occlusion and
imitations. Tackling the problem, this paper proposes an inference framework of
ball trajectory from player trajectories as a cost-efficient alternative to
ball tracking. We combine Set Transformers to get permutation-invariant and
equivariant representations of the multi-agent contexts with a hierarchical
architecture that intermediately predicts the player ball possession to support
the final trajectory inference. Also, we introduce the reality loss term and
postprocessing to secure the estimated trajectories to be physically realistic.
The experimental results show that our model provides natural and accurate
trajectories as well as admissible player ball possession at the same time.
Lastly, we suggest several practical applications of our framework including
missing trajectory imputation, semi-automated pass annotation, automated
zoom-in for match broadcasting, and calculating possession-wise running
performance metrics
Feature based dynamic intra-video indexing
A thesis submitted in partial fulfillment for the degree of Doctor of PhilosophyWith the advent of digital imagery and its wide spread application in all vistas of life, it has become an important component in the world of communication. Video content ranging from broadcast news, sports, personal videos, surveillance, movies and entertainment and similar domains is increasing exponentially in quantity and it is becoming a challenge to retrieve content of interest from the corpora. This has led to an increased interest amongst the researchers to investigate concepts of video structure analysis, feature extraction, content annotation, tagging, video indexing, querying and retrieval to fulfil the requirements. However, most of the previous work is confined within specific domain and constrained by the quality, processing and storage capabilities. This thesis presents a novel framework agglomerating the established approaches from feature extraction to browsing in one system of content based video retrieval. The proposed framework significantly fills the gap identified while satisfying the imposed constraints of processing, storage, quality and retrieval times. The output entails a framework, methodology and prototype application to allow the user to efficiently and effectively retrieved content of interest such as age, gender and activity by specifying the relevant query. Experiments have shown plausible results with an average precision and recall of 0.91 and 0.92 respectively for face detection using Haar wavelets based approach. Precision of age ranges from 0.82 to 0.91 and recall from 0.78 to 0.84. The recognition of gender gives better precision with males (0.89) compared to females while recall gives a higher value with females (0.92). Activity of the subject has been detected using Hough transform and classified using Hiddell Markov Model. A comprehensive dataset to support similar studies has also been developed as part of the research process. A Graphical User Interface (GUI) providing a friendly and intuitive interface has been integrated into the developed system to facilitate the retrieval process. The comparison results of the intraclass correlation coefficient (ICC) shows that the performance of the system closely resembles with that of the human annotator. The performance has been optimised for time and error rate
- …