40 research outputs found
Dynamic NeRFs for Soccer Scenes
The long-standing problem of novel view synthesis has many applications,
notably in sports broadcasting. Photorealistic novel view synthesis of soccer
actions, in particular, is of enormous interest to the broadcast industry. Yet
only a few industrial solutions have been proposed, and even fewer that achieve
near-broadcast quality of the synthetic replays. Except for their setup of
multiple static cameras around the playfield, the best proprietary systems
disclose close to no information about their inner workings. Leveraging
multiple static cameras for such a task indeed presents a challenge rarely
tackled in the literature, for a lack of public datasets: the reconstruction of
a large-scale, mostly static environment, with small, fast-moving elements.
Recently, the emergence of neural radiance fields has induced stunning progress
in many novel view synthesis applications, leveraging deep learning principles
to produce photorealistic results in the most challenging settings. In this
work, we investigate the feasibility of basing a solution to the task on
dynamic NeRFs, i.e., neural models purposed to reconstruct general dynamic
content. We compose synthetic soccer environments and conduct multiple
experiments using them, identifying key components that help reconstruct soccer
scenes with dynamic NeRFs. We show that, although this approach cannot fully
meet the quality requirements for the target application, it suggests promising
avenues toward a cost-efficient, automatic solution. We also make our work
dataset and code publicly available, with the goal to encourage further efforts
from the research community on the task of novel view synthesis for dynamic
soccer scenes. For code, data, and video results, please see
https://soccernerfs.isach.be.Comment: Accepted at the 6th International ACM Workshop on Multimedia Content
Analysis in Sports. 8 pages, 9 figures. Project page:
https://soccernerfs.isach.b
A Universal Protocol to Benchmark Camera Calibration for Sports
peer reviewedCamera calibration is a crucial component in the realm of sports analytics, as it serves as the foundation to extract 3D information out of the broadcast images. Despite the significance of camera calibration research in sports analytics, progress is impeded by outdated benchmarking criteria. Indeed, the annotation data and evaluation metrics provided by most currently available benchmarks strongly favor and incite the development of sports field registration methods, i.e. methods estimating homographies that map the sports field plane to the image plane. However, such homography-based methods are doomed to overlook the broader capabilities of camera calibration in bridging the 3D world to the image. In particular, real-world non-planar sports field elements (such as goals, corner flags, baskets, ...) and image distortion caused by broadcast camera lenses are out of the scope of sports field registration methods. To overcome these limitations, we designed a new benchmarking protocol, named ProCC, based on two principles: (1) the protocol should be agnostic to the camera model chosen for a camera calibration method, and (2) the protocol should fairly evaluate camera calibration methods using the reprojection of arbitrary yet accurately known 3D objects. Indirectly, we also provide insights into the metric used in SoccerNet-calibration, which solely relies on image annotation data of viewed 3D objects as ground truth, thus implementing our protocol. With experiments on the World Cup 2014, CARWC, and SoccerNet datasets, we show that our benchmarking protocol provides fairer evaluations of camera calibration methods. By defining our requirements for proper benchmarking, we hope to pave the way for a new stage in camera calibration for sports applications with high accuracy standards
Dynamic NeRFs for Soccer Scenes
peer reviewedThe long-standing problem of novel view synthesis has many applications, notably in sports broadcasting. Photorealistic novel view synthesis of soccer actions, in particular, is of enormous interest to the broadcast industry. Yet only a few industrial solutions have been proposed, and even fewer that achieve near-broadcast quality of the synthetic replays. Except for their setup of multiple static cameras around the playfield, the best proprietary systems disclose close to no information about their inner workings. Leveraging multiple static cameras for such a task indeed presents a challenge rarely tackled in the literature, for a lack of public datasets: the reconstruction of a large-scale, mostly static environment, with small, fast-moving elements. Recently, the emergence of neural radiance fields has induced stunning progress in many novel view synthesis applications, leveraging deep learning principles to produce photorealistic results in the most challenging settings. In this work, we investigate the feasibility of basing a solution to the task on dynamic NeRFs, i.e., neural models purposed to reconstruct general dynamic content. We compose synthetic soccer environments and conduct multiple experiments using them, identifying key components that help reconstruct soccer scenes with dynamic NeRFs. We show that, although this approach cannot fully meet the quality requirements for the target application, it suggests promising avenues toward a cost-efficient, automatic solution. We also make our work dataset and code publicly available, with the goal to encourage further efforts from the research community on the task of novel view synthesis for dynamic soccer scenes. For code, data, and video results, please see https://soccernerfs.isach.be
Computer vision systems for automatic analysis of face and eye images in specific applications of interpretation of facial expressions
This thesis is about the computer vision-based automation of specific tasks of face perception, for specific applications where they are essential. These tasks, and the applications in which they are automated, deal with the interpretation of facial expressions.
Our first application of interest is the automatic recognition of sign language, as carried out via a chain of automatic systems that extract visual communication cues from the image of a signer, transcribe these visual cues to an intermediary semantic notation, and translate this semantic notation to a comprehensible text in a spoken language. For use within the visual cue extraction part of such a system chain, we propose a computer vision system that automatically extracts facial communication cues from the image of a signer, based on a pre-existing facial landmark point tracking method and its various robust refinements. With this system, our contribution notably lies in the fruitful use of this tracking method and its refinements within a sign language recognition system chain. We consider the facial communication cues extracted by our system as facial expressions with a specific interpretation useful to this application.
Our second application of interest is the objective assessment of visual pursuit in patients with a disorder of consciousness. In the clinical practice, this delicate assessment is done by a clinician who manually moves a handheld mirror in front of the patient's face while simultaneously estimating the patient's ability to track this visual stimulus. This clinical setup is appropriate, but the assessment outcome was shown to be sensitive to the clinician's subjectivity. For use with a head-mounted device, we propose a computer vision system that attaches itself to the clinical procedure without disrupting it, and automatically estimates, in an objective way, the patient's ability to perform visual pursuit. Our system, combined with the use of a head-mounted device, therefore takes the form of an assisting technology for the clinician. It is based on the tracking of the patient's pupil and the mirror moved by the clinician, and the comparison of the obtained trajectories. All methods used within our system are simple yet specific instantiations of general methods, for the objective assessment of visual pursuit. We consider the visual pursuit ability extracted by our system as a facial expression with a specific interpretation useful to this application.
To some extent, our third application of interest is the general-purpose automatic recognition of facial expression codes in a muscle-based taxonomic coding system. We do not actually provide any new computer vision system for this application. Instead, we consider a supervised classification problem relevant to this application, and we empirically compare the performance of two general classification approaches for solving this problem, namely hierarchical classification and standard classification ("flat" classification, in this comparative context). We also compare these approaches for solving a classification problem relevant to 3D shape recognition, as well as artificial classification problems we generate in a simulation framework of our design. Our contribution lies in the general theoretical conclusions we reach from our empirical study of hierarchical vs. flat classification, which are of interest for properly using hierarchical classification in vision-based recognition problems, for example for an application of facial expression recognition
Tests of a new drowsiness characterization and monitoring system based on ocular parameters
Drowsiness is the intermediate state between wakefulness and sleep. It is characterized by impairments of performance, which can be very dangerous in many activities and can lead to catastrophic accidents in transportation or in industry. There is thus an obvious need for systems that are able to continuously, objectively, and automatically estimate the level of drowsiness of a person busy at a task. We have developed such a system, which is based on the physiological state of a person, and, more specifically, on the values of ocular parameters extracted from images of the eye (photooculography), and which produces a numerical level of drowsiness. In order to test our system, we compared the level of drowsiness determined by our system to two references: (1) the level of drowsiness obtained by analyzing polysomnographic signals; and (2) the performance of individuals in the accomplishment of a task. We carried out an experiment in which 24 participants were asked to perform several Psychomotor Vigilance Tests in different sleep conditions. The results show that the output of our system is well correlated with both references. We determined also the best drowsiness level threshold in order to warn individuals before they reach dangerous situations. Our system thus has significant potential for reliably quantifying the level of drowsiness of individuals accomplishing a task and, ultimately, for preventing drowsiness-related accidents