1,873 research outputs found
Towards Structured Analysis of Broadcast Badminton Videos
Sports video data is recorded for nearly every major tournament but remains
archived and inaccessible to large scale data mining and analytics. It can only
be viewed sequentially or manually tagged with higher-level labels which is
time consuming and prone to errors. In this work, we propose an end-to-end
framework for automatic attributes tagging and analysis of sport videos. We use
commonly available broadcast videos of matches and, unlike previous approaches,
does not rely on special camera setups or additional sensors.
Our focus is on Badminton as the sport of interest. We propose a method to
analyze a large corpus of badminton broadcast videos by segmenting the points
played, tracking and recognizing the players in each point and annotating their
respective badminton strokes. We evaluate the performance on 10 Olympic matches
with 20 players and achieved 95.44% point segmentation accuracy, 97.38% player
detection score ([email protected]), 97.98% player identification accuracy, and stroke
segmentation edit scores of 80.48%. We further show that the automatically
annotated videos alone could enable the gameplay analysis and inference by
computing understandable metrics such as player's reaction time, speed, and
footwork around the court, etc.Comment: 9 page
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Object detection for KRSBI robot soccer using PeleeNet on omnidirectional camera
Kontes Robot Sepak Bola Indonesia (KRSBI) is an annual event for contestants to compete their design and robot engineering in the field of robot soccer. Each contestant tries to win the match by scoring a goal toward the opponent's goal. In order to score a goal, the robot needs to find the ball, locate the goal, then kick the ball toward goal. We employed an omnidirectional vision camera as a visual sensor for a robot to perceive the object’s information. We calibrated streaming images from the camera to remove the mirror distortion. Furthermore, we deployed PeleeNet as our deep learning model for object detection. We fine-tuned PeleeNet on our dataset generated from our image collection. Our experiment result showed PeleeNet had the potential for deep learning mobile platform in KRSBI as the object detection architecture. It had a perfect combination of memory efficiency, speed and accuracy
Online Visual Robot Tracking and Identification using Deep LSTM Networks
Collaborative robots working on a common task are necessary for many
applications. One of the challenges for achieving collaboration in a team of
robots is mutual tracking and identification. We present a novel pipeline for
online visionbased detection, tracking and identification of robots with a
known and identical appearance. Our method runs in realtime on the limited
hardware of the observer robot. Unlike previous works addressing robot tracking
and identification, we use a data-driven approach based on recurrent neural
networks to learn relations between sequential inputs and outputs. We formulate
the data association problem as multiple classification problems. A deep LSTM
network was trained on a simulated dataset and fine-tuned on small set of real
data. Experiments on two challenging datasets, one synthetic and one real,
which include long-term occlusions, show promising results.Comment: IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS), Vancouver, Canada, 2017. IROS RoboCup Best Paper Awar
Size and Shape Determination of Riprap and Large-sized Aggregates Using Field Imaging
Riprap rock and large-sized aggregates are extensively used in transportation, geotechnical, and hydraulic engineering applications. Traditional methods for assessing riprap categories based on particle weight may involve subjective visual inspection and time-consuming manual measurements. Aggregate imaging and segmentation techniques can efficiently characterize riprap particles for their size and morphological/shape properties to estimate particle weights. Particle size and morphological/shape characterization ensure the reliable and sustainable use of all aggregate skeleton materials at quarry production lines and construction sites. Aggregate imaging systems developed to date for size and shape characterization, however, have primarily focused on measurement of separated or non-overlapping aggregate particles. This research study presents an innovative approach for automated segmentation and morphological analyses of stockpile aggregate images based on deep-learning techniques. As a project outcome, a portable, deployable, and affordable field-imaging system is envisioned to estimate volumes of individual riprap rocks for field evaluation. A state-of-the-art object detection and segmentation framework is used to train an image-segmentation kernel from manually labeled 2D riprap images in order to facilitate automatic and user-independent segmentation of stockpile aggregate images. The segmentation results show good agreement with ground-truth validation, which entailed comparing the manual labeling to the automatically segmented images. A significant improvement to the efficiency of size and morphological analyses conducted on densely stacked and overlapping particle images is achieved. The algorithms are integrated into a software application with a user-friendly Graphical User Interface (GUI) for ease of operation. Based on the findings of this study, this stockpile aggregate image analysis program promises to become an efficient and innovative application for field-scale and in-place evaluations of aggregate materials. The innovative imaging-based system is envisioned to provide convenient, reliable, and sustainable solutions for the on-site quality assurance/quality control (QA/QC) tasks related to riprap rock and large-sized aggregate material characterization and classification.IDOT-R27-182Ope
Over speed detection using Artificial Intelligence
Over speeding is one of the most common traffic violations. Around 41 million people are issued speeding tickets each year in USA i.e one every second. Existing approaches to detect over- speeding are not scalable and require manual efforts. In this project, by the use of computer vision and artificial intelligence, I have tried to detect over speeding and report the violation to the law enforcement officer. It was observed that when predictions are done using YoloV3, we get the best results
Application-aware optimization of Artificial Intelligence for deployment on resource constrained devices
Artificial intelligence (AI) is changing people's everyday life. AI techniques such as Deep Neural Networks (DNN) rely on heavy computational models, which are in principle designed to be executed on powerful HW platforms, such as desktop or server environments. However, the increasing need to apply such solutions in people's everyday life has encouraged the research for methods to allow their deployment on embedded, portable and stand-alone devices, such as mobile phones, which exhibit relatively low memory and computational resources. Such methods targets both the development of lightweight AI algorithms and their acceleration through dedicated HW.
This thesis focuses on the development of lightweight AI solutions, with attention to deep neural networks, to facilitate their deployment on resource constrained devices. Focusing on the computer vision field, we show how putting together the self learning ability of deep neural networks with application-specific knowledge, in the form of feature engineering, it is possible to dramatically reduce the total memory and computational burden, thus allowing the deployment on edge devices. The proposed approach aims to be complementary to already existing application-independent network compression solutions. In this work three main DNN optimization goals have been considered: increasing speed and accuracy, allowing training at the edge, and allowing execution on a microcontroller. For each of these we deployed the resulting algorithm to the target embedded device and measured its performance
Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB
We propose a new single-shot method for multi-person 3D pose estimation in
general scenes from a monocular RGB camera. Our approach uses novel
occlusion-robust pose-maps (ORPM) which enable full body pose inference even
under strong partial occlusions by other people and objects in the scene. ORPM
outputs a fixed number of maps which encode the 3D joint locations of all
people in the scene. Body part associations allow us to infer 3D pose for an
arbitrary number of people without explicit bounding box prediction. To train
our approach we introduce MuCo-3DHP, the first large scale training data set
showing real images of sophisticated multi-person interactions and occlusions.
We synthesize a large corpus of multi-person images by compositing images of
individual people (with ground truth from mutli-view performance capture). We
evaluate our method on our new challenging 3D annotated multi-person test set
MuPoTs-3D where we achieve state-of-the-art performance. To further stimulate
research in multi-person 3D pose estimation, we will make our new datasets, and
associated code publicly available for research purposes.Comment: International Conference on 3D Vision (3DV), 201
- …