39 research outputs found
A Cyber-Physical System for integrated remote control andprotection of smart grid critical infrastructures
This work proposes a Cyber-Physical System (CPS) for protecting Smart Electric Grid Critical Infrastructures (CI) using video surveillance while remotely monitoring them. Due to the critical nature of Smart Grid, it is necessary to guarantee an adequate level of safety, security and reliability. Thus, this CPS is back-boned by a Time-Sensitive Network solution (TSN) providing concurrent support for smart-video surveillance and Smart Grid control over a single communication infrastructure. To this end, TSN delivers high-bandwidth communication for video surveil-lance and deterministic Quality of Service (QoS), latency and bandwidth guarantees, required by the time-critical Smart Grid control. On the one hand, the CPS utilizes High-availability Seamless Redundancy (HSR) in the control subsystem via Remote Terminal Units (RTU) guaranteeing seamless failover against failures in Smart Grid. On the other hand, the smart video surveillance subsystem applies machine learning to monitor secured perimeters and detect people around the Smart Grid CI. Moreover, it is also able to directly interoperate with RTUs via MODBUS protocol to send alarms in case of e.g. intrusion. The work evaluates the accuracy and performance of the detection using common metrics in surveillance field. An integrated monitoring dashboard has also been developed in which all CPS information is available in real timeThis work was partially supported by the EU Project FitOptiVis [3] through the ECSEL Joint Undertaking under GA n. 783162, a Spanish National grant funded by MINECO through APCIN PCI2018-093184, and partially by the Research Network RED2018-102511-
Vision-based Fight Detection from Surveillance Cameras
Vision-based action recognition is one of the most challenging research
topics of computer vision and pattern recognition. A specific application of
it, namely, detecting fights from surveillance cameras in public areas,
prisons, etc., is desired to quickly get under control these violent incidents.
This paper addresses this research problem and explores LSTM-based approaches
to solve it. Moreover, the attention layer is also utilized. Besides, a new
dataset is collected, which consists of fight scenes from surveillance camera
videos available at YouTube. This dataset is made publicly available. From the
extensive experiments conducted on Hockey Fight, Peliculas, and the newly
collected fight datasets, it is observed that the proposed approach, which
integrates Xception model, Bi-LSTM, and attention, improves the
state-of-the-art accuracy for fight scene classification.Comment: 6 pages, 5 figures, 4 tables, International Conference on Image
Processing Theory, Tools and Applications, IPTA 201
ViPR: Visual-Odometry-aided Pose Regression for 6DoF Camera Localization
Visual Odometry (VO) accumulates a positional drift in long-term robot
navigation tasks. Although Convolutional Neural Networks (CNNs) improve VO in
various aspects, VO still suffers from moving obstacles, discontinuous
observation of features, and poor textures or visual information. While recent
approaches estimate a 6DoF pose either directly from (a series of) images or by
merging depth maps with optical flow (OF), research that combines absolute pose
regression with OF is limited. We propose ViPR, a novel modular architecture
for long-term 6DoF VO that leverages temporal information and synergies between
absolute pose estimates (from PoseNet-like modules) and relative pose estimates
(from FlowNet-based modules) by combining both through recurrent layers.
Experiments on known datasets and on our own Industry dataset show that our
modular design outperforms state of the art in long-term navigation tasks.Comment: Conf. on Computer Vision and Pattern Recognition (CVPR): Joint
Workshop on Long-Term Visual Localization, Visual Odometry and Geometric and
Learning-based SLAM 202
So you think you can track?
This work introduces a multi-camera tracking dataset consisting of 234 hours
of video data recorded concurrently from 234 overlapping HD cameras covering a
4.2 mile stretch of 8-10 lane interstate highway near Nashville, TN. The video
is recorded during a period of high traffic density with 500+ objects typically
visible within the scene and typical object longevities of 3-15 minutes. GPS
trajectories from 270 vehicle passes through the scene are manually corrected
in the video data to provide a set of ground-truth trajectories for
recall-oriented tracking metrics, and object detections are provided for each
camera in the scene (159 million total before cross-camera fusion). Initial
benchmarking of tracking-by-detection algorithms is performed against the GPS
trajectories, and a best HOTA of only 9.5% is obtained (best recall 75.9% at
IOU 0.1, 47.9 average IDs per ground truth object), indicating the benchmarked
trackers do not perform sufficiently well at the long temporal and spatial
durations required for traffic scene understanding