30,329 research outputs found
Minimum Delay Object Detection From Video
We consider the problem of detecting objects, as they come into view, from
videos in an online fashion. We provide the first real-time solution that is
guaranteed to minimize the delay, i.e., the time between when the object comes
in view and the declared detection time, subject to acceptable levels of
detection accuracy. The method leverages modern CNN-based object detectors that
operate on a single frame, to aggregate detection results over frames to
provide reliable detection at a rate, specified by the user, in guaranteed
minimal delay. To do this, we formulate the problem as a Quickest Detection
problem, which provides the aforementioned guarantees. We derive our algorithms
from this theory. We show in experiments, that with an overhead of just 50 fps,
we can increase the number of correct detections and decrease the overall
computational cost compared to running a modern single-frame detector.Comment: ICCV 201
Quickest Moving Object Detection
We present a general framework and method for simultaneous detection and
segmentation of an object in a video that moves (or comes into view of the
camera) at some unknown time in the video. The method is an online approach
based on motion segmentation, and it operates under dynamic backgrounds caused
by a moving camera or moving nuisances. The goal of the method is to detect and
segment the object as soon as it moves. Due to stochastic variability in the
video and unreliability of the motion signal, several frames are needed to
reliably detect the object. The method is designed to detect and segment with
minimum delay subject to a constraint on the false alarm rate. The method is
derived as a problem of Quickest Change Detection. Experiments on a dataset
show the effectiveness of our method in minimizing detection delay subject to
false alarm constraints
Decentralized Smart Surveillance through Microservices Platform
Connected societies require reliable measures to assure the safety, privacy,
and security of members. Public safety technology has made fundamental
improvements since the first generation of surveillance cameras were
introduced, which aims to reduce the role of observer agents so that no
abnormality goes unnoticed. While the edge computing paradigm promises
solutions to address the shortcomings of cloud computing, e.g., the extra
communication delay and network security issues, it also introduces new
challenges. One of the main concerns is the limited computing power at the edge
to meet the on-site dynamic data processing. In this paper, a Lightweight IoT
(Internet of Things) based Smart Public Safety (LISPS) framework is proposed on
top of microservices architecture. As a computing hierarchy at the edge, the
LISPS system possesses high flexibility in the design process, loose coupling
to add new services or update existing functions without interrupting the
normal operations, and efficient power balancing. A real-world public safety
monitoring scenario is selected to verify the effectiveness of LISPS, which
detects, tracks human objects and identify suspicious activities. The
experimental results demonstrate the feasibility of the approach.Comment: 2019 SPIE Defense + Commercial Sensin
Cloud Chaser: Real Time Deep Learning Computer Vision on Low Computing Power Devices
Internet of Things(IoT) devices, mobile phones, and robotic systems are often
denied the power of deep learning algorithms due to their limited computing
power. However, to provide time-critical services such as emergency response,
home assistance, surveillance, etc, these devices often need real-time analysis
of their camera data. This paper strives to offer a viable approach to
integrate high-performance deep learning-based computer vision algorithms with
low-resource and low-power devices by leveraging the computing power of the
cloud. By offloading the computation work to the cloud, no dedicated hardware
is needed to enable deep neural networks on existing low computing power
devices. A Raspberry Pi based robot, Cloud Chaser, is built to demonstrate the
power of using cloud computing to perform real-time vision tasks. Furthermore,
to reduce latency and improve real-time performance, compression algorithms are
proposed and evaluated for streaming real-time video frames to the cloud.Comment: Accepted to The 11th International Conference on Machine Vision (ICMV
2018). Project site: https://zhengyiluo.github.io/projects/cloudchaser
Fast Semantic Segmentation on Video Using Block Motion-Based Feature Interpolation
Convolutional networks optimized for accuracy on challenging, dense
prediction tasks are prohibitively slow to run on each frame in a video. The
spatial similarity of nearby video frames, however, suggests opportunity to
reuse computation. Existing work has explored basic feature reuse and feature
warping based on optical flow, but has encountered limits to the speedup
attainable with these techniques. In this paper, we present a new, two part
approach to accelerating inference on video. First, we propose a fast feature
propagation technique that utilizes the block motion vectors present in
compressed video (e.g. H.264 codecs) to cheaply propagate features from frame
to frame. Second, we develop a novel feature estimation scheme, termed feature
interpolation, that fuses features propagated from enclosing keyframes to
render accurate feature estimates, even at sparse keyframe frequencies. We
evaluate our system on the Cityscapes and CamVid datasets, comparing to both a
frame-by-frame baseline and related work. We find that we are able to
substantially accelerate segmentation on video, achieving near real-time frame
rates (20.1 frames per second) on large images (960 x 720 pixels), while
maintaining competitive accuracy. This represents an improvement of almost 6x
over the single-frame baseline and 2.5x over the fastest prior work.Comment: 14 page
Inter-BMV: Interpolation with Block Motion Vectors for Fast Semantic Segmentation on Video
Models optimized for accuracy on single images are often prohibitively slow
to run on each frame in a video. Recent work exploits the use of optical flow
to warp image features forward from select keyframes, as a means to conserve
computation on video. This approach, however, achieves only limited speedup,
even when optimized, due to the accuracy degradation introduced by repeated
forward warping, and the inference cost of optical flow estimation. To address
these problems, we propose a new scheme that propagates features using the
block motion vectors (BMV) present in compressed video (e.g. H.264 codecs),
instead of optical flow, and bi-directionally warps and fuses features from
enclosing keyframes to capture scene context on each video frame. Our
technique, interpolation-BMV, enables us to accurately estimate the features of
intermediate frames, while keeping inference costs low. We evaluate our system
on the CamVid and Cityscapes datasets, comparing to both a strong single-frame
baseline and related work. We find that we are able to substantially accelerate
segmentation on video, achieving near real-time frame rates (20+ frames per
second) on large images (e.g. 960 x 720 pixels), while maintaining competitive
accuracy. This represents an improvement of almost 6x over the single-frame
baseline and 2.5x over the fastest prior work.Comment: 12 pages. arXiv admin note: substantial text overlap with
arXiv:1803.0774
CARRADA Dataset: Camera and Automotive Radar with Range-Angle-Doppler Annotations
High quality perception is essential for autonomous driving (AD) systems. To
reach the accuracy and robustness that are required by such systems, several
types of sensors must be combined. Currently, mostly cameras and laser scanners
(lidar) are deployed to build a representation of the world around the vehicle.
While radar sensors have been used for a long time in the automotive industry,
they are still under-used for AD despite their appealing characteristics
(notably, their ability to measure the relative speed of obstacles and to
operate even in adverse weather conditions). To a large extent, this situation
is due to the relative lack of automotive datasets with real radar signals that
are both raw and annotated. In this work, we introduce CARRADA, a dataset of
synchronized camera and radar recordings with range-angle-Doppler annotations.
We also present a semi-automatic annotation approach, which was used to
annotate the dataset, and a radar semantic segmentation baseline, which we
evaluate on several metrics. Both our code and dataset are available online.Comment: 8 pages, 5 figues. Accepted at ICPR 2020. Erratum: results in Table
III have been updated since the ICPR proceedings, models are selected using
the PP metric instead of the previously used PR metri
eBPF-based Content and Computation-aware Communication for Real-time Edge Computing
By placing computation resources within a one-hop wireless topology, the
recent edge computing paradigm is a key enabler of real-time Internet of Things
(IoT) applications. In the context of IoT scenarios where the same information
from a sensor is used by multiple applications at different locations, the data
stream needs to be replicated. However, the transportation of parallel streams
might not be feasible due to limitations in the capacity of the network
transporting the data. To address this issue, a content and computation-aware
communication control framework is proposed based on the Software Defined
Network (SDN) paradigm. The framework supports multi-streaming using the
extended Berkeley Packet Filter (eBPF), where the traffic flow and packet
replication for each specific computation process is controlled by a program
running inside an in-kernel Virtual Ma- chine (VM). The proposed framework is
instantiated to address a case-study scenario where video streams from multiple
cameras are transmitted to the edge processor for real-time analysis. Numerical
results demonstrate the advantage of the proposed framework in terms of
programmability, network bandwidth and system resource savings.Comment: This article has been accepted for publication in the IEEE
International Conference on Computer Communications (INFOCOM Workshops), 201
ReXCam: Resource-Efficient, Cross-Camera Video Analytics at Scale
Enterprises are increasingly deploying large camera networks for video
analytics. Many target applications entail a common problem template: searching
for and tracking an object or activity of interest (e.g. a speeding vehicle, a
break-in) through a large camera network in live video. Such cross-camera
analytics is compute and data intensive, with cost growing with the number of
cameras and time. To address this cost challenge, we present ReXCam, a new
system for efficient cross-camera video analytics. ReXCam exploits spatial and
temporal locality in the dynamics of real camera networks to guide its
inference-time search for a query identity. In an offline profiling phase,
ReXCam builds a cross-camera correlation model that encodes the locality
observed in historical traffic patterns. At inference time, ReXCam applies this
model to filter frames that are not spatially and temporally correlated with
the query identity's current position. In the cases of occasional missed
detections, ReXCam performs a fast-replay search on recently filtered video
frames, enabling gracefully recovery. Together, these techniques allow ReXCam
to reduce compute workload by 8.3x on an 8-camera dataset, and by 23x - 38x on
a simulated 130-camera dataset. ReXCam has been implemented and deployed on a
testbed of 5 AWS DeepLens cameras.Comment: 15 page
Cyclostationary Statistical Models and Algorithms for Anomaly Detection Using Multi-Modal Data
A framework is proposed to detect anomalies in multi-modal data. A deep
neural network-based object detector is employed to extract counts of objects
and sub-events from the data. A cyclostationary model is proposed to model
regular patterns of behavior in the count sequences. The anomaly detection
problem is formulated as a problem of detecting deviations from learned
cyclostationary behavior. Sequential algorithms are proposed to detect
anomalies using the proposed model. The proposed algorithms are shown to be
asymptotically efficient in a well-defined sense. The developed algorithms are
applied to a multi-modal data consisting of CCTV imagery and social media posts
to detect a 5K run in New York City
- …