33,793 research outputs found
Eagle: End-to-end Deep Reinforcement Learning based Autonomous Control of PTZ Cameras
Existing approaches for autonomous control of pan-tilt-zoom (PTZ) cameras use
multiple stages where object detection and localization are performed
separately from the control of the PTZ mechanisms. These approaches require
manual labels and suffer from performance bottlenecks due to error propagation
across the multi-stage flow of information. The large size of object detection
neural networks also makes prior solutions infeasible for real-time deployment
in resource-constrained devices. We present an end-to-end deep reinforcement
learning (RL) solution called Eagle to train a neural network policy that
directly takes images as input to control the PTZ camera. Training
reinforcement learning is cumbersome in the real world due to labeling effort,
runtime environment stochasticity, and fragile experimental setups. We
introduce a photo-realistic simulation framework for training and evaluation of
PTZ camera control policies. Eagle achieves superior camera control performance
by maintaining the object of interest close to the center of captured images at
high resolution and has up to 17% more tracking duration than the
state-of-the-art. Eagle policies are lightweight (90x fewer parameters than
Yolo5s) and can run on embedded camera platforms such as Raspberry PI (33 FPS)
and Jetson Nano (38 FPS), facilitating real-time PTZ tracking for
resource-constrained environments. With domain randomization, Eagle policies
trained in our simulator can be transferred directly to real-world scenarios.Comment: 20 pages, IoTD
3D cephalometric landmark detection by multiple stage deep reinforcement learning
The lengthy time needed for manual landmarking has delayed the widespread adoption of three-dimensional (3D) cephalometry. We here propose an automatic 3D cephalometric annotation system based on multi-stage deep reinforcement learning (DRL) and volume-rendered imaging. This system considers geometrical characteristics of landmarks and simulates the sequential decision process underlying human professional landmarking patterns. It consists mainly of constructing an appropriate two-dimensional cutaway or 3D model view, then implementing single-stage DRL with gradient-based boundary estimation or multi-stage DRL to dictate the 3D coordinates of target landmarks. This system clearly shows sufficient detection accuracy and stability for direct clinical applications, with a low level of detection error and low inter-individual variation (1.96 ยฑ 0.78 mm). Our system, moreover, requires no additional steps of segmentation and 3D mesh-object construction for landmark detection. We believe these system features will enable fast-track cephalometric analysis and planning and expect it to achieve greater accuracy as larger CT datasets become available for training and testing.ope
Deformable Object Tracking with Gated Fusion
The tracking-by-detection framework receives growing attentions through the
integration with the Convolutional Neural Networks (CNNs). Existing
tracking-by-detection based methods, however, fail to track objects with severe
appearance variations. This is because the traditional convolutional operation
is performed on fixed grids, and thus may not be able to find the correct
response while the object is changing pose or under varying environmental
conditions. In this paper, we propose a deformable convolution layer to enrich
the target appearance representations in the tracking-by-detection framework.
We aim to capture the target appearance variations via deformable convolution,
which adaptively enhances its original features. In addition, we also propose a
gated fusion scheme to control how the variations captured by the deformable
convolution affect the original appearance. The enriched feature representation
through deformable convolution facilitates the discrimination of the CNN
classifier on the target object and background. Extensive experiments on the
standard benchmarks show that the proposed tracker performs favorably against
state-of-the-art methods
- โฆ