59,012 research outputs found
High efficiency compression for object detection
Image and video compression has traditionally been tailored to human vision.
However, modern applications such as visual analytics and surveillance rely on
computers seeing and analyzing the images before (or instead of) humans. For
these applications, it is important to adjust compression to computer vision.
In this paper we present a bit allocation and rate control strategy that is
tailored to object detection. Using the initial convolutional layers of a
state-of-the-art object detector, we create an importance map that can guide
bit allocation to areas that are important for object detection. The proposed
method enables bit rate savings of 7% or more compared to default HEVC, at the
equivalent object detection rate.Comment: The paper is published in IEEE ICASSP 18
YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design
The rapid development and wide utilization of object detection techniques
have aroused attention on both accuracy and speed of object detectors. However,
the current state-of-the-art object detection works are either
accuracy-oriented using a large model but leading to high latency or
speed-oriented using a lightweight model but sacrificing accuracy. In this
work, we propose YOLObile framework, a real-time object detection on mobile
devices via compression-compilation co-design. A novel block-punched pruning
scheme is proposed for any kernel size. To improve computational efficiency on
mobile devices, a GPU-CPU collaborative scheme is adopted along with advanced
compiler-assisted optimizations. Experimental results indicate that our pruning
scheme achieves 14 compression rate of YOLOv4 with 49.0 mAP. Under our
YOLObile framework, we achieve 17 FPS inference speed using GPU on Samsung
Galaxy S20. By incorporating our proposed GPU-CPU collaborative scheme, the
inference speed is increased to 19.1 FPS, and outperforms the original YOLOv4
by 5 speedup. Source code is at:
\url{https://github.com/nightsnack/YOLObile}
Spiking Neural Network for Ultra-low-latency and High-accurate Object Detection
Spiking Neural Networks (SNNs) have garnered widespread interest for their
energy efficiency and brain-inspired event-driven properties. While recent
methods like Spiking-YOLO have expanded the SNNs to more challenging object
detection tasks, they often suffer from high latency and low detection
accuracy, making them difficult to deploy on latency sensitive mobile
platforms. Furthermore, the conversion method from Artificial Neural Networks
(ANNs) to SNNs is hard to maintain the complete structure of the ANNs,
resulting in poor feature representation and high conversion errors. To address
these challenges, we propose two methods: timesteps compression and
spike-time-dependent integrated (STDI) coding. The former reduces the timesteps
required in ANN-SNN conversion by compressing information, while the latter
sets a time-varying threshold to expand the information holding capacity. We
also present a SNN-based ultra-low latency and high accurate object detection
model (SUHD) that achieves state-of-the-art performance on nontrivial datasets
like PASCAL VOC and MS COCO, with about remarkable 750x fewer timesteps and 30%
mean average precision (mAP) improvement, compared to the Spiking-YOLO on MS
COCO datasets. To the best of our knowledge, SUHD is the deepest spike-based
object detection model to date that achieves ultra low timesteps to complete
the lossless conversion.Comment: 14 pages, 10 figure
Increasing Compression Ratio of Low Complexity Compressive Sensing Video Encoder with Application-Aware Configurable Mechanism
With the development of embedded video acquisition nodes and wireless video
surveillance systems, traditional video coding methods could not meet the needs
of less computing complexity any more, as well as the urgent power consumption.
So, a low-complexity compressive sensing video encoder framework with
application-aware configurable mechanism is proposed in this paper, where novel
encoding methods are exploited based on the practical purposes of the real
applications to reduce the coding complexity effectively and improve the
compression ratio (CR). Moreover, the group of processing (GOP) size and the
measurement matrix size can be configured on the encoder side according to the
post-analysis requirements of an application example of object tracking to
increase the CR of encoder as best as possible. Simulations show the proposed
framework of encoder could achieve 60X of CR when the tracking successful rate
(SR) is still keeping above 90%.Comment: 5 pages with 6figures and 1 table,conferenc
Deep Learning with Passive Optical Nonlinear Mapping
Deep learning has fundamentally transformed artificial intelligence, but the
ever-increasing complexity in deep learning models calls for specialized
hardware accelerators. Optical accelerators can potentially offer enhanced
performance, scalability, and energy efficiency. However, achieving nonlinear
mapping, a critical component of neural networks, remains challenging
optically. Here, we introduce a design that leverages multiple scattering in a
reverberating cavity to passively induce optical nonlinear random mapping,
without the need for additional laser power. A key advantage emerging from our
work is that we show we can perform optical data compression, facilitated by
multiple scattering in the cavity, to efficiently compress and retain vital
information while also decreasing data dimensionality. This allows rapid
optical information processing and generation of low dimensional mixtures of
highly nonlinear features. These are particularly useful for applications
demanding high-speed analysis and responses such as in edge computing devices.
Utilizing rapid optical information processing capabilities, our optical
platforms could potentially offer more efficient and real-time processing
solutions for a broad range of applications. We demonstrate the efficacy of our
design in improving computational performance across tasks, including
classification, image reconstruction, key-point detection, and object
detection, all achieved through optical data compression combined with a
digital decoder. Notably, we observed high performance, at an extreme
compression ratio, for real-time pedestrian detection. Our findings pave the
way for novel algorithms and architectural designs for optical computing.Comment: 16 pages, 7 figure
Visual Importance-Biased Image Synthesis Animation
Present ray tracing algorithms are computationally intensive, requiring hours of computing time for complex scenes. Our previous work has dealt with the development of an overall approach to the application of visual attention to progressive and adaptive ray-tracing techniques. The approach facilitates large computational savings by modulating the supersampling rates in an image by the visual importance of the region being rendered. This paper extends the approach by incorporating temporal changes into the models and techniques developed, as it is expected that further efficiency savings can be reaped for animated scenes. Applications for this approach include entertainment, visualisation and simulation
Training a Binary Weight Object Detector by Knowledge Transfer for Autonomous Driving
Autonomous driving has harsh requirements of small model size and energy
efficiency, in order to enable the embedded system to achieve real-time
on-board object detection. Recent deep convolutional neural network based
object detectors have achieved state-of-the-art accuracy. However, such models
are trained with numerous parameters and their high computational costs and
large storage prohibit the deployment to memory and computation resource
limited systems. Low-precision neural networks are popular techniques for
reducing the computation requirements and memory footprint. Among them, binary
weight neural network (BWN) is the extreme case which quantizes the float-point
into just bit. BWNs are difficult to train and suffer from accuracy
deprecation due to the extreme low-bit representation. To address this problem,
we propose a knowledge transfer (KT) method to aid the training of BWN using a
full-precision teacher network. We built DarkNet- and MobileNet-based binary
weight YOLO-v2 detectors and conduct experiments on KITTI benchmark for car,
pedestrian and cyclist detection. The experimental results show that the
proposed method maintains high detection accuracy while reducing the model size
of DarkNet-YOLO from 257 MB to 8.8 MB and MobileNet-YOLO from 193 MB to 7.9 MB.Comment: Accepted by ICRA 201
- …