149,437 research outputs found
Detection of Small Targets in Sea Clutter Based on RepVGG and Continuous Wavelet Transform
Constructing a high-performance target detector under the background of sea
clutter is always necessary and important. In this work, we propose a
RepVGGA0-CWT detector, where RepVGG is a residual network that gains a high
detection accuracy. Different from traditional residual networks, RepVGG keeps
an acceptable calculation speed. Giving consideration to both accuracy and
speed, the RepVGGA0 is selected among all the variants of RepVGG. Also,
continuous wavelet transform (CWT) is employed to extract the radar echoes'
time-frequency feature effectively. In the tests, other networks (ResNet50,
ResNet18 and AlexNet) and feature extraction methods (short-time Fourier
transform (STFT), CWT) are combined to build detectors for comparison. The
result of different datasets shows that the RepVGGA0-CWT detector performs
better than those detectors in terms of low controllable false alarm rate, high
training speed, high inference speed and low memory usage. This RepVGGA0-CWT
detector is hardware-friendly and can be applied in real-time scenes for its
high inference speed in detection
Feature Distilled Tracking
Feature extraction and representation is one of the most important components for fast, accurate, and robust visual tracking. Very deep convolutional neural networks (CNNs) provide effective tools for feature extraction with good generalization ability. However, extracting features using very deep CNN models needs high performance hardware due to its large computation complexity, which prohibits its extensions in real-time applications. To alleviate this problem, we aim at obtaining small and fast-to-execute shallow models based on model compression for visual tracking. Specifically, we propose a small feature distilled network (FDN) for tracking by imitating the intermediate representations of a much deeper network. The FDN extracts rich visual features with higher speed than the original deeper network. To further speed-up, we introduce a shift-and-stitch method to reduce the arithmetic operations, while preserving the spatial resolution of the distilled feature maps unchanged. Finally, a scale adaptive discriminative correlation filter is learned on the distilled feature for visual tracking to handle scale variation of the target. Comprehensive experimental results on object tracking benchmark datasets show that the proposed approach achieves 5x speed-up with competitive performance to the state-of-the-art deep trackers
Towards lightweight convolutional neural networks for object detection
We propose model with larger spatial size of feature maps and evaluate it on
object detection task. With the goal to choose the best feature extraction
network for our model we compare several popular lightweight networks. After
that we conduct a set of experiments with channels reduction algorithms in
order to accelerate execution. Our vehicle detection models are accurate, fast
and therefore suit for embedded visual applications. With only 1.5 GFLOPs our
best model gives 93.39 AP on validation subset of challenging DETRAC dataset.
The smallest of our models is the first to achieve real-time inference speed on
CPU with reasonable accuracy drop to 91.43 AP.Comment: Submitted to the International Workshop on Traffic and Street
Surveillance for Safety and Security (IWT4S) in conjunction with the 14th
IEEE International Conference on Advanced Video and Signal based Surveillance
(AVSS 2017
Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution
Convolutional neural networks have recently demonstrated high-quality
reconstruction for single-image super-resolution. In this paper, we propose the
Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively
reconstruct the sub-band residuals of high-resolution images. At each pyramid
level, our model takes coarse-resolution feature maps as input, predicts the
high-frequency residuals, and uses transposed convolutions for upsampling to
the finer level. Our method does not require the bicubic interpolation as the
pre-processing step and thus dramatically reduces the computational complexity.
We train the proposed LapSRN with deep supervision using a robust Charbonnier
loss function and achieve high-quality reconstruction. Furthermore, our network
generates multi-scale predictions in one feed-forward pass through the
progressive reconstruction, thereby facilitates resource-aware applications.
Extensive quantitative and qualitative evaluations on benchmark datasets show
that the proposed algorithm performs favorably against the state-of-the-art
methods in terms of speed and accuracy.Comment: This work is accepted in CVPR 2017. The code and datasets are
available on http://vllab.ucmerced.edu/wlai24/LapSRN
- …