412 research outputs found
Two images comparison with invariance to illumination properties
ICIP 19 - IEEE International Conference on Image Processing, Taipei, TAIWAN, 22-/09/2019 - 25/09/2019We propose a new way of performing pixel by pixel comparison between two images, taking advantage of interesting invariance properties with respect to illumination conditions and camera settings. Moreover, we show that the proposed operator is relatively robust to strong noise on one of the compared images. The new operator can be used for background subtraction which inherits its invariance properties. The useful properties of the proposed operators are illustrated in the experiments
SSSDET: Simple Short and Shallow Network for Resource Efficient Vehicle Detection in Aerial Scenes
Detection of small-sized targets is of paramount importance in many aerial
vision-based applications. The commonly deployed low cost unmanned aerial
vehicles (UAVs) for aerial scene analysis are highly resource constrained in
nature. In this paper we propose a simple short and shallow network (SSSDet) to
robustly detect and classify small-sized vehicles in aerial scenes. The
proposed SSSDet is up to 4x faster, requires 4.4x less FLOPs, has 30x less
parameters, requires 31x less memory space and provides better accuracy in
comparison to existing state-of-the-art detectors. Thus, it is more suitable
for hardware implementation in real-time applications. We also created a new
airborne image dataset (ABD) by annotating 1396 new objects in 79 aerial images
for our experiments. The effectiveness of the proposed method is validated on
the existing VEDAI, DLR-3K, DOTA and Combined dataset. The SSSDet outperforms
state-of-the-art detectors in term of accuracy, speed, compute and memory
efficiency.Comment: International Conference on Image Processing (ICIP) 2019, Taipei,
Taiwa
Fusion-Based Versatile Video Coding Intra Prediction Algorithm with Template Matching and Linear Prediction
The new generation video coding standard Versatile Video Coding (VVC) has adopted many novel technologies to improve compression performance, and consequently, remarkable results have been achieved. In practical applications, less data, in terms of bitrate, would reduce the burden of the sensors and improve their performance. Hence, to further enhance the intra compression performance of VVC, we propose a fusion-based intra prediction algorithm in this paper. Specifically, to better predict areas with similar texture information, we propose a fusion-based adaptive template matching method, which directly takes the error between reference and objective templates into account. Furthermore, to better utilize the correlation between reference pixels and the pixels to be predicted, we propose a fusion-based linear prediction method, which can compensate for the deficiency of single linear prediction. We implemented our algorithm on top of the VVC Test Model (VTM) 9.1. When compared with the VVC, our proposed fusion-based algorithm saves a bitrate of 0.89%, 0.84%, and 0.90% on average for the Y, Cb, and Cr components, respectively. In addition, when compared with some other existing works, our algorithm showed superior performance in bitrate savings
BiRA-Net: Bilinear Attention Net for Diabetic Retinopathy Grading
Diabetic retinopathy (DR) is a common retinal disease that leads to
blindness. For diagnosis purposes, DR image grading aims to provide automatic
DR grade classification, which is not addressed in conventional research
methods of binary DR image classification. Small objects in the eye images,
like lesions and microaneurysms, are essential to DR grading in medical
imaging, but they could easily be influenced by other objects. To address these
challenges, we propose a new deep learning architecture, called BiRA-Net, which
combines the attention model for feature extraction and bilinear model for
fine-grained classification. Furthermore, in considering the distance between
different grades of different DR categories, we propose a new loss function,
called grading loss, which leads to improved training convergence of the
proposed approach. Experimental results are provided to demonstrate the
superior performance of the proposed approach.Comment: Accepted at ICIP 201
Shallow Triple Stream Three-dimensional CNN (STSTNet) for Micro-expression Recognition
In the recent year, state-of-the-art for facial micro-expression recognition
have been significantly advanced by deep neural networks. The robustness of
deep learning has yielded promising performance beyond that of traditional
handcrafted approaches. Most works in literature emphasized on increasing the
depth of networks and employing highly complex objective functions to learn
more features. In this paper, we design a Shallow Triple Stream
Three-dimensional CNN (STSTNet) that is computationally light whilst capable of
extracting discriminative high level features and details of micro-expressions.
The network learns from three optical flow features (i.e., optical strain,
horizontal and vertical optical flow fields) computed based on the onset and
apex frames of each video. Our experimental results demonstrate the
effectiveness of the proposed STSTNet, which obtained an unweighted average
recall rate of 0.7605 and unweighted F1-score of 0.7353 on the composite
database consisting of 442 samples from the SMIC, CASME II and SAMM databases.Comment: 5 pages, 1 figure, Accepted and published in IEEE FG 201
- …