4,877 research outputs found
RGB-T salient object detection via fusing multi-level CNN features
RGB-induced salient object detection has recently witnessed substantial progress, which is attributed to the superior feature learning capability of deep convolutional neural networks (CNNs). However, such detections suffer from challenging scenarios characterized by cluttered backgrounds, low-light conditions and variations in illumination. Instead of improving RGB based saliency detection, this paper takes advantage of the complementary benefits of RGB and thermal infrared images. Specifically, we propose a novel end-to-end network for multi-modal salient object detection, which turns the challenge of RGB-T saliency detection to a CNN feature fusion problem. To this end, a backbone network (e.g., VGG-16) is first adopted to extract the coarse features from each RGB or thermal infrared image individually, and then several adjacent-depth feature combination (ADFC) modules are designed to extract multi-level refined features for each single-modal input image, considering that features captured at different depths differ in semantic information and visual details. Subsequently, a multi-branch group fusion (MGF) module is employed to capture the cross-modal features by fusing those features from ADFC modules for a RGB-T image pair at each level. Finally, a joint attention guided bi-directional message passing (JABMP) module undertakes the task of saliency prediction via integrating the multi-level fused features from MGF modules. Experimental results on several public RGB-T salient object detection datasets demonstrate the superiorities of our proposed algorithm over the state-of-the-art approaches, especially under challenging conditions, such as poor illumination, complex background and low contrast
Guided Curriculum Model Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation
Most progress in semantic segmentation reports on daytime images taken under
favorable illumination conditions. We instead address the problem of semantic
segmentation of nighttime images and improve the state-of-the-art, by adapting
daytime models to nighttime without using nighttime annotations. Moreover, we
design a new evaluation framework to address the substantial uncertainty of
semantics in nighttime images. Our central contributions are: 1) a curriculum
framework to gradually adapt semantic segmentation models from day to night via
labeled synthetic images and unlabeled real images, both for progressively
darker times of day, which exploits cross-time-of-day correspondences for the
real images to guide the inference of their labels; 2) a novel
uncertainty-aware annotation and evaluation framework and metric for semantic
segmentation, designed for adverse conditions and including image regions
beyond human recognition capability in the evaluation in a principled fashion;
3) the Dark Zurich dataset, which comprises 2416 unlabeled nighttime and 2920
unlabeled twilight images with correspondences to their daytime counterparts
plus a set of 151 nighttime images with fine pixel-level annotations created
with our protocol, which serves as a first benchmark to perform our novel
evaluation. Experiments show that our guided curriculum adaptation
significantly outperforms state-of-the-art methods on real nighttime sets both
for standard metrics and our uncertainty-aware metric. Furthermore, our
uncertainty-aware evaluation reveals that selective invalidation of predictions
can lead to better results on data with ambiguous content such as our nighttime
benchmark and profit safety-oriented applications which involve invalid inputs.Comment: ICCV 2019 camera-read
Low-light Pedestrian Detection in Visible and Infrared Image Feeds: Issues and Challenges
Pedestrian detection has become a cornerstone for several high-level tasks,
including autonomous driving, intelligent transportation, and traffic
surveillance. There are several works focussed on pedestrian detection using
visible images, mainly in the daytime. However, this task is very intriguing
when the environmental conditions change to poor lighting or nighttime.
Recently, new ideas have been spurred to use alternative sources, such as Far
InfraRed (FIR) temperature sensor feeds for detecting pedestrians in low-light
conditions. This study comprehensively reviews recent developments in low-light
pedestrian detection approaches. It systematically categorizes and analyses
various algorithms from region-based to non-region-based and graph-based
learning methodologies by highlighting their methodologies, implementation
issues, and challenges. It also outlines the key benchmark datasets that can be
used for research and development of advanced pedestrian detection algorithms,
particularly in low-light situation
An Expressive Deep Model for Human Action Parsing from A Single Image
This paper aims at one newly raising task in vision and multimedia research:
recognizing human actions from still images. Its main challenges lie in the
large variations in human poses and appearances, as well as the lack of
temporal motion information. Addressing these problems, we propose to develop
an expressive deep model to naturally integrate human layout and surrounding
contexts for higher level action understanding from still images. In
particular, a Deep Belief Net is trained to fuse information from different
noisy sources such as body part detection and object detection. To bridge the
semantic gap, we used manually labeled data to greatly improve the
effectiveness and efficiency of the pre-training and fine-tuning stages of the
DBN training. The resulting framework is shown to be robust to sometimes
unreliable inputs (e.g., imprecise detections of human parts and objects), and
outperforms the state-of-the-art approaches.Comment: 6 pages, 8 figures, ICME 201
Interactively Test Driving an Object Detector: Estimating Performance on Unlabeled Data
In this paper, we study the problem of `test-driving' a detector, i.e.
allowing a human user to get a quick sense of how well the detector generalizes
to their specific requirement. To this end, we present the first system that
estimates detector performance interactively without extensive ground truthing
using a human in the loop. We approach this as a problem of estimating
proportions and show that it is possible to make accurate inferences on the
proportion of classes or groups within a large data collection by observing
only of samples from the data. In estimating the false detections (for
precision), the samples are chosen carefully such that the overall
characteristics of the data collection are preserved. Next, inspired by its use
in estimating disease propagation we apply pooled testing approaches to
estimate missed detections (for recall) from the dataset. The estimates thus
obtained are close to the ones obtained using ground truth, thus reducing the
need for extensive labeling which is expensive and time consuming.Comment: Published at Winter Conference on Applications of Computer Vision,
201
- …