35,056 research outputs found
Latent Variable Algorithms for Multimodal Learning and Sensor Fusion
Multimodal learning has been lacking principled ways of combining information
from different modalities and learning a low-dimensional manifold of meaningful
representations. We study multimodal learning and sensor fusion from a latent
variable perspective. We first present a regularized recurrent attention filter
for sensor fusion. This algorithm can dynamically combine information from
different types of sensors in a sequential decision making task. Each sensor is
bonded with a modular neural network to maximize utility of its own
information. A gating modular neural network dynamically generates a set of
mixing weights for outputs from sensor networks by balancing utility of all
sensors' information. We design a co-learning mechanism to encourage
co-adaption and independent learning of each sensor at the same time, and
propose a regularization based co-learning method. In the second part, we focus
on recovering the manifold of latent representation. We propose a co-learning
approach using probabilistic graphical model which imposes a structural prior
on the generative model: multimodal variational RNN (MVRNN) model, and derive a
variational lower bound for its objective functions. In the third part, we
extend the siamese structure to sensor fusion for robust acoustic event
detection. We perform experiments to investigate the latent representations
that are extracted; works will be done in the following months. Our experiments
show that the recurrent attention filter can dynamically combine different
sensor inputs according to the information carried in the inputs. We consider
MVRNN can identify latent representations that are useful for many downstream
tasks such as speech synthesis, activity recognition, and control and planning.
Both algorithms are general frameworks which can be applied to other tasks
where different types of sensors are jointly used for decision making
Robust Depth Estimation from Auto Bracketed Images
As demand for advanced photographic applications on hand-held devices grows,
these electronics require the capture of high quality depth. However, under
low-light conditions, most devices still suffer from low imaging quality and
inaccurate depth acquisition. To address the problem, we present a robust depth
estimation method from a short burst shot with varied intensity (i.e., Auto
Bracketing) or strong noise (i.e., High ISO). We introduce a geometric
transformation between flow and depth tailored for burst images, enabling our
learning-based multi-view stereo matching to be performed effectively. We then
describe our depth estimation pipeline that incorporates the geometric
transformation into our residual-flow network. It allows our framework to
produce an accurate depth map even with a bracketed image sequence. We
demonstrate that our method outperforms state-of-the-art methods for various
datasets captured by a smartphone and a DSLR camera. Moreover, we show that the
estimated depth is applicable for image quality enhancement and photographic
editing.Comment: To appear in CVPR 2018. Total 9 page
Review of Visual Saliency Detection with Comprehensive Information
Visual saliency detection model simulates the human visual system to perceive
the scene, and has been widely used in many vision tasks. With the acquisition
technology development, more comprehensive information, such as depth cue,
inter-image correspondence, or temporal relationship, is available to extend
image saliency detection to RGBD saliency detection, co-saliency detection, or
video saliency detection. RGBD saliency detection model focuses on extracting
the salient regions from RGBD images by combining the depth information.
Co-saliency detection model introduces the inter-image correspondence
constraint to discover the common salient object in an image group. The goal of
video saliency detection model is to locate the motion-related salient object
in video sequences, which considers the motion cue and spatiotemporal
constraint jointly. In this paper, we review different types of saliency
detection algorithms, summarize the important issues of the existing methods,
and discuss the existent problems and future works. Moreover, the evaluation
datasets and quantitative measurements are briefly introduced, and the
experimental analysis and discission are conducted to provide a holistic
overview of different saliency detection methods.Comment: 18 pages, 11 figures, 7 tables, Accepted by IEEE Transactions on
Circuits and Systems for Video Technology 2018, https://rmcong.github.io
Intensity Video Guided 4D Fusion for Improved Highly Dynamic 3D Reconstruction
The availability of high-speed 3D video sensors has greatly facilitated 3D
shape acquisition of dynamic and deformable objects, but high frame rate 3D
reconstruction is always degraded by spatial noise and temporal fluctuations.
This paper presents a simple yet powerful intensity video guided multi-frame 4D
fusion pipeline. Temporal tracking of intensity image points (of moving and
deforming objects) allows registration of the corresponding 3D data points,
whose 3D noise and fluctuations are then reduced by spatio-temporal multi-frame
4D fusion. We conducted simulated noise tests and real experiments on four 3D
objects using a 1000 fps 3D video sensor. The results demonstrate that the
proposed algorithm is effective at reducing 3D noise and is robust against
intensity noise. It outperforms existing algorithms with good scalability on
both stationary and dynamic objects
Real-world Underwater Enhancement: Challenges, Benchmarks, and Solutions
Underwater image enhancement is such an important low-level vision task with
many applications that numerous algorithms have been proposed in recent years.
These algorithms developed upon various assumptions demonstrate successes from
various aspects using different data sets and different metrics. In this work,
we setup an undersea image capturing system, and construct a large-scale
Real-world Underwater Image Enhancement (RUIE) data set divided into three
subsets. The three subsets target at three challenging aspects for enhancement,
i.e., image visibility quality, color casts, and higher-level
detection/classification, respectively. We conduct extensive and systematic
experiments on RUIE to evaluate the effectiveness and limitations of various
algorithms to enhance visibility and correct color casts on images with
hierarchical categories of degradation. Moreover, underwater image enhancement
in practice usually serves as a preprocessing step for mid-level and high-level
vision tasks. We thus exploit the object detection performance on enhanced
images as a brand new task-specific evaluation criterion. The findings from
these evaluations not only confirm what is commonly believed, but also suggest
promising solutions and new directions for visibility enhancement, color
correction, and object detection on real-world underwater images.Comment: arXiv admin note: text overlap with arXiv:1712.04143 by other author
CRDN: Cascaded Residual Dense Networks for Dynamic MR Imaging with Edge-enhanced Loss Constraint
Dynamic magnetic resonance (MR) imaging has generated great research
interest, as it can provide both spatial and temporal information for clinical
diagnosis. However, slow imaging speed or long scanning time is still one of
the challenges for dynamic MR imaging. Most existing methods reconstruct
Dynamic MR images from incomplete k-space data under the guidance of compressed
sensing (CS) or low rank theory, which suffer from long iterative
reconstruction time. Recently, deep learning has shown great potential in
accelerating dynamic MR. Our previous work proposed a dynamic MR imaging method
with both k-space and spatial prior knowledge integrated via multi-supervised
network training. Nevertheless, there was still a certain degree of smooth in
the reconstructed images at high acceleration factors. In this work, we propose
cascaded residual dense networks for dynamic MR imaging with edge-enhance loss
constraint, dubbed as CRDN. Specifically, the cascaded residual dense networks
fully exploit the hierarchical features from all the convolutional layers with
both local and global feature fusion. We further utilize the total variation
(TV) loss function, which has the edge enhancement properties, for training the
networks
A fully dense and globally consistent 3D map reconstruction approach for GI tract to enhance therapeutic relevance of the endoscopic capsule robot
In the gastrointestinal (GI) tract endoscopy field, ingestible wireless
capsule endoscopy is emerging as a novel, minimally invasive diagnostic
technology for inspection of the GI tract and diagnosis of a wide range of
diseases and pathologies. Since the development of this technology, medical
device companies and many research groups have made substantial progress in
converting passive capsule endoscopes to robotic active capsule endoscopes with
most of the functionality of current active flexible endoscopes. However,
robotic capsule endoscopy still has some challenges. In particular, the use of
such devices to generate a precise three-dimensional (3D) mapping of the entire
inner organ remains an unsolved problem. Such global 3D maps of inner organs
would help doctors to detect the location and size of diseased areas more
accurately and intuitively, thus permitting more reliable diagnoses. To our
knowledge, this paper presents the first complete pipeline for a complete 3D
visual map reconstruction of the stomach. The proposed pipeline is modular and
includes a preprocessing module, an image registration module, and a final
shape-from-shading-based 3D reconstruction module; the 3D map is primarily
generated by a combination of image stitching and shape-from-shading
techniques, and is updated in a frame-by-frame iterative fashion via capsule
motion inside the stomach. A comprehensive quantitative analysis of the
proposed 3D reconstruction method is performed using an esophagus gastro
duodenoscopy simulator, three different endoscopic cameras, and a 3D optical
scanner
Robust Real-Time Multi-View Eye Tracking
Despite significant advances in improving the gaze tracking accuracy under
controlled conditions, the tracking robustness under real-world conditions,
such as large head pose and movements, use of eyeglasses, illumination and eye
type variations, remains a major challenge in eye tracking. In this paper, we
revisit this challenge and introduce a real-time multi-camera eye tracking
framework to improve the tracking robustness. First, differently from previous
work, we design a multi-view tracking setup that allows for acquiring multiple
eye appearances simultaneously. Leveraging multi-view appearances enables to
more reliably detect gaze features under challenging conditions, particularly
when they are obstructed in conventional single-view appearance due to large
head movements or eyewear effects. The features extracted on various
appearances are then used for estimating multiple gaze outputs. Second, we
propose to combine estimated gaze outputs through an adaptive fusion mechanism
to compute user's overall point of regard. The proposed mechanism firstly
determines the estimation reliability of each gaze output according to user's
momentary head pose and predicted gazing behavior, and then performs a
reliability-based weighted fusion. We demonstrate the efficacy of our framework
with extensive simulations and user experiments on a collected dataset
featuring 20 subjects. Our results show that in comparison with
state-of-the-art eye trackers, the proposed framework provides not only a
significant enhancement in accuracy but also a notable robustness. Our
prototype system runs at 30 frames-per-second (fps) and achieves 1 degree
accuracy under challenging experimental scenarios, which makes it suitable for
applications demanding high accuracy and robustness.Comment: Organisational changes in the main msp and supplementary info.
Results unchanged. Main msp: 14 pages, 15 figures. Supplementary: 2 tables, 1
figure. Under review for an IEEE transactions publicatio
Multi-Channel CNN-based Object Detection for Enhanced Situation Awareness
Object Detection is critical for automatic military operations. However, the
performance of current object detection algorithms is deficient in terms of the
requirements in military scenarios. This is mainly because the object presence
is hard to detect due to the indistinguishable appearance and dramatic changes
of object's size which is determined by the distance to the detection sensors.
Recent advances in deep learning have achieved promising results in many
challenging tasks. The state-of-the-art in object detection is represented by
convolutional neural networks (CNNs), such as the fast R-CNN algorithm. These
CNN-based methods improve the detection performance significantly on several
public generic object detection datasets. However, their performance on
detecting small objects or undistinguishable objects in visible spectrum images
is still insufficient. In this study, we propose a novel detection algorithm
for military objects by fusing multi-channel CNNs. We combine spatial, temporal
and thermal information by generating a three-channel image, and they will be
fused as CNN feature maps in an unsupervised manner. The backbone of our object
detection framework is from the fast R-CNN algorithm, and we utilize
cross-domain transfer learning technique to fine-tune the CNN model on
generated multi-channel images. In the experiments, we validated the proposed
method with the images from SENSIAC (Military Sensing Information Analysis
Centre) database and compared it with the state-of-the-art. The experimental
results demonstrated the effectiveness of the proposed method on both accuracy
and computational efficiency.Comment: Published at the Sensors & Electronics Technology (SET) panel
Symposium SET-241 on 9th NATO Military Sensing Symposiu
Towards Real-Time Advancement of Underwater Visual Quality with GAN
Low visual quality has prevented underwater robotic vision from a wide range
of applications. Although several algorithms have been developed, real-time and
adaptive methods are deficient for real-world tasks. In this paper, we address
this difficulty based on generative adversarial networks (GAN), and propose a
GAN-based restoration scheme (GAN-RS). In particular, we develop a multi-branch
discriminator including an adversarial branch and a critic branch for the
purpose of simultaneously preserving image content and removing underwater
noise. In addition to adversarial learning, a novel dark channel prior loss
also promotes the generator to produce realistic vision. More specifically, an
underwater index is investigated to describe underwater properties, and a loss
function based on the underwater index is designed to train the critic branch
for underwater noise suppression. Through extensive comparisons on visual
quality and feature restoration, we confirm the superiority of the proposed
approach. Consequently, the GAN-RS can adaptively improve underwater visual
quality in real time and induce an overall superior restoration performance.
Finally, a real-world experiment is conducted on the seabed for grasping marine
products, and the results are quite promising. The source code is publicly
available at https://github.com/SeanChenxy/GAN_RS
- …