5,372 research outputs found
Deep Eyes: Binocular Depth-from-Focus on Focal Stack Pairs
Human visual system relies on both binocular stereo cues and monocular
focusness cues to gain effective 3D perception. In computer vision, the two
problems are traditionally solved in separate tracks. In this paper, we present
a unified learning-based technique that simultaneously uses both types of cues
for depth inference. Specifically, we use a pair of focal stacks as input to
emulate human perception. We first construct a comprehensive focal stack
training dataset synthesized by depth-guided light field rendering. We then
construct three individual networks: a Focus-Net to extract depth from a single
focal stack, a EDoF-Net to obtain the extended depth of field (EDoF) image from
the focal stack, and a Stereo-Net to conduct stereo matching. We show how to
integrate them into a unified BDfF-Net to obtain high-quality depth maps.
Comprehensive experiments show that our approach outperforms the
state-of-the-art in both accuracy and speed and effectively emulates human
vision systems
Receiver Architectures for MIMO-OFDM Based on a Combined VMP-SP Algorithm
Iterative information processing, either based on heuristics or analytical
frameworks, has been shown to be a very powerful tool for the design of
efficient, yet feasible, wireless receiver architectures. Within this context,
algorithms performing message-passing on a probabilistic graph, such as the
sum-product (SP) and variational message passing (VMP) algorithms, have become
increasingly popular.
In this contribution, we apply a combined VMP-SP message-passing technique to
the design of receivers for MIMO-ODFM systems. The message-passing equations of
the combined scheme can be obtained from the equations of the stationary points
of a constrained region-based free energy approximation. When applied to a
MIMO-OFDM probabilistic model, we obtain a generic receiver architecture
performing iterative channel weight and noise precision estimation,
equalization and data decoding. We show that this generic scheme can be
particularized to a variety of different receiver structures, ranging from
high-performance iterative structures to low complexity receivers. This allows
for a flexible design of the signal processing specially tailored for the
requirements of each specific application. The numerical assessment of our
solutions, based on Monte Carlo simulations, corroborates the high performance
of the proposed algorithms and their superiority to heuristic approaches
GCoNet+: A Stronger Group Collaborative Co-Salient Object Detector
In this paper, we present a novel end-to-end group collaborative learning
network, termed GCoNet+, which can effectively and efficiently (250 fps)
identify co-salient objects in natural scenes. The proposed GCoNet+ achieves
the new state-of-the-art performance for co-salient object detection (CoSOD)
through mining consensus representations based on the following two essential
criteria: 1) intra-group compactness to better formulate the consistency among
co-salient objects by capturing their inherent shared attributes using our
novel group affinity module (GAM); 2) inter-group separability to effectively
suppress the influence of noisy objects on the output by introducing our new
group collaborating module (GCM) conditioning on the inconsistent consensus. To
further improve the accuracy, we design a series of simple yet effective
components as follows: i) a recurrent auxiliary classification module (RACM)
promoting the model learning at the semantic level; ii) a confidence
enhancement module (CEM) helping the model to improve the quality of the final
predictions; and iii) a group-based symmetric triplet (GST) loss guiding the
model to learn more discriminative features. Extensive experiments on three
challenging benchmarks, i.e., CoCA, CoSOD3k, and CoSal2015, demonstrate that
our GCoNet+ outperforms the existing 12 cutting-edge models. Code has been
released at https://github.com/ZhengPeng7/GCoNet_plus
Transformer Transforms Salient Object Detection and Camouflaged Object Detection
The transformer networks are particularly good at modeling long-range
dependencies within a long sequence. In this paper, we conduct research on
applying the transformer networks for salient object detection (SOD). We adopt
the dense transformer backbone for fully supervised RGB image based SOD, RGB-D
image pair based SOD, and weakly supervised SOD within a unified framework
based on the observation that the transformer backbone can provide accurate
structure modeling, which makes it powerful in learning from weak labels with
less structure information. Further, we find that the vision transformer
architectures do not offer direct spatial supervision, instead encoding
position as a feature. Therefore, we investigate the contributions of two
strategies to provide stronger spatial supervision through the transformer
layers within our unified framework, namely deep supervision and
difficulty-aware learning. We find that deep supervision can get gradients back
into the higher level features, thus leads to uniform activation within the
same semantic object. Difficulty-aware learning on the other hand is capable of
identifying the hard pixels for effective hard negative mining. We also
visualize features of conventional backbone and transformer backbone before and
after fine-tuning them for SOD, and find that transformer backbone encodes more
accurate object structure information and more distinct semantic information
within the lower and higher level features respectively. We also apply our
model to camouflaged object detection (COD) and achieve similar observations as
the above three SOD tasks. Extensive experimental results on various SOD and
COD tasks illustrate that transformer networks can transform SOD and COD,
leading to new benchmarks for each related task. The source code and
experimental results are available via our project page:
https://github.com/fupiao1998/TrasformerSOD.Comment: Technical report, 18 pages, 22 figure
Imputation estimators for unnormalized models with missing data
Several statistical models are given in the form of unnormalized densities,
and calculation of the normalization constant is intractable. We propose
estimation methods for such unnormalized models with missing data. The key
concept is to combine imputation techniques with estimators for unnormalized
models including noise contrastive estimation and score matching. In addition,
we derive asymptotic distributions of the proposed estimators and construct
confidence intervals. Simulation results with truncated Gaussian graphical
models and the application to real data of wind direction reveal that the
proposed methods effectively enable statistical inference with unnormalized
models from missing data.Comment: To appear (AISTATS 2020
- …