2,189 research outputs found
Chromatic Learning for Sparse Datasets
Learning over sparse, high-dimensional data frequently necessitates the use
of specialized methods such as the hashing trick. In this work, we design a
highly scalable alternative approach that leverages the low degree of feature
co-occurrences present in many practical settings. This approach, which we call
Chromatic Learning (CL), obtains a low-dimensional dense feature representation
by performing graph coloring over the co-occurrence graph of features---an
approach previously used as a runtime performance optimization for GBDT
training. This color-based dense representation can be combined with additional
dense categorical encoding approaches, e.g., submodular feature compression, to
further reduce dimensionality. CL exhibits linear parallelizability and
consumes memory linear in the size of the co-occurrence graph. By leveraging
the structural properties of the co-occurrence graph, CL can compress sparse
datasets, such as KDD Cup 2012, that contain over 50M features down to 1024,
using an order of magnitude fewer features than frequency-based truncation and
the hashing trick while maintaining the same test error for linear models. This
compression further enables the use of deep networks in this wide, sparse
setting, where CL similarly has favorable performance compared to existing
baselines for budgeted input dimension.Comment: 15 pages, 8 figures, under revie
A Single Model Explains both Visual and Auditory Precortical Coding
Precortical neural systems encode information collected by the senses, but
the driving principles of the encoding used have remained a subject of debate.
We present a model of retinal coding that is based on three constraints:
information preservation, minimization of the neural wiring, and response
equalization. The resulting novel version of sparse principal components
analysis successfully captures a number of known characteristics of the retinal
coding system, such as center-surround receptive fields, color opponency
channels, and spatiotemporal responses that correspond to magnocellular and
parvocellular pathways. Furthermore, when trained on auditory data, the same
model learns receptive fields well fit by gammatone filters, commonly used to
model precortical auditory coding. This suggests that efficient coding may be a
unifying principle of precortical encoding across modalities
Deep Optics for Monocular Depth Estimation and 3D Object Detection
Depth estimation and 3D object detection are critical for scene understanding
but remain challenging to perform with a single image due to the loss of 3D
information during image capture. Recent models using deep neural networks have
improved monocular depth estimation performance, but there is still difficulty
in predicting absolute depth and generalizing outside a standard dataset. Here
we introduce the paradigm of deep optics, i.e. end-to-end design of optics and
image processing, to the monocular depth estimation problem, using coded
defocus blur as an additional depth cue to be decoded by a neural network. We
evaluate several optical coding strategies along with an end-to-end
optimization scheme for depth estimation on three datasets, including NYU Depth
v2 and KITTI. We find an optimized freeform lens design yields the best
results, but chromatic aberration from a singlet lens offers significantly
improved performance as well. We build a physical prototype and validate that
chromatic aberrations improve depth estimation on real-world results. In
addition, we train object detection networks on the KITTI dataset and show that
the lens optimized for depth estimation also results in improved 3D object
detection performance.Comment: 10 pages, 5 figure
Structural Residual Learning for Single Image Rain Removal
To alleviate the adverse effect of rain streaks in image processing tasks,
CNN-based single image rain removal methods have been recently proposed.
However, the performance of these deep learning methods largely relies on the
covering range of rain shapes contained in the pre-collected training
rainy-clean image pairs. This makes them easily trapped into the
overfitting-to-the-training-samples issue and cannot finely generalize to
practical rainy images with complex and diverse rain streaks. Against this
generalization issue, this study proposes a new network architecture by
enforcing the output residual of the network possess intrinsic rain structures.
Such a structural residual setting guarantees the rain layer extracted by the
network finely comply with the prior knowledge of general rain streaks, and
thus regulates sound rain shapes capable of being well extracted from rainy
images in both training and predicting stages. Such a general regularization
function naturally leads to both its better training accuracy and testing
generalization capability even for those non-seen rain configurations. Such
superiority is comprehensively substantiated by experiments implemented on
synthetic and real datasets both visually and quantitatively as compared with
current state-of-the-art methods
4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks
In many robotics and VR/AR applications, 3D-videos are readily-available
sources of input (a continuous sequence of depth images, or LIDAR scans).
However, those 3D-videos are processed frame-by-frame either through 2D
convnets or 3D perception algorithms. In this work, we propose 4-dimensional
convolutional neural networks for spatio-temporal perception that can directly
process such 3D-videos using high-dimensional convolutions. For this, we adopt
sparse tensors and propose the generalized sparse convolution that encompasses
all discrete convolutions. To implement the generalized sparse convolution, we
create an open-source auto-differentiation library for sparse tensors that
provides extensive functions for high-dimensional convolutional neural
networks. We create 4D spatio-temporal convolutional neural networks using the
library and validate them on various 3D semantic segmentation benchmarks and
proposed 4D datasets for 3D-video perception. To overcome challenges in the 4D
space, we propose the hybrid kernel, a special case of the generalized sparse
convolution, and the trilateral-stationary conditional random field that
enforces spatio-temporal consistency in the 7D space-time-chroma space.
Experimentally, we show that convolutional neural networks with only
generalized 3D sparse convolutions can outperform 2D or 2D-3D hybrid methods by
a large margin. Also, we show that on 3D-videos, 4D spatio-temporal
convolutional neural networks are robust to noise, outperform 3D convolutional
neural networks and are faster than the 3D counterpart in some cases.Comment: CVPR'1
Learning Representations for Automatic Colorization
We develop a fully automatic image colorization system. Our approach
leverages recent advances in deep networks, exploiting both low-level and
semantic representations. As many scene elements naturally appear according to
multimodal color distributions, we train our model to predict per-pixel color
histograms. This intermediate output can be used to automatically generate a
color image, or further manipulated prior to image formation. On both fully and
partially automatic colorization tasks, we outperform existing methods. We also
explore colorization as a vehicle for self-supervised visual representation
learning.Comment: ECCV 2016 (Project page:
http://people.cs.uchicago.edu/~larsson/colorization/
Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining
Rain streaks can severely degrade the visibility, which causes many current
computer vision algorithms fail to work. So it is necessary to remove the rain
from images. We propose a novel deep network architecture based on deep
convolutional and recurrent neural networks for single image deraining. As
contextual information is very important for rain removal, we first adopt the
dilated convolutional neural network to acquire large receptive field. To
better fit the rain removal task, we also modify the network. In heavy rain,
rain streaks have various directions and shapes, which can be regarded as the
accumulation of multiple rain streak layers. We assign different alpha-values
to various rain streak layers according to the intensity and transparency by
incorporating the squeeze-and-excitation block. Since rain streak layers
overlap with each other, it is not easy to remove the rain in one stage. So we
further decompose the rain removal into multiple stages. Recurrent neural
network is incorporated to preserve the useful information in previous stages
and benefit the rain removal in later stages. We conduct extensive experiments
on both synthetic and real-world datasets. Our proposed method outperforms the
state-of-the-art approaches under all evaluation metrics. Codes and
supplementary material are available at our project webpage:
https://xialipku.github.io/RESCAN .Comment: Accepted by ECC
Learning Invariant Color Features for Person Re-Identification
Matching people across multiple camera views known as person
re-identification, is a challenging problem due to the change in visual
appearance caused by varying lighting conditions. The perceived color of the
subject appears to be different with respect to illumination. Previous works
use color as it is or address these challenges by designing color spaces
focusing on a specific cue. In this paper, we propose a data driven approach
for learning color patterns from pixels sampled from images across two camera
views. The intuition behind this work is that, even though pixel values of same
color would be different across views, they should be encoded with the same
values. We model color feature generation as a learning problem by jointly
learning a linear transformation and a dictionary to encode pixel values. We
also analyze different photometric invariant color spaces. Using color as the
only cue, we compare our approach with all the photometric invariant color
spaces and show superior performance over all of them. Combining with other
learned low-level and high-level features, we obtain promising results in
ViPER, Person Re-ID 2011 and CAVIAR4REID datasets
Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset
Removing rain streaks from a single image has been drawing considerable
attention as rain streaks can severely degrade the image quality and affect the
performance of existing outdoor vision tasks. While recent CNN-based derainers
have reported promising performances, deraining remains an open problem for two
reasons. First, existing synthesized rain datasets have only limited realism,
in terms of modeling real rain characteristics such as rain shape, direction
and intensity. Second, there are no public benchmarks for quantitative
comparisons on real rain images, which makes the current evaluation less
objective. The core challenge is that real world rain/clean image pairs cannot
be captured at the same time. In this paper, we address the single image rain
removal problem in two ways. First, we propose a semi-automatic method that
incorporates temporal priors and human supervision to generate a high-quality
clean image from each input sequence of real rain images. Using this method, we
construct a large-scale dataset of rain/rain-free image pairs
that covers a wide range of natural rain scenes. Second, to better cover the
stochastic distribution of real rain streaks, we propose a novel SPatial
Attentive Network (SPANet) to remove rain streaks in a local-to-global manner.
Extensive experiments demonstrate that our network performs favorably against
the state-of-the-art deraining methods.Comment: Accepted by CVPR'19. Project page:
https://stevewongv.github.io/derain-project.htm
Modeling Bottom-Up and Top-Down Attention with a Neurodynamic Model of V1
Previous studies suggested that lateral interactions of V1 cells are
responsible, among other visual effects, of bottom-up visual attention
(alternatively named visual salience or saliency). Our objective is to mimic
these connections with a neurodynamic network of firing-rate neurons in order
to predict visual attention. Early visual subcortical processes (i.e. retinal
and thalamic) are functionally simulated. An implementation of the cortical
magnification function is included to define the retinotopical projections
towards V1, processing neuronal activity for each distinct view during scene
observation. Novel computational definitions of top-down inhibition (in terms
of inhibition of return and selection mechanisms), are also proposed to predict
attention in Free-Viewing and Visual Search tasks. Results show that our model
outpeforms other biologically-inpired models of saliency prediction while
predicting visual saccade sequences with the same model. We also show how
temporal and spatial characteristics of inhibition of return can improve
prediction of saccades, as well as how distinct search strategies (in terms of
feature-selective or category-specific inhibition) can predict attention at
distinct image contexts.Comment: 27 pages, 19 figure
- …