1,233 research outputs found
Confidence Propagation through CNNs for Guided Sparse Depth Regression
Generally, convolutional neural networks (CNNs) process data on a regular
grid, e.g. data generated by ordinary cameras. Designing CNNs for sparse and
irregularly spaced input data is still an open research problem with numerous
applications in autonomous driving, robotics, and surveillance. In this paper,
we propose an algebraically-constrained normalized convolution layer for CNNs
with highly sparse input that has a smaller number of network parameters
compared to related work. We propose novel strategies for determining the
confidence from the convolution operation and propagating it to consecutive
layers. We also propose an objective function that simultaneously minimizes the
data error while maximizing the output confidence. To integrate structural
information, we also investigate fusion strategies to combine depth and RGB
information in our normalized convolution network framework. In addition, we
introduce the use of output confidence as an auxiliary information to improve
the results. The capabilities of our normalized convolution network framework
are demonstrated for the problem of scene depth completion. Comprehensive
experiments are performed on the KITTI-Depth and the NYU-Depth-v2 datasets. The
results clearly demonstrate that the proposed approach achieves superior
performance while requiring only about 1-5% of the number of parameters
compared to the state-of-the-art methods.Comment: 14 pages, 14 Figure
Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)
The implicit objective of the biennial "international - Traveling Workshop on
Interactions between Sparse models and Technology" (iTWIST) is to foster
collaboration between international scientific teams by disseminating ideas
through both specific oral/poster presentations and free discussions. For its
second edition, the iTWIST workshop took place in the medieval and picturesque
town of Namur in Belgium, from Wednesday August 27th till Friday August 29th,
2014. The workshop was conveniently located in "The Arsenal" building within
walking distance of both hotels and town center. iTWIST'14 has gathered about
70 international participants and has featured 9 invited talks, 10 oral
presentations, and 14 posters on the following themes, all related to the
theory, application and generalization of the "sparsity paradigm":
Sparsity-driven data sensing and processing; Union of low dimensional
subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph
sensing/processing; Blind inverse problems and dictionary learning; Sparsity
and computational neuroscience; Information theory, geometry and randomness;
Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?;
Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website:
http://sites.google.com/site/itwist1
Compressive sensing for 3D microwave imaging systems
Compressed sensing (CS) image reconstruction techniques are developed and experimentally implemented for wideband microwave synthetic aperture radar (SAR) imaging systems with applications to nondestructive testing and evaluation. These techniques significantly reduce the number of spatial measurement points and, consequently, the acquisition time by sampling at a level lower than the Nyquist-Shannon rate. Benefiting from a reduced number of samples, this work successfully implemented two scanning procedures: the nonuniform raster and the optimum path.
Three CS reconstruction approaches are also proposed for the wideband microwave SAR-based imaging systems. The first approach reconstructs a full-set of raw data from undersampled measurements via L1-norm optimization and consequently applies 3D forward SAR on the reconstructed raw data. The second proposed approach employs forward SAR and reverse SAR (R-SAR) transforms in each L1-norm optimization iteration reconstructing images directly. This dissertation proposes a simple, elegant truncation repair method to combat the truncation error which is a critical obstacle to the convergence of the CS iterative algorithm. The third proposed CS reconstruction algorithm is the adaptive basis selection (ABS) compressed sensing. Rather than a fixed sparsifying basis, the proposed ABS method adaptively selects the best basis from a set of bases in each iteration of the L1-norm optimization according to a proposed decision metric that is derived from the sparsity of the image and the coherence between the measurement and sparsifying matrices. The results of several experiments indicate that the proposed algorithms recover 2D and 3D SAR images with only 20% of the spatial points and reduce the acquisition time by up to 66% of that of conventional methods while maintaining or improving the quality of the SAR images --Abstract, page iv
Adapting Computer Vision Models To Limitations On Input Dimensionality And Model Complexity
When considering instances of distributed systems where visual sensors communicate with remote predictive models, data traffic is limited to the capacity of communication channels, and hardware limits the processing of collected data prior to transmission. We study novel methods of adapting visual inference to limitations on complexity and data availability at test time, wherever the aforementioned limitations exist. Our contributions detailed in this thesis consider both task-specific and task-generic approaches to reducing the data requirement for inference, and evaluate our proposed methods on a wide range of computer vision tasks. This thesis makes four distinct contributions: (i) We investigate multi-class action classification via two-stream convolutional neural networks that directly ingest information extracted from compressed video bitstreams. We show that selective access to macroblock motion vector information provides a good low-dimensional approximation of the underlying optical flow in visual sequences. (ii) We devise a bitstream cropping method by which AVC/H.264 and H.265 bitstreams are reduced to the minimum amount of necessary elements for optical flow extraction, while maintaining compliance with codec standards. We additionally study the effect of codec rate-quality control on the sparsity and noise incurred on optical flow derived from resulting bitstreams, and do so for multiple coding standards. (iii) We demonstrate degrees of variability in the amount of data required for action classification, and leverage this to reduce the dimensionality of input volumes by inferring the required temporal extent for accurate classification prior to processing via learnable machines. (iv) We extend the Mixtures-of-Experts (MoE) paradigm to adapt the data cost of inference for any set of constituent experts. We postulate that the minimum acceptable data cost of inference varies for different input space partitions, and consider mixtures where each expert is designed to meet a different set of constraints on input dimensionality. To take advantage of the flexibility of such mixtures in processing different input representations and modalities, we train biased gating functions such that experts requiring less information to make their inferences are favoured to others. We finally note that, our proposed data utility optimization solutions include a learnable component which considers specified priorities on the amount of information to be used prior to inference, and can be realized for any combination of tasks, modalities, and constraints on available data
- …