14,935 research outputs found
Impatient DNNs - Deep Neural Networks with Dynamic Time Budgets
We propose Impatient Deep Neural Networks (DNNs) which deal with dynamic time
budgets during application. They allow for individual budgets given a priori
for each test example and for anytime prediction, i.e., a possible interruption
at multiple stages during inference while still providing output estimates. Our
approach can therefore tackle the computational costs and energy demands of
DNNs in an adaptive manner, a property essential for real-time applications.
Our Impatient DNNs are based on a new general framework of learning dynamic
budget predictors using risk minimization, which can be applied to current DNN
architectures by adding early prediction and additional loss layers. A key
aspect of our method is that all of the intermediate predictors are learned
jointly. In experiments, we evaluate our approach for different budget
distributions, architectures, and datasets. Our results show a significant gain
in expected accuracy compared to common baselines.Comment: British Machine Vision Conference (BMVC) 201
Anytime Stereo Image Depth Estimation on Mobile Devices
Many applications of stereo depth estimation in robotics require the
generation of accurate disparity maps in real time under significant
computational constraints. Current state-of-the-art algorithms force a choice
between either generating accurate mappings at a slow pace, or quickly
generating inaccurate ones, and additionally these methods typically require
far too many parameters to be usable on power- or memory-constrained devices.
Motivated by these shortcomings, we propose a novel approach for disparity
prediction in the anytime setting. In contrast to prior work, our end-to-end
learned approach can trade off computation and accuracy at inference time.
Depth estimation is performed in stages, during which the model can be queried
at any time to output its current best estimate. Our final model can process
1242375 resolution images within a range of 10-35 FPS on an NVIDIA
Jetson TX2 module with only marginal increases in error -- using two orders of
magnitude fewer parameters than the most competitive baseline. The source code
is available at https://github.com/mileyan/AnyNet .Comment: Accepted by ICRA201
Improving 6D Pose Estimation of Objects in Clutter via Physics-aware Monte Carlo Tree Search
This work proposes a process for efficiently searching over combinations of
individual object 6D pose hypotheses in cluttered scenes, especially in cases
involving occlusions and objects resting on each other. The initial set of
candidate object poses is generated from state-of-the-art object detection and
global point cloud registration techniques. The best-scored pose per object by
using these techniques may not be accurate due to overlaps and occlusions.
Nevertheless, experimental indications provided in this work show that object
poses with lower ranks may be closer to the real poses than ones with high
ranks according to registration techniques. This motivates a global
optimization process for improving these poses by taking into account
scene-level physical interactions between objects. It also implies that the
Cartesian product of candidate poses for interacting objects must be searched
so as to identify the best scene-level hypothesis. To perform the search
efficiently, the candidate poses for each object are clustered so as to reduce
their number but still keep a sufficient diversity. Then, searching over the
combinations of candidate object poses is performed through a Monte Carlo Tree
Search (MCTS) process that uses the similarity between the observed depth image
of the scene and a rendering of the scene given the hypothesized pose as a
score that guides the search procedure. MCTS handles in a principled way the
tradeoff between fine-tuning the most promising poses and exploring new ones,
by using the Upper Confidence Bound (UCB) technique. Experimental results
indicate that this process is able to quickly identify in cluttered scenes
physically-consistent object poses that are significantly closer to ground
truth compared to poses found by point cloud registration methods.Comment: 8 pages, 4 figure
Tracking of motor vehicles from aerial video imagery using the OT-MACH correlation filter
Accurately tracking moving targets in a complex scene involving moving cameras, occlusions and targets embedded in noise is a very active research area in computer vision. In this paper, an optimal trade-off maximum correlation height (OT-MACH) filter has been designed and implemented as a robust tracker. The algorithm allows selection of different objects as a target, based on the operator’s requirements. The user interface is designed so as to allow the selection of a different target for tracking at any time. The filter is updated, at a frequency selected by the user, which makes the filter more resistant to progressive changes in the object’s orientation and scale. The tracker has been tested on both colour visible band as well as infra-red band video sequences acquired from the air by the Sussex County police helicopter. Initial testing has demonstrated the ability of the filter to maintain a stable track on vehicles despite changes of scale, orientation and lighting and the ability to re-acquire the track after short losses due to the vehicle passing behind occlusions
Smartphone picture organization: a hierarchical approach
We live in a society where the large majority of the population has a camera-equipped smartphone. In addition, hard drives and cloud storage are getting cheaper and cheaper, leading to a tremendous growth in stored personal photos. Unlike photo collections captured by a digital camera, which typically are pre-processed by the user who organizes them into event-related folders, smartphone pictures are automatically stored in the cloud. As a consequence, photo collections captured by a smartphone are highly unstructured and because smartphones are ubiquitous, they present a larger variability compared to pictures captured by a digital camera. To solve the need of organizing large smartphone photo collections automatically, we propose here a new methodology for hierarchical photo organization into topics and topic-related categories. Our approach successfully estimates latent topics in the pictures by applying probabilistic Latent Semantic Analysis, and automatically assigns a name to each topic by relying on a lexical database. Topic-related categories are then estimated by using a set of topic-specific Convolutional Neuronal Networks. To validate our approach, we ensemble and make public a large dataset of more than 8,000 smartphone pictures from 40 persons. Experimental results demonstrate major user satisfaction with respect to state of the art solutions in terms of organization.Peer ReviewedPreprin
Robustness of 3D Deep Learning in an Adversarial Setting
Understanding the spatial arrangement and nature of real-world objects is of
paramount importance to many complex engineering tasks, including autonomous
navigation. Deep learning has revolutionized state-of-the-art performance for
tasks in 3D environments; however, relatively little is known about the
robustness of these approaches in an adversarial setting. The lack of
comprehensive analysis makes it difficult to justify deployment of 3D deep
learning models in real-world, safety-critical applications. In this work, we
develop an algorithm for analysis of pointwise robustness of neural networks
that operate on 3D data. We show that current approaches presented for
understanding the resilience of state-of-the-art models vastly overestimate
their robustness. We then use our algorithm to evaluate an array of
state-of-the-art models in order to demonstrate their vulnerability to
occlusion attacks. We show that, in the worst case, these networks can be
reduced to 0% classification accuracy after the occlusion of at most 6.5% of
the occupied input space.Comment: 10 pages, 8 figures, 1 tabl
- …