65,395 research outputs found
Deep neural network for traffic sign recognition systems: An analysis of spatial transformers and stochastic optimisation methods
This paper presents a Deep Learning approach for traffic sign recognition systems. Several classification experiments are conducted over publicly available traffic sign datasets from Germany and Belgium using a Deep Neural Network which comprises Convolutional layers and Spatial Transformer Networks. Such trials are built to measure the impact of diverse factors with the end goal of designing a Convolutional Neural Network that can improve the state-of-the-art of traffic sign classification task. First, different adaptive and non-adaptive stochastic gradient descent optimisation algorithms such as SGD, SGD-Nesterov, RMSprop and Adam are evaluated. Subsequently, multiple combinations of Spatial Transformer Networks placed at distinct positions within the main neural network are analysed. The recognition rate of the proposed Convolutional Neural Network reports an accuracy of 99.71% in the German Traffic Sign Recognition Benchmark, outperforming previous state-of-the-art methods and also being more efficient in terms of memory requirements.Ministerio de Economía y Competitividad TIN2017-82113-C2-1-RMinisterio de Economía y Competitividad TIN2013-46801-C4-1-
Recognizing point clouds using conditional random fields
Detecting objects in cluttered scenes is a necessary step for many robotic tasks and facilitates the interaction of the robot with its environment. Because of the availability of efficient 3D sensing devices as the Kinect, methods for the recognition of objects in 3D point clouds have gained importance during the last years. In this paper, we propose a new supervised learning approach for the recognition of objects from 3D point clouds using Conditional Random Fields, a type of discriminative, undirected probabilistic graphical model. The various features and contextual relations of the objects are described by the potential functions in the graph. Our method allows for learning and inference from unorganized point clouds of arbitrary sizes and shows significant benefit in terms of computational speed during prediction when compared to a state-of-the-art approach based on constrained optimization.Peer ReviewedPostprint (author’s final draft
Constrained Deep Networks: Lagrangian Optimization via Log-Barrier Extensions
This study investigates the optimization aspects of imposing hard inequality
constraints on the outputs of CNNs. In the context of deep networks,
constraints are commonly handled with penalties for their simplicity, and
despite their well-known limitations. Lagrangian-dual optimization has been
largely avoided, except for a few recent works, mainly due to the computational
complexity and stability/convergence issues caused by alternating explicit dual
updates/projections and stochastic optimization. Several studies showed that,
surprisingly for deep CNNs, the theoretical and practical advantages of
Lagrangian optimization over penalties do not materialize in practice. We
propose log-barrier extensions, which approximate Lagrangian optimization of
constrained-CNN problems with a sequence of unconstrained losses. Unlike
standard interior-point and log-barrier methods, our formulation does not need
an initial feasible solution. Furthermore, we provide a new technical result,
which shows that the proposed extensions yield an upper bound on the duality
gap. This generalizes the duality-gap result of standard log-barriers, yielding
sub-optimality certificates for feasible solutions. While sub-optimality is not
guaranteed for non-convex problems, our result shows that log-barrier
extensions are a principled way to approximate Lagrangian optimization for
constrained CNNs via implicit dual variables. We report comprehensive weakly
supervised segmentation experiments, with various constraints, showing that our
formulation outperforms substantially the existing constrained-CNN methods,
both in terms of accuracy, constraint satisfaction and training stability, more
so when dealing with a large number of constraints
Crowd-ML: A Privacy-Preserving Learning Framework for a Crowd of Smart Devices
Smart devices with built-in sensors, computational capabilities, and network
connectivity have become increasingly pervasive. The crowds of smart devices
offer opportunities to collectively sense and perform computing tasks in an
unprecedented scale. This paper presents Crowd-ML, a privacy-preserving machine
learning framework for a crowd of smart devices, which can solve a wide range
of learning problems for crowdsensing data with differential privacy
guarantees. Crowd-ML endows a crowdsensing system with an ability to learn
classifiers or predictors online from crowdsensing data privately with minimal
computational overheads on devices and servers, suitable for a practical and
large-scale employment of the framework. We analyze the performance and the
scalability of Crowd-ML, and implement the system with off-the-shelf
smartphones as a proof of concept. We demonstrate the advantages of Crowd-ML
with real and simulated experiments under various conditions
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning
Understanding the global optimality in deep learning (DL) has been attracting
more and more attention recently. Conventional DL solvers, however, have not
been developed intentionally to seek for such global optimality. In this paper
we propose a novel approximation algorithm, BPGrad, towards optimizing deep
models globally via branch and pruning. Our BPGrad algorithm is based on the
assumption of Lipschitz continuity in DL, and as a result it can adaptively
determine the step size for current gradient given the history of previous
updates, wherein theoretically no smaller steps can achieve the global
optimality. We prove that, by repeating such branch-and-pruning procedure, we
can locate the global optimality within finite iterations. Empirically an
efficient solver based on BPGrad for DL is proposed as well, and it outperforms
conventional DL solvers such as Adagrad, Adadelta, RMSProp, and Adam in the
tasks of object recognition, detection, and segmentation
Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images
Analysis-by-synthesis has been a successful approach for many tasks in
computer vision, such as 6D pose estimation of an object in an RGB-D image
which is the topic of this work. The idea is to compare the observation with
the output of a forward process, such as a rendered image of the object of
interest in a particular pose. Due to occlusion or complicated sensor noise, it
can be difficult to perform this comparison in a meaningful way. We propose an
approach that "learns to compare", while taking these difficulties into
account. This is done by describing the posterior density of a particular
object pose with a convolutional neural network (CNN) that compares an observed
and rendered image. The network is trained with the maximum likelihood
paradigm. We observe empirically that the CNN does not specialize to the
geometry or appearance of specific objects, and it can be used with objects of
vastly different shapes and appearances, and in different backgrounds. Compared
to state-of-the-art, we demonstrate a significant improvement on two different
datasets which include a total of eleven objects, cluttered background, and
heavy occlusion.Comment: 16 pages, 8 figure
- …