1,499 research outputs found
Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
This paper proposes a new hybrid architecture that consists of a deep
Convolutional Network and a Markov Random Field. We show how this architecture
is successfully applied to the challenging problem of articulated human pose
estimation in monocular images. The architecture can exploit structural domain
constraints such as geometric relationships between body joint locations. We
show that joint training of these two model paradigms improves performance and
allows us to significantly outperform existing state-of-the-art techniques
Gravity optimised particle filter for hand tracking
This paper presents a gravity optimised particle filter (GOPF) where the magnitude of the gravitational force for every particle is proportional to its weight. GOPF attracts nearby particles and replicates new particles as if moving the particles towards the peak of the likelihood distribution, improving the sampling efficiency. GOPF is incorporated into a technique for hand features tracking. A fast approach to hand features detection and labelling using convexity defects is also presented. Experimental results show that GOPF outperforms the standard particle filter and its variants, as well as state-of-the-art CamShift guided particle filter using a significantly reduced number of particles
Automatic differentiation in machine learning: a survey
Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in
machine learning. Automatic differentiation (AD), also called algorithmic
differentiation or simply "autodiff", is a family of techniques similar to but
more general than backpropagation for efficiently and accurately evaluating
derivatives of numeric functions expressed as computer programs. AD is a small
but established field with applications in areas including computational fluid
dynamics, atmospheric sciences, and engineering design optimization. Until very
recently, the fields of machine learning and AD have largely been unaware of
each other and, in some cases, have independently discovered each other's
results. Despite its relevance, general-purpose AD has been missing from the
machine learning toolbox, a situation slowly changing with its ongoing adoption
under the names "dynamic computational graphs" and "differentiable
programming". We survey the intersection of AD and machine learning, cover
applications where AD has direct relevance, and address the main implementation
techniques. By precisely defining the main differentiation techniques and their
interrelationships, we aim to bring clarity to the usage of the terms
"autodiff", "automatic differentiation", and "symbolic differentiation" as
these are encountered more and more in machine learning settings.Comment: 43 pages, 5 figure
Enabling More Accurate and Efficient Structured Prediction
Machine learning practitioners often face a fundamental trade-off between expressiveness and computation time: on average, more accurate, expressive models tend to be more computationally intensive both at training and test time. While this trade-off is always applicable, it is acutely present in the setting of structured prediction, where the joint prediction of multiple output variables often creates two primary, inter-related bottlenecks: inference and feature computation time.
In this thesis, we address this trade-off at test-time by presenting frameworks that enable more accurate and efficient structured prediction by addressing each of the bottlenecks specifically. First, we develop a framework based on a cascade of models, where the goal is to control test-time complexity even as features are added that increase inference time (even exponentially). We call this framework Structured Prediction Cascades (SPC); we develop SPC in the context of exact inference and then extend the framework to handle the approximate case. Next, we develop a framework for the setting where the feature computation is explicitly the bottleneck, in which we learn to selectively evaluate features within an instance of the mode. This second framework is referred to as Dynamic Structured Model Selection (DMS), and is once again developed for a simpler, restricted model before being extended to handle a much more complex setting. For both cases, we evaluate our methods on several benchmark datasets, and we find that it is possible to dramatically improve the efficiency and accuracy of structured prediction
Theseus: A Library for Differentiable Nonlinear Optimization
We present Theseus, an efficient application-agnostic open source library for
differentiable nonlinear least squares (DNLS) optimization built on PyTorch,
providing a common framework for end-to-end structured learning in robotics and
vision. Existing DNLS implementations are application specific and do not
always incorporate many ingredients important for efficiency. Theseus is
application-agnostic, as we illustrate with several example applications that
are built using the same underlying differentiable components, such as
second-order optimizers, standard costs functions, and Lie groups. For
efficiency, Theseus incorporates support for sparse solvers, automatic
vectorization, batching, GPU acceleration, and gradient computation with
implicit differentiation and direct loss minimization. We do extensive
performance evaluation in a set of applications, demonstrating significant
efficiency gains and better scalability when these features are incorporated.
Project page: https://sites.google.com/view/theseus-a
Blending Learning and Inference in Structured Prediction
In this paper we derive an efficient algorithm to learn the parameters of
structured predictors in general graphical models. This algorithm blends the
learning and inference tasks, which results in a significant speedup over
traditional approaches, such as conditional random fields and structured
support vector machines. For this purpose we utilize the structures of the
predictors to describe a low dimensional structured prediction task which
encourages local consistencies within the different structures while learning
the parameters of the model. Convexity of the learning task provides the means
to enforce the consistencies between the different parts. The
inference-learning blending algorithm that we propose is guaranteed to converge
to the optimum of the low dimensional primal and dual programs. Unlike many of
the existing approaches, the inference-learning blending allows us to learn
efficiently high-order graphical models, over regions of any size, and very
large number of parameters. We demonstrate the effectiveness of our approach,
while presenting state-of-the-art results in stereo estimation, semantic
segmentation, shape reconstruction, and indoor scene understanding
Recommended from our members
Generating 3D product design models in real-time using hand motion and gesture
This thesis was submitted for the degree of Master of Philosophy and awarded by Brunel University.Three dimensional product design models are widely used in conceptual design and in the early stage of prototyping during the design processes. A product design specification often demands a substantial amount of 3D models to be constructed within a short period of time. Current methods begin with designers sketching product concepts in 2D using pencil and paper, which in turn are then translated into 3D models by a design individual with CAD expertise, using a 3D modelling software package such as Pro Engineer, Solid Works, Auto CAD etc. Several novel methods have been used to incorporate hand motion as a way of interacting with computers. There are three main types of technology available to capture motion data, capable of translating human motion into numeric data which can be read by a computer system. The first being, hand gesture glove-based systems such as “Cyberglove”, these systems are generally used to capture hand gesture and joint angle information. The second is full body motion capture systems, optical and non-optical-based, and finally vision based gesture recognition systems which capture full degree of - freedom (DOF) hand motion estimation. There has yet to be a method using any of the above mentioned input devices to rapidly produce 3D product design models in real time, using hand motion and gestures. In this research, a novel method is presented, using a motion capture system to capture hand gestures and motion in real time, to recreate 3D curves and surfaces, which can be translated into 3D product design models. The main aim of this research is to develop a hand motion and gesture-based rapid 3D product modelling method, allowing designers to interactively sketch out 3D concepts in real time using a virtual workspace.
A database of a number of hand signs was built for both architectural hand signs (preliminary study) and Product Design hand signs. A marker set model with a total of eight markers (five on the left hand and three on right hand/marker pen) was designed and used in the capture of hand gestures with the use of an Optical Motion Capture System. A preliminary testing session was successfully completed to determine whether the Motion Capture system would be suitable for a real-time application, by effectively modelling a train station in an offline state using hand motion and gesture. An OpenGL software application was programmed using C++ and the Microsoft Foundation Classes which was used to communicate and pass information of captured motion from the EVaRT system to the user
Efficient Human Pose Estimation with Image-dependent Interactions
Human pose estimation from 2D images is one of the most challenging
and computationally-demanding problems in computer vision. Standard
models such as Pictorial Structures consider interactions between
kinematically connected joints or limbs, leading to inference cost
that is quadratic in the number of pixels. As a result, researchers
and practitioners have restricted themselves to simple models which
only measure the quality of limb-pair possibilities by their 2D
geometric plausibility.
In this talk, we propose novel methods which allow for efficient
inference in richer models with data-dependent interactions. First, we
introduce structured prediction cascades, a structured analog of
binary cascaded classifiers, which learn to focus computational effort
where it is needed, filtering out many states cheaply while ensuring
the correct output is unfiltered. Second, we propose a way to
decompose models of human pose with cyclic dependencies into a
collection of tree models, and provide novel methods to impose model
agreement. Finally, we develop a local linear approach that learns
bases centered around modes in the training data, giving us
image-dependent local models which are fast and accurate.
These techniques allow for sparse and efficient inference on the order
of minutes or seconds per image. As a result, we can afford to model
pairwise interaction potentials much more richly with data-dependent
features such as contour continuity, segmentation alignment, color
consistency, optical flow and multiple modes. We show empirically that
these richer models are worthwhile, obtaining significantly more
accurate pose estimation on popular datasets
- …