975 research outputs found
Pose-invariant, model-based object recognition, using linear combination of views and Bayesian statistics
This thesis presents an in-depth study on the problem of object recognition, and in particular the detection
of 3-D objects in 2-D intensity images which may be viewed from a variety of angles. A solution to this
problem remains elusive to this day, since it involves dealing with variations in geometry, photometry
and viewing angle, noise, occlusions and incomplete data. This work restricts its scope to a particular
kind of extrinsic variation; variation of the image due to changes in the viewpoint from which the object
is seen.
A technique is proposed and developed to address this problem, which falls into the category of
view-based approaches, that is, a method in which an object is represented as a collection of a small
number of 2-D views, as opposed to a generation of a full 3-D model. This technique is based on the
theoretical observation that the geometry of the set of possible images of an object undergoing 3-D rigid
transformations and scaling may, under most imaging conditions, be represented by a linear combination
of a small number of 2-D views of that object. It is therefore possible to synthesise a novel image of an
object given at least two existing and dissimilar views of the object, and a set of linear coefficients that
determine how these views are to be combined in order to synthesise the new image.
The method works in conjunction with a powerful optimization algorithm, to search and recover the
optimal linear combination coefficients that will synthesize a novel image, which is as similar as possible
to the target, scene view. If the similarity between the synthesized and the target images is above some
threshold, then an object is determined to be present in the scene and its location and pose are defined,
in part, by the coefficients. The key benefits of using this technique is that because it works directly
with pixel values, it avoids the need for problematic, low-level feature extraction and solution of the
correspondence problem. As a result, a linear combination of views (LCV) model is easy to construct
and use, since it only requires a small number of stored, 2-D views of the object in question, and the
selection of a few landmark points on the object, the process which is easily carried out during the offline,
model building stage. In addition, this method is general enough to be applied across a variety of
recognition problems and different types of objects.
The development and application of this method is initially explored looking at two-dimensional
problems, and then extending the same principles to 3-D. Additionally, the method is evaluated across
synthetic and real-image datasets, containing variations in the objects’ identity and pose. Future work on
possible extensions to incorporate a foreground/background model and lighting variations of the pixels
are examined
Unsupervised learning of object landmarks by factorized spatial embeddings
Learning automatically the structure of object categories remains an
important open problem in computer vision. In this paper, we propose a novel
unsupervised approach that can discover and learn landmarks in object
categories, thus characterizing their structure. Our approach is based on
factorizing image deformations, as induced by a viewpoint change or an object
deformation, by learning a deep neural network that detects landmarks
consistently with such visual effects. Furthermore, we show that the learned
landmarks establish meaningful correspondences between different object
instances in a category without having to impose this requirement explicitly.
We assess the method qualitatively on a variety of object types, natural and
man-made. We also show that our unsupervised landmarks are highly predictive of
manually-annotated landmarks in face benchmark datasets, and can be used to
regress these with a high degree of accuracy.Comment: To be published in ICCV 201
Unsupervised Learning of Depth and Ego-Motion from Video
We present an unsupervised learning framework for the task of monocular depth
and camera motion estimation from unstructured video sequences. We achieve this
by simultaneously training depth and camera pose estimation networks using the
task of view synthesis as the supervisory signal. The networks are thus coupled
via the view synthesis objective during training, but can be applied
independently at test time. Empirical evaluation on the KITTI dataset
demonstrates the effectiveness of our approach: 1) monocular depth performing
comparably with supervised methods that use either ground-truth pose or depth
for training, and 2) pose estimation performing favorably with established SLAM
systems under comparable input settings.Comment: Accepted to CVPR 2017. Project webpage:
https://people.eecs.berkeley.edu/~tinghuiz/projects/SfMLearner
Biometric Authentication System on Mobile Personal Devices
We propose a secure, robust, and low-cost biometric authentication system on the mobile personal device for the personal network. The system consists of the following five key modules: 1) face detection; 2) face registration; 3) illumination normalization; 4) face verification; and 5) information fusion. For the complicated face authentication task on the devices with limited resources, the emphasis is largely on the reliability and applicability of the system. Both theoretical and practical considerations are taken. The final system is able to achieve an equal error rate of 2% under challenging testing protocols. The low hardware and software cost makes the system well adaptable to a large range of security applications
The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch
Recent and forthcoming advances in instrumentation, and giant new surveys,
are creating astronomical data sets that are not amenable to the methods of
analysis familiar to astronomers. Traditional methods are often inadequate not
merely because of the size in bytes of the data sets, but also because of the
complexity of modern data sets. Mathematical limitations of familiar algorithms
and techniques in dealing with such data sets create a critical need for new
paradigms for the representation, analysis and scientific visualization (as
opposed to illustrative visualization) of heterogeneous, multiresolution data
across application domains. Some of the problems presented by the new data sets
have been addressed by other disciplines such as applied mathematics,
statistics and machine learning and have been utilized by other sciences such
as space-based geosciences. Unfortunately, valuable results pertaining to these
problems are mostly to be found only in publications outside of astronomy. Here
we offer brief overviews of a number of concepts, techniques and developments,
some "old" and some new. These are generally unknown to most of the
astronomical community, but are vital to the analysis and visualization of
complex datasets and images. In order for astronomers to take advantage of the
richness and complexity of the new era of data, and to be able to identify,
adopt, and apply new solutions, the astronomical community needs a certain
degree of awareness and understanding of the new concepts. One of the goals of
this paper is to help bridge the gap between applied mathematics, artificial
intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in
Astronomy, special issue "Robotic Astronomy
A visual category filter for Google images
We extend the constellation model to include heterogeneous parts which may represent either the appearance or the geometry of a region of the object. The pans and their spatial configuration are learnt simultaneously and automatically, without supervision, from cluttered images.
We describe how this model can be employed for ranking the output of an image search engine when searching for object categories. It is shown that visual consistencies in the output images can be identified, and then used to rank the images according to their closeness to the visual object category.
Although the proportion of good images may be small, the algorithm is designed to be robust and is capable of learning in either a totally unsupervised manner, or with a very limited amount of supervision.
We demonstrate the method on image sets returned by Google's image search for a number of object categories including bottles, camels, cars, horses, tigers and zebras
Geometric and photometric affine invariant image registration
This thesis aims to present a solution to the correspondence problem for the registration
of wide-baseline images taken from uncalibrated cameras. We propose an affine
invariant descriptor that combines the geometry and photometry of the scene to find
correspondences between both views. The geometric affine invariant component of the
descriptor is based on the affine arc-length metric, whereas the photometry is analysed
by invariant colour moments. A graph structure represents the spatial distribution of the
primitive features; i.e. nodes correspond to detected high-curvature points, whereas arcs
represent connectivities by extracted contours. After matching, we refine the search for
correspondences by using a maximum likelihood robust algorithm. We have evaluated
the system over synthetic and real data. The method is endemic to propagation of errors
introduced by approximations in the system.BAE SystemsSelex Sensors and Airborne System
- …