311,349 research outputs found
Bridging Between Computer and Robot Vision Through Data Augmentation: A Case Study on Object Recognition
Despite the impressive progress brought by deep network in visual object recognition, robot vision is still far from being a solved problem. The most successful convolutional architectures are developed starting from ImageNet, a large scale collection of images of object categories downloaded from the Web. This kind of images is very different from the situated and embodied visual experience of robots deployed in unconstrained settings. To reduce the gap between these two visual experiences, this paper proposes a simple yet effective data augmentation layer that zooms on the object of interest and simulates the object detection outcome of a robot vision system. The layer, that can be used with any convolutional deep architecture, brings to an increase in object recognition performance of up to 7{\%}, in experiments performed over three different benchmark databases. An implementation of our robot data augmentation layer has been made publicly available
A strategy for the visual recognition of objects in an industrial environment.
This thesis is concerned with the problem of recognizing industrial
objects rapidly and flexibly. The system design is based on a
general strategy that consists of a generalized local feature detector,
an extended learning algorithm and the use of unique structure of
the objects. Thus, the system is not designed to be limited to the
industrial environment.
The generalized local feature detector uses the gradient image of
the scene to provide a feature description that is insensitive to a
range of imaging conditions such as object position, and overall light
intensity. The feature detector is based on a representative point
algorithm which is able to reduce the data content of the image
without restricting the allowed object geometry. Thus, a major advantage
of the local feature detector is its ability to describe and
represent complex object structure. The reliance on local features
also allows the system to recognize partially visible objects.
The task of the learning algorithm is to observe the feature
description generated by the feature detector in order to select
features that are reliable over the range of imaging conditions of
interest. Once a set of reliable features is found for each object,
the system finds unique relational structure which is later used to
recognize the objects. Unique structure is a set of descriptions of
unique subparts of the objects of interest. The present implementation
is limited to the use of unique local structure. The recognition
routine uses these unique descriptions to recognize objects in new
images. An important feature of this strategy is the transference of
a large amount of processing required for graph matching from the
recognition stage to the learning stage, which allows the recognition
routine to execute rapidly.
The test results show that the system is able to function with a
significant level of insensitivity to operating conditions; The system
shows insensitivity to its 3 main assumptions -constant scale, constant
lighting, and 2D images- displaying a degree of graceful degradation
when the operating conditions degrade. For example, for one
set of test objects, the recognition threshold was reached when the
absolute light level was reduced by 70%-80%, or the object scale was
reduced by 30%-40%, or the object was tilted away from the learned 2D
plane by 300-400. This demonstrates a very important feature of the
learning strategy: It shows that the generalizations made by the system
are not only valid within the domain of the sampled set of images,
but extend outside this domain. The test results also show that the
recognition routine is able to execute rapidly, requiring 10ms-500ms
(on a PDP11/24 minicomputer) in the special case when ideal operating
conditions are guaranteed. (Note: This does not include pre-processing
time). This thesis describes the strategy, the architecture and the
implementation of the vision system in detail, and gives detailed test
results. A proposal for extending the system to scale independent 3D
object recognition is also given
Data comparison schemes for Pattern Recognition in Digital Images using Fractals
Pattern recognition in digital images is a common problem with application in
remote sensing, electron microscopy, medical imaging, seismic imaging and
astrophysics for example. Although this subject has been researched for over
twenty years there is still no general solution which can be compared with the
human cognitive system in which a pattern can be recognised subject to
arbitrary orientation and scale.
The application of Artificial Neural Networks can in principle provide a very
general solution providing suitable training schemes are implemented.
However, this approach raises some major issues in practice. First, the CPU
time required to train an ANN for a grey level or colour image can be very
large especially if the object has a complex structure with no clear geometrical
features such as those that arise in remote sensing applications. Secondly,
both the core and file space memory required to represent large images and
their associated data tasks leads to a number of problems in which the use of
virtual memory is paramount.
The primary goal of this research has been to assess methods of image data
compression for pattern recognition using a range of different compression
methods. In particular, this research has resulted in the design and
implementation of a new algorithm for general pattern recognition based on
the use of fractal image compression.
This approach has for the first time allowed the pattern recognition problem to
be solved in a way that is invariant of rotation and scale. It allows both ANNs
and correlation to be used subject to appropriate pre-and post-processing
techniques for digital image processing on aspect for which a dedicated
programmer's work bench has been developed using X-Designer
Grounding semantics in robots for Visual Question Answering
In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning
- …