311,349 research outputs found

    Bridging Between Computer and Robot Vision Through Data Augmentation: A Case Study on Object Recognition

    Get PDF
    Despite the impressive progress brought by deep network in visual object recognition, robot vision is still far from being a solved problem. The most successful convolutional architectures are developed starting from ImageNet, a large scale collection of images of object categories downloaded from the Web. This kind of images is very different from the situated and embodied visual experience of robots deployed in unconstrained settings. To reduce the gap between these two visual experiences, this paper proposes a simple yet effective data augmentation layer that zooms on the object of interest and simulates the object detection outcome of a robot vision system. The layer, that can be used with any convolutional deep architecture, brings to an increase in object recognition performance of up to 7{\%}, in experiments performed over three different benchmark databases. An implementation of our robot data augmentation layer has been made publicly available

    A strategy for the visual recognition of objects in an industrial environment.

    Get PDF
    This thesis is concerned with the problem of recognizing industrial objects rapidly and flexibly. The system design is based on a general strategy that consists of a generalized local feature detector, an extended learning algorithm and the use of unique structure of the objects. Thus, the system is not designed to be limited to the industrial environment. The generalized local feature detector uses the gradient image of the scene to provide a feature description that is insensitive to a range of imaging conditions such as object position, and overall light intensity. The feature detector is based on a representative point algorithm which is able to reduce the data content of the image without restricting the allowed object geometry. Thus, a major advantage of the local feature detector is its ability to describe and represent complex object structure. The reliance on local features also allows the system to recognize partially visible objects. The task of the learning algorithm is to observe the feature description generated by the feature detector in order to select features that are reliable over the range of imaging conditions of interest. Once a set of reliable features is found for each object, the system finds unique relational structure which is later used to recognize the objects. Unique structure is a set of descriptions of unique subparts of the objects of interest. The present implementation is limited to the use of unique local structure. The recognition routine uses these unique descriptions to recognize objects in new images. An important feature of this strategy is the transference of a large amount of processing required for graph matching from the recognition stage to the learning stage, which allows the recognition routine to execute rapidly. The test results show that the system is able to function with a significant level of insensitivity to operating conditions; The system shows insensitivity to its 3 main assumptions -constant scale, constant lighting, and 2D images- displaying a degree of graceful degradation when the operating conditions degrade. For example, for one set of test objects, the recognition threshold was reached when the absolute light level was reduced by 70%-80%, or the object scale was reduced by 30%-40%, or the object was tilted away from the learned 2D plane by 300-400. This demonstrates a very important feature of the learning strategy: It shows that the generalizations made by the system are not only valid within the domain of the sampled set of images, but extend outside this domain. The test results also show that the recognition routine is able to execute rapidly, requiring 10ms-500ms (on a PDP11/24 minicomputer) in the special case when ideal operating conditions are guaranteed. (Note: This does not include pre-processing time). This thesis describes the strategy, the architecture and the implementation of the vision system in detail, and gives detailed test results. A proposal for extending the system to scale independent 3D object recognition is also given

    Data comparison schemes for Pattern Recognition in Digital Images using Fractals

    Get PDF
    Pattern recognition in digital images is a common problem with application in remote sensing, electron microscopy, medical imaging, seismic imaging and astrophysics for example. Although this subject has been researched for over twenty years there is still no general solution which can be compared with the human cognitive system in which a pattern can be recognised subject to arbitrary orientation and scale. The application of Artificial Neural Networks can in principle provide a very general solution providing suitable training schemes are implemented. However, this approach raises some major issues in practice. First, the CPU time required to train an ANN for a grey level or colour image can be very large especially if the object has a complex structure with no clear geometrical features such as those that arise in remote sensing applications. Secondly, both the core and file space memory required to represent large images and their associated data tasks leads to a number of problems in which the use of virtual memory is paramount. The primary goal of this research has been to assess methods of image data compression for pattern recognition using a range of different compression methods. In particular, this research has resulted in the design and implementation of a new algorithm for general pattern recognition based on the use of fractal image compression. This approach has for the first time allowed the pattern recognition problem to be solved in a way that is invariant of rotation and scale. It allows both ANNs and correlation to be used subject to appropriate pre-and post-processing techniques for digital image processing on aspect for which a dedicated programmer's work bench has been developed using X-Designer

    Grounding semantics in robots for Visual Question Answering

    Get PDF
    In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning
    • …
    corecore