Control issues in high level vision.

Abstract

Vision entails complex processes to sense, interpret and reason about the external world. The performance of such processes in a dynamic environment needs to be regulated by flexible and reliable control mechanisms. This thesis is concerned with aspects of control in high level vision. The study of control problems in vision defines a research area which only recently has received adequate attention. Classification criteria such as scope of application, knowledge representation, control structure and communication have been chosen to establish means of comparisons between the existing vision systems. Control problems have recently become of great topical interest as a result of the basic ideas of the active vision paradigm. The proponents of active vision suggest that robust solutions to vision problems arise when sensing and analysis are controlled (i.e. purposively adjusted) to exploit both data and available knowledge (temporal context). The work reported in this thesis follows the basic tenets of active vision. It is directed at the study of control of sensor gaze, scene interpretation and visual strategy monitoring. Control of the visual sensor is an important aspect of active vision. A vision system must be able to establish its orientation with respect to the partially known environment and have control strategies for selecting targets to be viewed. In this thesis algorithms are implemented for establishing vision system pose relative to prestored environment landmarks and for directing gaze to points defined by objects in an established scene model. Particular emphasis has been placed on accounting for and propagating estimation errors arising from both measured image data and inaccuracy of stored scene knowledge. In order to minimise the effect of such errors a hierarchical scene model has been adopted with contextually related objects grouped together. Object positions are described relative to local determined landmarks and this keeps the size of errors within tolerable bounds. The scene interpretation module takes image descriptions in terms of low level features and produces a symbolic description of the scene in terms of known objects classes and their attributes. The construction of the scene model is an incremental process which is achieved by means of several knowledge sources independently controlled by separate modules. The scene interpreter has been carefully structured and operates in a loop of perception that is controlled by high level commands delivered from the system supervisor module. The individual scene interpreter modules operate as locally controlled modules and are instructed as to what visual task to perform, where to look in the scene and what subset of data to use. The module processing takes into account the existing partial scene interpretation. These mechanisms embody the concepts of spatial focus of attention and exploitation of temporal context. Robust scene interpretation is achieved via temporal integration of the interpretation. The element of control concerned with visual strategy monitoring is at the system supervisor level. The supervisor takes a user given task and decides the best strategy to follow in order to satisfy it. This may involve interrogation of existing knowledge or the initiation of new data collection and analysis. In the case of new analysis the supervisor has to express the task in terms of a set of achievable visual tasks and then these are encoded into a control word which is passed to the scene interpreter. The vocabulary of the scene supervisor includes tasks such as general scene exploration, the finding of a specific object, the monitoring of a specified object, the description of attributes of single objects or relationships between two or more objects. The supervisor has to schedule sub-tasks in such a way as to achieve a good solution to the given problem. A considerable number of experiments, which make use of real and synthetic data, demonstrate the advantages of the proposed approach by means of the current implementation (written in C and in the rule based system Clips)

    Similar works

    Full text

    thumbnail-image

    Available Versions