2,687 research outputs found
An attention model and its application in man-made scene interpretation
The ultimate aim of research into computer vision is designing a system which interprets
its surrounding environment in a similar way the human can do effortlessly. However, the
state of technology is far from achieving such a goal. In this thesis different components of
a computer vision system that are designed for the task of interpreting man-made scenes,
in particular images of buildings, are described. The flow of information in the proposed
system is bottom-up i.e., the image is first segmented into its meaningful components and
subsequently the regions are labelled using a contextual classifier.
Starting from simple observations concerning the human vision system and the gestalt laws
of human perception, like the law of “good (simple) shape” and “perceptual grouping”, a
blob detector is developed, that identifies components in a 2D image. These components
are convex regions of interest, with interest being defined as significant gradient magnitude
content. An eye tracking experiment is conducted, which shows that the regions identified
by the blob detector, correlate significantly with the regions which drive the attention of
viewers.
Having identified these blobs, it is postulated that a blob represents an object, linguistically
identified with its own semantic name. In other words, a blob may contain a window a
door or a chimney in a building. These regions are used to identify and segment higher
order structures in a building, like facade, window array and also environmental regions
like sky and ground.
Because of inconsistency in the unary features of buildings, a contextual learning algorithm
is used to classify the segmented regions. A model which learns spatial and topological
relationships between different objects from a set of hand-labelled data, is used. This
model utilises this information in a MRF to achieve consistent labellings of new scenes
- …