291 research outputs found
Invariance of visual operations at the level of receptive fields
Receptive field profiles registered by cell recordings have shown that
mammalian vision has developed receptive fields tuned to different sizes and
orientations in the image domain as well as to different image velocities in
space-time. This article presents a theoretical model by which families of
idealized receptive field profiles can be derived mathematically from a small
set of basic assumptions that correspond to structural properties of the
environment. The article also presents a theory for how basic invariance
properties to variations in scale, viewing direction and relative motion can be
obtained from the output of such receptive fields, using complementary
selection mechanisms that operate over the output of families of receptive
fields tuned to different parameters. Thereby, the theory shows how basic
invariance properties of a visual system can be obtained already at the level
of receptive fields, and we can explain the different shapes of receptive field
profiles found in biological vision from a requirement that the visual system
should be invariant to the natural types of image transformations that occur in
its environment.Comment: 40 pages, 17 figure
Affine Subspace Representation for Feature Description
This paper proposes a novel Affine Subspace Representation (ASR) descriptor
to deal with affine distortions induced by viewpoint changes. Unlike the
traditional local descriptors such as SIFT, ASR inherently encodes local
information of multi-view patches, making it robust to affine distortions while
maintaining a high discriminative ability. To this end, PCA is used to
represent affine-warped patches as PCA-patch vectors for its compactness and
efficiency. Then according to the subspace assumption, which implies that the
PCA-patch vectors of various affine-warped patches of the same keypoint can be
represented by a low-dimensional linear subspace, the ASR descriptor is
obtained by using a simple subspace-to-point mapping. Such a linear subspace
representation could accurately capture the underlying information of a
keypoint (local structure) under multiple views without sacrificing its
distinctiveness. To accelerate the computation of ASR descriptor, a fast
approximate algorithm is proposed by moving the most computational part (ie,
warp patch under various affine transformations) to an offline training stage.
Experimental results show that ASR is not only better than the state-of-the-art
descriptors under various image transformations, but also performs well without
a dedicated affine invariant detector when dealing with viewpoint changes.Comment: To Appear in the 2014 European Conference on Computer Visio
Covariance properties under natural image transformations for the generalized Gaussian derivative model for visual receptive fields
This paper presents a theory for how geometric image transformations can be
handled by a first layer of linear receptive fields, in terms of true
covariance properties, which, in turn, enable geometric invariance properties
at higher levels in the visual hierarchy. Specifically, we develop this theory
for a generalized Gaussian derivative model for visual receptive fields, which
is derived in an axiomatic manner from first principles, that reflect symmetry
properties of the environment, complemented by structural assumptions to
guarantee internally consistent treatment of image structures over multiple
spatio-temporal scales.
It is shown how the studied generalized Gaussian derivative model for visual
receptive fields obeys true covariance properties under spatial scaling
transformations, spatial affine transformations, Galilean transformations and
temporal scaling transformations, implying that a vision system, based on image
and video measurements in terms of the receptive fields according to this
model, can to first order of approximation handle the image and video
deformations between multiple views of objects delimited by smooth surfaces, as
well as between multiple views of spatio-temporal events, under varying
relative motions between the objects and events in the world and the observer.
We conclude by describing implications of the presented theory for biological
vision, regarding connections between the variabilities of the shapes of
biological visual receptive fields and the variabilities of spatial and
spatio-temporal image structures under natural image transformations.Comment: 38 pages, 14 figure
Object Tracking
Object tracking consists in estimation of trajectory of moving objects in the sequence of images. Automation of the computer object tracking is a difficult task. Dynamics of multiple parameters changes representing features and motion of the objects, and temporary partial or full occlusion of the tracked objects have to be considered. This monograph presents the development of object tracking algorithms, methods and systems. Both, state of the art of object tracking methods and also the new trends in research are described in this book. Fourteen chapters are split into two sections. Section 1 presents new theoretical ideas whereas Section 2 presents real-life applications. Despite the variety of topics contained in this monograph it constitutes a consisted knowledge in the field of computer object tracking. The intention of editor was to follow up the very quick progress in the developing of methods as well as extension of the application
Evaluation of Local Feature Detectors for the Comparison of Thermal and Visual Low Altitude Aerial Images
Local features are key regions of an image suitable for applications such as image matching, and fusion. Detection of targets under varying atmospheric conditions, via aerial images is a typical defence application where multi spectral correlation is essential. Focuses on local features for the comparison of thermal and visual aerial images in this study. The state of the art differential and intensity comparison based features are evaluated over the dataset. An improved affine invariant feature is proposed with a new saliency measure. The performances of the existing and the proposed features are measured with a ground truth transformation estimated for each of the image pairs. Among the state of the art local features, Speeded Up Robust Feature exhibited the highest average repeatability of 57 per cent. The proposed detector produces features with average repeatability of 64 per cent. Future works include design of techniques for retrieval of corresponding regions
Registration and categorization of camera captured documents
Camera captured document image analysis concerns with processing of documents captured with hand-held sensors, smart phones, or other capturing devices using advanced image processing, computer vision, pattern recognition, and machine learning techniques. As there is no constrained capturing in the real world, the captured documents suffer from illumination variation, viewpoint variation, highly variable scale/resolution, background clutter, occlusion, and non-rigid deformations e.g., folds and crumples. Document registration is a problem where the image of a template document whose layout is known is registered with a test document image. Literature in camera captured document mosaicing addressed the registration of captured documents with the assumption of considerable amount of single chunk overlapping content. These methods cannot be directly applied to registration of forms, bills, and other commercial documents where the fixed content is distributed into tiny portions across the document. On the other hand, most of the existing document image registration methods work with scanned documents under affine transformation. Literature in document image retrieval addressed categorization of documents based on text, figures, etc.
However, the scalability of existing document categorization methodologies based on logo identification is very limited. This dissertation focuses on two problems (i) registration of captured documents where the overlapping content is distributed into tiny portions across the documents and (ii) categorization of captured documents into predefined logo classes that scale to large datasets using local invariant features. A novel methodology is proposed for the registration of user defined Regions Of Interest (ROI) using corresponding local features from their neighborhood. The methodology enhances prior approaches in point pattern based registration, like RANdom SAmple Consensus (RANSAC) and Thin Plate Spline-Robust Point Matching (TPS-RPM), to enable registration of cell phone and camera captured documents under non-rigid transformations. Three novel aspects are embedded into the methodology: (i) histogram based uniformly transformed correspondence estimation, (ii) clustering of points located near the ROI to select only close by regions for matching, and (iii) validation of the registration in RANSAC and TPS-RPM algorithms. Experimental results on a dataset of 480 images captured using iPhone 3GS and Logitech webcam Pro 9000 have shown an average registration accuracy of 92.75% using Scale Invariant Feature Transform (SIFT).
Robust local features for logo identification are determined empirically by comparisons among SIFT, Speeded-Up Robust Features (SURF), Hessian-Affine, Harris-Affine, and Maximally Stable Extremal Regions (MSER). Two different matching methods are presented for categorization: matching all features extracted from the query document as a single set and a segment-wise matching of query document features using segmentation achieved by grouping area under intersecting dense local affine covariant regions. The later approach not only gives an approximate location of predicted logo classes in the query document but also helps to increase the prediction accuracies. In order to facilitate scalability to large data sets, inverted indexing of logo class features has been incorporated in both approaches. Experimental results on a dataset of real camera captured documents have shown a peak 13.25% increase in the Fâmeasure accuracy using the later approach as compared to the former
- âŠ