5,366 research outputs found

    Robust automatic target tracking based on a Bayesian ego-motion compensation framework for airborne FLIR imagery

    Get PDF
    Automatic target tracking in airborne FLIR imagery is currently a challenge due to the camera ego-motion. This phenomenon distorts the spatio-temporal correlation of the video sequence, which dramatically reduces the tracking performance. Several works address this problem using ego-motion compensation strategies. They use a deterministic approach to compensate the camera motion assuming a specific model of geometric transformation. However, in real sequences a specific geometric transformation can not accurately describe the camera ego-motion for the whole sequence, and as consequence of this, the performance of the tracking stage can significantly decrease, even completely fail. The optimum transformation for each pair of consecutive frames depends on the relative depth of the elements that compose the scene, and their degree of texturization. In this work, a novel Particle Filter framework is proposed to efficiently manage several hypothesis of geometric transformations: Euclidean, affine, and projective. Each type of transformation is used to compute candidate locations of the object in the current frame. Then, each candidate is evaluated by the measurement model of the Particle Filter using the appearance information. This approach is able to adapt to different camera ego-motion conditions, and thus to satisfactorily perform the tracking. The proposed strategy has been tested on the AMCOM FLIR dataset, showing a high efficiency in the tracking of different types of targets in real working conditions

    The computational magic of the ventral stream

    Get PDF
    I argue that the sample complexity of (biological, feedforward) object recognition is mostly due to geometric image transformations and conjecture that a main goal of the ventral stream – V1, V2, V4 and IT – is to learn-and-discount image transformations.

In the first part of the paper I describe a class of simple and biologically plausible memory-based modules that learn transformations from unsupervised visual experience. The main theorems show that these modules provide (for every object) a signature which is invariant to local affine transformations and approximately invariant for other transformations. I also prove that,
in a broad class of hierarchical architectures, signatures remain invariant from layer to layer. The identification of these memory-based modules with complex (and simple) cells in visual areas leads to a theory of invariant recognition for the ventral stream.

In the second part, I outline a theory about hierarchical architectures that can learn invariance to transformations. I show that the memory complexity of learning affine transformations is drastically reduced in a hierarchical architecture that factorizes transformations in terms of the subgroup of translations and the subgroups of rotations and scalings. I then show how translations are automatically selected as the only learnable transformations during development by enforcing small apertures – eg small receptive fields – in the first layer.

In a third part I show that the transformations represented in each area can be optimized in terms of storage and robustness, as a consequence determining the tuning of the neurons in the area, rather independently (under normal conditions) of the statistics of natural images. I describe a model of learning that can be proved to have this property, linking in an elegant way the spectral properties of the signatures with the tuning of receptive fields in different areas. A surprising implication of these theoretical results is that the computational goals and some of the tuning properties of cells in the ventral stream may follow from symmetry properties (in the sense of physics) of the visual world through a process of unsupervised correlational learning, based on Hebbian synapses. In particular, simple and complex cells do not directly care about oriented bars: their tuning is a side effect of their role in translation invariance. Across the whole ventral stream the preferred features reported for neurons in different areas are only a symptom of the invariances computed and represented.

The results of each of the three parts stand on their own independently of each other. Together this theory-in-fieri makes several broad predictions, some of which are:

-invariance to small transformations in early areas (eg translations in V1) may underly stability of visual perception (suggested by Stu Geman);

-each cell’s tuning properties are shaped by visual experience of image transformations during developmental and adult plasticity;

-simple cells are likely to be the same population as complex cells, arising from different convergence of the Hebbian learning rule. The input to complex “complex” cells are dendritic branches with simple cell properties;

-class-specific transformations are learned and represented at the top of the ventral stream hierarchy; thus class-specific modules such as faces, places and possibly body areas should exist in IT;

-the type of transformations that are learned from visual experience depend on the size of the receptive fields and thus on the area (layer in the models) – assuming that the size increases with layers;

-the mix of transformations learned in each area influences the tuning properties of the cells oriented bars in V1+V2, radial and spiral patterns in V4 up to class specific tuning in AIT (eg face tuned cells);

-features must be discriminative and invariant: invariance to transformations is the primary determinant of the tuning of cortical neurons rather than statistics of natural images.

The theory is broadly consistent with the current version of HMAX. It explains it and extend it in terms of unsupervised learning, a broader class of transformation invariance and higher level modules. The goal of this paper is to sketch a comprehensive theory with little regard for mathematical niceties. If the theory turns out to be useful there will be scope for deep mathematics, ranging from group representation tools to wavelet theory to dynamics of learning

    Object Tracking from Unstabilized Platforms by Particle Filtering with Embedded Camera Ego Motion

    Get PDF
    Visual tracking with moving cameras is a challenging task. The global motion induced by the moving camera moves the target object outside the expected search area, according to the object dynamics. The typical approach is to use a registration algorithm to compensate the camera motion. However, in situations involving several moving objects, and backgrounds highly affected by the aperture problem, image registration quality may be very low, decreasing dramatically the performance of the tracking. In this work, a novel approach is proposed to successfully tackle the tracking with moving cameras in complex situations, which involve several independent moving objects. The key idea is to compute several hypotheses for the camera motion, instead of estimating deterministically only one. These hypotheses are combined with the object dynamics in a Particle Filter framework to predict the most probable object locations. Then, each hypothetical object location is evaluated by the measurement model using a spatiogram, which is a region descriptor based on color and spatial distributions. Experimental results show that the proposed strategy allows to accurately track an object in complex situations affected by strong ego motion

    The Computational Magic of the Ventral Stream: Towards a Theory

    Get PDF
    I conjecture that the sample complexity of object recognition is mostly due to geometric image transformations and that a main goal of the ventral stream – V1, V2, V4 and IT – is to learn-and-discount image transformations. The most surprising implication of the theory emerging from these assumptions is that the computational goals and detailed properties of cells in the ventral stream follow from symmetry properties of the visual world through a process of unsupervised correlational learning.

From the assumption of a hierarchy of areas with receptive fields of increasing size the theory predicts that the size of the receptive fields determines which transformations are learned during development and then factored out during normal processing; that the transformation represented in each area determines the tuning of the neurons in the aerea, independently of the statistics of natural images; and that class-specific transformations are learned and represented at the top of the ventral stream hierarchy.

Some of the main predictions of this theory-in-fieri are:
1. the type of transformation that are learned from visual experience depend on the size (measured in terms of wavelength) and thus on the area (layer in the models) – assuming that the aperture size increases with layers;
2. the mix of transformations learned determine the properties of the receptive fields – oriented bars in V1+V2, radial and spiral patterns in V4 up to class specific tuning in AIT (eg face tuned cells);
3. invariance to small translations in V1 may underly stability of visual perception
4. class-specific modules – such as faces, places and possibly body areas – should exist in IT to process images of object classes

    Invariance of visual operations at the level of receptive fields

    Get PDF
    Receptive field profiles registered by cell recordings have shown that mammalian vision has developed receptive fields tuned to different sizes and orientations in the image domain as well as to different image velocities in space-time. This article presents a theoretical model by which families of idealized receptive field profiles can be derived mathematically from a small set of basic assumptions that correspond to structural properties of the environment. The article also presents a theory for how basic invariance properties to variations in scale, viewing direction and relative motion can be obtained from the output of such receptive fields, using complementary selection mechanisms that operate over the output of families of receptive fields tuned to different parameters. Thereby, the theory shows how basic invariance properties of a visual system can be obtained already at the level of receptive fields, and we can explain the different shapes of receptive field profiles found in biological vision from a requirement that the visual system should be invariant to the natural types of image transformations that occur in its environment.Comment: 40 pages, 17 figure

    Graph matching with a dual-step EM algorithm

    Get PDF
    This paper describes a new approach to matching geometric structure in 2D point-sets. The novel feature is to unify the tasks of estimating transformation geometry and identifying point-correspondence matches. Unification is realized by constructing a mixture model over the bipartite graph representing the correspondence match and by affecting optimization using the EM algorithm. According to our EM framework, the probabilities of structural correspondence gate contributions to the expected likelihood function used to estimate maximum likelihood transformation parameters. These gating probabilities measure the consistency of the matched neighborhoods in the graphs. The recovery of transformational geometry and hard correspondence matches are interleaved and are realized by applying coupled update operations to the expected log-likelihood function. In this way, the two processes bootstrap one another. This provides a means of rejecting structural outliers. We evaluate the technique on two real-world problems. The first involves the matching of different perspective views of 3.5-inch floppy discs. The second example is furnished by the matching of a digital map against aerial images that are subject to severe barrel distortion due to a line-scan sampling process. We complement these experiments with a sensitivity study based on synthetic data

    Geometric and photometric affine invariant image registration

    Get PDF
    This thesis aims to present a solution to the correspondence problem for the registration of wide-baseline images taken from uncalibrated cameras. We propose an affine invariant descriptor that combines the geometry and photometry of the scene to find correspondences between both views. The geometric affine invariant component of the descriptor is based on the affine arc-length metric, whereas the photometry is analysed by invariant colour moments. A graph structure represents the spatial distribution of the primitive features; i.e. nodes correspond to detected high-curvature points, whereas arcs represent connectivities by extracted contours. After matching, we refine the search for correspondences by using a maximum likelihood robust algorithm. We have evaluated the system over synthetic and real data. The method is endemic to propagation of errors introduced by approximations in the system.BAE SystemsSelex Sensors and Airborne System
    corecore