9,315 research outputs found
Learning Mid-Level Representations for Visual Recognition
The objective of this thesis is to enhance visual recognition for objects and scenes
through the development of novel mid-level representations and appendent learning
algorithms. In particular, this work is focusing on category level recognition which
is still a very challenging and mainly unsolved task. One crucial component in visual
recognition systems is the representation of objects and scenes. However, depending on
the representation, suitable learning strategies need to be developed that make it possible
to learn new categories automatically from training data. Therefore, the aim of this thesis
is to extend low-level representations by mid-level representations and to develop suitable
learning mechanisms.
A popular kind of mid-level representations are higher order statistics such as
self-similarity and co-occurrence statistics. While these descriptors are satisfying the
demand for higher-level object representations, they are also exhibiting very large and ever
increasing dimensionality. In this thesis a new object representation, based on curvature
self-similarity, is suggested that goes beyond the currently popular approximation of
objects using straight lines. However, like all descriptors using second order statistics,
it also exhibits a high dimensionality. Although improving discriminability, the high
dimensionality becomes a critical issue due to lack of generalization ability and curse
of dimensionality. Given only a limited amount of training data, even sophisticated
learning algorithms such as the popular kernel methods are not able to suppress noisy or
superfluous dimensions of such high-dimensional data. Consequently, there is a natural
need for feature selection when using present-day informative features and, particularly,
curvature self-similarity. We therefore suggest an embedded feature selection method for
support vector machines that reduces complexity and improves generalization capability
of object models. The proposed curvature self-similarity representation is successfully
integrated together with the embedded feature selection in a widely used state-of-the-art
object detection framework.
The influence of higher order statistics for category level object recognition, is further
investigated by learning co-occurrences between foreground and background, to reduce
the number of false detections. While the suggested curvature self-similarity descriptor
is improving the model for more detailed description of the foreground, higher order
statistics are now shown to be also suitable for explicitly modeling the background.
This is of particular use for the popular chamfer matching technique, since it is prone
to accidental matches in dense clutter. As clutter only interferes with the foreground
model contour, we learn where to place the background contours with respect to the
foreground object boundary. The co-occurrence of background contours is integrated
into a max-margin framework. Thus the suggested approach combines the advantages of
accurately detecting object parts via chamfer matching and the robustness of max-margin
learning.
While chamfer matching is very efficient technique for object detection, parts are only
detected based on a simple distance measure. Contrary to that, mid-level parts and
patches are explicitly trained to distinguish true positives in the foreground from false
positives in the background. Due to the independence of mid-level patches and parts it
is possible to train a large number of instance specific part classifiers. This is contrary
to the current most powerful discriminative approaches that are typically only feasible
for a small number of parts, as they are modeling the spatial dependencies between
them. Due to their number, we cannot directly train a powerful classifier to combine
all parts. Instead, parts are randomly grouped into fewer, overlapping compositions that
are trained using a maximum-margin approach. In contrast to the common rationale of
compositional approaches, we do not aim for semantically meaningful ensembles. Rather
we seek randomized compositions that are discriminative and generalize over all instances
of a category. Compositions are all combined by a non-linear decision function which is
completing the powerful hierarchy of discriminative classifiers.
In summary, this thesis is improving visual recognition of objects and scenes, by
developing novel mid-level representations on top of different kinds of low-level
representations. Furthermore, it investigates in the development of suitable learning
algorithms, to deal with the new challenges that are arising form the novel object
representations presented in this work
FPGA-based Anomalous trajectory detection using SOFM
A system for automatically classifying the trajectory of a moving object in a scene as usual or suspicious is presented. The system uses an unsupervised neural network (Self Organising Feature Map) fully implemented on a reconfigurable hardware architecture (Field Programmable Gate Array) to cluster trajectories acquired over a period, in order to detect novel ones. First order motion information, including first order moving average smoothing, is generated from the 2D image coordinates (trajectories). The classification is dynamic and achieved in real-time. The dynamic classifier is achieved using a SOFM and a probabilistic model. Experimental results show less than 15\% classification error, showing the robustness of our approach over others in literature and the speed-up over the use of conventional microprocessor as compared to the use of an off-the-shelf FPGA prototyping board
Robust Temporally Coherent Laplacian Protrusion Segmentation of 3D Articulated Bodies
In motion analysis and understanding it is important to be able to fit a
suitable model or structure to the temporal series of observed data, in order
to describe motion patterns in a compact way, and to discriminate between them.
In an unsupervised context, i.e., no prior model of the moving object(s) is
available, such a structure has to be learned from the data in a bottom-up
fashion. In recent times, volumetric approaches in which the motion is captured
from a number of cameras and a voxel-set representation of the body is built
from the camera views, have gained ground due to attractive features such as
inherent view-invariance and robustness to occlusions. Automatic, unsupervised
segmentation of moving bodies along entire sequences, in a temporally-coherent
and robust way, has the potential to provide a means of constructing a
bottom-up model of the moving body, and track motion cues that may be later
exploited for motion classification. Spectral methods such as locally linear
embedding (LLE) can be useful in this context, as they preserve "protrusions",
i.e., high-curvature regions of the 3D volume, of articulated shapes, while
improving their separation in a lower dimensional space, making them in this
way easier to cluster. In this paper we therefore propose a spectral approach
to unsupervised and temporally-coherent body-protrusion segmentation along time
sequences. Volumetric shapes are clustered in an embedding space, clusters are
propagated in time to ensure coherence, and merged or split to accommodate
changes in the body's topology. Experiments on both synthetic and real
sequences of dense voxel-set data are shown. This supports the ability of the
proposed method to cluster body-parts consistently over time in a totally
unsupervised fashion, its robustness to sampling density and shape quality, and
its potential for bottom-up model constructionComment: 31 pages, 26 figure
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
- …