2,632 research outputs found
Learning Mid-Level Representations for Visual Recognition
The objective of this thesis is to enhance visual recognition for objects and scenes
through the development of novel mid-level representations and appendent learning
algorithms. In particular, this work is focusing on category level recognition which
is still a very challenging and mainly unsolved task. One crucial component in visual
recognition systems is the representation of objects and scenes. However, depending on
the representation, suitable learning strategies need to be developed that make it possible
to learn new categories automatically from training data. Therefore, the aim of this thesis
is to extend low-level representations by mid-level representations and to develop suitable
learning mechanisms.
A popular kind of mid-level representations are higher order statistics such as
self-similarity and co-occurrence statistics. While these descriptors are satisfying the
demand for higher-level object representations, they are also exhibiting very large and ever
increasing dimensionality. In this thesis a new object representation, based on curvature
self-similarity, is suggested that goes beyond the currently popular approximation of
objects using straight lines. However, like all descriptors using second order statistics,
it also exhibits a high dimensionality. Although improving discriminability, the high
dimensionality becomes a critical issue due to lack of generalization ability and curse
of dimensionality. Given only a limited amount of training data, even sophisticated
learning algorithms such as the popular kernel methods are not able to suppress noisy or
superfluous dimensions of such high-dimensional data. Consequently, there is a natural
need for feature selection when using present-day informative features and, particularly,
curvature self-similarity. We therefore suggest an embedded feature selection method for
support vector machines that reduces complexity and improves generalization capability
of object models. The proposed curvature self-similarity representation is successfully
integrated together with the embedded feature selection in a widely used state-of-the-art
object detection framework.
The influence of higher order statistics for category level object recognition, is further
investigated by learning co-occurrences between foreground and background, to reduce
the number of false detections. While the suggested curvature self-similarity descriptor
is improving the model for more detailed description of the foreground, higher order
statistics are now shown to be also suitable for explicitly modeling the background.
This is of particular use for the popular chamfer matching technique, since it is prone
to accidental matches in dense clutter. As clutter only interferes with the foreground
model contour, we learn where to place the background contours with respect to the
foreground object boundary. The co-occurrence of background contours is integrated
into a max-margin framework. Thus the suggested approach combines the advantages of
accurately detecting object parts via chamfer matching and the robustness of max-margin
learning.
While chamfer matching is very efficient technique for object detection, parts are only
detected based on a simple distance measure. Contrary to that, mid-level parts and
patches are explicitly trained to distinguish true positives in the foreground from false
positives in the background. Due to the independence of mid-level patches and parts it
is possible to train a large number of instance specific part classifiers. This is contrary
to the current most powerful discriminative approaches that are typically only feasible
for a small number of parts, as they are modeling the spatial dependencies between
them. Due to their number, we cannot directly train a powerful classifier to combine
all parts. Instead, parts are randomly grouped into fewer, overlapping compositions that
are trained using a maximum-margin approach. In contrast to the common rationale of
compositional approaches, we do not aim for semantically meaningful ensembles. Rather
we seek randomized compositions that are discriminative and generalize over all instances
of a category. Compositions are all combined by a non-linear decision function which is
completing the powerful hierarchy of discriminative classifiers.
In summary, this thesis is improving visual recognition of objects and scenes, by
developing novel mid-level representations on top of different kinds of low-level
representations. Furthermore, it investigates in the development of suitable learning
algorithms, to deal with the new challenges that are arising form the novel object
representations presented in this work
Temporal Attention-Gated Model for Robust Sequence Classification
Typical techniques for sequence classification are designed for
well-segmented sequences which have been edited to remove noisy or irrelevant
parts. Therefore, such methods cannot be easily applied on noisy sequences
expected in real-world applications. In this paper, we present the Temporal
Attention-Gated Model (TAGM) which integrates ideas from attention models and
gated recurrent networks to better deal with noisy or unsegmented sequences.
Specifically, we extend the concept of attention model to measure the relevance
of each observation (time step) of a sequence. We then use a novel gated
recurrent network to learn the hidden representation for the final prediction.
An important advantage of our approach is interpretability since the temporal
attention weights provide a meaningful value for the salience of each time step
in the sequence. We demonstrate the merits of our TAGM approach, both for
prediction accuracy and interpretability, on three different tasks: spoken
digit recognition, text-based sentiment analysis and visual event recognition.Comment: Accepted by CVPR 201
PartCom: Part Composition Learning for 3D Open-Set Recognition
3D recognition is the foundation of 3D deep learning in many emerging fields,
such as autonomous driving and robotics.Existing 3D methods mainly focus on the
recognition of a fixed set of known classes and neglect possible unknown
classes during testing. These unknown classes may cause serious accidents in
safety-critical applications, i.e. autonomous driving. In this work, we make a
first attempt to address 3D open-set recognition (OSR) so that a classifier can
recognize known classes as well as be aware of unknown classes. We analyze
open-set risks in the 3D domain and point out the overconfidence and
under-representation problems that make existing methods perform poorly on the
3D OSR task. To resolve above problems, we propose a novel part prototype-based
OSR method named PartCom. We use part prototypes to represent a 3D shape as a
part composition, since a part composition can represent the overall structure
of a shape and can help distinguish different known classes and unknown ones.
Then we formulate two constraints on part prototypes to ensure their
effectiveness. To reduce open-set risks further, we devise a PUFS module to
synthesize unknown features as representatives of unknown samples by mixing up
part composite features of different classes. We conduct experiments on three
kinds of 3D OSR tasks based on both CAD shape dataset and scan shape dataset.
Extensive experiments show that our method is powerful in classifying known
classes and unknown ones and can attain much better results than SOTA baselines
on all 3D OSR tasks. The project will be released
Privacy-Preserving Face Recognition Using Random Frequency Components
The ubiquitous use of face recognition has sparked increasing privacy
concerns, as unauthorized access to sensitive face images could compromise the
information of individuals. This paper presents an in-depth study of the
privacy protection of face images' visual information and against recovery.
Drawing on the perceptual disparity between humans and models, we propose to
conceal visual information by pruning human-perceivable low-frequency
components. For impeding recovery, we first elucidate the seeming paradox
between reducing model-exploitable information and retaining high recognition
accuracy. Based on recent theoretical insights and our observation on model
attention, we propose a solution to the dilemma, by advocating for the training
and inference of recognition models on randomly selected frequency components.
We distill our findings into a novel privacy-preserving face recognition
method, PartialFace. Extensive experiments demonstrate that PartialFace
effectively balances privacy protection goals and recognition accuracy. Code is
available at: https://github.com/Tencent/TFace.Comment: ICCV 202
- …