2,478 research outputs found
Memory Based Online Learning of Deep Representations from Video Streams
We present a novel online unsupervised method for face identity learning from
video streams. The method exploits deep face descriptors together with a memory
based learning mechanism that takes advantage of the temporal coherence of
visual data. Specifically, we introduce a discriminative feature matching
solution based on Reverse Nearest Neighbour and a feature forgetting strategy
that detect redundant features and discard them appropriately while time
progresses. It is shown that the proposed learning procedure is asymptotically
stable and can be effectively used in relevant applications like multiple face
identification and tracking from unconstrained video streams. Experimental
results show that the proposed method achieves comparable results in the task
of multiple face tracking and better performance in face identification with
offline approaches exploiting future information. Code will be publicly
available.Comment: arXiv admin note: text overlap with arXiv:1708.0361
Defensive Few-shot Adversarial Learning
The robustness of deep learning models against adversarial attacks has
received increasing attention in recent years. However, both deep learning and
adversarial training rely on the availability of a large amount of labeled data
and usually do not generalize well to new, unseen classes when only a few
training samples are accessible. To address this problem, we explicitly
introduce a new challenging problem -- how to learn a robust deep model with
limited training samples per class, called defensive few-shot learning in this
paper. Simply employing the existing adversarial training techniques in the
literature cannot solve this problem. This is because few-shot learning needs
to learn transferable knowledge from disjoint auxiliary data, and thus it is
invalid to assume the sample-level distribution consistency between the
training and test sets as commonly assumed in existing adversarial training
techniques. In this paper, instead of assuming such a distribution consistency,
we propose to make this assumption at a task-level in the episodic training
paradigm in order to better transfer the defense knowledge. Furthermore, inside
each task, we design a task-conditioned distribution constraint to narrow the
distribution gap between clean and adversarial examples at a sample-level.
These give rise to a novel mechanism called multi-level distribution based
adversarial training (MDAT) for learning transferable adversarial defense. In
addition, a unified score is introduced to evaluate
different defense methods under the same principle. Extensive experiments
demonstrate that MDAT achieves higher effectiveness and robustness over
existing alternatives in the few-shot case.Comment: 10 page
Sparse Spatial Transformers for Few-Shot Learning
Learning from limited data is a challenging task since the scarcity of data
leads to a poor generalization of the trained model. The classical global
pooled representation is likely to lose useful local information. Recently,
many few shot learning methods address this challenge by using deep descriptors
and learning a pixel-level metric. However, using deep descriptors as feature
representations may lose the contextual information of the image. And most of
these methods deal with each class in the support set independently, which
cannot sufficiently utilize discriminative information and task-specific
embeddings. In this paper, we propose a novel Transformer based neural network
architecture called Sparse Spatial Transformers (SSFormers), which can find
task-relevant features and suppress task-irrelevant features. Specifically, we
first divide each input image into several image patches of different sizes to
obtain dense local features. These features retain contextual information while
expressing local information. Then, a sparse spatial transformer layer is
proposed to find spatial correspondence between the query image and the entire
support set to select task-relevant image patches and suppress task-irrelevant
image patches. Finally, we propose to use an image patch matching module for
calculating the distance between dense local representations, thus to determine
which category the query image belongs to in the support set. Extensive
experiments on popular few-shot learning benchmarks show that our method
achieves the state-of-the-art performance
Collaborative Appearance-Based Place Recognition and Improving Place Recognition Using Detection of Dynamic Objects
This dissertation makes contributions to the problem of Long-Term Appearance-Based Place Recognition. We present a framework for place recognition in a collaborative scheme and a method to reduce the impact of dynamic objects on place representations. We demonstrate our findings using a state-of-the-art place recognition approach.
We begin in Part I by describing the general problem of place recognition and its importance in applications where accurate localization is crucial. We discuss feature detection and description and also explain the functioning of several place recognition frameworks.
In Part II, we present a novel framework for collaboration between agents from a pure appearance-based place recognition perspective. Using this framework, multiple agents can efficiently share partial or complete knowledge about places and benefit from their teamwork. This collaborative framework allows agents with limited storage and memory capacity to become useful in environment exploration tasks (for instance, by enabling remote recognition); includes procedures to manage an agent’s memory load and distributes knowledge of places across agents; allows the reuse of knowledge from one agent to another; and increases the tolerance for failure of individual agents. Part II also defines metrics which allow us to measure the performance of a system that uses the collaborative framework.
Finally, in Part III, we present an innovative method to improve the recognition of places in environments densely populated by dynamic objects. We demonstrate that we can improve the recognition performance in these environments by incorporating high- level information from dynamic objects. Tests conducted using a synthetic dataset show the benefits of our approach. The proposed method allows the system to significantly improve the recognition performance in the photo-realistic dataset while reducing storage requirements, resulting in up to 23.7 percent less storage space than the state-of-the-art approach that we have extended; smaller representations also reduced the time required to match places. In Part III, we also formulate the concept of a valid place representation and determine the quality of the observation based on dynamic objects present in the agent’s view.
Of course, recognition systems that are sensitive to dynamic objects incur additional computational costs to recognize those objects. We show that this additional cost is outweighed by the benefits that incorporating dynamic object detection in the place recognition pipeline. Our findings can be used in many applications, including applications for navigation, e.g. assisting visually impaired individuals with navigating indoors, or autonomous vehicles
Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual Recognition
Recognizing novel sub-categories with scarce samples is an essential and
challenging research topic in computer vision. Existing literature addresses
this challenge by employing local-based representation approaches, which may
not sufficiently facilitate meaningful object-specific semantic understanding,
leading to a reliance on apparent background correlations. Moreover, they
primarily rely on high-dimensional local descriptors to construct complex
embedding space, potentially limiting the generalization. To address the above
challenges, this article proposes a novel model called RSaG for few-shot
fine-grained visual recognition. RSaG introduces additional saliency-aware
supervision via saliency detection to guide the model toward focusing on the
intrinsic discriminative regions. Specifically, RSaG utilizes the saliency
detection model to emphasize the critical regions of each sub-category,
providing additional object-specific information for fine-grained prediction.
RSaG transfers such information with two symmetric branches in a mutual
learning paradigm. Furthermore, RSaG exploits inter-regional relationships to
enhance the informativeness of the representation and subsequently summarize
the highlighted details into contextual embeddings to facilitate the effective
transfer, enabling quick generalization to novel sub-categories. The proposed
approach is empirically evaluated on three widely used benchmarks,
demonstrating its superior performance.Comment: Under Revie
- …