2,478 research outputs found

    Memory Based Online Learning of Deep Representations from Video Streams

    Full text link
    We present a novel online unsupervised method for face identity learning from video streams. The method exploits deep face descriptors together with a memory based learning mechanism that takes advantage of the temporal coherence of visual data. Specifically, we introduce a discriminative feature matching solution based on Reverse Nearest Neighbour and a feature forgetting strategy that detect redundant features and discard them appropriately while time progresses. It is shown that the proposed learning procedure is asymptotically stable and can be effectively used in relevant applications like multiple face identification and tracking from unconstrained video streams. Experimental results show that the proposed method achieves comparable results in the task of multiple face tracking and better performance in face identification with offline approaches exploiting future information. Code will be publicly available.Comment: arXiv admin note: text overlap with arXiv:1708.0361

    Defensive Few-shot Adversarial Learning

    Full text link
    The robustness of deep learning models against adversarial attacks has received increasing attention in recent years. However, both deep learning and adversarial training rely on the availability of a large amount of labeled data and usually do not generalize well to new, unseen classes when only a few training samples are accessible. To address this problem, we explicitly introduce a new challenging problem -- how to learn a robust deep model with limited training samples per class, called defensive few-shot learning in this paper. Simply employing the existing adversarial training techniques in the literature cannot solve this problem. This is because few-shot learning needs to learn transferable knowledge from disjoint auxiliary data, and thus it is invalid to assume the sample-level distribution consistency between the training and test sets as commonly assumed in existing adversarial training techniques. In this paper, instead of assuming such a distribution consistency, we propose to make this assumption at a task-level in the episodic training paradigm in order to better transfer the defense knowledge. Furthermore, inside each task, we design a task-conditioned distribution constraint to narrow the distribution gap between clean and adversarial examples at a sample-level. These give rise to a novel mechanism called multi-level distribution based adversarial training (MDAT) for learning transferable adversarial defense. In addition, a unified Fβ\mathcal{F}_{\beta} score is introduced to evaluate different defense methods under the same principle. Extensive experiments demonstrate that MDAT achieves higher effectiveness and robustness over existing alternatives in the few-shot case.Comment: 10 page

    Sparse Spatial Transformers for Few-Shot Learning

    Full text link
    Learning from limited data is a challenging task since the scarcity of data leads to a poor generalization of the trained model. The classical global pooled representation is likely to lose useful local information. Recently, many few shot learning methods address this challenge by using deep descriptors and learning a pixel-level metric. However, using deep descriptors as feature representations may lose the contextual information of the image. And most of these methods deal with each class in the support set independently, which cannot sufficiently utilize discriminative information and task-specific embeddings. In this paper, we propose a novel Transformer based neural network architecture called Sparse Spatial Transformers (SSFormers), which can find task-relevant features and suppress task-irrelevant features. Specifically, we first divide each input image into several image patches of different sizes to obtain dense local features. These features retain contextual information while expressing local information. Then, a sparse spatial transformer layer is proposed to find spatial correspondence between the query image and the entire support set to select task-relevant image patches and suppress task-irrelevant image patches. Finally, we propose to use an image patch matching module for calculating the distance between dense local representations, thus to determine which category the query image belongs to in the support set. Extensive experiments on popular few-shot learning benchmarks show that our method achieves the state-of-the-art performance

    Collaborative Appearance-Based Place Recognition and Improving Place Recognition Using Detection of Dynamic Objects

    Full text link
    This dissertation makes contributions to the problem of Long-Term Appearance-Based Place Recognition. We present a framework for place recognition in a collaborative scheme and a method to reduce the impact of dynamic objects on place representations. We demonstrate our findings using a state-of-the-art place recognition approach. We begin in Part I by describing the general problem of place recognition and its importance in applications where accurate localization is crucial. We discuss feature detection and description and also explain the functioning of several place recognition frameworks. In Part II, we present a novel framework for collaboration between agents from a pure appearance-based place recognition perspective. Using this framework, multiple agents can efficiently share partial or complete knowledge about places and benefit from their teamwork. This collaborative framework allows agents with limited storage and memory capacity to become useful in environment exploration tasks (for instance, by enabling remote recognition); includes procedures to manage an agent’s memory load and distributes knowledge of places across agents; allows the reuse of knowledge from one agent to another; and increases the tolerance for failure of individual agents. Part II also defines metrics which allow us to measure the performance of a system that uses the collaborative framework. Finally, in Part III, we present an innovative method to improve the recognition of places in environments densely populated by dynamic objects. We demonstrate that we can improve the recognition performance in these environments by incorporating high- level information from dynamic objects. Tests conducted using a synthetic dataset show the benefits of our approach. The proposed method allows the system to significantly improve the recognition performance in the photo-realistic dataset while reducing storage requirements, resulting in up to 23.7 percent less storage space than the state-of-the-art approach that we have extended; smaller representations also reduced the time required to match places. In Part III, we also formulate the concept of a valid place representation and determine the quality of the observation based on dynamic objects present in the agent’s view. Of course, recognition systems that are sensitive to dynamic objects incur additional computational costs to recognize those objects. We show that this additional cost is outweighed by the benefits that incorporating dynamic object detection in the place recognition pipeline. Our findings can be used in many applications, including applications for navigation, e.g. assisting visually impaired individuals with navigating indoors, or autonomous vehicles

    Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual Recognition

    Full text link
    Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision. Existing literature addresses this challenge by employing local-based representation approaches, which may not sufficiently facilitate meaningful object-specific semantic understanding, leading to a reliance on apparent background correlations. Moreover, they primarily rely on high-dimensional local descriptors to construct complex embedding space, potentially limiting the generalization. To address the above challenges, this article proposes a novel model called RSaG for few-shot fine-grained visual recognition. RSaG introduces additional saliency-aware supervision via saliency detection to guide the model toward focusing on the intrinsic discriminative regions. Specifically, RSaG utilizes the saliency detection model to emphasize the critical regions of each sub-category, providing additional object-specific information for fine-grained prediction. RSaG transfers such information with two symmetric branches in a mutual learning paradigm. Furthermore, RSaG exploits inter-regional relationships to enhance the informativeness of the representation and subsequently summarize the highlighted details into contextual embeddings to facilitate the effective transfer, enabling quick generalization to novel sub-categories. The proposed approach is empirically evaluated on three widely used benchmarks, demonstrating its superior performance.Comment: Under Revie
    • …
    corecore