8 research outputs found

    Attribute-Guided Network for Cross-Modal Zero-Shot Hashing

    Get PDF
    Zero-shot hashing (ZSH) aims at learning a hashing model that is trained only by instances from seen categories but can generate well to those of unseen categories. Typically, it is achieved by utilizing a semantic embedding space to transfer knowledge from seen domain to unseen domain. Existing efforts mainly focus on single-modal retrieval task, especially image-based image retrieval (IBIR). However, as a highlighted research topic in the field of hashing, cross-modal retrieval is more common in real-world applications. To address the cross-modal ZSH (CMZSH) retrieval task, we propose a novel attribute-guided network (AgNet), which can perform not only IBIR but also text-based image retrieval (TBIR). In particular, AgNet aligns different modal data into a semantically rich attribute space, which bridges the gap caused by modality heterogeneity and zero-shot setting. We also design an effective strategy that exploits the attribute to guide the generation of hash codes for image and text within the same network. Extensive experimental results on three benchmark data sets (AwA, SUN, and ImageNet) demonstrate the superiority of AgNet on both cross-modal and single-modal zero-shot image retrieval tasks

    Extending Cross-Modal Retrieval with Interactive Learning to Improve Image Retrieval Performance in Forensics

    Full text link
    Nowadays, one of the critical challenges in forensics is analyzing the enormous amounts of unstructured digital evidence, such as images. Often, unstructured digital evidence contains precious information for forensic investigations. Therefore, a retrieval system that can effectively identify forensically relevant images is paramount. In this work, we explored the effectiveness of interactive learning in improving image retrieval performance in the forensic domain by proposing Excalibur - a zero-shot cross-modal image retrieval system extended with interactive learning. Excalibur was evaluated using both simulations and a user study. The simulations reveal that interactive learning is highly effective in improving retrieval performance in the forensic domain. Furthermore, user study participants could effectively leverage the power of interactive learning. Finally, they considered Excalibur effective and straightforward to use and expressed interest in using it in their daily practice.Comment: Submitted to the AAAI22 conferenc

    A few-shot learning method for tobacco abnormality identification

    Get PDF
    Tobacco is a valuable crop, but its disease identification is rarely involved in existing works. In this work, we use few-shot learning (FSL) to identify abnormalities in tobacco. FSL is a solution for the data deficiency that has been an obstacle to using deep learning. However, weak feature representation caused by limited data is still a challenging issue in FSL. The weak feature representation leads to weak generalization and troubles in cross-domain. In this work, we propose a feature representation enhancement network (FREN) that enhances the feature representation through instance embedding and task adaptation. For instance embedding, global max pooling, and global average pooling are used together for adding more features, and Gaussian-like calibration is used for normalizing the feature distribution. For task adaptation, self-attention is adopted for task contextualization. Given the absence of publicly available data on tobacco, we created a tobacco leaf abnormality dataset (TLA), which includes 16 categories, two settings, and 1,430 images in total. In experiments, we use PlantVillage, which is the benchmark dataset for plant disease identification, to validate the superiority of FREN first. Subsequently, we use the proposed method and TLA to analyze and discuss the abnormality identification of tobacco. For the multi-symptom diseases that always have low accuracy, we propose a solution by dividing the samples into subcategories created by symptom. For the 10 categories of tomato in PlantVillage, the accuracy achieves 66.04% in 5-way, 1-shot tasks. For the two settings of the tobacco leaf abnormality dataset, the accuracies were achieved at 45.5% and 56.5%. By using the multisymptom solution, the best accuracy can be lifted to 60.7% in 16-way, 1-shot tasks and achieved at 81.8% in 16-way, 10-shot tasks. The results show that our method improves the performance greatly by enhancing feature representation, especially for tasks that contain categories with high similarity. The desensitization of data when crossing domains also validates that the FREN has a strong generalization ability

    Improving Generalization via Attribute Selection on Out-of-the-Box Data.

    Full text link
    Zero-shot learning (ZSL) aims to recognize unseen objects (test classes) given some other seen objects (training classes) by sharing information of attributes between different objects. Attributes are artificially annotated for objects and treated equally in recent ZSL tasks. However, some inferior attributes with poor predictability or poor discriminability may have negative impacts on the ZSL system performance. This letter first derives a generalization error bound for ZSL tasks. Our theoretical analysis verifies that selecting the subset of key attributes can improve the generalization performance of the original ZSL model, which uses all the attributes. Unfortunately, previous attribute selection methods have been conducted based on the seen data, and their selected attributes have poor generalization capability to the unseen data, which is unavailable in the training stage of ZSL tasks. Inspired by learning from pseudo-relevance feedback, this letter introduces out-of-the-box data-pseudo-data generated by an attribute-guided generative model-to mimic the unseen data. We then present an iterative attribute selection (IAS) strategy that iteratively selects key attributes based on the out-of-the-box data. Since the distribution of the generated out-of-the-box data is similar to that of the test data, the key attributes selected by IAS can be effectively generalized to test data. Extensive experiments demonstrate that IAS can significantly improve existing attribute-based ZSL methods and achieve state-of-the-art performance

    Methods for data-related problems in person re-ID

    Get PDF
    In the last years, the ever-increasing need for public security has attracted wide attention in person re-ID. State-of-the-art techniques have achieved impressive results on academic datasets, which are nearly saturated. However, when it comes to deploying a re-ID system in a practical surveillance scenario, several challenges arise. 1) Full person views are often unavailable, and missing body parts make the comparison very challenging due to significant misalignment of the views. 2) Low diversity in training data introduces bias in re-ID systems. 3) The available data might come from different modalities, e.g., text and images. This thesis proposes Partial Matching Net (PMN) that detects body joints, aligns partial views, and hallucinates the missing parts based on the information present in the frame and a learned model of a person. The aligned and reconstructed views are then combined into a joint representation and used for matching images. The thesis also investigates different types of bias that typically occur in re-ID scenarios when the similarity between two persons is due to the same pose, body part, or camera view, rather than to the ID-related cues. It proposes a general approach to mitigate these effects named Bias-Control (BC) framework with two training streams leveraging adversarial and multitask learning to reduce bias-related features. Finally, the thesis investigates a novel mechanism for matching data across visual and text modalities. It proposes a framework Text (TAVD) with two complementary modules: Text attribute feature aggregation (TA) that aggregates multiple semantic attributes in a bimodal space for globally matching text descriptions with images and Visual feature decomposition (VD) which performs feature embedding for locally matching image regions with text attributes. The results and comparison to state of the art on different benchmarks show that the proposed solutions are effective strategies for person re-ID.Open Acces
    corecore