131 research outputs found

    Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection

    Full text link
    Multi-label image classification is a fundamental but challenging task towards general visual understanding. Existing methods found the region-level cues (e.g., features from RoIs) can facilitate multi-label classification. Nevertheless, such methods usually require laborious object-level annotations (i.e., object labels and bounding boxes) for effective learning of the object-level visual features. In this paper, we propose a novel and efficient deep framework to boost multi-label classification by distilling knowledge from weakly-supervised detection task without bounding box annotations. Specifically, given the image-level annotations, (1) we first develop a weakly-supervised detection (WSD) model, and then (2) construct an end-to-end multi-label image classification framework augmented by a knowledge distillation module that guides the classification model by the WSD model according to the class-level predictions for the whole image and the object-level visual features for object RoIs. The WSD model is the teacher model and the classification model is the student model. After this cross-task knowledge distillation, the performance of the classification model is significantly improved and the efficiency is maintained since the WSD model can be safely discarded in the test phase. Extensive experiments on two large-scale datasets (MS-COCO and NUS-WIDE) show that our framework achieves superior performances over the state-of-the-art methods on both performance and efficiency.Comment: accepted by ACM Multimedia 2018, 9 pages, 4 figures, 5 table

    End-to-End Supervised Multilabel Contrastive Learning

    Full text link
    Multilabel representation learning is recognized as a challenging problem that can be associated with either label dependencies between object categories or data-related issues such as the inherent imbalance of positive/negative samples. Recent advances address these challenges from model- and data-centric viewpoints. In model-centric, the label correlation is obtained by an external model designs (e.g., graph CNN) to incorporate an inductive bias for training. However, they fail to design an end-to-end training framework, leading to high computational complexity. On the contrary, in data-centric, the realistic nature of the dataset is considered for improving the classification while ignoring the label dependencies. In this paper, we propose a new end-to-end training framework -- dubbed KMCL (Kernel-based Mutlilabel Contrastive Learning) -- to address the shortcomings of both model- and data-centric designs. The KMCL first transforms the embedded features into a mixture of exponential kernels in Gaussian RKHS. It is then followed by encoding an objective loss that is comprised of (a) reconstruction loss to reconstruct kernel representation, (b) asymmetric classification loss to address the inherent imbalance problem, and (c) contrastive loss to capture label correlation. The KMCL models the uncertainty of the feature encoder while maintaining a low computational footprint. Extensive experiments are conducted on image classification tasks to showcase the consistent improvements of KMCL over the SOTA methods. PyTorch implementation is provided in \url{https://github.com/mahdihosseini/KMCL}

    Multimodal sequential fashion attribute prediction

    Get PDF
    We address multimodal product attribute prediction of fashion items based on product images and titles. The product attributes, such as type, sub-type, cut or fit, are in a chain format, with previous attribute values constraining the values of the next attributes. We propose to address this task with a sequential prediction model that can learn to capture the dependencies between the different attribute values in the chain. Our experiments on three product datasets show that the sequential model outperforms two non-sequential baselines on all experimental datasets. Compared to other models, the sequential model is also better able to generate sequences of attribute chains not seen during training. We also measure the contributions of both image and textual input and show that while text-only models always outperform image-only models, only the multimodal sequential model combining both image and text improves over the text-only model on all experimental dataset

    Graph Networks for Multi-Label Image Recognition

    Get PDF
    Providing machines with a robust visualization of multiple objects in a scene has a myriad of applications in the physical world. This research solves the task of multi-label image recognition using a deep learning approach. For most multi-label image recognition datasets, there are multiple objects within a single image and a single label can be seen many times throughout the dataset. Therefore, it is not efficient to classify each object in isolation, rather it is important to infer the inter-dependencies between the labels. To extract a latent representation of the pixels from an image, this work uses a convolutional network approach evaluating three different image feature extraction networks. In order to learn the label inter-dependencies, this work proposes a graph convolution network approach as compared to previous approaches such as probabilistic graph or recurrent neural networks. In the graph neural network approach, the image labels are first encoded into word embeddings. These serve as nodes on a graph. The correlations between these nodes are learned using graph neural networks. We investigate how to create the adjacency matrix without manual calculation of the label correlations in the respective datasets. This proposed approach is evaluated on the widely-used PASCAL VOC, MSCOCO, and NUS-WIDE multi-label image recognition datasets. The main evaluation metrics used will be mean average precision and overall F1 score, to show that the learned adjacency matrix method for labels along with the addition of visual attention for image features is able to achieve similar performance to manually calculating the label adjacency matrix
    corecore