15 research outputs found
Multi-layered Semantic Representation Network for Multi-label Image Classification
Multi-label image classification (MLIC) is a fundamental and practical task,
which aims to assign multiple possible labels to an image. In recent years,
many deep convolutional neural network (CNN) based approaches have been
proposed which model label correlations to discover semantics of labels and
learn semantic representations of images. This paper advances this research
direction by improving both the modeling of label correlations and the learning
of semantic representations. On the one hand, besides the local semantics of
each label, we propose to further explore global semantics shared by multiple
labels. On the other hand, existing approaches mainly learn the semantic
representations at the last convolutional layer of a CNN. But it has been noted
that the image representations of different layers of CNN capture different
levels or scales of features and have different discriminative abilities. We
thus propose to learn semantic representations at multiple convolutional
layers. To this end, this paper designs a Multi-layered Semantic Representation
Network (MSRN) which discovers both local and global semantics of labels
through modeling label correlations and utilizes the label semantics to guide
the semantic representations learning at multiple layers through an attention
mechanism. Extensive experiments on four benchmark datasets including VOC 2007,
COCO, NUS-WIDE, and Apparel show a competitive performance of the proposed MSRN
against state-of-the-art models
Discriminator-free Unsupervised Domain Adaptation for Multi-label Image Classification
In this paper, a discriminator-free adversarial-based Unsupervised Domain
Adaptation (UDA) for Multi-Label Image Classification (MLIC) referred to as
DDA-MLIC is proposed. Over the last two years, some attempts have been made for
introducing adversarial-based UDA methods in the context of MLIC. However,
these methods which rely on an additional discriminator subnet present two
shortcomings. First, the learning of domain-invariant features may harm their
task-specific discriminative power, since the classification and discrimination
tasks are decoupled. Moreover, the use of an additional discriminator usually
induces an increase of the network size. Herein, we propose to overcome these
issues by introducing a novel adversarial critic that is directly deduced from
the task-specific classifier. Specifically, a two-component Gaussian Mixture
Model (GMM) is fitted on the source and target predictions, allowing the
distinction of two clusters. This allows extracting a Gaussian distribution for
each component. The resulting Gaussian distributions are then used for
formulating an adversarial loss based on a Frechet distance. The proposed
method is evaluated on three multi-label image datasets. The obtained results
demonstrate that DDA-MLIC outperforms existing state-of-the-art methods while
requiring a lower number of parameters
Graph Attention Transformer Network for Multi-Label Image Classification
Multi-label classification aims to recognize multiple objects or attributes
from images. However, it is challenging to learn from proper label graphs to
effectively characterize such inter-label correlations or dependencies. Current
methods often use the co-occurrence probability of labels based on the training
set as the adjacency matrix to model this correlation, which is greatly limited
by the dataset and affects the model's generalization ability. In this paper,
we propose a Graph Attention Transformer Network (GATN), a general framework
for multi-label image classification that can effectively mine complex
inter-label relationships. First, we use the cosine similarity based on the
label word embedding as the initial correlation matrix, which can represent
rich semantic information. Subsequently, we design the graph attention
transformer layer to transfer this adjacency matrix to adapt to the current
domain. Our extensive experiments have demonstrated that our proposed methods
can achieve state-of-the-art performance on three datasets
Estimator: An Effective and Scalable Framework for Transportation Mode Classification over Trajectories
Transportation mode classification, the process of predicting the class
labels of moving objects transportation modes, has been widely applied to a
variety of real world applications, such as traffic management, urban
computing, and behavior study. However, existing studies of transportation mode
classification typically extract the explicit features of trajectory data but
fail to capture the implicit features that affect the classification
performance. In addition, most of the existing studies also prefer to apply
RNN-based models to embed trajectories, which is only suitable for classifying
small-scale data. To tackle the above challenges, we propose an effective and
scalable framework for transportation mode classification over GPS
trajectories, abbreviated Estimator. Estimator is established on a developed
CNN-TCN architecture, which is capable of leveraging the spatial and temporal
hidden features of trajectories to achieve high effectiveness and efficiency.
Estimator partitions the entire traffic space into disjointed spatial regions
according to traffic conditions, which enhances the scalability significantly
and thus enables parallel transportation classification. Extensive experiments
using eight public real-life datasets offer evidence that Estimator i) achieves
superior model effectiveness (i.e., 99% Accuracy and 0.98 F1-score), which
outperforms state-of-the-arts substantially; ii) exhibits prominent model
efficiency, and obtains 7-40x speedups up over state-of-the-arts learning-based
methods; and iii) shows high model scalability and robustness that enables
large-scale classification analytics.Comment: 12 pages, 8 figure