29,744 research outputs found
Weakly Supervised Semantic Segmentation by Knowledge Graph Inference
Currently, existing efforts in Weakly Supervised Semantic Segmentation (WSSS)
based on Convolutional Neural Networks (CNNs) have predominantly focused on
enhancing the multi-label classification network stage, with limited attention
given to the equally important downstream segmentation network. Furthermore,
CNN-based local convolutions lack the ability to model the extensive
inter-category dependencies. Therefore, this paper introduces a graph
reasoning-based approach to enhance WSSS. The aim is to improve WSSS
holistically by simultaneously enhancing both the multi-label classification
and segmentation network stages. In the multi-label classification network
segment, external knowledge is integrated, coupled with GCNs, to globally
reason about inter-class dependencies. This encourages the network to uncover
features in non-salient regions of images, thereby refining the completeness of
generated pseudo-labels. In the segmentation network segment, the proposed
Graph Reasoning Mapping (GRM) module is employed to leverage knowledge obtained
from textual databases, facilitating contextual reasoning for class
representation within image regions. This GRM module enhances feature
representation in high-level semantics of the segmentation network's local
convolutions, while dynamically learning semantic coherence for individual
samples. Using solely image-level supervision, we have achieved
state-of-the-art performance in WSSS on the PASCAL VOC 2012 and MS-COCO
datasets. Extensive experimentation on both the multi-label classification and
segmentation network stages underscores the effectiveness of the proposed graph
reasoning approach for advancing WSSS
Graph Networks for Multi-Label Image Recognition
Providing machines with a robust visualization of multiple objects in a scene has a myriad of applications in the physical world. This research solves the task of multi-label image recognition using a deep learning approach. For most multi-label image recognition datasets, there are multiple objects within a single image and a single label can be seen many times throughout the dataset. Therefore, it is not efficient to classify each object in isolation, rather it is important to infer the inter-dependencies between the labels. To extract a latent representation of the pixels from an image, this work uses a convolutional network approach evaluating three different image feature extraction networks. In order to learn the label inter-dependencies, this work proposes a graph convolution network approach as compared to previous approaches such as probabilistic graph or recurrent neural networks. In the graph neural network approach, the image labels are first encoded into word embeddings. These serve as nodes on a graph. The correlations between these nodes are learned using graph neural networks. We investigate how to create the adjacency matrix without manual calculation of the label correlations in the respective datasets. This proposed approach is evaluated on the widely-used PASCAL VOC, MSCOCO, and NUS-WIDE multi-label image recognition datasets. The main evaluation metrics used will be mean average precision and overall F1 score, to show that the learned adjacency matrix method for labels along with the addition of visual attention for image features is able to achieve similar performance to manually calculating the label adjacency matrix
Multi-Label Continual Learning using Augmented Graph Convolutional Network
Multi-Label Continual Learning (MLCL) builds a class-incremental framework in
a sequential multi-label image recognition data stream. The critical challenges
of MLCL are the construction of label relationships on past-missing and
future-missing partial labels of training data and the catastrophic forgetting
on old classes, resulting in poor generalization. To solve the problems, the
study proposes an Augmented Graph Convolutional Network (AGCN++) that can
construct the cross-task label relationships in MLCL and sustain catastrophic
forgetting. First, we build an Augmented Correlation Matrix (ACM) across all
seen classes, where the intra-task relationships derive from the hard label
statistics. In contrast, the inter-task relationships leverage hard and soft
labels from data and a constructed expert network. Then, we propose a novel
partial label encoder (PLE) for MLCL, which can extract dynamic class
representation for each partial label image as graph nodes and help generate
soft labels to create a more convincing ACM and suppress forgetting. Last, to
suppress the forgetting of label dependencies across old tasks, we propose a
relationship-preserving constrainter to construct label relationships. The
inter-class topology can be augmented automatically, which also yields
effective class representations. The proposed method is evaluated using two
multi-label image benchmarks. The experimental results show that the proposed
way is effective for MLCL image recognition and can build convincing
correlations across tasks even if the labels of previous tasks are missing
Learning and Interpreting Multi-Multi-Instance Learning Networks
We introduce an extension of the multi-instance learning problem where
examples are organized as nested bags of instances (e.g., a document could be
represented as a bag of sentences, which in turn are bags of words). This
framework can be useful in various scenarios, such as text and image
classification, but also supervised learning over graphs. As a further
advantage, multi-multi instance learning enables a particular way of
interpreting predictions and the decision function. Our approach is based on a
special neural network layer, called bag-layer, whose units aggregate bags of
inputs of arbitrary size. We prove theoretically that the associated class of
functions contains all Boolean functions over sets of sets of instances and we
provide empirical evidence that functions of this kind can be actually learned
on semi-synthetic datasets. We finally present experiments on text
classification, on citation graphs, and social graph data, which show that our
model obtains competitive results with respect to accuracy when compared to
other approaches such as convolutional networks on graphs, while at the same
time it supports a general approach to interpret the learnt model, as well as
explain individual predictions.Comment: JML
Pedestrian Attribute Recognition: A Survey
Recognizing pedestrian attributes is an important task in computer vision
community due to it plays an important role in video surveillance. Many
algorithms has been proposed to handle this task. The goal of this paper is to
review existing works using traditional methods or based on deep learning
networks. Firstly, we introduce the background of pedestrian attributes
recognition (PAR, for short), including the fundamental concepts of pedestrian
attributes and corresponding challenges. Secondly, we introduce existing
benchmarks, including popular datasets and evaluation criterion. Thirdly, we
analyse the concept of multi-task learning and multi-label learning, and also
explain the relations between these two learning algorithms and pedestrian
attribute recognition. We also review some popular network architectures which
have widely applied in the deep learning community. Fourthly, we analyse
popular solutions for this task, such as attributes group, part-based,
\emph{etc}. Fifthly, we shown some applications which takes pedestrian
attributes into consideration and achieve better performance. Finally, we
summarized this paper and give several possible research directions for
pedestrian attributes recognition. The project page of this paper can be found
from the following website:
\url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey:
https://sites.google.com/view/ahu-pedestrianattributes
- …