5,483 research outputs found
Exploiting Web Images for Weakly Supervised Object Detection
In recent years, the performance of object detection has advanced
significantly with the evolving deep convolutional neural networks. However,
the state-of-the-art object detection methods still rely on accurate bounding
box annotations that require extensive human labelling. Object detection
without bounding box annotations, i.e, weakly supervised detection methods, are
still lagging far behind. As weakly supervised detection only uses image level
labels and does not require the ground truth of bounding box location and label
of each object in an image, it is generally very difficult to distill knowledge
of the actual appearances of objects. Inspired by curriculum learning, this
paper proposes an easy-to-hard knowledge transfer scheme that incorporates easy
web images to provide prior knowledge of object appearance as a good starting
point. While exploiting large-scale free web imagery, we introduce a
sophisticated labour free method to construct a web dataset with good diversity
in object appearance. After that, semantic relevance and distribution relevance
are introduced and utilized in the proposed curriculum training scheme. Our
end-to-end learning with the constructed web data achieves remarkable
improvement across most object classes especially for the classes that are
often considered hard in other works
Transitive Invariance for Self-supervised Visual Representation Learning
Learning visual representations with self-supervised learning has become
popular in computer vision. The idea is to design auxiliary tasks where labels
are free to obtain. Most of these tasks end up providing data to learn specific
kinds of invariance useful for recognition. In this paper, we propose to
exploit different self-supervised approaches to learn representations invariant
to (i) inter-instance variations (two objects in the same class should have
similar features) and (ii) intra-instance variations (viewpoint, pose,
deformations, illumination, etc). Instead of combining two approaches with
multi-task learning, we argue to organize and reason the data with multiple
variations. Specifically, we propose to generate a graph with millions of
objects mined from hundreds of thousands of videos. The objects are connected
by two types of edges which correspond to two types of invariance: "different
instances but a similar viewpoint and category" and "different viewpoints of
the same instance". By applying simple transitivity on the graph with these
edges, we can obtain pairs of images exhibiting richer visual invariance. We
use this data to train a Triplet-Siamese network with VGG16 as the base
architecture and apply the learned representations to different recognition
tasks. For object detection, we achieve 63.2% mAP on PASCAL VOC 2007 using Fast
R-CNN (compare to 67.3% with ImageNet pre-training). For the challenging COCO
dataset, our method is surprisingly close (23.5%) to the ImageNet-supervised
counterpart (24.4%) using the Faster R-CNN framework. We also show that our
network can perform significantly better than the ImageNet network in the
surface normal estimation task.Comment: ICCV 201
Curriculum Learning of Visual Attribute Clusters for Multi-Task Classification
Visual attributes, from simple objects (e.g., backpacks, hats) to
soft-biometrics (e.g., gender, height, clothing) have proven to be a powerful
representational approach for many applications such as image description and
human identification. In this paper, we introduce a novel method to combine the
advantages of both multi-task and curriculum learning in a visual attribute
classification framework. Individual tasks are grouped after performing
hierarchical clustering based on their correlation. The clusters of tasks are
learned in a curriculum learning setup by transferring knowledge between
clusters. The learning process within each cluster is performed in a multi-task
classification setup. By leveraging the acquired knowledge, we speed-up the
process and improve performance. We demonstrate the effectiveness of our method
via ablation studies and a detailed analysis of the covariates, on a variety of
publicly available datasets of humans standing with their full-body visible.
Extensive experimentation has proven that the proposed approach boosts the
performance by 4% to 10%.Comment: Published in Pattern Recognitio
A Survey on Multi-Task Learning
Multi-Task Learning (MTL) is a learning paradigm in machine learning and its
aim is to leverage useful information contained in multiple related tasks to
help improve the generalization performance of all the tasks. In this paper, we
give a survey for MTL. First, we classify different MTL algorithms into several
categories, including feature learning approach, low-rank approach, task
clustering approach, task relation learning approach, and decomposition
approach, and then discuss the characteristics of each approach. In order to
improve the performance of learning tasks further, MTL can be combined with
other learning paradigms including semi-supervised learning, active learning,
unsupervised learning, reinforcement learning, multi-view learning and
graphical models. When the number of tasks is large or the data dimensionality
is high, batch MTL models are difficult to handle this situation and online,
parallel and distributed MTL models as well as dimensionality reduction and
feature hashing are reviewed to reveal their computational and storage
advantages. Many real-world applications use MTL to boost their performance and
we review representative works. Finally, we present theoretical analyses and
discuss several future directions for MTL
Patterns for Learning with Side Information
Supervised, semi-supervised, and unsupervised learning estimate a function
given input/output samples. Generalization of the learned function to unseen
data can be improved by incorporating side information into learning. Side
information are data that are neither from the input space nor from the output
space of the function, but include useful information for learning it. In this
paper we show that learning with side information subsumes a variety of related
approaches, e.g. multi-task learning, multi-view learning and learning using
privileged information. Our main contributions are (i) a new perspective that
connects these previously isolated approaches, (ii) insights about how these
methods incorporate different types of prior knowledge, and hence implement
different patterns, (iii) facilitating the application of these methods in
novel tasks, as well as (iv) a systematic experimental evaluation of these
patterns in two supervised learning tasks.Comment: The first two authors contributed equally to this wor
Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News
Fake news are nowadays an issue of pressing concern, given their recent rise
as a potential threat to high-quality journalism and well-informed public
discourse. The Fake News Challenge (FNC-1) was organized in 2017 to encourage
the development of machine learning-based classification systems for stance
detection (i.e., for identifying whether a particular news article agrees,
disagrees, discusses, or is unrelated to a particular news headline), thus
helping in the detection and analysis of possible instances of fake news. This
article presents a new approach to tackle this stance detection problem, based
on the combination of string similarity features with a deep neural
architecture that leverages ideas previously advanced in the context of
learning efficient text representations, document classification, and natural
language inference. Specifically, we use bi-directional Recurrent Neural
Networks, together with max-pooling over the temporal/sequential dimension and
neural attention, for representing (i) the headline, (ii) the first two
sentences of the news article, and (iii) the entire news article. These
representations are then combined/compared, complemented with similarity
features inspired on other FNC-1 approaches, and passed to a final layer that
predicts the stance of the article towards the headline. We also explore the
use of external sources of information, specifically large datasets of sentence
pairs originally proposed for training and evaluating natural language
inference methods, in order to pre-train specific components of the neural
network architecture (e.g., the RNNs used for encoding sentences). The obtained
results attest to the effectiveness of the proposed ideas and show that our
model, particularly when considering pre-training and the combination of neural
representations together with similarity features, slightly outperforms the
previous state-of-the-art.Comment: Accepted for publication in the special issue of the ACM Journal of
Data and Information Quality (ACM JDIQ) on Combating Digital Misinformation
and Disinformatio
An Overview of Multi-Task Learning in Deep Neural Networks
Multi-task learning (MTL) has led to successes in many applications of
machine learning, from natural language processing and speech recognition to
computer vision and drug discovery. This article aims to give a general
overview of MTL, particularly in deep neural networks. It introduces the two
most common methods for MTL in Deep Learning, gives an overview of the
literature, and discusses recent advances. In particular, it seeks to help ML
practitioners apply MTL by shedding light on how MTL works and providing
guidelines for choosing appropriate auxiliary tasks.Comment: 14 pages, 8 figure
Visual Relationship Detection using Scene Graphs: A Survey
Understanding a scene by decoding the visual relationships depicted in an
image has been a long studied problem. While the recent advances in deep
learning and the usage of deep neural networks have achieved near human
accuracy on many tasks, there still exists a pretty big gap between human and
machine level performance when it comes to various visual relationship
detection tasks. Developing on earlier tasks like object recognition,
segmentation and captioning which focused on a relatively coarser image
understanding, newer tasks have been introduced recently to deal with a finer
level of image understanding. A Scene Graph is one such technique to better
represent a scene and the various relationships present in it. With its wide
number of applications in various tasks like Visual Question Answering,
Semantic Image Retrieval, Image Generation, among many others, it has proved to
be a useful tool for deeper and better visual relationship understanding. In
this paper, we present a detailed survey on the various techniques for scene
graph generation, their efficacy to represent visual relationships and how it
has been used to solve various downstream tasks. We also attempt to analyze the
various future directions in which the field might advance in the future. Being
one of the first papers to give a detailed survey on this topic, we also hope
to give a succinct introduction to scene graphs, and guide practitioners while
developing approaches for their applications
Decoding Brain Representations by Multimodal Learning of Neural Activity and Visual Features
This work presents a novel method of exploring human brain-visual
representations, with a view towards replicating these processes in machines.
The core idea is to learn plausible computational and biological
representations by correlating human neural activity and natural images. Thus,
we first propose a model, EEG-ChannelNet, to learn a brain manifold for EEG
classification. After verifying that visual information can be extracted from
EEG data, we introduce a multimodal approach that uses deep image and EEG
encoders, trained in a siamese configuration, for learning a joint manifold
that maximizes a compatibility measure between visual features and brain
representations. We then carry out image classification and saliency detection
on the learned manifold. Performance analyses show that our approach
satisfactorily decodes visual information from neural signals. This, in turn,
can be used to effectively supervise the training of deep learning models, as
demonstrated by the high performance of image classification and saliency
detection on out-of-training classes. The obtained results show that the
learned brain-visual features lead to improved performance and simultaneously
bring deep models more in line with cognitive neuroscience work related to
visual perception and attention
Recent Advances in Open Set Recognition: A Survey
In real-world recognition/classification tasks, limited by various objective
factors, it is usually difficult to collect training samples to exhaust all
classes when training a recognizer or classifier. A more realistic scenario is
open set recognition (OSR), where incomplete knowledge of the world exists at
training time, and unknown classes can be submitted to an algorithm during
testing, requiring the classifiers to not only accurately classify the seen
classes, but also effectively deal with the unseen ones. This paper provides a
comprehensive survey of existing open set recognition techniques covering
various aspects ranging from related definitions, representations of models,
datasets, evaluation criteria, and algorithm comparisons. Furthermore, we
briefly analyze the relationships between OSR and its related tasks including
zero-shot, one-shot (few-shot) recognition/learning techniques, classification
with reject option, and so forth. Additionally, we also overview the open world
recognition which can be seen as a natural extension of OSR. Importantly, we
highlight the limitations of existing approaches and point out some promising
subsequent research directions in this field.Comment: Accepted by IEEE TPAM
- …