4,384 research outputs found
Open Logo Detection Challenge
Existing logo detection benchmarks consider artificial deployment scenarios
by assuming that large training data with fine-grained bounding box annotations
for each class are available for model training. Such assumptions are often
invalid in realistic logo detection scenarios where new logo classes come
progressively and require to be detected with little or none budget for
exhaustively labelling fine-grained training data for every new class. Existing
benchmarks are thus unable to evaluate the true performance of a logo detection
method in realistic and open deployments. In this work, we introduce a more
realistic and challenging logo detection setting, called Open Logo Detection.
Specifically, this new setting assumes fine-grained labelling only on a small
proportion of logo classes whilst the remaining classes have no labelled
training data to simulate the open deployment. We further create an open logo
detection benchmark, called OpenLogo,to promote the investigation of this new
challenge. OpenLogo contains 27,083 images from 352 logo classes, built by
aggregating/refining 7 existing datasets and establishing an open logo
detection evaluation protocol. To address this challenge, we propose a Context
Adversarial Learning (CAL) approach to synthesising training data with coherent
logo instance appearance against diverse background context for enabling more
effective optimisation of contemporary deep learning detection models.
Experiments show the performance advantage of CAL over existing
state-of-the-art alternative methods on the more realistic and challenging
OpenLogo benchmark.Comment: Accepted by BMVC 2018. The QMUL-OpenLogo benchmark is publicly
available at: qmul-openlogo.github.i
A Robust Local Binary Similarity Pattern for Foreground Object Detection
Accurate and fast extraction of the foreground object is one of the most
significant issues to be solved due to its important meaning for object
tracking and recognition in video surveillance. Although many foreground object
detection methods have been proposed in the recent past, it is still regarded
as a tough problem due to illumination variations and dynamic backgrounds
challenges. In this paper, we propose a robust foreground object detection
method with two aspects of contributions. First, we propose a robust texture
operator named Robust Local Binary Similarity Pattern (RLBSP), which shows
strong robustness to illumination variations and dynamic backgrounds. Second, a
combination of color and texture features are used to characterize pixel
representations, which compensate each other to make full use of their own
advantages. Comprehensive experiments evaluated on the CDnet 2012 dataset
demonstrate that the proposed method performs favorably against
state-of-the-art methods.Comment: 2 page
Incorporating External Knowledge to Answer Open-Domain Visual Questions with Dynamic Memory Networks
Visual Question Answering (VQA) has attracted much attention since it offers
insight into the relationships between the multi-modal analysis of images and
natural language. Most of the current algorithms are incapable of answering
open-domain questions that require to perform reasoning beyond the image
contents. To address this issue, we propose a novel framework which endows the
model capabilities in answering more complex questions by leveraging massive
external knowledge with dynamic memory networks. Specifically, the questions
along with the corresponding images trigger a process to retrieve the relevant
information in external knowledge bases, which are embedded into a continuous
vector space by preserving the entity-relation structures. Afterwards, we
employ dynamic memory networks to attend to the large body of facts in the
knowledge graph and images, and then perform reasoning over these facts to
generate corresponding answers. Extensive experiments demonstrate that our
model not only achieves the state-of-the-art performance in the visual question
answering task, but can also answer open-domain questions effectively by
leveraging the external knowledge
Boosting Generative Models by Leveraging Cascaded Meta-Models
Deep generative models are effective methods of modeling data. However, it is
not easy for a single generative model to faithfully capture the distributions
of complex data such as images. In this paper, we propose an approach for
boosting generative models, which cascades meta-models together to produce a
stronger model. Any hidden variable meta-model (e.g., RBM and VAE) which
supports likelihood evaluation can be leveraged. We derive a decomposable
variational lower bound of the boosted model, which allows each meta-model to
be trained separately and greedily. Besides, our framework can be extended to
semi-supervised boosting, where the boosted model learns a joint distribution
of data and labels. Finally, we combine our boosting framework with the
multiplicative boosting framework, which further improves the learning power of
generative models
Scalable Deep Learning Logo Detection
Existing logo detection methods usually consider a small number of logo
classes and limited images per class with a strong assumption of requiring
tedious object bounding box annotations, therefore not scalable to real-world
dynamic applications. In this work, we tackle these challenges by exploring the
webly data learning principle without the need for exhaustive manual labelling.
Specifically, we propose a novel incremental learning approach, called Scalable
Logo Self-co-Learning (SL^2), capable of automatically self-discovering
informative training images from noisy web data for progressively improving
model capability in a cross-model co-learning manner. Moreover, we introduce a
very large (2,190,757 images of 194 logo classes) logo dataset "WebLogo-2M" by
an automatic web data collection and processing method. Extensive comparative
evaluations demonstrate the superiority of the proposed SL^2 method over the
state-of-the-art strongly and weakly supervised detection models and
contemporary webly data learning approaches
Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples
Deep neural networks (DNNs) have demonstrated impressive performance on a
wide array of tasks, but they are usually considered opaque since internal
structure and learned parameters are not interpretable. In this paper, we
re-examine the internal representations of DNNs using adversarial images, which
are generated by an ensemble-optimization algorithm. We find that: (1) the
neurons in DNNs do not truly detect semantic objects/parts, but respond to
objects/parts only as recurrent discriminative patches; (2) deep visual
representations are not robust distributed codes of visual concepts because the
representations of adversarial images are largely not consistent with those of
real images, although they have similar visual appearance, both of which are
different from previous findings. To further improve the interpretability of
DNNs, we propose an adversarial training scheme with a consistent loss such
that the neurons are endowed with human-interpretable concepts. The induced
interpretable representations enable us to trace eventual outcomes back to
influential neurons. Therefore, human users can know how the models make
predictions, as well as when and why they make errors
Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples
Sometimes it is not enough for a DNN to produce an outcome. For example, in
applications such as healthcare, users need to understand the rationale of the
decisions. Therefore, it is imperative to develop algorithms to learn models
with good interpretability (Doshi-Velez 2017). An important factor that leads
to the lack of interpretability of DNNs is the ambiguity of neurons, where a
neuron may fire for various unrelated concepts. This work aims to increase the
interpretability of DNNs on the whole image space by reducing the ambiguity of
neurons. In this paper, we make the following contributions:
1) We propose a metric to evaluate the consistency level of neurons in a
network quantitatively.
2) We find that the learned features of neurons are ambiguous by leveraging
adversarial examples.
3) We propose to improve the consistency of neurons on adversarial example
subset by an adversarial training algorithm with a consistent loss.Comment: In AAAI-19 Workshop on Network Interpretability for Deep Learnin
Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks
Deep neural networks are vulnerable to adversarial examples, which can
mislead classifiers by adding imperceptible perturbations. An intriguing
property of adversarial examples is their good transferability, making
black-box attacks feasible in real-world applications. Due to the threat of
adversarial attacks, many methods have been proposed to improve the robustness.
Several state-of-the-art defenses are shown to be robust against transferable
adversarial examples. In this paper, we propose a translation-invariant attack
method to generate more transferable adversarial examples against the defense
models. By optimizing a perturbation over an ensemble of translated images, the
generated adversarial example is less sensitive to the white-box model being
attacked and has better transferability. To improve the efficiency of attacks,
we further show that our method can be implemented by convolving the gradient
at the untranslated image with a pre-defined kernel. Our method is generally
applicable to any gradient-based attack method. Extensive experiments on the
ImageNet dataset validate the effectiveness of the proposed method. Our best
attack fools eight state-of-the-art defenses at an 82% success rate on average
based only on the transferability, demonstrating the insecurity of the current
defense techniques.Comment: CVPR 2019 (Oral
Improving Interpretability of Deep Neural Networks with Semantic Information
Interpretability of deep neural networks (DNNs) is essential since it enables
users to understand the overall strengths and weaknesses of the models, conveys
an understanding of how the models will behave in the future, and how to
diagnose and correct potential problems. However, it is challenging to reason
about what a DNN actually does due to its opaque or black-box nature. To
address this issue, we propose a novel technique to improve the
interpretability of DNNs by leveraging the rich semantic information embedded
in human descriptions. By concentrating on the video captioning task, we first
extract a set of semantically meaningful topics from the human descriptions
that cover a wide range of visual concepts, and integrate them into the model
with an interpretive loss. We then propose a prediction difference maximization
algorithm to interpret the learned features of each neuron. Experimental
results demonstrate its effectiveness in video captioning using the
interpretable features, which can also be transferred to video action
recognition. By clearly understanding the learned features, users can easily
revise false predictions via a human-in-the-loop procedure.Comment: To appear in CVPR 201
Joint Image-Text News Topic Detection and Tracking with And-Or Graph Representation
In this paper, we aim to develop a method for automatically detecting and
tracking topics in broadcast news. We present a hierarchical And-Or graph (AOG)
to jointly represent the latent structure of both texts and visuals. The AOG
embeds a context sensitive grammar that can describe the hierarchical
composition of news topics by semantic elements about people involved, related
places and what happened, and model contextual relationships between elements
in the hierarchy. We detect news topics through a cluster sampling process
which groups stories about closely related events. Swendsen-Wang Cuts (SWC), an
effective cluster sampling algorithm, is adopted for traversing the solution
space and obtaining optimal clustering solutions by maximizing a Bayesian
posterior probability. Topics are tracked to deal with the continuously updated
news streams. We generate topic trajectories to show how topics emerge, evolve
and disappear over time. The experimental results show that our method can
explicitly describe the textual and visual data in news videos and produce
meaningful topic trajectories. Our method achieves superior performance
compared to state-of-the-art methods on both a public dataset Reuters-21578 and
a self-collected dataset named UCLA Broadcast News Dataset
- …