97,088 research outputs found
GazeDPM: Early Integration of Gaze Information in Deformable Part Models
An increasing number of works explore collaborative human-computer systems in
which human gaze is used to enhance computer vision systems. For object
detection these efforts were so far restricted to late integration approaches
that have inherent limitations, such as increased precision without increase in
recall. We propose an early integration approach in a deformable part model,
which constitutes a joint formulation over gaze and visual data. We show that
our GazeDPM method improves over the state-of-the-art DPM baseline by 4% and a
recent method for gaze-supported object detection by 3% on the public POET
dataset. Our approach additionally provides introspection of the learnt models,
can reveal salient image structures, and allows us to investigate the interplay
between gaze attracting and repelling areas, the importance of view-specific
models, as well as viewers' personal biases in gaze patterns. We finally study
important practical aspects of our approach, such as the impact of using
saliency maps instead of real fixations, the impact of the number of fixations,
as well as robustness to gaze estimation error
Deep Multimodal Image-Repurposing Detection
Nefarious actors on social media and other platforms often spread rumors and
falsehoods through images whose metadata (e.g., captions) have been modified to
provide visual substantiation of the rumor/falsehood. This type of modification
is referred to as image repurposing, in which often an unmanipulated image is
published along with incorrect or manipulated metadata to serve the actor's
ulterior motives. We present the Multimodal Entity Image Repurposing (MEIR)
dataset, a substantially challenging dataset over that which has been
previously available to support research into image repurposing detection. The
new dataset includes location, person, and organization manipulations on
real-world data sourced from Flickr. We also present a novel, end-to-end, deep
multimodal learning model for assessing the integrity of an image by combining
information extracted from the image with related information from a knowledge
base. The proposed method is compared against state-of-the-art techniques on
existing datasets as well as MEIR, where it outperforms existing methods across
the board, with AUC improvement up to 0.23.Comment: To be published at ACM Multimeda 2018 (orals
Improving Small Object Proposals for Company Logo Detection
Many modern approaches for object detection are two-staged pipelines. The
first stage identifies regions of interest which are then classified in the
second stage. Faster R-CNN is such an approach for object detection which
combines both stages into a single pipeline. In this paper we apply Faster
R-CNN to the task of company logo detection. Motivated by its weak performance
on small object instances, we examine in detail both the proposal and the
classification stage with respect to a wide range of object sizes. We
investigate the influence of feature map resolution on the performance of those
stages.
Based on theoretical considerations, we introduce an improved scheme for
generating anchor proposals and propose a modification to Faster R-CNN which
leverages higher-resolution feature maps for small objects. We evaluate our
approach on the FlickrLogos dataset improving the RPN performance from 0.52 to
0.71 (MABO) and the detection performance from 0.52 to 0.67 (mAP).Comment: 8 Pages, ICMR 201
A new 2D static hand gesture colour image dataset for ASL gestures
It usually takes a fusion of image processing and machine learning algorithms in order to
build a fully-functioning computer vision system for hand gesture recognition. Fortunately,
the complexity of developing such a system could be alleviated by treating the system as a
collection of multiple sub-systems working together, in such a way that they can be dealt
with in isolation. Machine learning need to feed on thousands of exemplars (e.g. images,
features) to automatically establish some recognisable patterns for all possible classes (e.g.
hand gestures) that applies to the problem domain. A good number of exemplars helps, but
it is also important to note that the efficacy of these exemplars depends on the variability
of illumination conditions, hand postures, angles of rotation, scaling and on the number of
volunteers from whom the hand gesture images were taken. These exemplars are usually
subjected to image processing first, to reduce the presence of noise and extract the important
features from the images. These features serve as inputs to the machine learning system.
Different sub-systems are integrated together to form a complete computer vision system for
gesture recognition. The main contribution of this work is on the production of the exemplars.
We discuss how a dataset of standard American Sign Language (ASL) hand gestures containing
2425 images from 5 individuals, with variations in lighting conditions and hand postures is
generated with the aid of image processing techniques. A minor contribution is given in
the form of a specific feature extraction method called moment invariants, for which the
computation method and the values are furnished with the dataset
Open Source Software for Automatic Detection of Cone Photoreceptors in Adaptive Optics Ophthalmoscopy Using Convolutional Neural Networks
Imaging with an adaptive optics scanning light ophthalmoscope (AOSLO) enables direct visualization of the cone photoreceptor mosaic in the living human retina. Quantitative analysis of AOSLO images typically requires manual grading, which is time consuming, and subjective; thus, automated algorithms are highly desirable. Previously developed automated methods are often reliant on ad hoc rules that may not be transferable between different imaging modalities or retinal locations. In this work, we present a convolutional neural network (CNN) based method for cone detection that learns features of interest directly from training data. This cone-identifying algorithm was trained and validated on separate data sets of confocal and split detector AOSLO images with results showing performance that closely mimics the gold standard manual process. Further, without any need for algorithmic modifications for a specific AOSLO imaging system, our fully-automated multi-modality CNN-based cone detection method resulted in comparable results to previous automatic cone segmentation methods which utilized ad hoc rules for different applications. We have made free open-source software for the proposed method and the corresponding training and testing datasets available online
Security Evaluation of Support Vector Machines in Adversarial Environments
Support Vector Machines (SVMs) are among the most popular classification
techniques adopted in security applications like malware detection, intrusion
detection, and spam filtering. However, if SVMs are to be incorporated in
real-world security systems, they must be able to cope with attack patterns
that can either mislead the learning algorithm (poisoning), evade detection
(evasion), or gain information about their internal parameters (privacy
breaches). The main contributions of this chapter are twofold. First, we
introduce a formal general framework for the empirical evaluation of the
security of machine-learning systems. Second, according to our framework, we
demonstrate the feasibility of evasion, poisoning and privacy attacks against
SVMs in real-world security problems. For each attack technique, we evaluate
its impact and discuss whether (and how) it can be countered through an
adversary-aware design of SVMs. Our experiments are easily reproducible thanks
to open-source code that we have made available, together with all the employed
datasets, on a public repository.Comment: 47 pages, 9 figures; chapter accepted into book 'Support Vector
Machine Applications
- …