8,012 research outputs found
"'Who are you?' - Learning person specific classifiers from video"
We investigate the problem of automatically labelling
faces of characters in TV or movie material with their
names, using only weak supervision from automaticallyaligned
subtitle and script text. Our previous work (Everingham
et al. [8]) demonstrated promising results on the
task, but the coverage of the method (proportion of video
labelled) and generalization was limited by a restriction to
frontal faces and nearest neighbour classification.
In this paper we build on that method, extending the coverage
greatly by the detection and recognition of characters
in profile views. In addition, we make the following contributions:
(i) seamless tracking, integration and recognition
of profile and frontal detections, and (ii) a character specific
multiple kernel classifier which is able to learn the features
best able to discriminate between the characters.
We report results on seven episodes of the TV series
βBuffy the Vampire Slayerβ, demonstrating significantly increased
coverage and performance with respect to previous
methods on this material
Automatic Detection of Vague Words and Sentences in Privacy Policies
Website privacy policies represent the single most important source of
information for users to gauge how their personal data are collected, used and
shared by companies. However, privacy policies are often vague and people
struggle to understand the content. Their opaqueness poses a significant
challenge to both users and policy regulators. In this paper, we seek to
identify vague content in privacy policies. We construct the first corpus of
human-annotated vague words and sentences and present empirical studies on
automatic vagueness detection. In particular, we investigate context-aware and
context-agnostic models for predicting vague words, and explore
auxiliary-classifier generative adversarial networks for characterizing
sentence vagueness. Our experimental results demonstrate the effectiveness of
proposed approaches. Finally, we provide suggestions for resolving vagueness
and improving the usability of privacy policies.Comment: 10 page
DALES: Automated Tool for Detection, Annotation, Labelling and Segmentation of Multiple Objects in Multi-Camera Video Streams
In this paper, we propose a new software tool called DALES to extract semantic information
from multi-view videos based on the analysis of their visual content. Our system is fully automatic
and is well suited for multi-camera environment. Once the multi-view video sequences are
loaded into DALES, our software performs the detection, counting, and segmentation of the visual
objects evolving in the provided video streams. Then, these objects of interest are processed
in order to be labelled, and the related frames are thus annotated with the corresponding semantic
content. Moreover, a textual script is automatically generated with the video annotations.
DALES system shows excellent performance in terms of accuracy and computational speed and
is robustly designed to ensure view synchronization
Tracking Dengue Epidemics using Twitter Content Classification and Topic Modelling
Detecting and preventing outbreaks of mosquito-borne diseases such as Dengue
and Zika in Brasil and other tropical regions has long been a priority for
governments in affected areas. Streaming social media content, such as Twitter,
is increasingly being used for health vigilance applications such as flu
detection. However, previous work has not addressed the complexity of drastic
seasonal changes on Twitter content across multiple epidemic outbreaks. In
order to address this gap, this paper contrasts two complementary approaches to
detecting Twitter content that is relevant for Dengue outbreak detection,
namely supervised classification and unsupervised clustering using topic
modelling. Each approach has benefits and shortcomings. Our classifier achieves
a prediction accuracy of about 80\% based on a small training set of about
1,000 instances, but the need for manual annotation makes it hard to track
seasonal changes in the nature of the epidemics, such as the emergence of new
types of virus in certain geographical locations. In contrast, LDA-based topic
modelling scales well, generating cohesive and well-separated clusters from
larger samples. While clusters can be easily re-generated following changes in
epidemics, however, this approach makes it hard to clearly segregate relevant
tweets into well-defined clusters.Comment: Procs. SoWeMine - co-located with ICWE 2016. 2016, Lugano,
Switzerlan
Automatic annotation for weakly supervised learning of detectors
PhDObject detection in images and action detection in videos are among the most widely studied
computer vision problems, with applications in consumer photography, surveillance, and automatic
media tagging. Typically, these standard detectors are fully supervised, that is they require
a large body of training data where the locations of the objects/actions in images/videos have
been manually annotated. With the emergence of digital media, and the rise of high-speed internet,
raw images and video are available for little to no cost. However, the manual annotation
of object and action locations remains tedious, slow, and expensive. As a result there has been
a great interest in training detectors with weak supervision where only the presence or absence
of object/action in image/video is needed, not the location. This thesis presents approaches for
weakly supervised learning of object/action detectors with a focus on automatically annotating
object and action locations in images/videos using only binary weak labels indicating the presence
or absence of object/action in images/videos.
First, a framework for weakly supervised learning of object detectors in images is presented.
In the proposed approach, a variation of multiple instance learning (MIL) technique for automatically
annotating object locations in weakly labelled data is presented which, unlike existing
approaches, uses inter-class and intra-class cue fusion to obtain the initial annotation. The initial
annotation is then used to start an iterative process in which standard object detectors are used to
refine the location annotation. Finally, to ensure that the iterative training of detectors do not drift
from the object of interest, a scheme for detecting model drift is also presented. Furthermore,
unlike most other methods, our weakly supervised approach is evaluated on data without manual
pose (object orientation) annotation.
Second, an analysis of the initial annotation of objects, using inter-class and intra-class cues,
is carried out. From the analysis, a new method based on negative mining (NegMine) is presented
for the initial annotation of both object and action data. The NegMine based approach is a
much simpler formulation using only inter-class measure and requires no complex combinatorial
optimisation but can still meet or outperform existing approaches including the previously pre3
sented inter-intra class cue fusion approach. Furthermore, NegMine can be fused with existing
approaches to boost their performance.
Finally, the thesis will take a step back and look at the use of generic object detectors as prior
knowledge in weakly supervised learning of object detectors. These generic object detectors are
typically based on sampling saliency maps that indicate if a pixel belongs to the background
or foreground. A new approach to generating saliency maps is presented that, unlike existing
approaches, looks beyond the current image of interest and into images similar to the current
image. We show that our generic object proposal method can be used by itself to annotate the
weakly labelled object data with surprisingly high accuracy
Distant Vehicle Detection Using Radar and Vision
For autonomous vehicles to be able to operate successfully they need to be
aware of other vehicles with sufficient time to make safe, stable plans. Given
the possible closing speeds between two vehicles, this necessitates the ability
to accurately detect distant vehicles. Many current image-based object
detectors using convolutional neural networks exhibit excellent performance on
existing datasets such as KITTI. However, the performance of these networks
falls when detecting small (distant) objects. We demonstrate that incorporating
radar data can boost performance in these difficult situations. We also
introduce an efficient automated method for training data generation using
cameras of different focal lengths
- β¦