38,627 research outputs found
Long-term face tracking in the wild using deep learning
This paper investigates long-term face tracking of a specific person given
his/her face image in a single frame as a query in a video stream. Through
taking advantage of pre-trained deep learning models on big data, a novel
system is developed for accurate video face tracking in the unconstrained
environments depicting various people and objects moving in and out of the
frame. In the proposed system, we present a detection-verification-tracking
method (dubbed as 'DVT') which accomplishes the long-term face tracking task
through the collaboration of face detection, face verification, and
(short-term) face tracking. An offline trained detector based on cascaded
convolutional neural networks localizes all faces appeared in the frames, and
an offline trained face verifier based on deep convolutional neural networks
and similarity metric learning decides if any face or which face corresponds to
the queried person. An online trained tracker follows the face from frame to
frame. When validated on a sitcom episode and a TV show, the DVT method
outperforms tracking-learning-detection (TLD) and face-TLD in terms of recall
and precision. The proposed system is also tested on many other types of videos
and shows very promising results.Comment: KDD Workshop on Large-scale Deep Learning for Data Mining, August
2016, San Fransisco, CA, US
Applying Social Media Intelligence for Predicting and Identifying On-line Radicalization and Civil Unrest Oriented Threats
Research shows that various social media platforms on Internet such as
Twitter, Tumblr (micro-blogging websites), Facebook (a popular social
networking website), YouTube (largest video sharing and hosting website), Blogs
and discussion forums are being misused by extremist groups for spreading their
beliefs and ideologies, promoting radicalization, recruiting members and
creating online virtual communities sharing a common agenda. Popular
microblogging websites such as Twitter are being used as a real-time platform
for information sharing and communication during planning and mobilization if
civil unrest related events. Applying social media intelligence for predicting
and identifying online radicalization and civil unrest oriented threats is an
area that has attracted several researchers' attention over past 10 years.
There are several algorithms, techniques and tools that have been proposed in
existing literature to counter and combat cyber-extremism and predicting
protest related events in much advance. In this paper, we conduct a literature
review of all these existing techniques and do a comprehensive analysis to
understand state-of-the-art, trends and research gaps. We present a one class
classification approach to collect scholarly articles targeting the topics and
subtopics of our research scope. We perform characterization, classification
and an in-depth meta analysis meta-anlaysis of about 100 conference and journal
papers to gain a better understanding of existing literature.Comment: 18 pages, 16 figures, 4 tables. This paper is a comprehensive and
detailed literature survey to understand current state-of-the-art of Online
Social Media Intelligence to counter and combat ISI related threat
A Survey on Content-Aware Video Analysis for Sports
Sports data analysis is becoming increasingly large-scale, diversified, and
shared, but difficulty persists in rapidly accessing the most crucial
information. Previous surveys have focused on the methodologies of sports video
analysis from the spatiotemporal viewpoint instead of a content-based
viewpoint, and few of these studies have considered semantics. This study
develops a deeper interpretation of content-aware sports video analysis by
examining the insight offered by research into the structure of content under
different scenarios. On the basis of this insight, we provide an overview of
the themes particularly relevant to the research on content-aware systems for
broadcast sports. Specifically, we focus on the video content analysis
techniques applied in sportscasts over the past decade from the perspectives of
fundamentals and general review, a content hierarchical model, and trends and
challenges. Content-aware analysis methods are discussed with respect to
object-, event-, and context-oriented groups. In each group, the gap between
sensation and content excitement must be bridged using proper strategies. In
this regard, a content-aware approach is required to determine user demands.
Finally, the paper summarizes the future trends and challenges for sports video
analysis. We believe that our findings can advance the field of research on
content-aware video analysis for broadcast sports.Comment: Accepted for publication in IEEE Transactions on Circuits and Systems
for Video Technology (TCSVT
Cascaded Pyramid Mining Network for Weakly Supervised Temporal Action Localization
Weakly supervised temporal action localization, which aims at temporally
locating action instances in untrimmed videos using only video-level class
labels during training, is an important yet challenging problem in video
analysis. Many current methods adopt the "localization by classification"
framework: first do video classification, then locate temporal area
contributing to the results most. However, this framework fails to locate the
entire action instances and gives little consideration to the local context. In
this paper, we present a novel architecture called Cascaded Pyramid Mining
Network (CPMN) to address these issues using two effective modules. First, to
discover the entire temporal interval of specific action, we design a two-stage
cascaded module with proposed Online Adversarial Erasing (OAE) mechanism, where
new and complementary regions are mined through feeding the erased feature maps
of discovered regions back to the system. Second, to exploit hierarchical
contextual information in videos and reduce missing detections, we design a
pyramid module which produces a scale-invariant attention map through combining
the feature maps from different levels. Final, we aggregate the results of two
modules to perform action localization via locating high score areas in
temporal Class Activation Sequence (CAS). Extensive experiments conducted on
THUMOS14 and ActivityNet-1.3 datasets demonstrate the effectiveness of our
method.Comment: Accepted at ACCV 201
A Framework for Picture Extraction on Search Engine Improved and Meaningful Result
Searching is an important tool of information gathering, if information is in
the form of picture than it play a major role to take quick action and easy to
memorize. This is a human tendency to retain more picture than text. The
complexity and the occurrence of variety of query can give variation in result
and provide the humans to learn something new or get confused. This paper
presents a development of a framework that will focus on recourse
identification for the user so that they can get faster access with accurate &
concise results on time and analysis of the change that is evident as the
scenario changes from text to picture retrieval. This paper also provides a
glimpse how to get accurate picture information in advance and extended
technologies searching framework. The new challenges and design techniques of
picture retrieval systems are also suggested in this paper.Comment: 5 pages,1 figur
Learnable PINs: Cross-Modal Embeddings for Person Identity
We propose and investigate an identity sensitive joint embedding of face and
voice. Such an embedding enables cross-modal retrieval from voice to face and
from face to voice. We make the following four contributions: first, we show
that the embedding can be learnt from videos of talking faces, without
requiring any identity labels, using a form of cross-modal self-supervision;
second, we develop a curriculum learning schedule for hard negative mining
targeted to this task, that is essential for learning to proceed successfully;
third, we demonstrate and evaluate cross-modal retrieval for identities unseen
and unheard during training over a number of scenarios and establish a
benchmark for this novel task; finally, we show an application of using the
joint embedding for automatically retrieving and labelling characters in TV
dramas.Comment: To appear in ECCV 201
Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video
We explore object discovery and detector adaptation based on unlabeled video
sequences captured from a mobile platform. We propose a fully automatic
approach for object mining from video which builds upon a generic object
tracking approach. By applying this method to three large video datasets from
autonomous driving and mobile robotics scenarios, we demonstrate its robustness
and generality. Based on the object mining results, we propose a novel approach
for unsupervised object discovery by appearance-based clustering. We show that
this approach successfully discovers interesting objects relevant to driving
scenarios. In addition, we perform self-supervised detector adaptation in order
to improve detection performance on the KITTI dataset for existing categories.
Our approach has direct relevance for enabling large-scale object learning for
autonomous driving.Comment: CVPR'18 submissio
Active Mining of Parallel Video Streams
The practicality of a video surveillance system is adversely limited by the
amount of queries that can be placed on human resources and their vigilance in
response. To transcend this limitation, a major effort under way is to include
software that (fully or at least semi) automatically mines video footage,
reducing the burden imposed to the system. Herein, we propose a semi-supervised
incremental learning framework for evolving visual streams in order to develop
a robust and flexible track classification system. Our proposed method learns
from consecutive batches by updating an ensemble in each time. It tries to
strike a balance between performance of the system and amount of data which
needs to be labelled. As no restriction is considered, the system can address
many practical problems in an evolving multi-camera scenario, such as concept
drift, class evolution and various length of video streams which have not been
addressed before. Experiments were performed on synthetic as well as real-world
visual data in non-stationary environments, showing high accuracy with fairly
little human collaboration
Memory Warps for Learning Long-Term Online Video Representations
This paper proposes a novel memory-based online video representation that is
efficient, accurate and predictive. This is in contrast to prior works that
often rely on computationally heavy 3D convolutions, ignore actual motion when
aligning features over time, or operate in an off-line mode to utilize future
frames. In particular, our memory (i) holds the feature representation, (ii) is
spatially warped over time to compensate for observer and scene motions, (iii)
can carry long-term information, and (iv) enables predicting feature
representations in future frames. By exploring a variant that operates at
multiple temporal scales, we efficiently learn across even longer time
horizons. We apply our online framework to object detection in videos,
obtaining a large 2.3 times speed-up and losing only 0.9% mAP on ImageNet-VID
dataset, compared to prior works that even use future frames. Finally, we
demonstrate the predictive property of our representation in two novel
detection setups, where features are propagated over time to (i) significantly
enhance a real-time detector by more than 10% mAP in a multi-threaded online
setup and to (ii) anticipate objects in future frames
Crowd-Powered Data Mining
Many data mining tasks cannot be completely addressed by auto- mated
processes, such as sentiment analysis and image classification. Crowdsourcing
is an effective way to harness the human cognitive ability to process these
machine-hard tasks. Thanks to public crowdsourcing platforms, e.g., Amazon
Mechanical Turk and Crowd- Flower, we can easily involve hundreds of thousands
of ordinary workers (i.e., the crowd) to address these machine-hard tasks. In
this tutorial, we will survey and synthesize a wide spectrum of existing
studies on crowd-powered data mining. We first give an overview of
crowdsourcing, and then summarize the fundamental techniques, including quality
control, cost control, and latency control, which must be considered in
crowdsourced data mining. Next we review crowd-powered data mining operations,
including classification, clustering, pattern mining, machine learning using
the crowd (including deep learning, transfer learning and semi-supervised
learning) and knowledge discovery. Finally, we provide the emerging challenges
in crowdsourced data mining
- …