3 research outputs found
A Unified Multi-Faceted Video Summarization System
This paper addresses automatic summarization and search in visual data
comprising of videos, live streams and image collections in a unified manner.
In particular, we propose a framework for multi-faceted summarization which
extracts key-frames (image summaries), skims (video summaries) and entity
summaries (summarization at the level of entities like objects, scenes, humans
and faces in the video). The user can either view these as extractive
summarization, or query focused summarization. Our approach first pre-processes
the video or image collection once, to extract all important visual features,
following which we provide an interactive mechanism to the user to summarize
the video based on their choice. We investigate several diversity, coverage and
representation models for all these problems, and argue the utility of these
different mod- els depending on the application. While most of the prior work
on submodular summarization approaches has focused on combining several models
and learning weighted mixtures, we focus on the explain-ability of different
the diversity, coverage and representation models and their scalability. Most
importantly, we also show that we can summarize hours of video data in a few
seconds, and our system allows the user to generate summaries of various
lengths and types interactively on the fly.Comment: 18 pages, 11 Figure
Vis-DSS: An Open-Source toolkit for Visual Data Selection and Summarization
With increasing amounts of visual data being created in the form of videos
and images, visual data selection and summarization are becoming ever
increasing problems. We present Vis-DSS, an open-source toolkit for Visual Data
Selection and Summarization. Vis-DSS implements a framework of models for
summarization and data subset selection using submodular functions, which are
becoming increasingly popular today for these problems. We present several
classes of models, capturing notions of diversity, coverage, representation and
importance, along with optimization/inference and learning algorithms. Vis-DSS
is the first open source toolkit for several Data selection and summarization
tasks including Image Collection Summarization, Video Summarization, Training
Data selection for Classification and Diversified Active Learning. We
demonstrate state-of-the art performance on all these tasks, and also show how
we can scale to large problems. Vis-DSS allows easy integration for
applications to be built on it, also can serve as a general skeleton that can
be extended to several use cases, including video and image sharing platforms
for creating GIFs, image montage creation, or as a component to surveillance
systems and we demonstrate this by providing a graphical user-interface (GUI)
desktop app built over Qt framework. Vis-DSS is available at
https://github.com/rishabhk108/vis-dssComment: Vis-DSS is available at https://github.com/rishabhk108/vis-ds
Deployment of Customized Deep Learning based Video Analytics On Surveillance Cameras
This paper demonstrates the effectiveness of our customized deep learning
based video analytics system in various applications focused on security,
safety, customer analytics and process compliance. We describe our video
analytics system comprising of Search, Summarize, Statistics and real-time
alerting, and outline its building blocks. These building blocks include object
detection, tracking, face detection and recognition, human and face
sub-attribute analytics. In each case, we demonstrate how custom models trained
using data from the deployment scenarios provide considerably superior
accuracies than off-the-shelf models. Towards this end, we describe our data
processing and model training pipeline, which can train and fine-tune models
from videos with a quick turnaround time. Finally, since most of these models
are deployed on-site, it is important to have resource constrained models which
do not require GPUs. We demonstrate how we custom train resource constrained
models and deploy them on embedded devices without significant loss in
accuracy. To our knowledge, this is the first work which provides a
comprehensive evaluation of different deep learning models on various
real-world customer deployment scenarios of surveillance video analytics. By
sharing our implementation details and the experiences learned from deploying
customized deep learning models for various customers, we hope that customized
deep learning based video analytics is widely incorporated in commercial
products around the world.Comment: Added Equal Contribution footnot