6,299 research outputs found
Strategies for Searching Video Content with Text Queries or Video Examples
The large number of user-generated videos uploaded on to the Internet
everyday has led to many commercial video search engines, which mainly rely on
text metadata for search. However, metadata is often lacking for user-generated
videos, thus these videos are unsearchable by current search engines.
Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity
problem by directly analyzing the visual and audio streams of each video. CBVR
encompasses multiple research topics, including low-level feature design,
feature fusion, semantic detector training and video search/reranking. We
present novel strategies in these topics to enhance CBVR in both accuracy and
speed under different query inputs, including pure textual queries and query by
video examples. Our proposed strategies have been incorporated into our
submission for the TRECVID 2014 Multimedia Event Detection evaluation, where
our system outperformed other submissions in both text queries and video
example queries, thus demonstrating the effectiveness of our proposed
approaches
Watch and Learn: Semi-Supervised Learning of Object Detectors from Videos
We present a semi-supervised approach that localizes multiple unknown object
instances in long videos. We start with a handful of labeled boxes and
iteratively learn and label hundreds of thousands of object instances. We
propose criteria for reliable object detection and tracking for constraining
the semi-supervised learning process and minimizing semantic drift. Our
approach does not assume exhaustive labeling of each object instance in any
single frame, or any explicit annotation of negative data. Working in such a
generic setting allow us to tackle multiple object instances in video, many of
which are static. In contrast, existing approaches either do not consider
multiple object instances per video, or rely heavily on the motion of the
objects present. The experiments demonstrate the effectiveness of our approach
by evaluating the automatically labeled data on a variety of metrics like
quality, coverage (recall), diversity, and relevance to training an object
detector.Comment: To appear in CVPR 201
Spatio-Temporal Multimedia Big Data Analytics Using Deep Neural Networks
With the proliferation of online services and mobile technologies, the world has stepped into a multimedia big data era, where new opportunities and challenges appear with the high diversity multimedia data together with the huge amount of social data. Nowadays, multimedia data consisting of audio, text, image, and video has grown tremendously. With such an increase in the amount of multimedia data, the main question raised is how one can analyze this high volume and variety of data in an efficient and effective way. A vast amount of research work has been done in the multimedia area, targeting different aspects of big data analytics, such as the capture, storage, indexing, mining, and retrieval of multimedia big data. However, there is insufficient research that provides a comprehensive framework for multimedia big data analytics and management.
To address the major challenges in this area, a new framework is proposed based on deep neural networks for multimedia semantic concept detection with a focus on spatio-temporal information analysis and rare event detection. The proposed framework is able to discover the pattern and knowledge of multimedia data using both static deep data representation and temporal semantics. Specifically, it is designed to handle data with skewed distributions. The proposed framework includes the following components: (1) a synthetic data generation component based on simulation and adversarial networks for data augmentation and deep learning training, (2) an automatic sampling model to overcome the imbalanced data issue in multimedia data, (3) a deep representation learning model leveraging novel deep learning techniques to generate the most discriminative static features from multimedia data, (4) an automatic hyper-parameter learning component for faster training and convergence of the learning models, (5) a spatio-temporal deep learning model to analyze dynamic features from multimedia data, and finally (6) a multimodal deep learning fusion model to integrate different data modalities. The whole framework has been evaluated using various large-scale multimedia datasets that include the newly collected disaster-events video dataset and other public datasets
Behavior and event detection for annotation and surveillance
Visual surveillance and activity analysis is an active research
field of computer vision. As a result, there are several
different algorithms produced for this purpose. To obtain
more robust systems it is desirable to integrate the different algorithms. To achieve this goal, the paper presents results in automatic event detection in surveillance videos, and a distributed application framework for supporting these methods. Results in motion analysis for static and moving cameras, automatic fight detection, shadow segmentation, discovery of unusual motion patterns, indexing and retrieval will be presented. These applications perform real time, and are suitable for real life applications
Real-Time Idling Vehicles Detection using Combined Audio-Visual Deep Learning
Combustion vehicle emissions contribute to poor air quality and release
greenhouse gases into the atmosphere, and vehicle pollution has been associated
with numerous adverse health effects. Roadways with extensive waiting and/or
passenger drop off, such as schools and hospital drop-off zones, can result in
high incidence and density of idling vehicles. This can produce micro-climates
of increased vehicle pollution. Thus, the detection of idling vehicles can be
helpful in monitoring and responding to unnecessary idling and be integrated
into real-time or off-line systems to address the resulting pollution. In this
paper we present a real-time, dynamic vehicle idling detection algorithm. The
proposed idle detection algorithm and notification rely on an algorithm to
detect these idling vehicles. The proposed method relies on a multi-sensor,
audio-visual, machine-learning workflow to detect idling vehicles visually
under three conditions: moving, static with the engine on, and static with the
engine off. The visual vehicle motion detector is built in the first stage, and
then a contrastive-learning-based latent space is trained for classifying
static vehicle engine sound. We test our system in real-time at a hospital
drop-off point in Salt Lake City. This in-situ dataset was collected and
annotated, and it includes vehicles of varying models and types. The
experiments show that the method can detect engine switching on or off
instantly and achieves 71.02 average precision (AP) for idle detections and
91.06 for engine off detections
- …