2,652 research outputs found
Semi-automatic Data Annotation System for Multi-Target Multi-Camera Vehicle Tracking
Multi-target multi-camera tracking (MTMCT) plays an important role in
intelligent video analysis, surveillance video retrieval, and other application
scenarios. Nowadays, the deep-learning-based MTMCT has been the mainstream and
has achieved fascinating improvements regarding tracking accuracy and
efficiency. However, according to our investigation, the lacking of datasets
focusing on real-world application scenarios limits the further improvements
for current learning-based MTMCT models. Specifically, the learning-based MTMCT
models training by common datasets usually cannot achieve satisfactory results
in real-world application scenarios. Motivated by this, this paper presents a
semi-automatic data annotation system to facilitate the real-world MTMCT
dataset establishment. The proposed system first employs a deep-learning-based
single-camera trajectory generation method to automatically extract
trajectories from surveillance videos. Subsequently, the system provides a
recommendation list in the following manual cross-camera trajectory matching
process. The recommendation list is generated based on side information,
including camera location, timestamp relation, and background scene. In the
experimental stage, extensive results further demonstrate the efficiency of the
proposed system.Comment: 9 pages, 10 figure
Learning Multimodal Latent Attributes
Abstract—The rapid development of social media sharing has created a huge demand for automatic media classification and annotation techniques. Attribute learning has emerged as a promising paradigm for bridging the semantic gap and addressing data sparsity via transferring attribute knowledge in object recognition and relatively simple action classification. In this paper, we address the task of attribute learning for understanding multimedia data with sparse and incomplete labels. In particular we focus on videos of social group activities, which are particularly challenging and topical examples of this task because of their multi-modal content and complex and unstructured nature relative to the density of annotations. To solve this problem, we (1) introduce a concept of semi-latent attribute space, expressing user-defined and latent attributes in a unified framework, and (2) propose a novel scalable probabilistic topic model for learning multi-modal semi-latent attributes, which dramatically reduces requirements for an exhaustive accurate attribute ontology and expensive annotation effort. We show that our framework is able to exploit latent attributes to outperform contemporary approaches for addressing a variety of realistic multimedia sparse data learning tasks including: multi-task learning, learning with label noise, N-shot transfer learning and importantly zero-shot learning
The IMMED Project: Wearable Video Monitoring of People with Age Dementia
International audienceIn this paper, we describe a new application for multimedia indexing, using a system that monitors the instrumental activities of daily living to assess the cognitive decline caused by dementia. The system is composed of a wearable camera device designed to capture audio and video data of the instrumental activities of a patient, which is leveraged with multimedia indexing techniques in order to allow medical specialists to analyze several hour long observation shots efficiently
- …