196 research outputs found

    Heteroskedastic Geospatial Tracking with Distributed Camera Networks

    Full text link
    Visual object tracking has seen significant progress in recent years. However, the vast majority of this work focuses on tracking objects within the image plane of a single camera and ignores the uncertainty associated with predicted object locations. In this work, we focus on the geospatial object tracking problem using data from a distributed camera network. The goal is to predict an object's track in geospatial coordinates along with uncertainty over the object's location while respecting communication constraints that prohibit centralizing raw image data. We present a novel single-object geospatial tracking data set that includes high-accuracy ground truth object locations and video data from a network of four cameras. We present a modeling framework for addressing this task including a novel backbone model and explore how uncertainty calibration and fine-tuning through a differentiable tracker affect performance

    Visual on-line learning in distributed camera networks

    Get PDF
    Automatic detection of persons is an important application in visual surveillance. In general, state-of-the-art systems have two main disadvantages: First, usually a general detector has to be learned that is applicable to a wide range of scenes. Thus, the training is time-consuming and requires a huge amount of labeled data. Second, the data is usually processed centralized, which leads to a huge network traffic. Thus, the goal of this paper is to overcome these problems, which is realized by a person detection system, that is based on distributed smart cameras (DSCs). Assuming that we have a large number of cameras with partly overlapping views, the main idea is to reduce the model complexity of the detector by training a specific detector for each camera. These detectors are initialized by a pre-trained classifier, that is then adapted for a specific camera by co-training. In particular, for co-training we apply an on-line learning method (i.e., boosting for feature selection), where the information exchange is realized via mapping the overlapping views onto each other by using a homography. Thus, we have a compact scenedependent representation, which allows to train and to evaluate the classifiers on an embedded device. Moreover, since the information transfer is reduced to exchanging positions the required network-traffic is minimal. The power of the approach is demonstrated in various experiments on different publicly available data sets. In fact, we show that on-line learning and applying DSCs can benefit from each other. Index Terms — visual on-line learning, object detection, multi-camera networks 1

    Fusion techniques for activity recognition using multi-camera networks

    Get PDF
    Real-time automatic activity recognition is an important area of research in the field of Computer Vision with plenty of applications in surveillance, gaming, entertainment and automobile safety. Because of advances in wireless networks and camera technologies, distributed camera networks are becoming more prominent. Distributed camera networks offer complimentary views of scenes and hence are better suited for real-time surveillance applications. They are robust to camera failures and in-complete field of views.;In a camera network, fusing information from multiple cameras is an important problem, especially when one doesn\u27t have knowledge of subjects orientation with respect to the camera and when arrangement of cameras is not symmetric. The objective of this dissertation is to design a information fusion technique for camera networks and to apply them in the contenxt of surveillance and safety applications (in coal-mines). (Abstract shortened by ProQuest.)

    Attention Mechanism for Recognition in Computer Vision

    Get PDF
    It has been proven that humans do not focus their attention on an entire scene at once when they perform a recognition task. Instead, they pay attention to the most important parts of the scene to extract the most discriminative information. Inspired by this observation, in this dissertation, the importance of attention mechanism in recognition tasks in computer vision is studied by designing novel attention-based models. In specific, four scenarios are investigated that represent the most important aspects of attention mechanism.First, an attention-based model is designed to reduce the visual features\u27 dimensionality by selectively processing only a small subset of the data. We study this aspect of the attention mechanism in a framework based on object recognition in distributed camera networks. Second, an attention-based image retrieval system (i.e., person re-identification) is proposed which learns to focus on the most discriminative regions of the person\u27s image and process those regions with higher computation power using a deep convolutional neural network. Furthermore, we show how visualizing the attention maps can make deep neural networks more interpretable. In other words, by visualizing the attention maps we can observe the regions of the input image where the neural network relies on, in order to make a decision. Third, a model for estimating the importance of the objects in a scene based on a given task is proposed. More specifically, the proposed model estimates the importance of the road users that a driver (or an autonomous vehicle) should pay attention to in a driving scenario in order to have safe navigation. In this scenario, the attention estimation is the final output of the model. Fourth, an attention-based module and a new loss function in a meta-learning based few-shot learning system is proposed in order to incorporate the context of the task into the feature representations of the samples and increasing the few-shot recognition accuracy.In this dissertation, we showed that attention can be multi-facet and studied the attention mechanism from the perspectives of feature selection, reducing the computational cost, interpretable deep learning models, task-driven importance estimation, and context incorporation. Through the study of four scenarios, we further advanced the field of where \u27\u27attention is all you need\u27\u27

    General Dynamic Scene Reconstruction from Multiple View Video

    Get PDF
    This paper introduces a general approach to dynamic scene reconstruction from multiple moving cameras without prior knowledge or limiting constraints on the scene structure, appearance, or illumination. Existing techniques for dynamic scene reconstruction from multiple wide-baseline camera views primarily focus on accurate reconstruction in controlled environments, where the cameras are fixed and calibrated and background is known. These approaches are not robust for general dynamic scenes captured with sparse moving cameras. Previous approaches for outdoor dynamic scene reconstruction assume prior knowledge of the static background appearance and structure. The primary contributions of this paper are twofold: an automatic method for initial coarse dynamic scene segmentation and reconstruction without prior knowledge of background appearance or structure; and a general robust approach for joint segmentation refinement and dense reconstruction of dynamic scenes from multiple wide-baseline static or moving cameras. Evaluation is performed on a variety of indoor and outdoor scenes with cluttered backgrounds and multiple dynamic non-rigid objects such as people. Comparison with state-of-the-art approaches demonstrates improved accuracy in both multiple view segmentation and dense reconstruction. The proposed approach also eliminates the requirement for prior knowledge of scene structure and appearance
    • …
    corecore