85,818 research outputs found

    Efficient multi-view multi-target tracking using a distributed camera network

    Get PDF
    In this paper, we propose a multi-target tracking method using a distributed camera network, which can effectively handle the occlusion and reidenfication problems by combining advanced deep learning and distributed information fusion. The targets are first detected using a fast object detection method based on deep learning. We then combine the deep visual feature information and spatial trajectory information in the Hungarian algorithm for robust targets association. The deep visual feature information is extracted from a convolutional neural network, which is pre-trained using a large-scale person reidentification dataset. The spatial trajectories of multiple targets in our framework are derived from a multiple view information fusion method, which employs an information weighted consensus filter for fusion and tracking. In addition, we also propose an efficient track processing method for ID assignment using multiple view information. The experiments on public datasets show that the proposed method is robust to solve the occlusion problem and reidentification problem, and can achieve superior performance compared to the state of the art methods

    Particle Filter for Targets Tracking with Motion Model

    Get PDF
    Real-time robust tracking for multiple non-rigid objects is a challenging task in computer vision research. In recent years, stochastic sampling based particle filter has been widely used to describe the complicated target features of image sequence. In this paper, non-parametric density estimation and particle filter techniques are employed to model the background and track the object. Color feature and motion model of the target are extracted and used as key features in the tracking step, in order to adapt to multiple variations in the scene, such as background clutters, object's scale change and partial overlap of different targets. The paper also presents the experimental result on the robustness and effectiveness of the proposed method in a number of outdoor and indoor visual surveillance scenes.published_or_final_versio

    Online Multi-Object Tracking Using CNN-based Single Object Tracker with Spatial-Temporal Attention Mechanism

    Full text link
    In this paper, we propose a CNN-based framework for online MOT. This framework utilizes the merits of single object trackers in adapting appearance models and searching for target in the next frame. Simply applying single object tracker for MOT will encounter the problem in computational efficiency and drifted results caused by occlusion. Our framework achieves computational efficiency by sharing features and using ROI-Pooling to obtain individual features for each target. Some online learned target-specific CNN layers are used for adapting the appearance model for each target. In the framework, we introduce spatial-temporal attention mechanism (STAM) to handle the drift caused by occlusion and interaction among targets. The visibility map of the target is learned and used for inferring the spatial attention map. The spatial attention map is then applied to weight the features. Besides, the occlusion status can be estimated from the visibility map, which controls the online updating process via weighted loss on training samples with different occlusion statuses in different frames. It can be considered as temporal attention mechanism. The proposed algorithm achieves 34.3% and 46.0% in MOTA on challenging MOT15 and MOT16 benchmark dataset respectively.Comment: Accepted at International Conference on Computer Vision (ICCV) 201

    Multiple Fish Tracking via Viterbi Data Association for Low-Frame-Rate Underwater Camera Systems †

    Get PDF
    Abstract-Non-extractive fish abundance estimation with the aid of visual analysis has drawn increasing attention. Low frame rate and variable illumination in the underwater environment, however, makes conventional tracking methods unreliable. In this paper, a robust multiple fish tracking system for low-framerate underwater stereo cameras is proposed. With the result of fish segmentation, a computationally efficient block-matching method is applied to perform successful stereo matching. A multiple-feature matching cost function is utilized to give a simple but effective metric for finding the temporal match of each target. Built upon reliable stereo matching, a multipletarget tracking algorithm via the Viterbi data association is developed to overcome the poor motion continuity of targets. Experimental results show that an accurate underwater live fish tracking result with stereo cameras is achieved

    SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

    Get PDF
    Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure
    corecore