574 research outputs found

    Unsupervised Moving Object Segmentation using Background Subtraction and Optimal Adversarial Noise Sample Search

    Get PDF
    Moving Objects Segmentation (MOS) is a fundamental task in many computer vision applications such as human activity analysis, visual object tracking, content based video search, traffic monitoring, surveillance, and security. MOS becomes challenging due to abrupt illumination variations, dynamic backgrounds, camouflage and scenes with bootstrapping. To address these challenges we propose a MOS algorithm exploiting multiple adversarial regularizations including conventional as well as least squares losses. More specifically, our model is trained on scene background images with the help of cross-entropy loss, least squares adversarial loss and β„“ 1 loss in image space working jointly to learn the dynamic background changes. During testing, our proposed method aims to generate test image background scenes by searching optimal noise samples using joint minimization of β„“ 1 loss in image space, β„“ 1 loss in feature space, and discriminator least squares loss. These loss functions force the generator to synthesize dynamic backgrounds similar to the test sequences which upon subtraction results in moving objects segmentation. Experimental evaluations on five benchmark datasets have shown excellent performance of the proposed algorithm compared to the twenty one existing state-of-the-art methods

    Deep Siamese Networks toward Robust Visual Tracking

    Get PDF
    Recently, Siamese neural networks have been widely used in visual object tracking to leverage the template matching mechanism. Siamese network architecture contains two parallel streams to estimate the similarity between two inputs and has the ability to learn their discriminative features. Various deep Siamese-based tracking frameworks have been proposed to estimate the similarity between the target and the search region. In this chapter, we categorize deep Siamese networks into three categories by the position of the merging layers as late merge, intermediate merge and early merge architectures. In the late merge architecture, inputs are processed as two separate streams and merged at the end of the network, while in the intermediate merge architecture, inputs are initially processed separately and merged intermediate well before the final layer. Whereas in the early merge architecture, inputs are combined at the start of the network and a unified data stream is processed by a single convolutional neural network. We evaluate the performance of deep Siamese trackers based on the merge architectures and their output such as similarity score, response map, and bounding box in various tracking challenges. This chapter will give an overview of the recent development in deep Siamese trackers and provide insights for the new developments in the tracking field

    High-Quality Face Caricature via Style Translation

    Full text link
    Caricature is an exaggerated form of artistic portraiture that accentuates unique yet subtle characteristics of human faces. Recently, advancements in deep end-to-end techniques have yielded encouraging outcomes in capturing both style and elevated exaggerations in creating face caricatures. Most of these approaches tend to produce cartoon-like results that could be more practical for real-world applications. In this study, we proposed a high-quality, unpaired face caricature method that is appropriate for use in the real world and uses computer vision techniques and GAN models. We attain the exaggeration of facial features and the stylization of appearance through a two-step process: Face caricature generation and face caricature projection. The face caricature generation step creates new caricature face datasets from real images and trains a generative model using the real and newly created caricature datasets. The Face caricature projection employs an encoder trained with real and caricature faces with the pretrained generator to project real and caricature faces. We perform an incremental facial exaggeration from the real image to the caricature faces using the encoder and generator's latent space. Our projection preserves the facial identity, attributes, and expressions from the input image. Also, it accounts for facial occlusions, such as reading glasses or sunglasses, to enhance the robustness of our model. Furthermore, we conducted a comprehensive comparison of our approach with various state-of-the-art face caricature methods, highlighting our process's distinctiveness and exceptional realism.Comment: 14 pages, 21 figure
    • …
    corecore