405 research outputs found

    Hashing Neural Video Decomposition with Multiplicative Residuals in Space-Time

    Full text link
    We present a video decomposition method that facilitates layer-based editing of videos with spatiotemporally varying lighting and motion effects. Our neural model decomposes an input video into multiple layered representations, each comprising a 2D texture map, a mask for the original video, and a multiplicative residual characterizing the spatiotemporal variations in lighting conditions. A single edit on the texture maps can be propagated to the corresponding locations in the entire video frames while preserving other contents' consistencies. Our method efficiently learns the layer-based neural representations of a 1080p video in 25s per frame via coordinate hashing and allows real-time rendering of the edited result at 71 fps on a single GPU. Qualitatively, we run our method on various videos to show its effectiveness in generating high-quality editing effects. Quantitatively, we propose to adopt feature-tracking evaluation metrics for objectively assessing the consistency of video editing. Project page: https://lightbulb12294.github.io/hashing-nvd

    Super RĂ©solution Temporelle de Formes Multi-Vues

    Get PDF
    International audienceNous considérons le problème de super résolution temporelle de formes, par l'utilisation de multiples observations d'un même modèle déformé. Sans pertes de généralité, nous nous concentrons plus particulièrement au scénario multi-camera moyenne échelle, c'est à dire des scènes dynamiques, pouvant contenir plusieurs sujets. Ce contexte favorise l'utilisation de caméras couleur, mais nécessite une méthode de reconstruction robuste aux inconsistances photométriques. Dans ce but, nous proposons une nouvelle approche, spécialement dédiée à ce contexte moyenne échelle, utilisant des descripteurs et des schémas de votes adaptés. Cette méthode est étendue à la dimension temporelle de manière à améliorer les reconstructions à chaque instant, en exploitant la redondance des informations dans le temps. Pour cela, les informations photométriques fiables sont accumulées dans le temps à l'aide de champs de déformations combinés à une stratégie de croissance de région. Nous démontrons l'amélioration des reconstructions apportée par notre approche à l'aide de séquences multi-camera synthétiques

    Evaluation of face recognition algorithms under noise

    Get PDF
    One of the major applications of computer vision and image processing is face recognition, where a computerized algorithm automatically identifies a person’s face from a large image dataset or even from a live video. This thesis addresses facial recognition, a topic that has been widely studied due to its importance in many applications in both civilian and military domains. The application of face recognition systems has expanded from security purposes to social networking sites, managing fraud, and improving user experience. Numerous algorithms have been designed to perform face recognition with good accuracy. This problem is challenging due to the dynamic nature of the human face and the different poses that it can take. Regardless of the algorithm, facial recognition accuracy can be heavily affected by the presence of noise. This thesis presents a comparison of traditional and deep learning face recognition algorithms under the presence of noise. For this purpose, Gaussian and salt-andpepper noises are applied to the face images drawn from the ORL Dataset. The image recognition is performed using each of the following eight algorithms: principal component analysis (PCA), two-dimensional PCA (2D-PCA), linear discriminant analysis (LDA), independent component analysis (ICA), discrete cosine transform (DCT), support vector machine (SVM), convolution neural network (CNN) and Alex Net. The ORL dataset was used in the experiments to calculate the evaluation accuracy for each of the investigated algorithms. Each algorithm is evaluated with two experiments; in the first experiment only one image per person is used for training, whereas in the second experiment, five images per person are used for training. The investigated traditional algorithms are implemented with MATLAB and the deep learning algorithms approaches are implemented with Python. The results show that the best performance was obtained using the DCT algorithm with 92% dominant eigenvalues and 95.25 % accuracy, whereas for deep learning, the best performance was using a CNN with accuracy of 97.95%, which makes it the best choice under noisy conditions

    GAN-Based Differential Private Image Privacy Protection Framework for the Internet of Multimedia Things.

    Full text link
    With the development of the Internet of Multimedia Things (IoMT), an increasing amount of image data is collected by various multimedia devices, such as smartphones, cameras, and drones. This massive number of images are widely used in each field of IoMT, which presents substantial challenges for privacy preservation. In this paper, we propose a new image privacy protection framework in an effort to protect the sensitive personal information contained in images collected by IoMT devices. We aim to use deep neural network techniques to identify the privacy-sensitive content in images, and then protect it with the synthetic content generated by generative adversarial networks (GANs) with differential privacy (DP). Our experiment results show that the proposed framework can effectively protect users' privacy while maintaining image utility

    Apprentissage de la Cohérence Photométrique pour la Reconstruction de Formes Multi-Vues

    Get PDF
    International audienceWith the rise of augmented and virtual reality, estimating accurate shapes from multi-view RGB images is becoming an important task in computer vision. The dominant strategy employed for that purpose in the recent years relies on depth maps estimation followed by depth fusion, as depth maps prove to be efficient in recovering local surface details. Motivated by recent success of convolutional neural networks, we take this strategy a step further and present a novel solution for depth map estimation which consists in sweeping a volume along projected rays from a camera, and inferring surface presence probability at a point, seen by an arbitrary number of cameras. A strong motivation behind this work is to study the ability of learning based features to outperform traditional 2D features when estimating depth from multi-view cues. Especially with real life dynamic scenes, containing multiple moving subjects with complex surface details, scenarios where previous image based MVS methods fail to recover accurate details. Our results demonstrate this ability, showing that a CNN, trained on a standard static dataset, can help recovering surface details on dynamic scenes that are not visible to traditional 2D feature based methods. In addition, our evaluation also includes a comparison to existing reconstruction pipelines on the standard evaluation dataset we used to train our network with, showing that our solution performs on par or better than these approaches.L'essor des technologies de réalité virtuelle et augmentée s'accompagne d'un besoin accru de contenus appropriés à ces technologies et à leurs méthodes de visualisation. En particulier, la capacité à produire des contenus réels visualisables en 3D devient prépondérante. Nous considérons dans cet article le problème de la reconstruction de scènes 3D dynamiques à partir d'images couleurs. Nous intéressons tout particulièrement à la possibilité de bénéficier des réseaux de neurones convolutifs dans ce processus de reconstruction pour l'améliorer de manière effective. Les méthodes les plus récentes de reconstruction multi-vues estiment des cartes de profondeur par vue et fusionnent ensuite ces cartes dans une forme implicite 3D. Une étape clé de ces méthodes réside dans l'estimation des cartes de profondeurs. Cette étape est traditionnellement effectuée par la recherche de correspondances multi-vues à l'aide de critères de photo-cohérence. Nous proposons ici d'apprendre cette fonction de photo-cohérence sur des exemples au lieu de la définir à travers la corrélation de descripteurs photométriques, comme c'est le cas dans la plupart des méthodes actuelles. L'intuition est que la corrélation de descripteurs d'images est intrinsèquement contrainte et limitée, et que les réseaux profonds ont la capacité d'apprendre des configurations plus larges. Nos résultats sur des données réelles démontrent que cela est le cas. Entraîné sur un jeu de données statiques standard, les réseaux de convolution nous permettent de récupérer des détails sur une forme en mouvement que les descripteurs d'images classiques ne peuvent extraire. Les évaluations comparatives sur ces données standards sont par ailleurs favorables à la méthode que nous proposons

    Anomaly activity classification in the grocery stores

    Get PDF
    Nowadays, because of the growing number of robberies in shopping malls and grocery stores, automatic camera’s applications are vital necessities to detect anomalous actions. These events usually happen quickly and unexpectedly. Therefore, having a robust system which can classify anomalies in a real-time with minimum false alarms is required. Due to this needs, the main objective of this project is to classify anomalies which may happen in grocery stores. This objective is acquired by considering properties, such as; using one fixed camera in the store and the presence of at least one person in the camera view. The actions of human upper body are used to determine the anomalies. Articulated motion model is used as the basis of the anomalies classification design. In the design, the process starts with feature extraction and followed by target model establishment, tracking and action classification. The features such as color and image gradient built the template as the target model. Then, the models of different upper body parts are tracked during consecutive frames by the tracking method which is sum of square differences (SSD) combined with the Kalman filter as the predictor. The spatio-temporal information as the trajectory of limbs gained by tracking part is sent to proposed classification part. For classification, three different scenarios are studied: attacking cash machine, cashier’s attacking and making the store messy. In implementing these scenarios, some events were introduced. These events are; basic (static) events which are the static objects in the scene, spatial events which are those actions depend on coordinates of body parts and spatio-temporal events in which these actions are tracked in consecutive frames. At last, if one of the scenarios happens, an anomalous action will be detected. The results show the robustness of the proposed methods which have the minimum false positive error of 7% for the cash machine attack and minimum false negative error of 19% for the cashier’s attacking scenario
    • …
    corecore