10 research outputs found

    Fully-Coupled Two-Stream Spatiotemporal Networks for Extremely Low Resolution Action Recognition

    Full text link
    A major emerging challenge is how to protect people's privacy as cameras and computer vision are increasingly integrated into our daily lives, including in smart devices inside homes. A potential solution is to capture and record just the minimum amount of information needed to perform a task of interest. In this paper, we propose a fully-coupled two-stream spatiotemporal architecture for reliable human action recognition on extremely low resolution (e.g., 12x16 pixel) videos. We provide an efficient method to extract spatial and temporal features and to aggregate them into a robust feature representation for an entire action video sequence. We also consider how to incorporate high resolution videos during training in order to build better low resolution action recognition models. We evaluate on two publicly-available datasets, showing significant improvements over the state-of-the-art.Comment: 9 pagers, 5 figures, published in WACV 201

    Scale Invariant Privacy Preserving Video via Wavelet Decomposition

    Full text link
    Video surveillance has become ubiquitous in the modern world. Mobile devices, surveillance cameras, and IoT devices, all can record video that can violate our privacy. One proposed solution for this is privacy-preserving video, which removes identifying information from the video as it is produced. Several algorithms for this have been proposed, but all of them suffer from scale issues: in order to sufficiently anonymize near-camera objects, distant objects become unidentifiable. In this paper, we propose a scale-invariant method, based on wavelet decomposition

    Adversarial Learning of Privacy-Preserving and Task-Oriented Representations

    Full text link
    Data privacy has emerged as an important issue as data-driven deep learning has been an essential component of modern machine learning systems. For instance, there could be a potential privacy risk of machine learning systems via the model inversion attack, whose goal is to reconstruct the input data from the latent representation of deep networks. Our work aims at learning a privacy-preserving and task-oriented representation to defend against such model inversion attacks. Specifically, we propose an adversarial reconstruction learning framework that prevents the latent representations decoded into original input data. By simulating the expected behavior of adversary, our framework is realized by minimizing the negative pixel reconstruction loss or the negative feature reconstruction (i.e., perceptual distance) loss. We validate the proposed method on face attribute prediction, showing that our method allows protecting visual privacy with a small decrease in utility performance. In addition, we show the utility-privacy trade-off with different choices of hyperparameter for negative perceptual distance loss at training, allowing service providers to determine the right level of privacy-protection with a certain utility performance. Moreover, we provide an extensive study with different selections of features, tasks, and the data to further analyze their influence on privacy protection

    A review of privacy-preserving human and human activity recognition

    Get PDF

    Learning Human Action Recognition Representations Without Real Humans

    Full text link
    Pre-training on massive video datasets has become essential to achieve high action recognition performance on smaller downstream datasets. However, most large-scale video datasets contain images of people and hence are accompanied with issues related to privacy, ethics, and data protection, often preventing them from being publicly shared for reproducible research. Existing work has attempted to alleviate these problems by blurring faces, downsampling videos, or training on synthetic data. On the other hand, analysis on the transferability of privacy-preserving pre-trained models to downstream tasks has been limited. In this work, we study this problem by first asking the question: can we pre-train models for human action recognition with data that does not include real humans? To this end, we present, for the first time, a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model. We then evaluate the transferability of the representation learned on this data to a diverse set of downstream action recognition benchmarks. Furthermore, we propose a novel pre-training strategy, called Privacy-Preserving MAE-Align, to effectively combine synthetic data and human-removed real data. Our approach outperforms previous baselines by up to 5% and closes the performance gap between human and no-human action recognition representations on downstream tasks, for both linear probing and fine-tuning. Our benchmark, code, and models are available at https://github.com/howardzh01/PPMA .Comment: 19 pages, 7 figures, 2023 NeurIPS Datasets and Benchmarks Trac

    Towards Generalizable Deep Image Matting: Decomposition, Interaction, and Merging

    Get PDF
    Image matting refers to extracting the precise alpha mattes from images, playing a critical role in many downstream applications. Despite extensive attention, key challenges persist and motivate the research presented in this thesis. One major challenge is the reliance of auxiliary inputs in previous methods, hindering real-time practicality. To address this, we introduce fully automatic image matting by decomposing the task into high-level semantic segmentation and low-level details matting. We then incorporate plug-in modules to enhance the interaction between the sub-tasks through feature integration. Furthermore, we propose an attention-based mechanism to guide the matting process through collaboration merging. Another challenge lies in limited matting datasets, resulting in reliance on composite images and inferior performance on images in the wild. In response, our research proposes a composition route to mitigate the discrepancies and result in remarkable generalization ability. Additionally, we construct numerous large datasets of high-quality real-world images with manually labeled alpha mattes, providing a solid foundation for training and evaluation. Moreover, our research uncovers new observations that warrant further investigation. Firstly, we systematically analyze and address privacy issues that have been neglected in previous portrait matting research. Secondly, we explore the adaptation of automatic matting methods to non-salient or transparent categories beyond salient ones. Furthermore, we collaborate with language modality to achieve a more controllable matting process, enabling specific target selection at a low cost. To validate our studies, we conduct extensive experiments and provide all codes and datasets through the link (https://github.com/JizhiziLi/). We believe that the analyses, methods, and datasets presented in this thesis will offer valuable insights for future research endeavors in the field of image matting
    corecore