1,707 research outputs found
Reversible de-identification for lossless image compression using reversible watermarking
De-Identification is a process which can be used to ensure privacy by concealing the identity of individuals captured by video surveillance systems. One important challenge is to make the obfuscation process reversible so that the original image/video can be recovered by persons in possession of the right security credentials. This work presents a novel Reversible De-Identification method that can be used in conjunction with any obfuscation process. The residual information needed to reverse the obfuscation process is compressed, authenticated, encrypted and embedded within the obfuscated image using a two-level Reversible Watermarking scheme. The proposed method ensures an overall single-pass embedding capacity of 1.25 bpp, where 99.8% of the images considered required less than 0.8 bpp while none of them required more than 1.1 bpp. Experimental results further demonstrate that the proposed method managed to recover and authenticate all images considered.peer-reviewe
Constructing Hierarchical Image-tags Bimodal Representations for Word Tags Alternative Choice
This paper describes our solution to the multi-modal learning challenge of
ICML. This solution comprises constructing three-level representations in three
consecutive stages and choosing correct tag words with a data-specific
strategy. Firstly, we use typical methods to obtain level-1 representations.
Each image is represented using MPEG-7 and gist descriptors with additional
features released by the contest organizers. And the corresponding word tags
are represented by bag-of-words model with a dictionary of 4000 words.
Secondly, we learn the level-2 representations using two stacked RBMs for each
modality. Thirdly, we propose a bimodal auto-encoder to learn the
similarities/dissimilarities between the pairwise image-tags as level-3
representations. Finally, during the test phase, based on one observation of
the dataset, we come up with a data-specific strategy to choose the correct tag
words leading to a leap of an improved overall performance. Our final average
accuracy on the private test set is 100%, which ranks the first place in this
challenge.Comment: 6 pages, 1 figure, Presented at the Workshop on Representation
Learning, ICML 201
Exploring Disentanglement with Multilingual and Monolingual VQ-VAE
This work examines the content and usefulness of disentangled phone and
speaker representations from two separately trained VQ-VAE systems: one trained
on multilingual data and another trained on monolingual data. We explore the
multi- and monolingual models using four small proof-of-concept tasks:
copy-synthesis, voice transformation, linguistic code-switching, and
content-based privacy masking. From these tasks, we reflect on how disentangled
phone and speaker representations can be used to manipulate speech in a
meaningful way. Our experiments demonstrate that the VQ representations are
suitable for these tasks, including creating new voices by mixing speaker
representations together. We also present our novel technique to conceal the
content of targeted words within an utterance by manipulating phone VQ codes,
while retaining speaker identity and intelligibility of surrounding words.
Finally, we discuss recommendations for further increasing the viability of
disentangled representations.Comment: Accepted to Speech Synthesis Workshop 2021 (SSW11
Information Hiding in Images Using Steganography Techniques
Innovation of technology and having fast Internet make information to distribute over the world easily and economically. This is made people to worry about their privacy and works. Steganography is a technique that prevents unauthorized users to have access to the important data. The steganography and digital watermarking provide methods that users can hide and mix their information within other information that make them difficult to recognize by attackers. In this paper, we review some techniques of steganography and digital watermarking in both spatial and frequency domains. Also we explain types of host documents and we focused on types of images
Toward Fine-grained Facial Expression Manipulation
Facial expression manipulation aims at editing facial expression with a given
condition. Previous methods edit an input image under the guidance of a
discrete emotion label or absolute condition (e.g., facial action units) to
possess the desired expression. However, these methods either suffer from
changing condition-irrelevant regions or are inefficient for fine-grained
editing. In this study, we take these two objectives into consideration and
propose a novel method. First, we replace continuous absolute condition with
relative condition, specifically, relative action units. With relative action
units, the generator learns to only transform regions of interest which are
specified by non-zero-valued relative AUs. Second, our generator is built on
U-Net but strengthened by Multi-Scale Feature Fusion (MSF) mechanism for
high-quality expression editing purposes. Extensive experiments on both
quantitative and qualitative evaluation demonstrate the improvements of our
proposed approach compared to the state-of-the-art expression editing methods.
Code is available at \url{https://github.com/junleen/Expression-manipulator}
Learning a self-supervised tone mapping operator via feature contrast masking loss
High Dynamic Range (HDR) content is becoming ubiquitous due to the rapid development of capture technologies. Nevertheless, the dynamic range of common display devices is still limited, therefore tone mapping (TM) remains a key challenge for image visualization. Recent work has demonstrated that neural networks can achieve remarkable performance in this task when compared to traditional methods, however, the quality of the results of these learning-based methods is limited by the training data. Most existing works use as training set a curated selection of best-performing results from existing traditional tone mapping operators (often guided by a quality metric), therefore, the quality of newly generated results is fundamentally limited by the performance of such operators. This quality might be even further limited by the pool of HDR content that is used for training. In this work we propose a learning-based self-supervised tone mapping operator that is trained at test time specifically for each HDR image and does not need any data labeling. The key novelty of our approach is a carefully designed loss function built upon fundamental knowledge on contrast perception that allows for directly comparing the content in the HDR and tone mapped images. We achieve this goal by reformulating classic VGG feature maps into feature contrast maps that normalize local feature differences by their average magnitude in a local neighborhood, allowing our loss to account for contrast masking effects. We perform extensive ablation studies and exploration of parameters and demonstrate that our solution outperforms existing approaches with a single set of fixed parameters, as confirmed by both objective and subjective metrics
- …