37 research outputs found
Generative Watermarking Against Unauthorized Subject-Driven Image Synthesis
Large text-to-image models have shown remarkable performance in synthesizing
high-quality images. In particular, the subject-driven model makes it possible
to personalize the image synthesis for a specific subject, e.g., a human face
or an artistic style, by fine-tuning the generic text-to-image model with a few
images from that subject. Nevertheless, misuse of subject-driven image
synthesis may violate the authority of subject owners. For example, malicious
users may use subject-driven synthesis to mimic specific artistic styles or to
create fake facial images without authorization. To protect subject owners
against such misuse, recent attempts have commonly relied on adversarial
examples to indiscriminately disrupt subject-driven image synthesis. However,
this essentially prevents any benign use of subject-driven synthesis based on
protected images.
In this paper, we take a different angle and aim at protection without
sacrificing the utility of protected images for general synthesis purposes.
Specifically, we propose GenWatermark, a novel watermark system based on
jointly learning a watermark generator and a detector. In particular, to help
the watermark survive the subject-driven synthesis, we incorporate the
synthesis process in learning GenWatermark by fine-tuning the detector with
synthesized images for a specific subject. This operation is shown to largely
improve the watermark detection accuracy and also ensure the uniqueness of the
watermark for each individual subject. Extensive experiments validate the
effectiveness of GenWatermark, especially in practical scenarios with unknown
models and text prompts (74% Acc.), as well as partial data watermarking (80%
Acc. for 1/4 watermarking). We also demonstrate the robustness of GenWatermark
to two potential countermeasures that substantially degrade the synthesis
quality
The Stable Signature: Rooting Watermarks in Latent Diffusion Models
Generative image modeling enables a wide range of applications but raises
ethical concerns about responsible deployment. This paper introduces an active
strategy combining image watermarking and Latent Diffusion Models. The goal is
for all generated images to conceal an invisible watermark allowing for future
detection and/or identification. The method quickly fine-tunes the latent
decoder of the image generator, conditioned on a binary signature. A
pre-trained watermark extractor recovers the hidden signature from any
generated image and a statistical test then determines whether it comes from
the generative model. We evaluate the invisibility and robustness of the
watermarks on a variety of generation tasks, showing that Stable Signature
works even after the images are modified. For instance, it detects the origin
of an image generated from a text prompt, then cropped to keep of the
content, with + accuracy at a false positive rate below 10.Comment: Website at https://pierrefdz.github.io/publications/stablesignatur
Information embedding and retrieval in 3D printed objects
Deep learning and convolutional neural networks have become the main tools of computer vision. These techniques are good at using supervised learning to learn complex representations from data. In particular, under limited settings, the image recognition model now performs better than the human baseline. However, computer vision science aims to build machines that can see. It requires the model to be able to extract more valuable information from images and videos than recognition. Generally, it is much more challenging to apply these deep learning models from recognition to other problems in computer vision.
This thesis presents end-to-end deep learning architectures for a new computer vision field: watermark retrieval from 3D printed objects. As it is a new area, there is no state-of-the-art on many challenging benchmarks. Hence, we first define the problems and introduce the traditional approach, Local Binary Pattern method, to set our baseline for further study. Our neural networks seem useful but straightfor- ward, which outperform traditional approaches. What is more, these networks have good generalization. However, because our research field is new, the problems we face are not only various unpredictable parameters but also limited and low-quality training data.
To address this, we make two observations: (i) we do not need to learn everything from scratch, we know a lot about the image segmentation area, and (ii) we cannot know everything from data, our models should be aware what key features they
should learn. This thesis explores these ideas and even explore more. We show how to use end-to-end deep learning models to learn to retrieve watermark bumps and tackle covariates from a few training images data. Secondly, we introduce ideas from synthetic image data and domain randomization to augment training data and understand various covariates that may affect retrieve real-world 3D watermark bumps. We also show how the illumination in synthetic images data to effect and even improve retrieval accuracy for real-world recognization applications
State of the Art on Neural Rendering
Efficient rendering of photo-realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo-realistic images from hand-crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo-realistic computer graphics more widely accessible. Concurrently, progress in computer vision and machine learning have given rise to a new approach to image synthesis and editing, namely deep generative models. Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. With a plethora of applications in computer graphics and vision, neural rendering is poised to become a new area in the graphics community, yet no survey of this emerging field exists. This state-of-the-art report summarizes the recent trends and applications of neural rendering. We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs. Starting with an overview of the underlying computer graphics and machine learning concepts, we discuss critical aspects of neural rendering approaches. This state-of-the-art report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence. Finally, we conclude with a discussion of the social implications of such technology and investigate open research problems
Privacy Intelligence: A Survey on Image Sharing on Online Social Networks
Image sharing on online social networks (OSNs) has become an indispensable
part of daily social activities, but it has also led to an increased risk of
privacy invasion. The recent image leaks from popular OSN services and the
abuse of personal photos using advanced algorithms (e.g. DeepFake) have
prompted the public to rethink individual privacy needs when sharing images on
OSNs. However, OSN image sharing itself is relatively complicated, and systems
currently in place to manage privacy in practice are labor-intensive yet fail
to provide personalized, accurate and flexible privacy protection. As a result,
an more intelligent environment for privacy-friendly OSN image sharing is in
demand. To fill the gap, we contribute a systematic survey of 'privacy
intelligence' solutions that target modern privacy issues related to OSN image
sharing. Specifically, we present a high-level analysis framework based on the
entire lifecycle of OSN image sharing to address the various privacy issues and
solutions facing this interdisciplinary field. The framework is divided into
three main stages: local management, online management and social experience.
At each stage, we identify typical sharing-related user behaviors, the privacy
issues generated by those behaviors, and review representative intelligent
solutions. The resulting analysis describes an intelligent privacy-enhancing
chain for closed-loop privacy management. We also discuss the challenges and
future directions existing at each stage, as well as in publicly available
datasets.Comment: 32 pages, 9 figures. Under revie
FACTIFY3M: A Benchmark for Multimodal Fact Verification with Explainability through 5W Question-Answering
Combating disinformation is one of the burning societal crises -- about 67%
of the American population believes that disinformation produces a lot of
uncertainty, and 10% of them knowingly propagate disinformation. Evidence shows
that disinformation can manipulate democratic processes and public opinion,
causing disruption in the share market, panic and anxiety in society, and even
death during crises. Therefore, disinformation should be identified promptly
and, if possible, mitigated. With approximately 3.2 billion images and 720,000
hours of video shared online daily on social media platforms, scalable
detection of multimodal disinformation requires efficient fact verification.
Despite progress in automatic text-based fact verification (e.g., FEVER, LIAR),
the research community lacks substantial effort in multimodal fact
verification. To address this gap, we introduce FACTIFY 3M, a dataset of 3
million samples that pushes the boundaries of the domain of fact verification
via a multimodal fake news dataset, in addition to offering explainability
through the concept of 5W question-answering. Salient features of the dataset
include: (i) textual claims, (ii) ChatGPT-generated paraphrased claims, (iii)
associated images, (iv) stable diffusion-generated additional images (i.e.,
visual paraphrases), (v) pixel-level image heatmap to foster image-text
explainability of the claim, (vi) 5W QA pairs, and (vii) adversarial fake news
stories.Comment: arXiv admin note: text overlap with arXiv:2305.0432
Visual Content Privacy Protection: A Survey
Vision is the most important sense for people, and it is also one of the main
ways of cognition. As a result, people tend to utilize visual content to
capture and share their life experiences, which greatly facilitates the
transfer of information. Meanwhile, it also increases the risk of privacy
violations, e.g., an image or video can reveal different kinds of
privacy-sensitive information. Researchers have been working continuously to
develop targeted privacy protection solutions, and there are several surveys to
summarize them from certain perspectives. However, these surveys are either
problem-driven, scenario-specific, or technology-specific, making it difficult
for them to summarize the existing solutions in a macroscopic way. In this
survey, a framework that encompasses various concerns and solutions for visual
privacy is proposed, which allows for a macro understanding of privacy concerns
from a comprehensive level. It is based on the fact that privacy concerns have
corresponding adversaries, and divides privacy protection into three
categories, based on computer vision (CV) adversary, based on human vision (HV)
adversary, and based on CV \& HV adversary. For each category, we analyze the
characteristics of the main approaches to privacy protection, and then
systematically review representative solutions. Open challenges and future
directions for visual privacy protection are also discussed.Comment: 24 pages, 13 figure