Search CORE

8 research outputs found

Interpreting CLIP's Image Representation via Text-Based Decomposition

Author: Efros Alexei A.
Gandelsman Yossi
Steinhardt Jacob
Publication venue
Publication date: 09/10/2023
Field of study

We investigate the CLIP image encoder by analyzing how individual model components affect the final representation. We decompose the image representation as a sum across individual image patches, model layers, and attention heads, and use CLIP's text representation to interpret the summands. Interpreting the attention heads, we characterize each head's role by automatically finding text representations that span its output space, which reveals property-specific roles for many heads (e.g. location or shape). Next, interpreting the image patches, we uncover an emergent spatial localization within CLIP. Finally, we use this understanding to remove spurious features from CLIP and to create a strong zero-shot image segmenter. Our results indicate that a scalable understanding of transformer models is attainable and can be used to repair and improve models.Comment: Project page and code: https://yossigandelsman.github.io/clip_decomposition

arXiv.org e-Print Archive

Synthesizing Moving People with 3D Control

Author: Efros Alexei A.
Gandelsman Yossi
Li Boyi
Malik Jitendra
Rajasegaran Jathushan
Publication venue
Publication date: 19/01/2024
Field of study

In this paper, we present a diffusion model-based framework for animating people from a single image for a given target 3D motion sequence. Our approach has two core components: a) learning priors about invisible parts of the human body and clothing, and b) rendering novel body poses with proper clothing and texture. For the first part, we learn an in-filling diffusion model to hallucinate unseen parts of a person given a single image. We train this model on texture map space, which makes it more sample-efficient since it is invariant to pose and viewpoint. Second, we develop a diffusion-based rendering pipeline, which is controlled by 3D human poses. This produces realistic renderings of novel poses of the person, including clothing, hair, and plausible in-filling of unseen regions. This disentangled approach allows our method to generate a sequence of images that are faithful to the target motion in the 3D pose and, to the input image in terms of visual similarity. In addition to that, the 3D control allows various synthetic camera trajectories to render a person. Our experiments show that our method is resilient in generating prolonged motions and varied challenging and complex poses compared to prior methods. Please check our website for more details: https://boyiliee.github.io/3DHM.github.io/

arXiv.org e-Print Archive

Idempotent Generative Network

Author: Dravid Amil
Efros Alexei A.
Gandelsman Yossi
Mosseri Inbar
Rubinstein Michael
Shocher Assaf
Publication venue
Publication date: 02/11/2023
Field of study

We propose a new approach for generative modeling based on training a neural network to be idempotent. An idempotent operator is one that can be applied sequentially without changing the result beyond the initial application, namely

f(f(z))=f(z)

. The proposed model

f

is trained to map a source distribution (e.g, Gaussian noise) to a target distribution (e.g. realistic images) using the following objectives: (1) Instances from the target distribution should map to themselves, namely

f(x)=x

. We define the target manifold as the set of all instances that

f

maps to themselves. (2) Instances that form the source distribution should map onto the defined target manifold. This is achieved by optimizing the idempotence term,

f(f(z))=f(z)

which encourages the range of

f(z)

to be on the target manifold. Under ideal assumptions such a process provably converges to the target distribution. This strategy results in a model capable of generating an output in one step, maintaining a consistent latent space, while also allowing sequential applications for refinement. Additionally, we find that by processing inputs from both target and source distributions, the model adeptly projects corrupted or modified data back to the target manifold. This work is a first step towards a ``global projector'' that enables projecting any input into a target data distribution

arXiv.org e-Print Archive

MyStyle: A Personalized Generative Prior

Author: Aberman Kfir
Cohen-or Daniel
Gandelsman Yossi
He Qiurui
Liba Orly
Mosseri Inbar
Nitzan Yotam
Pritch Yael
Yarom Michal
Publication venue
Publication date: 31/03/2022
Field of study

We introduce MyStyle, a personalized deep generative prior trained with a few shots of an individual. MyStyle allows to reconstruct, enhance and edit images of a specific person, such that the output is faithful to the person's key facial characteristics. Given a small reference set of portrait images of a person (~100), we tune the weights of a pretrained StyleGAN face generator to form a local, low-dimensional, personalized manifold in the latent space. We show that this manifold constitutes a personalized region that spans latent codes associated with diverse portrait images of the individual. Moreover, we demonstrate that we obtain a personalized generative prior, and propose a unified approach to apply it to various ill-posed image enhancement problems, such as inpainting and super-resolution, as well as semantic editing. Using the personalized generative prior we obtain outputs that exhibit high-fidelity to the input images and are also faithful to the key facial characteristics of the individual in the reference set. We demonstrate our method with fair-use images of numerous widely recognizable individuals for whom we have the prior knowledge for a qualitative evaluation of the expected outcome. We evaluate our approach against few-shots baselines and show that our personalized prior, quantitatively and qualitatively, outperforms state-of-the-art alternatives.Comment: Project webpage: https://mystyle-personalized-prior.github.io/, Video: https://youtu.be/QvOdQR3tlO

arXiv.org e-Print Archive

Deep geometric texture synthesis

Author: Achlioptas Panos
Amir Hertz
Bier Eric A
Campen Marcel
Daniel Cohen-Or
Gandelsman Yossi
Kingma Diederik P
Knöppel Felix
Li Yangyan
Lévy Bruno
Park Jeong Joon
Qi Charles R
Raja Giryes
Rana Hanocka
Shaham Tamar Rott
Sorkine Olga
Sun Yu
Ulyanov Dmitry
Wu Zhirong
Ying Lexing
Zhou Qingnan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Point2Mesh

Author: Berger Matthew
Choi Sungjoon
Daniel Cohen-Or
Gaier Adam
Gal Metzer
Gandelsman Yossi
Hornung Alexander
Huang Hui
Huang Hui
Huang Jingwei
Kass Michael
Kazhdan Michael
Kazhdan Michael
Kurenkov Andrey
Manson Josiah
McInerney Tim
Miao Yongwei
Nagai Yukie
Ohtake Yutaka
Raja Giryes
Rakotosaona Marie-Julie
Rana Hanocka
Saxe Andrew M
Shaham Tamar Rott
Shamir Ohad
Sharf Andrei
Sun Yu
Ulyanov Dmitry
Yifan Wang
Zhou Qingnan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref