Search CORE

12 research outputs found

OReX: Object Reconstruction from Planner Cross-sections Using Neural Fields

Author: Bermano Amit H.
Sawdayee Haim
Vaxman Amir
Publication venue
Publication date: 23/11/2022
Field of study

Reconstructing 3D shapes from planar cross-sections is a challenge inspired by downstream applications like medical imaging and geographic informatics. The input is an in/out indicator function fully defined on a sparse collection of planes in space, and the output is an interpolation of the indicator function to the entire volume. Previous works addressing this sparse and ill-posed problem either produce low quality results, or rely on additional priors such as target topology, appearance information, or input normal directions. In this paper, we present OReX, a method for 3D shape reconstruction from slices alone, featuring a Neural Field as the interpolation prior. A simple neural network is trained on the input planes to receive a 3D coordinate and return an inside/outside estimate for the query point. This prior is powerful in inducing smoothness and self-similarities. The main challenge for this approach is high-frequency details, as the neural prior is overly smoothing. To alleviate this, we offer an iterative estimation architecture and a hierarchical input sampling scheme that encourage coarse-to-fine training, allowing focusing on high frequencies at later stages. In addition, we identify and analyze a common ripple-like effect stemming from the mesh extraction step. We mitigate it by regularizing the spatial gradients of the indicator function around input in/out boundaries, cutting the problem at the root. Through extensive qualitative and quantitative experimentation, we demonstrate our method is robust, accurate, and scales well with the size of the input. We report state-of-the-art results compared to previous approaches and recent potential solutions, and demonstrate the benefit of our individual contributions through analysis and ablation studies

arXiv.org e-Print Archive

Human Motion Diffusion as a Generative Prior

Author: Bermano Amit H.
Kapon Roy
Shafir Yonatan
Tevet Guy
Publication venue
Publication date: 16/08/2023
Field of study

Recent work has demonstrated the significant potential of denoising diffusion models for generating human motion, including text-to-motion capabilities. However, these methods are restricted by the paucity of annotated motion data, a focus on single-person motions, and a lack of detailed control. In this paper, we introduce three forms of composition based on diffusion priors: sequential, parallel, and model composition. Using sequential composition, we tackle the challenge of long sequence generation. We introduce DoubleTake, an inference-time method with which we generate long animations consisting of sequences of prompted intervals and their transitions, using a prior trained only for short clips. Using parallel composition, we show promising steps toward two-person generation. Beginning with two fixed priors as well as a few two-person training examples, we learn a slim communication block, ComMDM, to coordinate interaction between the two resulting motions. Lastly, using model composition, we first train individual priors to complete motions that realize a prescribed motion for a given joint. We then introduce DiffusionBlending, an interpolation mechanism to effectively blend several such models to enable flexible and efficient fine-grained joint and trajectory-level control and editing. We evaluate the composition methods using an off-the-shelf motion diffusion model, and further compare the results to dedicated models trained for these specific tasks

arXiv.org e-Print Archive

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Author: Alaluf Yuval
Atzmon Yuval
Bermano Amit H.
Chechik Gal
Cohen-Or Daniel
Gal Rinon
Patashnik Or
Publication venue
Publication date: 02/08/2022
Field of study

Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes. In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom. Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new "words" in the embedding space of a frozen text-to-image model. These "words" can be composed into natural language sentences, guiding personalized creation in an intuitive way. Notably, we find evidence that a single word embedding is sufficient for capturing unique and varied concepts. We compare our approach to a wide range of baselines, and demonstrate that it can more faithfully portray the concepts across a range of applications and tasks. Our code, data and new words will be available at: https://textual-inversion.github.ioComment: Project page: https://textual-inversion.github.i

arXiv.org e-Print Archive

Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models

Author: Arar Moab
Atzmon Yuval
Bermano Amit H.
Chechik Gal
Cohen-Or Daniel
Gal Rinon
Shamir Ariel
Publication venue
Publication date: 13/07/2023
Field of study

Text-to-image (T2I) personalization allows users to guide the creative image generation process by combining their own visual concepts in natural language prompts. Recently, encoder-based techniques have emerged as a new effective approach for T2I personalization, reducing the need for multiple images and long training times. However, most existing encoders are limited to a single-class domain, which hinders their ability to handle diverse concepts. In this work, we propose a domain-agnostic method that does not require any specialized dataset or prior information about the personalized concepts. We introduce a novel contrastive-based regularization technique to maintain high fidelity to the target concept characteristics while keeping the predicted embeddings close to editable regions of the latent space, by pushing the predicted tokens toward their nearest existing CLIP tokens. Our experimental results demonstrate the effectiveness of our approach and show how the learned tokens are more semantic than tokens predicted by unregularized models. This leads to a better representation that achieves state-of-the-art performance while being more flexible than previous methods.Comment: Project page at https://datencoder.github.i

arXiv.org e-Print Archive

State of the Art on Diffusion Models for Visual Computing

Author: Aberman Kfir
Barron Jonathan T.
Bermano Amit H.
Chan Eric Ryan
Dekel Tali
Golyanik Vladislav
Holynski Aleksander
Kanazawa Angjoo
Liu C. Karen
Liu Lingjie
Mildenhall Ben
Nießner Matthias
Ommer Björn
Po Ryan
Theobalt Christian
Wetzstein Gordon
Wonka Peter
Yifan Wang
Publication venue
Publication date: 11/10/2023
Field of study

The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applications has seen exponential growth and relevant papers are published across the computer graphics, computer vision, and AI communities with new works appearing daily on arXiv. This rapid growth of the field makes it difficult to keep up with all recent developments. The goal of this state-of-the-art report (STAR) is to introduce the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model, as well as overview important aspects of these generative AI tools, including personalization, conditioning, inversion, among others. Moreover, we give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing, categorized by the type of generated medium, including 2D images, videos, 3D objects, locomotion, and 4D scenes. Finally, we discuss available datasets, metrics, open challenges, and social implications. This STAR provides an intuitive starting point to explore this exciting topic for researchers, artists, and practitioners alike

arXiv.org e-Print Archive

Geometric Methods for Realistic Animation of Faces

Author: Bermano Amit H.
Publication venue: ETH-Zürich
Publication date: 01/01/2015
Field of study

Repository for Publications and Research Data

OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields

Author: Bermano Amit H.
Sawdayee Haim
Vaxman Amir
Publication venue
Publication date: 22/08/2023
Field of study

Edinburgh Research Explorer