415 research outputs found
DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models
Even though trained mainly on images, we discover that pretrained diffusion
models show impressive power in guiding sketch synthesis. In this paper, we
present DiffSketcher, an innovative algorithm that creates vectorized free-hand
sketches using natural language input. DiffSketcher is developed based on a
pre-trained text-to-image diffusion model. It performs the task by directly
optimizing a set of Bezier curves with an extended version of the score
distillation sampling (SDS) loss, which allows us to use a raster-level
diffusion model as a prior for optimizing a parametric vectorized sketch
generator. Furthermore, we explore attention maps embedded in the diffusion
model for effective stroke initialization to speed up the generation process.
The generated sketches demonstrate multiple levels of abstraction while
maintaining recognizability, underlying structure, and essential visual details
of the subject drawn. Our experiments show that DiffSketcher achieves greater
quality than prior work.Comment: 14 pages, 8 figures. update: improved experiment analysis, fixed
typos, and fixed image error
DeepSketchHair: Deep Sketch-based 3D Hair Modeling
We present sketchhair, a deep learning based tool for interactive modeling of
3D hair from 2D sketches. Given a 3D bust model as reference, our sketching
system takes as input a user-drawn sketch (consisting of hair contour and a few
strokes indicating the hair growing direction within a hair region), and
automatically generates a 3D hair model, which matches the input sketch both
globally and locally. The key enablers of our system are two carefully designed
neural networks, namely, S2ONet, which converts an input sketch to a dense 2D
hair orientation field; and O2VNet, which maps the 2D orientation field to a 3D
vector field. Our system also supports hair editing with additional sketches in
new views. This is enabled by another deep neural network, V2VNet, which
updates the 3D vector field with respect to the new sketches. All the three
networks are trained with synthetic data generated from a 3D hairstyle
database. We demonstrate the effectiveness and expressiveness of our tool using
a variety of hairstyles and also compare our method with prior art
Recommended from our members
Controllable Neural Synthesis for Natural Images and Vector Art
Neural image synthesis approaches have become increasingly popular over the last years due to their ability to generate photorealistic images useful for several applications, such as digital entertainment, mixed reality, synthetic dataset creation, computer art, to name a few. Despite the progress over the last years, current approaches lack two important aspects: (a) they often fail to capture long-range interactions in the image, and as a result, they fail to generate scenes with complex dependencies between their different objects or parts. (b) they often ignore the underlying 3D geometry of the shape/scene in the image, and as a result, they frequently lose coherency and details.My thesis proposes novel solutions to the above problems. First, I propose a neural transformer architecture that captures long-range interactions and context for image synthesis at high resolutions, leading to synthesizing interesting phenomena in scenes, such as reflections of landscapes onto water or flora consistent with the rest of the landscape, that was not possible to generate reliably with previous ConvNet- and other transformer-based approaches. The key idea of the architecture is to sparsify the transformer\u27s attention matrix at high resolutions, guided by dense attention extracted at lower image resolution. I present qualitative and quantitative results, along with user studies, demonstrating the effectiveness of the method, and its superiority compared to the state-of-the-art. Second, I propose a method that generates artistic images with the guidance of input 3D shapes. In contrast to previous methods, the use of a geometric representation of 3D shape enables the synthesis of more precise stylized drawings with fewer artifacts. My method outputs the synthesized images in a vector representation, enabling richer downstream analysis or editing in interactive applications. I also show that the method produces substantially better results than existing image-based methods, in terms of predicting artists’ drawings and in user evaluation of results
Image-Based Approaches to Hair Modeling
Hair is a relevant characteristic of virtual characters, therefore the modeling of plausible facial hair and hairstyles is an essential step in the generation of computer generated (CG) avatars. However, the inherent geometric complexity of hair together with the huge number of filaments of an average human head make the task of modeling hairstyles a very challenging one. To date this is commonly a manual process which requires artist skills or very specialized and costly acquisition software. In this work we present an image-based approach to model facial hair (beard and eyebrows) and (head) hairstyles. Since facial hair is usually much shorter than the average head hair two different methods are resented, adapted to the characteristics of the hair to be modeled. Facial hair is modeled using data extracted from facial texture images and missing information is inferred by means of a database-driven prior model. Our hairstyle reconstruction technique employs images of the hair to be modeled taken with a thermal camera. The major advantage of our thermal image-based method over conventional image-based techniques lies on the fact that during data capture the hairstyle is "lit from the inside": the thermal camera captures heat irradiated by the head and actively re-emitted by the hair filaments almost isotropically. Following this approach we can avoid several issues of conventional image-based techniques, like shadowing or anisotropy in reflectance. The presented technique requires minimal user interaction and a simple acquisition setup. Several challenging examples demonstrate the potential of the proposed approach
Sketch2CAD: Sequential CAD Modeling by Sketching in Context
International audienceWe present a sketch-based CAD modeling system, where users create objects incrementally by sketching the desired shape edits, which our system automatically translates to CAD operations. Our approach is motivated by the close similarities between the steps industrial designers follow to draw 3D shapes, and the operations CAD modeling systems offer to create similar shapes. To overcome the strong ambiguity with parsing 2D sketches, we observe that in a sketching sequence, each step makes sense and can be interpreted in the context of what has been drawn before. In our system, this context corresponds to a partial CAD model, inferred in the previous steps, which we feed along with the input sketch to a deep neural network in charge of interpreting how the model should be modified by that sketch. Our deep network architecture then recognizes the intended CAD operation and segments the sketch accordingly, such that a subsequent optimization estimates the parameters of the operation that best fit the segmented sketch strokes. Since there exists no datasets of paired sketching and CAD mod-eling sequences, we train our system by generating synthetic sequences of CAD operations that we render as line drawings. We present a proof of concept realization of our algorithm supporting four frequently used CAD operations. Using our system, participants are able to quickly model a large and diverse set of objects, demonstrating Sketch2CAD to be an alternate way of interacting with current CAD modeling systems
Image editing and interaction tools for visual expression
Digital photography is becoming extremely common in our daily life. However, images are difficult to edit and interact with. From a user's perspective, it is important to interact freely with the images on his/her smartphone or ipad. In this thesis we develop several image editing and interaction systems with this idea in mind. We aim for creating visual models with pre-computed internal structures
such that interaction is readily supported. We demonstrate that such interactable models,
driven by a user's hand, can render powerful visual expressiveness, and make static pixel arrays much more fun to play with.
The first system harnesses the editing power of vector graphics. We convert raster images
into a vector representation using Loop's subdivision surfaces. An image is represented by a multi-resolution feature-preserving sparse control mesh, with which image editing can be done at semantic level. A user can easily put a smile on a face image, or adjust the level of scene abstractness through a simple slider. The second system allows one to insert an object from image into a new scene. The key is to correct the shading on the object such that it goes consistently with the scene. Unlike traditional approach, we use a simple shape to
capture gross shading effects and a set of shading detail images to account for visual complexities. The high-frequency nature of these detail images allows a moderate range of interactive composition effects without causing alarming visual artifacts. The third system is on video clips instead of a single image. We proposed a fully automated algorithm to creat
Analysis and Synthesis of Interactive Video Sprites
In this thesis, we explore how video, an extremely compelling medium that is traditionally consumed passively, can be transformed into interactive experiences and what is preventing content creators from using it for this purpose. Film captures extremely rich and dynamic information but, due to the sheer amount of data and the drastic change in content appearance over time, it is problematic to work with. Content creators are willing to invest time and effort to design and capture video so why not for manipulating and interacting with it? We hypothesize that people can help and be helped by automatic video processing and synthesis algorithms when they are given the right tools. Computer games are a very popular interactive media where players engage with dynamic content in compelling and intuitive ways. The first contribution of this thesis is an in-depth exploration of the modes of interaction that enable game-like video experiences. Through active discussions with game developers, we identify both how to assist content creators and how their creation can be dynamically interacted with by players. We present concepts, explore algorithms and design tools that together enable interactive video experiences. Our findings concerning processing videos and interacting with filmed content come together in this thesis' second major contribution. We present a new medium of expression where video elements can be looped, merged and triggered interactively. Static-camera videos are converted into loopable sequences that can be controlled in real time in response to simple end-user requests. We present novel algorithms and interactive tools that enable our new medium of expression. Our human-in-the-loop system gives the user progressively more creative control over the video content as they invest more effort and artists help us evaluate it. Monocular, static-camera videos are a good fit for looping algorithms but they have been limited to two-dimensional applications as pixels are reshuffled in space and time on the image plane. The final contribution of this thesis breaks through this barrier by allowing users to interact with filmed objects in a three-dimensional manner. Our novel object tracking algorithm extends existing 2D bounding box trackers with 3D information, such as a well-fitting bounding volume, which in turn enables a new breed of interactive video experiences. The filmed content becomes a three-dimensional playground as users are free to move the virtual camera or the tracked objects and see them from novel viewpoints
Communication of Digital Material Appearance Based on Human Perception
Im alltägliche Leben begegnen wir digitalen Materialien in einer Vielzahl von Situationen wie beispielsweise bei Computerspielen, Filmen, Reklamewänden in zB U-Bahn Stationen oder beim Online-Kauf von Kleidungen. Während einige dieser Materialien durch digitale Modelle repräsentiert werden, welche das Aussehen einer bestimmten Oberfläche in Abhängigkeit des Materials der Fläche sowie den Beleuchtungsbedingungen beschreiben, basieren andere digitale Darstellungen auf der simplen Verwendung von Fotos der realen Materialien, was zB bei Online-Shopping häufig verwendet wird. Die Verwendung von computer-generierten Materialien ist im Vergleich zu einzelnen Fotos besonders vorteilhaft, da diese realistische Erfahrungen im Rahmen von virtuellen Szenarien, kooperativem Produkt-Design, Marketing während der prototypischen Entwicklungsphase oder der Ausstellung von Möbeln oder Accesoires in spezifischen Umgebungen erlauben. Während mittels aktueller Digitalisierungsmethoden bereits eine beeindruckende Reproduktionsqualität erzielt wird, wird eine hochpräzise photorealistische digitale Reproduktion von Materialien für die große Vielfalt von Materialtypen nicht erreicht. Daher verwenden viele Materialkataloge immer noch Fotos oder sogar physikalische Materialproben um ihre Kollektionen zu repräsentieren. Ein wichtiger Grund für diese Lücke in der Genauigkeit des Aussehens von digitalen zu echten Materialien liegt darin, dass die Zusammenhänge zwischen physikalischen Materialeigenschaften und der vom Menschen wahrgenommenen visuellen Qualität noch weitgehend unbekannt sind. Die im Rahmen dieser Arbeit durchgeführten Untersuchungen adressieren diesen Aspekt. Zu diesem Zweck werden etablierte digitalie Materialmodellen bezüglich ihrer Eignung zur Kommunikation von physikalischen und sujektiven Materialeigenschaften untersucht, wobei Beobachtungen darauf hinweisen, dass ein Teil der fühlbaren/haptischen Informationen wie z.B. Materialstärke oder Härtegrad aufgrund der dem Modell anhaftenden geometrische Abstraktion verloren gehen. Folglich wird im Rahmen der Arbeit das Zusammenspiel der verschiedenen Sinneswahrnehmungen (mit Fokus auf die visuellen und akustischen Modalitäten) untersucht um festzustellen, welche Informationen während des Digitalisierungsprozesses verloren gehen. Es zeigt sich, dass insbesondere akustische Informationen in Kombination mit der visuellen Wahrnehmung die Einschätzung fühlbarer Materialeigenschaften erleichtert. Eines der Defizite bei der Analyse des Aussehens von Materialien ist der Mangel bezüglich sich an der Wahnehmung richtenden Metriken die eine Beantwortung von Fragen wie z.B. "Sind die Materialien A und B sich ähnlicher als die Materialien C und D?" erlauben, wie sie in vielen Anwendungen der Computergrafik auftreten. Daher widmen sich die im Rahmen dieser Arbeit durchgeführten Studien auch dem Vergleich von unterschiedlichen Materialrepräsentationen im Hinblick auf. Zu diesem Zweck wird eine Methodik zur Berechnung der wahrgenommenen paarweisen Ähnlichkeit von Material-Texturen eingeführt, welche auf der Verwendung von Textursyntheseverfahren beruht und sich an der Idee/dem Begriff der geradenoch-wahrnehmbaren Unterschiede orientiert. Der vorgeschlagene Ansatz erlaubt das Überwinden einiger Probleme zuvor veröffentlichter Methoden zur Bestimmung der Änhlichkeit von Texturen und führt zu sinnvollen/plausiblen Distanzen von Materialprobem. Zusammenfassend führen die im Rahmen dieser Dissertation dargestellten Inhalte/Verfahren zu einem tieferen Verständnis bezüglich der menschlichen Wahnehmung von digitalen bzw. realen Materialien über unterschiedliche Sinne, einem besseren Verständnis bzgl. der Bewertung der Ähnlichkeit von Texturen durch die Entwicklung einer neuen perzeptuellen Metrik und liefern grundlegende Einsichten für zukünftige Untersuchungen im Bereich der Perzeption von digitalen Materialien.In daily life, we encounter digital materials and interact with them in numerous situations, for instance when we play computer games, watch a movie, see billboard in the metro station or buy new clothes online. While some of these virtual materials are given by computational models that describe the appearance of a particular surface based on its material and the illumination conditions, some others are presented as simple digital photographs of real materials, as is usually the case for material samples from online retailing stores. The utilization of computer-generated materials entails significant advantages over plain images as they allow realistic experiences in virtual scenarios, cooperative product design, advertising in prototype phase or exhibition of furniture and wearables in specific environments. However, even though exceptional material reproduction quality has been achieved in the domain of computer graphics, current technology is still far away from highly accurate photo-realistic virtual material reproductions for the wide range of existing categories and, for this reason, many material catalogs still use pictures or even physical material samples to illustrate their collections. An important reason for this gap between digital and real material appearance is that the connections between physical material characteristics and the visual quality perceived by humans are far from well-understood. Our investigations intend to shed some light in this direction. Concretely, we explore the ability of state-of-the-art digital material models in communicating physical and subjective material qualities, observing that part of the tactile/haptic information (eg thickness, hardness) is missing due to the geometric abstractions intrinsic to the model. Consequently, in order to account for the information deteriorated during the digitization process, we investigate the interplay between different sensing modalities (vision and hearing) and discover that particular sound cues, in combination with visual information, facilitate the estimation of such tactile material qualities. One of the shortcomings when studying material appearance is the lack of perceptually-derived metrics able to answer questions like "are materials A and B more similar than C and D?", which arise in many computer graphics applications. In the absence of such metrics, our studies compare different appearance models in terms of how capable are they to depict/transmit a collection of meaningful perceptual qualities. To address this problem, we introduce a methodology to compute the perceived pairwise similarity between textures from material samples that makes use of patch-based texture synthesis algorithms and is inspired on the notion of Just-Noticeable Differences. Our technique is able to overcome some of the issues posed by previous texture similarity collection methods and produces meaningful distances between samples. In summary, with the contents presented in this thesis we are able to delve deeply in how humans perceive digital and real materials through different senses, acquire a better understanding of texture similarity by developing a perceptually-based metric and provide a groundwork for further investigations in the perception of digital materials
THREE DIMENSIONAL MODELING AND ANIMATION OF FACIAL EXPRESSIONS
Facial expression and animation are important aspects of the 3D environment featuring human characters. These animations are frequently used in many kinds of applications and there have been many efforts to increase the realism. Three aspects are still stimulating active research: the detailed subtle facial expressions, the process of rigging a face, and the transfer of an expression from one person to another. This dissertation focuses on the above three aspects.
A system for freely designing and creating detailed, dynamic, and animated facial expressions is developed. The presented pattern functions produce detailed and animated facial expressions. The system produces realistic results with fast performance, and allows users to directly manipulate it and see immediate results.
Two unique methods for generating real-time, vivid, and animated tears have been developed and implemented. One method is for generating a teardrop that continually changes its shape as the tear drips down the face. The other is for generating a shedding tear, which is a kind of tear that seamlessly connects with the skin as it flows along the surface of the face, but remains an individual object. The methods both broaden CG and increase the realism of facial expressions.
A new method to automatically set the bones on facial/head models to speed up the rigging process of a human face is also developed. To accomplish this, vertices that describe the face/head as well as relationships between each part of the face/head are grouped. The average distance between pairs of vertices is used to place the head bones. To set the bones in the face with multi-density, the mean value of the vertices in a group is measured. The time saved with this method is significant.
A novel method to produce realistic expressions and animations by transferring an existing expression to a new facial model is developed. The approach is to transform the source model into the target model, which then has the same topology as the source model. The displacement vectors are calculated. Each vertex in the source model is mapped to the target model. The spatial relationships of each mapped vertex are constrained
- …