169 research outputs found
Mosaic: Designing Online Creative Communities for Sharing Works-in-Progress
Online creative communities allow creators to share their work with a large
audience, maximizing opportunities to showcase their work and connect with fans
and peers. However, sharing in-progress work can be technically and socially
challenging in environments designed for sharing completed pieces. We propose
an online creative community where sharing process, rather than showcasing
outcomes, is the main method of sharing creative work. Based on this, we
present Mosaic---an online community where illustrators share work-in-progress
snapshots showing how an artwork was completed from start to finish. In an
online deployment and observational study, artists used Mosaic as a vehicle for
reflecting on how they can improve their own creative process, developed a
social norm of detailed feedback, and became less apprehensive of sharing early
versions of artwork. Through Mosaic, we argue that communities oriented around
sharing creative process can create a collaborative environment that is
beneficial for creative growth
Searching the Visual Style and Structure of D3 Visualizations
We present a search engine for D3 visualizations that allows queries based on
their visual style and underlying structure. To build the engine we crawl a
collection of 7860 D3 visualizations from the Web and deconstruct each one to
recover its data, its data-encoding marks and the encodings describing how the
data is mapped to visual attributes of the marks. We also extract axes and
other non-data-encoding attributes of marks (e.g., typeface, background color).
Our search engine indexes this style and structure information as well as
metadata about the webpage containing the chart. We show how visualization
developers can search the collection to find visualizations that exhibit
specific design characteristics and thereby explore the space of possible
designs. We also demonstrate how researchers can use the search engine to
identify commonly used visual design patterns and we perform such a demographic
design analysis across our collection of D3 charts. A user study reveals that
visualization developers found our style and structure based search engine to
be significantly more useful and satisfying for finding different designs of D3
charts, than a baseline search engine that only allows keyword search over the
webpage containing a chart
Adding Conditional Control to Text-to-Image Diffusion Models
We present ControlNet, a neural network architecture to add spatial
conditioning controls to large, pretrained text-to-image diffusion models.
ControlNet locks the production-ready large diffusion models, and reuses their
deep and robust encoding layers pretrained with billions of images as a strong
backbone to learn a diverse set of conditional controls. The neural
architecture is connected with "zero convolutions" (zero-initialized
convolution layers) that progressively grow the parameters from zero and ensure
that no harmful noise could affect the finetuning. We test various conditioning
controls, eg, edges, depth, segmentation, human pose, etc, with Stable
Diffusion, using single or multiple conditions, with or without prompts. We
show that the training of ControlNets is robust with small (<50k) and large
(>1m) datasets. Extensive results show that ControlNet may facilitate wider
applications to control image diffusion models.Comment: Codes and Supplementary Material:
https://github.com/lllyasviel/ControlNe
Recommended from our members
Efficient Shadows from Sampled Environment Maps
This paper addresses the problem of efficiently calculating shadows from environment maps. Since accurate rendering of shadows from environment maps requires hundreds of lights, the expensive computation is determining visibility from each pixel to each light direction, such as by ray-tracing. We show that coherence in both spatial and angular domains can be used to reduce the number of shadow rays that need to be traced. Specifically, we use a coarse-to-fine evaluation of the image, predicting visibility by reusing visibility calculations from four nearby pixels that have already been evaluated. This simple method allows us to explicitly mark regions of uncertainty in the prediction. By only tracing rays in these and neighboring directions, we are able to reduce the number of shadow rays traced by up to a factor of 20 while maintaining error rates below 0.01%. For many scenes, our algorithm can add shadowing from hundreds of lights at twice the cost of rendering without shadows
Tree-Structured Shading Decomposition
We study inferring a tree-structured representation from a single image for
object shading. Prior work typically uses the parametric or measured
representation to model shading, which is neither interpretable nor easily
editable. We propose using the shade tree representation, which combines basic
shading nodes and compositing methods to factorize object surface shading. The
shade tree representation enables novice users who are unfamiliar with the
physical shading process to edit object shading in an efficient and intuitive
manner. A main challenge in inferring the shade tree is that the inference
problem involves both the discrete tree structure and the continuous parameters
of the tree nodes. We propose a hybrid approach to address this issue. We
introduce an auto-regressive inference model to generate a rough estimation of
the tree structure and node parameters, and then we fine-tune the inferred
shade tree through an optimization algorithm. We show experiments on synthetic
images, captured reflectance, real images, and non-realistic vector drawings,
allowing downstream applications such as material editing, vectorized shading,
and relighting. Project website: https://chen-geng.com/inv-shade-treesComment: Accepted at ICCV 2023. Project website:
https://chen-geng.com/inv-shade-tree
Bridging the Gulf of Envisioning: Cognitive Design Challenges in LLM Interfaces
Large language models (LLMs) exhibit dynamic capabilities and appear to
comprehend complex and ambiguous natural language prompts. However, calibrating
LLM interactions is challenging for interface designers and end-users alike. A
central issue is our limited grasp of how human cognitive processes begin with
a goal and form intentions for executing actions, a blindspot even in
established interaction models such as Norman's gulfs of execution and
evaluation. To address this gap, we theorize how end-users 'envision'
translating their goals into clear intentions and craft prompts to obtain the
desired LLM response. We define a process of Envisioning by highlighting three
misalignments: (1) knowing whether LLMs can accomplish the task, (2) how to
instruct the LLM to do the task, and (3) how to evaluate the success of the
LLM's output in meeting the goal. Finally, we make recommendations to narrow
the envisioning gulf in human-LLM interactions
De-emphasis of distracting image regions using texture power maps
We present a post-processing technique that selectively reduces the salience of distracting regions in an image. Computational models of attention predict that texture variation influences bottom-up attention mechanisms. Our method reduces the spatial variation of texture using power maps, high-order features describing local frequency content in an image. Modification of power maps results in effective regional de-emphasis. We validate our results quantitatively via a human subject search experiment and qualitatively with eye tracking data.Singapore-MIT Alliance (SMA
- …