787 research outputs found
ID.8: Co-Creating Visual Stories with Generative AI
Storytelling is an integral part of human culture and significantly impacts
cognitive and socio-emotional development and connection. Despite the
importance of interactive visual storytelling, the process of creating such
content requires specialized skills and is labor-intensive. This paper
introduces ID.8, an open-source system designed for the co-creation of visual
stories with generative AI. We focus on enabling an inclusive storytelling
experience by simplifying the content creation process and allowing for
customization. Our user evaluation confirms a generally positive user
experience in domains such as enjoyment and exploration, while highlighting
areas for improvement, particularly in immersiveness, alignment, and
partnership between the user and the AI system. Overall, our findings indicate
promising possibilities for empowering people to create visual stories with
generative AI. This work contributes a novel content authoring system, ID.8,
and insights into the challenges and potential of using generative AI for
multimedia content creation
Large-scale Text-to-Image Generation Models for Visual Artists' Creative Works
Large-scale Text-to-image Generation Models (LTGMs) (e.g., DALL-E),
self-supervised deep learning models trained on a huge dataset, have
demonstrated the capacity for generating high-quality open-domain images from
multi-modal input. Although they can even produce anthropomorphized versions of
objects and animals, combine irrelevant concepts in reasonable ways, and give
variation to any user-provided images, we witnessed such rapid technological
advancement left many visual artists disoriented in leveraging LTGMs more
actively in their creative works. Our goal in this work is to understand how
visual artists would adopt LTGMs to support their creative works. To this end,
we conducted an interview study as well as a systematic literature review of 72
system/application papers for a thorough examination. A total of 28 visual
artists covering 35 distinct visual art domains acknowledged LTGMs' versatile
roles with high usability to support creative works in automating the creation
process (i.e., automation), expanding their ideas (i.e., exploration), and
facilitating or arbitrating in communication (i.e., mediation). We conclude by
providing four design guidelines that future researchers can refer to in making
intelligent user interfaces using LTGMs.Comment: 15 pages, 3 figure
A Portrait of Emotion: Empowering Self-Expression through AI-Generated Art
We investigated the potential and limitations of generative artificial
intelligence (AI) in reflecting the authors' cognitive processes through
creative expression. The focus is on the AI-generated artwork's ability to
understand human intent (alignment) and visually represent emotions based on
criteria such as creativity, aesthetic, novelty, amusement, and depth. Results
show a preference for images based on the descriptions of the authors' emotions
over the main events. We also found that images that overrepresent specific
elements or stereotypes negatively impact AI alignment. Our findings suggest
that AI could facilitate creativity and the self-expression of emotions. Our
research framework with generative AIs can help design AI-based interventions
in related fields (e.g., mental health education, therapy, and counseling).Comment: Accepted CogSci 202
CLIP-CLOP: CLIP-Guided Collage and Photomontage
The unabated mystique of large-scale neural networks, such as the CLIP dual
image-and-text encoder, popularized automatically generated art. Increasingly
more sophisticated generators enhanced the artworks' realism and visual
appearance, and creative prompt engineering enabled stylistic expression.
Guided by an artist-in-the-loop ideal, we design a gradient-based generator to
produce collages. It requires the human artist to curate libraries of image
patches and to describe (with prompts) the whole image composition, with the
option to manually adjust the patches' positions during generation, thereby
allowing humans to reclaim some control of the process and achieve greater
creative freedom. We explore the aesthetic potentials of high-resolution
collages, and provide an open-source Google Colab as an artistic tool.Comment: 5 pages, 7 figures, published at the International Conference on
Computational Creativity (ICCC) 2022 as Short Paper: Dem
ReelFramer: Co-creating News Reels on Social Media with Generative AI
Short videos on social media are a prime way many young people find and
consume content. News outlets would like to reach audiences through news reels,
but currently struggle to translate traditional journalistic formats into the
short, entertaining videos that match the style of the platform. There are many
ways to frame a reel-style narrative around a news story, and selecting one is
a challenge. Different news stories call for different framings, and require a
different trade-off between entertainment and information. We present a system
called ReelFramer that uses text and image generation to help journalists
explore multiple narrative framings for a story, then generate scripts,
character boards and storyboards they can edit and iterate on. A user study of
five graduate students in journalism-related fields found the system greatly
eased the burden of transforming a written story into a reel, and that
exploring framings to find the right one was a rewarding process
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing
Creating music is iterative, requiring varied methods at each stage. However,
existing AI music systems fall short in orchestrating multiple subsystems for
diverse needs. To address this gap, we introduce Loop Copilot, a novel system
that enables users to generate and iteratively refine music through an
interactive, multi-round dialogue interface. The system uses a large language
model to interpret user intentions and select appropriate AI models for task
execution. Each backend model is specialized for a specific task, and their
outputs are aggregated to meet the user's requirements. To ensure musical
coherence, essential attributes are maintained in a centralized table. We
evaluate the effectiveness of the proposed system through semi-structured
interviews and questionnaires, highlighting its utility not only in
facilitating music creation but also its potential for broader applications.Comment: Source code and demo video are available at
\url{https://sites.google.com/view/loop-copilot
Controllable music performance synthesis via hierarchical modelling
L’expression musicale requiert le contrôle sur quelles notes sont jouées ainsi que comment elles se jouent.
Les synthétiseurs audios conventionnels offrent des contrôles expressifs détaillés, cependant au détriment du réalisme.
La synthèse neuronale en boîte noire des audios et les échantillonneurs concaténatifs sont capables de produire un son réaliste, pourtant, nous avons peu de mécanismes de contrôle.
Dans ce travail, nous introduisons MIDI-DDSP, un modèle hiérarchique des instruments musicaux qui permet tant la synthèse neuronale réaliste des audios que le contrôle sophistiqué de la part des utilisateurs.
À partir des paramètres interprétables de synthèse provenant du traitement différentiable des signaux numériques (Differentiable Digital Signal Processing, DDSP), nous inférons les notes musicales et la propriété de haut niveau de leur performance expressive (telles que le timbre, le vibrato, l’intensité et l’articulation).
Ceci donne naissance à une hiérarchie de trois niveaux (notes, performance, synthèse) qui laisse aux individus la possibilité d’intervenir à chaque niveau, ou d’utiliser la distribution préalable entraînée (notes étant donné performance, synthèse étant donné performance) pour une assistance créative. À l’aide des expériences quantitatives et des tests d’écoute, nous démontrons que cette hiérarchie permet de reconstruire des audios de haute fidélité, de prédire avec précision les attributs de performance d’une séquence de notes, mais aussi de manipuler indépendamment les attributs étant donné la performance. Comme il s’agit d’un système complet, la hiérarchie peut aussi générer des audios réalistes à partir d’une nouvelle séquence de notes.
En utilisant une hiérarchie interprétable avec de multiples niveaux de granularité, MIDI-DDSP ouvre la porte aux outils auxiliaires qui renforce la capacité des individus à travers une grande variété d’expérience musicale.Musical expression requires control of both what notes are played, and how they are performed.
Conventional audio synthesizers provide detailed expressive controls, but at the cost of realism.
Black-box neural audio synthesis and concatenative samplers can produce realistic audio, but have few mechanisms for control.
In this work, we introduce MIDI-DDSP a hierarchical model of musical instruments that enables both realistic neural audio synthesis and detailed user control.
Starting from interpretable Differentiable Digital Signal Processing (DDSP) synthesis parameters, we infer musical notes and high-level properties of their expressive performance (such as timbre, vibrato, dynamics, and articulation).
This creates a 3-level hierarchy (notes, performance, synthesis) that affords individuals the option to intervene at each level, or utilize trained priors (performance given notes, synthesis given performance) for creative assistance. Through quantitative experiments and listening tests, we demonstrate that this hierarchy can reconstruct high-fidelity audio, accurately predict performance attributes for a note sequence, independently manipulate the attributes of a given performance, and as a complete system, generate realistic audio from a novel note sequence.
By utilizing an interpretable hierarchy, with multiple levels of granularity, MIDI-DDSP opens the door to assistive tools to empower individuals across a diverse range of musical experience
"It Felt Like Having a Second Mind": Investigating Human-AI Co-creativity in Prewriting with Large Language Models
Prewriting is the process of discovering and developing ideas before a first
draft, which requires divergent thinking and often implies unstructured
strategies such as diagramming, outlining, free-writing, etc. Although large
language models (LLMs) have been demonstrated to be useful for a variety of
tasks including creative writing, little is known about how users would
collaborate with LLMs to support prewriting. The preferred collaborative role
and initiative of LLMs during such a creativity process is also unclear. To
investigate human-LLM collaboration patterns and dynamics during prewriting, we
conducted a three-session qualitative study with 15 participants in two
creative tasks: story writing and slogan writing. The findings indicated that
during collaborative prewriting, there appears to be a three-stage iterative
Human-AI Co-creativity process that includes Ideation, Illumination, and
Implementation stages. This collaborative process champions the human in a
dominant role, in addition to mixed and shifting levels of initiative that
exist between humans and LLMs. This research also reports on collaboration
breakdowns that occur during this process, user perceptions of using existing
LLMs during Human-AI Co-creativity, and discusses design implications to
support this co-creativity process.Comment: Under review at CSCW after a Major Revisio
- …