74 research outputs found
Latent Auto-recursive Composition Engine: A Generative System for Creative Expression in Human-AI Collaboration
This thesis investigates the shifting boundaries of art in the era of Generative AI, crit-ically examining the essence of art and the legitimacy of AI-generated works. Despitesignificant advancements in the quality and accessibility of art through generativeAI, such creations frequently encounter skepticism regarding their status as authenticart. To address this skepticism, the study explores the role of creative agency in var-ious generative AI workflows and introduces an ”artist-in-the-loop” system tailoredfor image generation models like Stable Diffusion. This system aims to deepen theartist’s engagement and understanding of the creative process. Additionally, a noveltool, the Latent Auto-recursive Composition Engine (LACE), which integrates Pho-toshop and ControlNet with Stable Diffusion, is introduced to improve transparencyand control. This approach not only broadens the scope of computational creativitybut also enhances artists’ ownership of AI-generated art, bridging the divide betweenAI-driven and traditional human artistry in the digital landscape
Real-time image generation for interactive art : developing an artwork platform with ComfyUI and TouchDesigner
In this thesis is presented the development of an adaptable and reusable artwork platform model. This platform, called P2KTool, is targeted towards artistic exploration of generative AI tools with a specific focus on real-time interactivity. The platform model was designed as a project template for off-the-shelf, consumer-grade PC hardware and the software tools ComfyUI and TouchDesigner. These tools and the relevant image generation technology are evaluated in the context of integrating them into a real-time application.
“ANTIMAGIA,” a proof-of-concept interactive installation utilising this platform, is presented with a technical description of the physical setup of the piece, as well as an overview of its central theme, the perception of artifice, and artificiality. Conceptual motivations and stylistic decisions regarding the presentation of generated imagery, as well as the role of interactivity and participation are discussed.
Finally, audience reactions to the exhibited installation are assessed together with insights gained in the process of working with generative AI models. Markedly positive and humorous impressions, as well as general curiosity about the technical implementation of the installation suggest that audiences are receptive to the interactive artform presented. In view of these findings, a proposal for the direction for continued development of P2KTool, future installations and other potential applications is given
Generative AI for 2D Character Animation
In this pilot project, we teamed up with artists to develop new workflows for
2D animation while producing a short educational cartoon. We identified several
workflows to streamline the animation process, bringing the artists' vision to
the screen more effectively.Comment: Project page: https://genai-2d-character-animation.github.io
Latent Auto-recursive Composition Engine
This thesis investigates the shifting boundaries of art in the era of Generative AI, critically examining the essence of art and the legitimacy of AI-generated works. Despite significant advancements in the quality and accessibility of art through generative AI, such creations frequently encounter skepticism regarding their status as authentic art. To address this skepticism, the study explores the role of creative agency in various generative AI workflows and introduces an artist-in-the-loop system tailored for image generation models like Stable Diffusion. This system aims to deepen the artist\u27s engagement and understanding of the creative process. Additionally, a novel tool, the Latent Auto-recursive Composition Engine (LACE), which integrates Photoshop and ControlNet with Stable Diffusion, is introduced to improve transparency and control. This approach not only broadens the scope of computational creativity but also enhances artists\u27 ownership of AI-generated art, bridging the divide between AI-driven and traditional human artistry in the digital landscape
Leveraging prompt engineering to generate enhanced facial image morphs: An analysis of generation and attack potential
Ansiktsgjenkjenningssystemer (FRS) er en integrert del av sikkerheten, men står overfor betydelige trusler fra morphing‑angrep, der ett enkelt bilde kan verifiseres for flere personer. Utviklingen og grundige evalueringen av deteksjonssystemer for morphing‑angrep (MAD) hemmes ofte av den begrensede visuelle kvaliteten og realismen i eksisterende morfede bildedatasett, som ofte er lavoppløselige. Denne avhandlingen tar for seg denne kritiske mangelen ved å utvikle, implementere og kritisk analysere en ny flerstegs bildeforbedringspipeline. Pipelinen utnytter avanserte generative modeller og sofistikerte prompt‑engineering‑strategier for systematisk å forbedre den visuelle kvaliteten til eksisterende lavoppløselige ansiktsmorphs og undersøker deretter hvilken innvirkning disse forbedringene har på deres angrepspotensial mot moderne FRS.
Forbedringspipelinen, konstruert i ComfyUI‑rammeverket, prosesserer systematisk basisbilder fra MorDiff på 256×256 piksler. Fase 1 fokuserer på grunnleggende forbedringer, inkludert høyoppløselig oppskalering (med FLUX.1‑dev og ControlNet), nøye ansiktstrekkforfining (ved hjelp av SAM og YOLOv8) og omfattende etterbehandling. Fase 2 utforsker avansert realismegenerering gjennom to distinkte delpipelines: PuLID for høyfidelitets, identitetsbevarende kontekstportretter (styrt av Florence‑2‑berikede prompts) og ACE++ for sofistikert kontekstuell inpainting. Effektiviteten til hvert forbedringssteg evalueres grundig ved hjelp av Fréchet Inception Distance (FID) for visuell realisme, ElasticFace‑likhetsscore for FRS‑oppfattet identitet og Morph Attack Potential (MAP)‑analyse for å kvantifisere angrepssuksess.
Funnene avdekker et overbevisende, og ofte kontraintuitivt, «paradoks av forbedret realisme». Mens grunnleggende forbedringer, særlig ansiktsdetaljering, forbedret FID‑score sammenlignet med innledende oppskalering, reduserte de gradvis FRS‑oppfattede identitetssignaler og MAP‑score. Mer bemerkelsesverdig er at de avanserte realismetiltakene (PuLID og ACE++), til tross for mål om økt perceptuell realisme, resulterte i betydelig dårligere FID‑score (når de ble sammenlignet med det opprinnelige bonafide‑datasettet) og førte til en nesten total tilintetgjørelse av morphing‑angrepspotensialet, med MAP‑score nær null, i strid med forventningen om at forbedret realisme skulle gi sterkere angrep. Dette indikerer at omfattende visuelle forbedringer, spesielt de som introduserer nye kontekster eller stiler via prompt‑engineering, fundamentalt kan endre biometriske signaler på måter som gjør morphs ineffektive mot dagens FRS.
Denne forskningen bidrar med en systematisk metodologi for forbedring av morfede bilder, gir nye innsikter i den nyanserte rollen som prompt‑engineering spiller på dette området, og presenterer sterke empiriske bevis som utfordrer antakelsen om at økt visuell realisme direkte oversettes til større styrke i morphing‑angrep. Den observerte reduksjonen i FRS‑gjenkjennelighet med avanserte teknikker antyder også tentativt et fremvoksende potensial for slike pipelines i personvernfremmende bildetransformasjoner. Disse funnene har betydelige implikasjoner for fremtidig utvikling av både morph‑genereringsteknikker og mer robuste MAD‑systemer.Facial Recognition Systems (FRS) are integral to security, yet face significant threats from morphing attacks, where a single image is verifiable for multiple individuals. The development and rigorous evaluation of Morphing Attack Detection (MAD) systems are often hampered by the limited visual quality and realism of existing morphed image datasets, which are frequently low-resolution. This thesis addresses this critical deficiency by developing, implementing, and critically analyzing a novel multi-stage image enhancement pipeline. The pipeline leverages advanced generative models and sophisticated prompt engineering strategies to systematically improve the visual quality of existing low-resolution facial morphs and subsequently assesses the impact of these enhancements on their attack potential against contemporary FRS.
The enhancement pipeline, constructed within the ComfyUI framework, systematically processes baseline 256x256 pixel MorDiff images. Phase 1 focuses on foundational enhancements, including high-resolution upscaling (utilizing FLUX.1-dev with ControlNet), meticulous facial feature refinement (employing SAM and YOLOv8), and comprehensive post-processing. Phase 2 explores advanced realism generation through two distinct sub-pipelines: PuLID for high-fidelity, identity-preserving contextual portraits (guided by Florence-2 enriched prompts) and ACE++ for sophisticated contextual inpainting. The efficacy of each enhancement stage is rigorously evaluated using Fréchet Inception Distance (FID) for visual realism, ElasticFace similarity scores for FRS-perceived identity, and Morph Attack Potential (MAP) analysis to quantify attack success.
The findings reveal a compelling, and often counter-intuitive, "paradox of enhanced realism." While foundational enhancements, particularly face detailing, improved FID scores relative to initial upscaling, they progressively decreased FRS-perceived identity signals and MAP scores. More strikingly, the advanced realism techniques (PuLID and ACE++), despite aiming for heightened perceptual realism, resulted in significantly worse FID scores (when compared against the original bonafide dataset) and led to a near-total annihilation of morph attack potential, with MAP scores approaching zero, contrary to the expectation that enhanced realism would yield stronger attacks. This indicates that extensive visual enhancements, especially those introducing new contexts or styles via prompt engineering, can fundamentally alter biometric cues in ways that render morphs ineffective against current FRS.
This research contributes a systematic methodology for enhancing morphed images, provides novel insights into the nuanced role of prompt engineering in this domain, and presents strong empirical evidence challenging the assumption that increased visual realism directly translates to greater morphing attack strength. The observed reduction in FRS recognizability with advanced techniques also tentatively suggests an emergent potential for such pipelines in privacy-enhancing image transformations. These findings have significant implications for the future development of both morph generation techniques and more robust MAD systems
Speculative Co-design With AI: An Artist-Friendly Prototype for Non-Human Avatar Creation
This technical demonstration responds to the growing importance of virtual identity. It introduces an artist-friendly, no-code prototype designed specifically to transparently integrate speculative AI processes into artistic co-design methods focused on virtual identity exploration. Artists and designers often struggle to effectively integrate AI into their creative practice because packaged commercial AI tools rarely accommodate their unique visual language and generally obscure creative control through black-box interfaces. These tools typically lack transparency for advanced users and are inconvenient when managing diverse creative inputs, especially in participatory co-design contexts involving multiple contributors. Addressing these challenges, this demonstration showcases a prototype developed using ComfyUI, a visual, node-based interface for building generative AI workflows without coding, combined with custom-trained LoRA models to ensure visual consistency and personal artistic expression in participant-driven avatar generation. Grounded in literature emphasising AI’s potential to foster imaginative engagement through transparency and creative flexibility, this transparent AI tool supports co-design activities, encouraging a wider community of creative practitioners to confidently experiment with speculative AI-driven co-desig
Prompting Techniques for Screenplay Generation with Large Language Models
openLa narrazione ha sempre avuto un ruolo centrale nella cultura e nella società umana. Oggi, saper raccontare storie è una competenza fondamentale in molti ambiti, ben oltre quello dell'intrattenimento. La capacità di creare una storia coerente, dettagliata, originale e capace di catturare l'attenzione è un'abilità complessa, considerabile come indicatore di intelligenza. Non sorprende quindi che siano stati sviluppati numerosi approcci per creare sistemi narrativi che sfruttano l'Intelligenza Artificiale e il Machine Learning. Questa tesi offre una revisione sistematica di tali approcci, concentrandosi su un formato narrativo specifico: la sceneggiatura cinematografica, con particolare enfasi sui sistemi basati sui Large Language Models. Nella seconda parte del lavoro, verranno presentati nuovi metodi per affrontare la scrittura di sceneggiature, alcuni dei quali in fase di studio e altri completamente innovativi nel panorama della ricerca. Inoltre, verrà introdotto "Kinotron", un'estensione per il noto software open source di IA ComfyUI, caratterizzata da un sistema a nodi progettato appositamente per la generazione di sceneggiature cinematografiche per mezzo dei Large Language Models.Storytelling has always held central cultural and social importance throughout human history. Today, it is a fundamental skill in many domains that extend far beyond the entertainment industry. The ability to craft a cohesive, detailed, novel, and attention-grabbing narrative is a challenging task that can be considered a proxy for intelligence. It is no surprise, then, that there have been various approaches to building storytelling systems that leverage components of Artificial Intelligence and Machine Learning. This thesis will present a systematic review of these past efforts, focusing on a specific type of storytelling format: film screenwriting, with a particular emphasis on systems based on Large Language Models. The second part of this work will introduce several new approaches to the screenwriting task, either currently under study or entirely novel within the research landscape. Additionally, this thesis will present "Kinotron," an extension for popular AI open source software ComfyUI, featuring a node-based system designed specifically for film script generation using Large Language Models
Generative media: Sign, metaphor, and experience
This article explores the vivid field of generative media, focusing on the production and semiotic analysis of texts. It uses the broad definition of “text,” which encompasses written, visual, and interactive forms, and examines how generative media redefines the roles of content creators and tools. Utilizing Roman Jakobson’s communication model, the article highlights the dynamic decision-making process in text production, whether by human or AI. The paper offers a historical review which traces generative media from early computer art in the 1960s, through the advent of digital design tools in the 1980s and 1990s, to contemporary AI techniques like GANs and diffusion models. It identifies the key properties of generative media: synthetic, dynamic, digital, combinatorial, and agentic. The discussion also addresses the shift from unnoticed AI assistance in early tools to the inadvertently AI-generated content in today’s media landscape. The last part of the paper categorizes generative media interfaces into three types: conversational, web UI, and visual programming interfaces, analyzing their semiotic implications. It suggests that understanding these tools requires recognizing their layered technical components and the evolving user experience from magic-like simplicity to complex customization. In conclusion, the articles advocates for a multidisciplinary, process-oriented approach to fully grasp the cultural and communicative transformations driven by generative media, emphasizing the importance of transparency and user agency in AI interactions
Generatiivisen tekoälysovelluksen käytettävyys ja osaamisen kehittäminen
Tämän tutkimustyyppisen opinnäytetyön tavoitteena oli tunnistaa käyttöhaasteita, joita käyttäjät kohtaavat erityisesti mediaa tuottavien generatiivisten tekoälymallien käytössä, sekä kartoittaa ratkaisumalleja näihin haasteisiin. Tutkimus toteutettiin toimeksiantajayritykselle Kimara.ai ja sen tarkoituksena oli tarjota luotettavaa ja ajankohtaista tietoa palvelun kehittämisen ja käyttöliittymäsuunnittelun tueksi, erityisesti uusien käyttäjien kohtaamat haasteet huomioiden.
Opinnäytetyön tietoperusta jaettiin kahteen päälukuun. Ensimmäisessä luvussa tarkasteltiin generatiivisen tekoälyn perusteita sekä perehdyttiin ComfyUI-järjestelmän toimintaperiaatteisiin ja generatiivisen tekoälyn tarjoamiin liiketoiminnallisiin mahdollisuuksiin. Tutkimusaiheen kokonaiskuvan muodostamiseksi toisessa luvussa käsiteltiin verkkosivujen käytettävyyden sekä liiketoimintavaatimusten periaatteita.
Tutkimusmenetelmänä käytettiin narratiivista kirjallisuuskatsausta, joka toteutettiin alkuvuodesta 2025. Tutkimuksen aineisto kerättiin systemaattisesti digitaalisista tietokannoista, kuten Google Scholar, Haaga-Helian Finna sekä alakohtaisista tietokannoista. Hakuprosessissa keskityttiin generatiivisen tekoälyn käyttöhaasteisiin sekä käytettävyyteen ja käyttöliittymäsuunnitteluun. Aineiston analysoinnissa hyödynnettiin temaattista luokittelua ja käsitekarttamenetelmää, mikä mahdollisti aiheiden jäsentämisen ja kokonaisuuksien hahmottamisen. Tutkimus rajattiin käsittelemään palvelun loppukäyttäjän käyttökokemusta, jättäen saavutettavuuden, visuaalisen estetiikan ja tekniset toteutukset tarkastelun ulkopuolelle.
Kirjallisuuskatsauksen tuloksena tunnistettiin viisi keskeistä käyttöhaastetta: käyttöliittymän monimutkaisuus, tekstisyötteiden muodostamisen haasteet, generatiivisen tekoälyn ennustamattomuus, käyttäjän mentaalimallin kehittämisen haasteet sekä teknisen osaamisen vaatimukset. Ratkaisumalleiksi näihin haasteisiin havaittiin käyttäjäkeskeisen suunnittelun periaatteiden soveltaminen, vuorovaikutteisten oppimismekanismien kehittäminen sekä käyttäjän hallintamahdollisuuksien lisääminen generointiprosessissa.
Tutkimuksen tuloksia voidaan hyödyntää Kimara.ai:n kehityksen eri vaiheissa, kuten vaatimusmäärittelyssä, käyttöliittymäsuunnittelussa ja testauskriteerien luomisessa. Konkreettisina kehitysehdotuksina esitettiin johdonmukaisen terminologian ja käyttöliittymäelementtien kehittäminen, tekstisyötteiden muodostamisen tukimekanismit, tekoälyn generointivaiheiden visualisointi sekä käyttöliittymän personointi eri käyttäjäryhmille
Jigsaw: Supporting Designers to Prototype Multimodal Applications by Chaining AI Foundation Models
Recent advancements in AI foundation models have made it possible for them to
be utilized off-the-shelf for creative tasks, including ideating design
concepts or generating visual prototypes. However, integrating these models
into the creative process can be challenging as they often exist as standalone
applications tailored to specific tasks. To address this challenge, we
introduce Jigsaw, a prototype system that employs puzzle pieces as metaphors to
represent foundation models. Jigsaw allows designers to combine different
foundation model capabilities across various modalities by assembling
compatible puzzle pieces. To inform the design of Jigsaw, we interviewed ten
designers and distilled design goals. In a user study, we showed that Jigsaw
enhanced designers' understanding of available foundation model capabilities,
provided guidance on combining capabilities across different modalities and
tasks, and served as a canvas to support design exploration, prototyping, and
documentation.Comment: https://jigsaw.t
- …
