112 research outputs found
StrokeGAN: Reducing Mode Collapse in Chinese Font Generation via Stroke Encoding
The generation of stylish Chinese fonts is an important problem involved in
many applications. Most of existing generation methods are based on the deep
generative models, particularly, the generative adversarial networks (GAN)
based models. However, these deep generative models may suffer from the mode
collapse issue, which significantly degrades the diversity and quality of
generated results. In this paper, we introduce a one-bit stroke encoding to
capture the key mode information of Chinese characters and then incorporate it
into CycleGAN, a popular deep generative model for Chinese font generation. As
a result we propose an efficient method called StrokeGAN, mainly motivated by
the observation that the stroke encoding contains amount of mode information of
Chinese characters. In order to reconstruct the one-bit stroke encoding of the
associated generated characters, we introduce a stroke-encoding reconstruction
loss imposed on the discriminator. Equipped with such one-bit stroke encoding
and stroke-encoding reconstruction loss, the mode collapse issue of CycleGAN
can be significantly alleviated, with an improved preservation of strokes and
diversity of generated characters. The effectiveness of StrokeGAN is
demonstrated by a series of generation tasks over nine datasets with different
fonts. The numerical results demonstrate that StrokeGAN generally outperforms
the state-of-the-art methods in terms of content and recognition accuracies, as
well as certain stroke error, and also generates more realistic characters.Comment: 10 pages, our codes and data are available at:
https://github.com/JinshanZeng/StrokeGA
A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception
In recent years there has been substantial growth in the capabilities of systems designed to generate text that mimics the fluency and coherence of human language. From this, there has been considerable research aimed at examining the potential uses of these natural language generators (NLG) towards a wide number of tasks. The increasing capabilities of powerful text generators to mimic human writing convincingly raises the potential for deception and other forms of dangerous misuse. As these systems improve, and it becomes ever harder to distinguish between human-written and machine-generated text, malicious actors could leverage these powerful NLG systems to a wide variety of ends, including the creation of fake news and misinformation, the generation of fake online product reviews, or via chatbots as means of convincing users to divulge private information. In this paper, we provide an overview of the NLG field via the identification and examination of 119 survey-like papers focused on NLG research. From these identified papers, we outline a proposed high-level taxonomy of the central concepts that constitute NLG, including the methods used to develop generalised NLG systems, the means by which these systems are evaluated, and the popular NLG tasks and subtasks that exist. In turn, we provide an overview and discussion of each of these items with respect to current research and offer an examination of the potential roles of NLG in deception and detection systems to counteract these threats. Moreover, we discuss the broader challenges of NLG, including the risks of bias that are often exhibited by existing text generation systems. This work offers a broad overview of the field of NLG with respect to its potential for misuse, aiming to provide a high-level understanding of this rapidly developing area of research
Automatic Image Captioning with Style
This thesis connects two core topics in machine learning, vision
and language. The problem of choice is image caption generation:
automatically constructing natural language descriptions of image
content. Previous research into image caption generation has
focused on generating purely descriptive captions; I focus on
generating visually relevant captions with a distinct linguistic
style. Captions with style have the potential to ease
communication and add a new layer of personalisation.
First, I consider naming variations in image captions, and
propose a method for predicting context-dependent names that
takes into account visual and linguistic information. This method
makes use of a large-scale image caption dataset, which I also
use to explore naming conventions and report naming conventions
for hundreds of animal classes. Next I propose the SentiCap
model, which relies on recent advances in artificial neural
networks to generate visually relevant image captions with
positive or negative sentiment. To balance descriptiveness and
sentiment, the SentiCap model dynamically switches between two
recurrent neural networks, one tuned for descriptive words and
one for sentiment words. As the first published model for
generating captions with sentiment, SentiCap has influenced a
number of subsequent works. I then investigate the sub-task of
modelling styled sentences without images. The specific task
chosen is sentence simplification: rewriting news article
sentences to make them easier to understand.
For this task I design a neural sequence-to-sequence model that
can work with
limited training data, using novel adaptations for word copying
and sharing
word embeddings. Finally, I present SemStyle, a system for
generating visually
relevant image captions in the style of an arbitrary text corpus.
A shared term
space allows a neural network for vision and content planning to
communicate
with a network for styled language generation. SemStyle achieves
competitive
results in human and automatic evaluations of descriptiveness and
style.
As a whole, this thesis presents two complete systems for styled
caption generation that are first of their kind and demonstrate,
for the first time, that automatic style transfer for image
captions is achievable. Contributions also include novel ideas
for object naming and sentence simplification. This thesis opens
up inquiries into highly personalised image captions; large scale
visually grounded concept naming; and more generally, styled text
generation with content control
Deep Visual Instruments: Realtime Continuous, Meaningful Human Control over Deep Neural Networks for Creative Expression
In this thesis, we investigate Deep Learning models as an artistic medium for new modes of performative, creative expression. We call these Deep Visual Instruments: realtime interactive generative systems that exploit and leverage the capabilities of state-of-the-art Deep Neural Networks (DNN), while allowing Meaningful Human Control, in a Realtime Continuous manner. We characterise Meaningful Human Control in terms of intent, predictability, and accountability; and Realtime Continuous Control with regards to its capacity for performative interaction with immediate feedback, enhancing goal-less exploration. The capabilities of DNNs that we are looking to exploit and leverage in this manner, are their ability to learn hierarchical representations modelling highly complex, real-world data such as images. Thinking of DNNs as tools that extract useful information from massive amounts of Big Data, we investigate ways in which we can navigate and explore what useful information a DNN has learnt, and how we can meaningfully use such a model in the production of artistic and creative works, in a performative, expressive manner. We present five studies that approach this from different but complementary angles. These include: a collaborative, generative sketching application using MCTS and discriminative CNNs; a system to gesturally conduct the realtime generation of text in different styles using an ensemble of LSTM RNNs; a performative tool that allows for the manipulation of hyperparameters in realtime while a Convolutional VAE trains on a live camera feed; a live video feed processing software that allows for digital puppetry and augmented drawing; and a method that allows for long-form story telling within a generative model's latent space with meaningful control over the narrative. We frame our research with the realtime, performative expression provided by musical instruments as a metaphor, in which we think of these systems as not used by a user, but played by a performer
AutoGraff: towards a computational understanding of graffiti writing and related art forms.
The aim of this thesis is to develop a system that generates letters and pictures with a style that is immediately recognizable as graffiti art or calligraphy. The proposed system can be used similarly to, and in tight integration with, conventional computer-aided geometric design tools and can be used to generate synthetic graffiti content for urban environments in games and in movies, and to guide robotic or fabrication systems that can materialise the output of the system with physical drawing media. The thesis is divided into two main parts. The first part describes a set of stroke primitives, building blocks that can be combined to generate different designs that resemble graffiti or calligraphy. These primitives mimic the process typically used to design graffiti letters and exploit well known principles of motor control to model the way in which an artist moves when incrementally tracing stylised letter forms. The second part demonstrates how these stroke primitives can be automatically recovered from input geometry defined in vector form, such as the digitised traces of writing made by a user, or the glyph outlines in a font. This procedure converts the input geometry into a seed that can be transformed into a variety of calligraphic and graffiti stylisations, which depend on parametric variations of the strokes
Facial Micro- and Macro-Expression Spotting and Generation Methods
Facial micro-expression (ME) recognition requires face movement interval as input, but computer methods in spotting ME are still underperformed. This is due to lacking large-scale long video dataset and ME generation methods are in their infancy. This thesis presents methods to address data deficiency issues and introduces a new method for spotting macro- and micro-expressions simultaneously.
This thesis introduces SAMM Long Videos (SAMM-LV), which contains 147 annotated long videos, and develops a baseline method to facilitate ME Grand Challenge 2020. Further, a reference-guided style transfer of StarGANv2 is experimented on SAMM-LV to generate a synthetic dataset, namely SAMM-SYNTH. The quality of SAMM-SYNTH is evaluated by using facial action units detected by OpenFace. Quantitative measurement shows high correlations on two Action Units (AU12 and AU6) of the original and synthetic data.
In facial expression spotting, a two-stream 3D-Convolutional Neural Network with temporal oriented frame skips that can spot micro- and macro-expression simultaneously is proposed. This method achieves state-of-the-art performance in SAMM-LV and is competitive in CAS(ME)2, it was used as the baseline result of ME Grand Challenge 2021. The F1-score improves to 0.1036 when trained with composite data consisting of SAMM-LV and SAMMSYNTH. On the unseen ME Grand Challenge 2022 evaluation dataset, it achieves F1-score of 0.1531.
Finally, a new sequence generation method to explore the capability of deep learning network is proposed. It generates spontaneous facial expressions by using only two input sequences without any labels. SSIM and NIQE were used for image quality analysis and the generated data achieved 0.87 and 23.14. By visualising the movements using optical flow value and absolute frame differences, this method demonstrates its potential in generating subtle ME. For realism evaluation, the generated videos were rated by using two facial expression recognition networks
Artificial Intelligence - Intelligent Art? Human-Machine Interaction and Creative Practice
As algorithmic data processing increasingly pervades everyday life, it is also making its way into the worlds of art, literature and music. In doing so, it shifts notions of creativity and evokes non-anthropocentric perspectives on artistic practice. This volume brings together contributions from the fields of cultural studies, literary studies, musicology and sound studies as well as media studies, sociology of technology, and beyond, presenting a truly interdisciplinary, state-of-the-art picture of the transformation of creative practice brought about by various forms of AI
- …