758 research outputs found
Artificial Intelligence in the Creative Industries: A Review
This paper reviews the current state of the art in Artificial Intelligence
(AI) technologies and applications in the context of the creative industries. A
brief background of AI, and specifically Machine Learning (ML) algorithms, is
provided including Convolutional Neural Network (CNNs), Generative Adversarial
Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement
Learning (DRL). We categorise creative applications into five groups related to
how AI technologies are used: i) content creation, ii) information analysis,
iii) content enhancement and post production workflows, iv) information
extraction and enhancement, and v) data compression. We critically examine the
successes and limitations of this rapidly advancing technology in each of these
areas. We further differentiate between the use of AI as a creative tool and
its potential as a creator in its own right. We foresee that, in the near
future, machine learning-based AI will be adopted widely as a tool or
collaborative assistant for creativity. In contrast, we observe that the
successes of machine learning in domains with fewer constraints, where AI is
the `creator', remain modest. The potential of AI (or its developers) to win
awards for its original creations in competition with human creatives is also
limited, based on contemporary technologies. We therefore conclude that, in the
context of creative industries, maximum benefit from AI will be derived where
its focus is human centric -- where it is designed to augment, rather than
replace, human creativity
A Survey on ChatGPT: AI-Generated Contents, Challenges, and Solutions
With the widespread use of large artificial intelligence (AI) models such as
ChatGPT, AI-generated content (AIGC) has garnered increasing attention and is
leading a paradigm shift in content creation and knowledge representation. AIGC
uses generative large AI algorithms to assist or replace humans in creating
massive, high-quality, and human-like content at a faster pace and lower cost,
based on user-provided prompts. Despite the recent significant progress in
AIGC, security, privacy, ethical, and legal challenges still need to be
addressed. This paper presents an in-depth survey of working principles,
security and privacy threats, state-of-the-art solutions, and future challenges
of the AIGC paradigm. Specifically, we first explore the enabling technologies,
general architecture of AIGC, and discuss its working modes and key
characteristics. Then, we investigate the taxonomy of security and privacy
threats to AIGC and highlight the ethical and societal implications of GPT and
AIGC technologies. Furthermore, we review the state-of-the-art AIGC
watermarking approaches for regulatable AIGC paradigms regarding the AIGC model
and its produced content. Finally, we identify future challenges and open
research directions related to AIGC.Comment: 20 pages, 6 figures, 4 table
Generative models for music using transformer architectures
openThis thesis focus on growth and impact of Transformes architectures which are mainly used for Natural Language Processing tasks for Audio generation. We think that music, with its notes, chords, and volumes, is a language. You could think of symbolic representation of music as human language.
A brief sound synthesis history which gives basic foundation for modern AI-generated music models is mentioned . The most recent in AI-generated audio is carefully studied and instances of AI-generated music is told in many contexts. Deep learning models and their applications to real-world issues are one of the key subjects that are covered.
The main areas of interest include transformer-based audio generation, including the training procedure, encoding and decoding techniques, and post-processing stages. Transformers have several key advantages, including long-term consistency and the ability to create minute-long audio compositions.
Numerous studies on the various representations of music have been explained, including how neural network and deep learning techniques can be used to apply symbolic melodies, musical arrangements, style transfer, and sound production.
This thesis largely focuses on transformation models, but it also recognises the importance of numerous AI-based generative models, including GAN.
Overall, this thesis enhances generative models for music composition and provides a complete understanding of transformer design. It shows the possibilities of AI-generated sound synthesis by emphasising the most current developments.This thesis focus on growth and impact of Transformes architectures which are mainly used for Natural Language Processing tasks for Audio generation. We think that music, with its notes, chords, and volumes, is a language. You could think of symbolic representation of music as human language.
A brief sound synthesis history which gives basic foundation for modern AI-generated music models is mentioned . The most recent in AI-generated audio is carefully studied and instances of AI-generated music is told in many contexts. Deep learning models and their applications to real-world issues are one of the key subjects that are covered.
The main areas of interest include transformer-based audio generation, including the training procedure, encoding and decoding techniques, and post-processing stages. Transformers have several key advantages, including long-term consistency and the ability to create minute-long audio compositions.
Numerous studies on the various representations of music have been explained, including how neural network and deep learning techniques can be used to apply symbolic melodies, musical arrangements, style transfer, and sound production.
This thesis largely focuses on transformation models, but it also recognises the importance of numerous AI-based generative models, including GAN.
Overall, this thesis enhances generative models for music composition and provides a complete understanding of transformer design. It shows the possibilities of AI-generated sound synthesis by emphasising the most current developments
Long future frame prediction using optical flow informed deep neural networks for enhancement of robotic teleoperation in high latency environments
High latency in teleoperation has a significant negative impact on operator performance. While deep learning has revolutionized many domains recently, it has not previously been applied to teleoperation enhancement. We propose a novel approach to predict video frames deep into the future using neural networks informed by synthetically generated optical flow information. This can be employed in teleoperated robotic systems that rely on video feeds for operator situational awareness. We have used the image-to-image translation technique as a basis for the prediction of future frames. The Pix2Pix conditional generative adversarial network (cGAN) has been selected as a base network. Optical flow components reflecting real-time control inputs are added to the standard RGB channels of the input image. We have experimented with three data sets of 20,000 input images each that were generated using our custom-designed teleoperation simulator with a 500-ms delay added between the input and target frames. Structural Similarity Index Measures (SSIMs) of 0.60 and Multi-SSIMs of 0.68 were achieved when training the cGAN with three-channel RGB image data. With the five-channel input data (incorporating optical flow) these values improved to 0.67 and 0.74, respectively. Applying Fleiss\u27 Îş gave a score of 0.40 for three-channel RGB data, and 0.55 for five-channel optical flow-added data. We are confident the predicted synthetic frames are of sufficient quality and reliability to be presented to teleoperators as a video feed that will enhance teleoperation. To the best of our knowledge, we are the first to attempt to reduce the impacts of latency through future frame prediction using deep neural networks
How to Make an Image More Memorable? A Deep Style Transfer Approach
Recent works have shown that it is possible to automatically predict
intrinsic image properties like memorability. In this paper, we take a step
forward addressing the question: "Can we make an image more memorable?".
Methods for automatically increasing image memorability would have an impact in
many application fields like education, gaming or advertising. Our work is
inspired by the popular editing-by-applying-filters paradigm adopted in photo
editing applications, like Instagram and Prisma. In this context, the problem
of increasing image memorability maps to that of retrieving "memorabilizing"
filters or style "seeds". Still, users generally have to go through most of the
available filters before finding the desired solution, thus turning the editing
process into a resource and time consuming task. In this work, we show that it
is possible to automatically retrieve the best style seeds for a given image,
thus remarkably reducing the number of human attempts needed to find a good
match. Our approach leverages from recent advances in the field of image
synthesis and adopts a deep architecture for generating a memorable picture
from a given input image and a style seed. Importantly, to automatically select
the best style a novel learning-based solution, also relying on deep models, is
proposed. Our experimental evaluation, conducted on publicly available
benchmarks, demonstrates the effectiveness of the proposed approach for
generating memorable images through automatic style seed selectionComment: Accepted at ACM ICMR 201
Facial re-enactment, speech synthesis and the rise of the Deepfake
Emergent technologies in the fields of audio speech synthesis and video facial manipulation have the potential to drastically impact our societal patterns of multimedia consumption. At a time when social media and internet culture is plagued by misinformation, propaganda and “fake news”, their latent misuse represents a possible looming threat to fragile systems of information sharing and social democratic discourse. It has thus become increasingly recognised in both academic and mainstream journalism that the ramifications of these tools must be examined to determine what they are and how their widespread availability can be managed.
This research project seeks to examine four emerging software programs – Face2Face, FakeApp , Adobe VoCo and Lyrebird – that are designed to facilitate the synthesis of speech and manipulate facial features in videos. I will explore their positive industry applications and the potentially negative consequences of their release into the public domain. Consideration will be directed to how such consequences and risks can be ameliorated through detection, regulation and education. A final analysis of these three competing threads will then attempt to address whether the practical and commercial applications of these technologies are outweighed by the inherent unethical or illegal uses they engender, and if so; what we can do in response
- …