26 research outputs found
Exposing the Functionalities of Neurons for Gated Recurrent Unit Based Sequence-to-Sequence Model
The goal of this paper is to report certain scientific discoveries about a
Seq2Seq model. It is known that analyzing the behavior of RNN-based models at
the neuron level is considered a more challenging task than analyzing a DNN or
CNN models due to their recursive mechanism in nature. This paper aims to
provide neuron-level analysis to explain why a vanilla GRU-based Seq2Seq model
without attention can achieve token-positioning. We found four different types
of neurons: storing, counting, triggering, and outputting and further uncover
the mechanism for these neurons to work together in order to produce the right
token in the right position.Comment: 9 pages (excluding reference), 10 figure
SongRewriter: A Chinese Song Rewriting System with Controllable Content and Rhyme Scheme
Although lyrics generation has achieved significant progress in recent years,
it has limited practical applications because the generated lyrics cannot be
performed without composing compatible melodies. In this work, we bridge this
practical gap by proposing a song rewriting system which rewrites the lyrics of
an existing song such that the generated lyrics are compatible with the rhythm
of the existing melody and thus singable. In particular, we propose
SongRewriter, a controllable Chinese lyric generation and editing system which
assists users without prior knowledge of melody composition. The system is
trained by a randomized multi-level masking strategy which produces a unified
model for generating entirely new lyrics or editing a few fragments. To improve
the controllabiliy of the generation process, we further incorporate a keyword
prompt to control the lexical choices of the content and propose novel decoding
constraints and a vowel modeling task to enable flexible end and internal rhyme
schemes. While prior rhyming metrics are mainly for rap lyrics, we propose
three novel rhyming evaluation metrics for song lyrics. Both automatic and
human evaluations show that the proposed model performs better than the
state-of-the-art models in both contents and rhyming quality. Our code and
models implemented in MindSpore Lite tool will be available
Sparks of Large Audio Models: A Survey and Outlook
This survey paper provides a comprehensive overview of the recent
advancements and challenges in applying large language models to the field of
audio signal processing. Audio processing, with its diverse signal
representations and a wide range of sources--from human voices to musical
instruments and environmental sounds--poses challenges distinct from those
found in traditional Natural Language Processing scenarios. Nevertheless,
\textit{Large Audio Models}, epitomized by transformer-based architectures,
have shown marked efficacy in this sphere. By leveraging massive amount of
data, these models have demonstrated prowess in a variety of audio tasks,
spanning from Automatic Speech Recognition and Text-To-Speech to Music
Generation, among others. Notably, recently these Foundational Audio Models,
like SeamlessM4T, have started showing abilities to act as universal
translators, supporting multiple speech tasks for up to 100 languages without
any reliance on separate task-specific systems. This paper presents an in-depth
analysis of state-of-the-art methodologies regarding \textit{Foundational Large
Audio Models}, their performance benchmarks, and their applicability to
real-world scenarios. We also highlight current limitations and provide
insights into potential future research directions in the realm of
\textit{Large Audio Models} with the intent to spark further discussion,
thereby fostering innovation in the next generation of audio-processing
systems. Furthermore, to cope with the rapid development in this area, we will
consistently update the relevant repository with relevant recent articles and
their open-source implementations at
https://github.com/EmulationAI/awesome-large-audio-models.Comment: work in progress, Repo URL:
https://github.com/EmulationAI/awesome-large-audio-model
A systematic review of artificial intelligence-based music generation: Scope, applications, and future trends
Currently available reviews in the area of artificial intelligence-based music generation do not provide a wide range of publications and are usually centered around comparing very specific topics between a very limited range of solutions. Best surveys available in the field are bibliography sections of some papers and books which lack a systematic approach and limit their scope to only handpicked examples In this work, we analyze the scope and trends of the research on artificial intelligence-based music generation by performing a systematic review of the available publications in the field using the Prisma methodology. Furthermore, we discuss the possible implementations and accessibility of a set of currently available AI solutions, as aids to musical composition. Our research shows how publications are being distributed globally according to many characteristics, which provides a clear picture of the situation of this technology.
Through our research it becomes clear that the interest of both musicians and computer scientists in AI-based automatic music generation has increased significantly in the last few years with an increasing participation of mayor companies in the field whose works we analyze. We discuss several generation architectures, both from a technical and a musical point of view and we highlight various areas were further research is needed