Search CORE

26 research outputs found

Exposing the Functionalities of Neurons for Gated Recurrent Unit Based Sequence-to-Sequence Model

Author: Lee Yi-Ting
Lin Shou-De
Wu Da-Yi
Yang Chih-Chun
Publication venue
Publication date: 27/03/2023
Field of study

The goal of this paper is to report certain scientific discoveries about a Seq2Seq model. It is known that analyzing the behavior of RNN-based models at the neuron level is considered a more challenging task than analyzing a DNN or CNN models due to their recursive mechanism in nature. This paper aims to provide neuron-level analysis to explain why a vanilla GRU-based Seq2Seq model without attention can achieve token-positioning. We found four different types of neurons: storing, counting, triggering, and outputting and further uncover the mechanism for these neurons to work together in order to produce the right token in the right position.Comment: 9 pages (excluding reference), 10 figure

arXiv.org e-Print Archive

SongRewriter: A Chinese Song Rewriting System with Controllable Content and Rhyme Scheme

Author: Li Liangyou
Liu Qun
Sun Yusen
Yeung Dit-Yan
Publication venue
Publication date: 27/11/2022
Field of study

Although lyrics generation has achieved significant progress in recent years, it has limited practical applications because the generated lyrics cannot be performed without composing compatible melodies. In this work, we bridge this practical gap by proposing a song rewriting system which rewrites the lyrics of an existing song such that the generated lyrics are compatible with the rhythm of the existing melody and thus singable. In particular, we propose SongRewriter, a controllable Chinese lyric generation and editing system which assists users without prior knowledge of melody composition. The system is trained by a randomized multi-level masking strategy which produces a unified model for generating entirely new lyrics or editing a few fragments. To improve the controllabiliy of the generation process, we further incorporate a keyword prompt to control the lexical choices of the content and propose novel decoding constraints and a vowel modeling task to enable flexible end and internal rhyme schemes. While prior rhyming metrics are mainly for rap lyrics, we propose three novel rhyming evaluation metrics for song lyrics. Both automatic and human evaluations show that the proposed model performs better than the state-of-the-art models in both contents and rhyming quality. Our code and models implemented in MindSpore Lite tool will be available

arXiv.org e-Print Archive

Methoden voor efficiënte supervisie van automatische taalverwerking

Author: Sterckx Lucas
Publication venue
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

Sparks of Large Audio Models: A Survey and Outlook

Author: Cuayáhuitl Heriberto
Latif Siddique
Ren Yi
Schuller Björn W.
Shamshad Fahad
Shoukat Moazzam
Togneri Roberto
Usama Muhammad
Wang Wenwu
Zhang Xulong
Publication venue
Publication date: 03/09/2023
Field of study

This survey paper provides a comprehensive overview of the recent advancements and challenges in applying large language models to the field of audio signal processing. Audio processing, with its diverse signal representations and a wide range of sources--from human voices to musical instruments and environmental sounds--poses challenges distinct from those found in traditional Natural Language Processing scenarios. Nevertheless, \textit{Large Audio Models}, epitomized by transformer-based architectures, have shown marked efficacy in this sphere. By leveraging massive amount of data, these models have demonstrated prowess in a variety of audio tasks, spanning from Automatic Speech Recognition and Text-To-Speech to Music Generation, among others. Notably, recently these Foundational Audio Models, like SeamlessM4T, have started showing abilities to act as universal translators, supporting multiple speech tasks for up to 100 languages without any reliance on separate task-specific systems. This paper presents an in-depth analysis of state-of-the-art methodologies regarding \textit{Foundational Large Audio Models}, their performance benchmarks, and their applicability to real-world scenarios. We also highlight current limitations and provide insights into potential future research directions in the realm of \textit{Large Audio Models} with the intent to spark further discussion, thereby fostering innovation in the next generation of audio-processing systems. Furthermore, to cope with the rapid development in this area, we will consistently update the relevant repository with relevant recent articles and their open-source implementations at https://github.com/EmulationAI/awesome-large-audio-models.Comment: work in progress, Repo URL: https://github.com/EmulationAI/awesome-large-audio-model

arXiv.org e-Print Archive

A systematic review of artificial intelligence-based music generation: Scope, applications, and future trends

Author: Civit Masot Javier
Civit Miguel
Cuadrado Francisco
Escalona Cuaresma María José
Publication venue: 'Elsevier BV'
Publication date: 01/12/2022
Field of study

Currently available reviews in the area of artificial intelligence-based music generation do not provide a wide range of publications and are usually centered around comparing very specific topics between a very limited range of solutions. Best surveys available in the field are bibliography sections of some papers and books which lack a systematic approach and limit their scope to only handpicked examples In this work, we analyze the scope and trends of the research on artificial intelligence-based music generation by performing a systematic review of the available publications in the field using the Prisma methodology. Furthermore, we discuss the possible implementations and accessibility of a set of currently available AI solutions, as aids to musical composition. Our research shows how publications are being distributed globally according to many characteristics, which provides a clear picture of the situation of this technology. Through our research it becomes clear that the interest of both musicians and computer scientists in AI-based automatic music generation has increased significantly in the last few years with an increasing participation of mayor companies in the field whose works we analyze. We discuss several generation architectures, both from a technical and a musical point of view and we highlight various areas were further research is needed

idUS. Depósito de Investigación Universidad de Sevilla