26 research outputs found

    Exposing the Functionalities of Neurons for Gated Recurrent Unit Based Sequence-to-Sequence Model

    Full text link
    The goal of this paper is to report certain scientific discoveries about a Seq2Seq model. It is known that analyzing the behavior of RNN-based models at the neuron level is considered a more challenging task than analyzing a DNN or CNN models due to their recursive mechanism in nature. This paper aims to provide neuron-level analysis to explain why a vanilla GRU-based Seq2Seq model without attention can achieve token-positioning. We found four different types of neurons: storing, counting, triggering, and outputting and further uncover the mechanism for these neurons to work together in order to produce the right token in the right position.Comment: 9 pages (excluding reference), 10 figure

    SongRewriter: A Chinese Song Rewriting System with Controllable Content and Rhyme Scheme

    Full text link
    Although lyrics generation has achieved significant progress in recent years, it has limited practical applications because the generated lyrics cannot be performed without composing compatible melodies. In this work, we bridge this practical gap by proposing a song rewriting system which rewrites the lyrics of an existing song such that the generated lyrics are compatible with the rhythm of the existing melody and thus singable. In particular, we propose SongRewriter, a controllable Chinese lyric generation and editing system which assists users without prior knowledge of melody composition. The system is trained by a randomized multi-level masking strategy which produces a unified model for generating entirely new lyrics or editing a few fragments. To improve the controllabiliy of the generation process, we further incorporate a keyword prompt to control the lexical choices of the content and propose novel decoding constraints and a vowel modeling task to enable flexible end and internal rhyme schemes. While prior rhyming metrics are mainly for rap lyrics, we propose three novel rhyming evaluation metrics for song lyrics. Both automatic and human evaluations show that the proposed model performs better than the state-of-the-art models in both contents and rhyming quality. Our code and models implemented in MindSpore Lite tool will be available

    Sparks of Large Audio Models: A Survey and Outlook

    Full text link
    This survey paper provides a comprehensive overview of the recent advancements and challenges in applying large language models to the field of audio signal processing. Audio processing, with its diverse signal representations and a wide range of sources--from human voices to musical instruments and environmental sounds--poses challenges distinct from those found in traditional Natural Language Processing scenarios. Nevertheless, \textit{Large Audio Models}, epitomized by transformer-based architectures, have shown marked efficacy in this sphere. By leveraging massive amount of data, these models have demonstrated prowess in a variety of audio tasks, spanning from Automatic Speech Recognition and Text-To-Speech to Music Generation, among others. Notably, recently these Foundational Audio Models, like SeamlessM4T, have started showing abilities to act as universal translators, supporting multiple speech tasks for up to 100 languages without any reliance on separate task-specific systems. This paper presents an in-depth analysis of state-of-the-art methodologies regarding \textit{Foundational Large Audio Models}, their performance benchmarks, and their applicability to real-world scenarios. We also highlight current limitations and provide insights into potential future research directions in the realm of \textit{Large Audio Models} with the intent to spark further discussion, thereby fostering innovation in the next generation of audio-processing systems. Furthermore, to cope with the rapid development in this area, we will consistently update the relevant repository with relevant recent articles and their open-source implementations at https://github.com/EmulationAI/awesome-large-audio-models.Comment: work in progress, Repo URL: https://github.com/EmulationAI/awesome-large-audio-model

    A systematic review of artificial intelligence-based music generation: Scope, applications, and future trends

    Get PDF
    Currently available reviews in the area of artificial intelligence-based music generation do not provide a wide range of publications and are usually centered around comparing very specific topics between a very limited range of solutions. Best surveys available in the field are bibliography sections of some papers and books which lack a systematic approach and limit their scope to only handpicked examples In this work, we analyze the scope and trends of the research on artificial intelligence-based music generation by performing a systematic review of the available publications in the field using the Prisma methodology. Furthermore, we discuss the possible implementations and accessibility of a set of currently available AI solutions, as aids to musical composition. Our research shows how publications are being distributed globally according to many characteristics, which provides a clear picture of the situation of this technology. Through our research it becomes clear that the interest of both musicians and computer scientists in AI-based automatic music generation has increased significantly in the last few years with an increasing participation of mayor companies in the field whose works we analyze. We discuss several generation architectures, both from a technical and a musical point of view and we highlight various areas were further research is needed
    corecore