66 research outputs found

    Acoustic Feature Identification to Recognize Rag Present in Borgit

    Get PDF
    In the world of Indian classical music, raga recognition is a crucial undertaking. Due to its particular sound qualities, the traditional wind instrument known as the borgit presents special difficulties for automatic raga recognition. In this research, we investigate the use of auditory feature identification methods to create a reliable raga recognition system for Borgit performances. Each of the Borgits, the devotional song of Assam is enriched with rag and each rag has unique melodious tune. This paper has carried out few experiments on the audio samples of rags and a few Borgits sung with those rugs. In this manuscript three mostly used rags and a few Borgits  with these rags are considered for the experiment. Acoustic features considred here are FFT (Fast Fourier Transform), ZCR (Zero Crossing Rates), Mean and Standard deviation of pitch contour and RMS(Root Mean Square). After evaluation and analysis it is seen that FFT  and ZCR are two noteworthy acoustic features that helps to identify the rag present in Borgits. At last K-means clustering was applied on the FFT and ZCR values of the Borgits and were able to find correct grouping according to rags present there. This research validates FFT and ZCR as most precise acoustic parameters for rag identification in Borgit. Here researchers had observed roles of Standard deviation of pitch contour and RMS values of the audio samples in rag identification. &nbsp

    Thaat Classification Using Recurrent Neural Networks with Long Short-Term Memory and Support Vector Machine

    Get PDF
    This research paper introduces a groundbreaking method for music classification, emphasizing thaats rather than the conventional raga-centric approach. A comprehensive range of audio features, including amplitude envelope, RMSE, STFT, spectral centroid, MFCC, spectral bandwidth, and zero-crossing rate, is meticulously used to capture thaats' distinct characteristics in Indian classical music. Importantly, the study predicts emotional responses linked with the identified thaats. The dataset encompasses a diverse collection of musical compositions, each representing unique thaats. Three classifier models - RNN-LSTM, SVM, and HMM - undergo thorough training and testing to evaluate their classification performance. Initial findings showcase promising accuracies, with the RNN-LSTM model achieving 85% and SVM performing at 78%. These results highlight the effectiveness of this innovative approach in accurately categorizing music based on thaats and predicting associated emotional responses, providing a fresh perspective on music analysis in Indian classical music

    Drum Synthesis and Rhythmic Transformation with Adversarial Autoencoders

    Get PDF
    Creative rhythmic transformations of musical audio refer to automated methods for manipulation of temporally-relevant sounds in time. This paper presents a method for joint synthesis and rhythm transformation of drum sounds through the use of adversarial autoencoders (AAE). Users may navigate both the timbre and rhythm of drum patterns in audio recordings through expressive control over a low-dimensional latent space. The model is based on an AAE with Gaussian mixture latent distributions that introduce rhythmic pattern conditioning to represent a wide variety of drum performances. The AAE is trained on a dataset of bar-length segments of percussion recordings, along with their clustered rhythmic pattern labels. The decoder is conditioned during adversarial training for mixing of data-driven rhythmic and timbral properties. The system is trained with over 500000 bars from 5418 tracks in popular datasets covering various musical genres. In an evaluation using real percussion recordings, the reconstruction accuracy and latent space interpolation between drum performances are investigated for audio generation conditioned by target rhythmic patterns

    Automated Rhythmic Transformation of Drum Recordings

    Get PDF
    Within the creative industries, music information retrieval techniques are now being applied in a variety of music creation and production applications. Audio artists incorporate techniques from music informatics and machine learning (e.g., beat and metre detection) for generative content creation and manipulation systems within the music production setting. Here musicians, desiring a certain sound or aesthetic influenced by the style of artists they admire, may change or replace the rhythmic pattern and sound characteristics (i.e., timbre) of drums in their recordings with those from an idealised recording (e.g., in processes of redrumming and mashup creation). Automated transformation systems for rhythm and timbre can be powerful tools for music producers, allowing them to quickly and easily adjust the different elements of a drum recording to fit the overall style of a song. The aim of this thesis is to develop systems for automated transformation of rhythmic patterns of drum recordings using a subset of techniques from deep learning called deep generative models (DGM) for neural audio synthesis. DGMs such as autoencoders and generative adversarial networks have been shown to be effective for transforming musical signals in a variety of genres as well as for learning the underlying structure of datasets for generation of new audio examples. To this end, modular deep learning-based systems are presented in this thesis with evaluations which measure the extent of the rhythmic modifications generated by different modes of transformation, which include audio style transfer, drum translation and latent space manipulation. The evaluation results underscore both the strengths and constraints of DGMs for transformation of rhythmic patterns as well as neural synthesis of drum sounds within a variety of musical genres. New audio style transfer (AST) functions were specifically designed for mashup-oriented drum recording transformation. The designed loss objectives lowered the computational demands of the AST algorithm and offered rhythmic transformation capabilities which adhere to a larger rhythmic structure of the input to generate music that is both creative and realistic. To extend the transformation possibilities of DGMs, systems based on adversarial autoencoders (AAE) were proposed for drum translation and continuous rhythmic transformation of bar-length patterns. The evaluations which investigated the lower dimensional representations of the latent space of the proposed system based on AAEs with a Gaussian mixture prior (AAE-GM) highlighted the importance of the structure of the disentangled latent distributions of AAE-GM. Furthermore, the proposed system demonstrated improved performance, as evidenced by higher reconstruction metrics, when compared to traditional autoencoder models. This implies that the system can more accurately recreate complex drum sounds, ensuring that the produced rhythmic transformation maintains richness of the source material. For music producers, this means heightened fidelity in drum synthesis and the potential for more expressive and varied drum tracks, enhancing the creativity in music production. This work also enhances neural drum synthesis by introducing a new, diverse dataset of kick, snare, and hi-hat drum samples, along with multiple drum loop datasets for model training and evaluation. Overall, the work in this thesis increased the profile of the field and hopefully will attract more attention and resources to the area, which will help drive future research and development of neural rhythmic transformation systems

    A Survey of AI Music Generation Tools and Models

    Full text link
    In this work, we provide a comprehensive survey of AI music generation tools, including both research projects and commercialized applications. To conduct our analysis, we classified music generation approaches into three categories: parameter-based, text-based, and visual-based classes. Our survey highlights the diverse possibilities and functional features of these tools, which cater to a wide range of users, from regular listeners to professional musicians. We observed that each tool has its own set of advantages and limitations. As a result, we have compiled a comprehensive list of these factors that should be considered during the tool selection process. Moreover, our survey offers critical insights into the underlying mechanisms and challenges of AI music generation

    Learning feature hierarchies for musical audio signals

    Get PDF

    Identification of expressive descriptors for style extraction in music analysis using linear and nonlinear models

    Get PDF
    La formalización de las interpretaciones expresivas aún se considera relevante debido a la complejidad de la música. La interpretación expresiva forma un aspecto importante de la música, teniendo en cuenta diferentes convenciones como géneros o estilos que una interpretación puede desarrollar con el tiempo. Modelar la relación entre las expresiones musicales y los aspectos estructurales de la información acústica requiere una base probabilística y estadística mínima para la robustez, validación y reproducibilidad de aplicaciones computacionales. Por lo tanto, es necesaria una relación cohesiva y una justificación sobre los resultados. Esta tesis se sustenta en la teoría y aplicaciones de modelos discriminativos y generativos en el marco del aprendizaje de maquina y la relación de procedimientos sistemáticos con los conceptos de la musicología utilizando técnicas de procesamiento de señales y minería de datos. Los resultados se validaron mediante pruebas estadísticas y una experimentación no paramétrica con la implementación de un conjunto de métricas para medir aspectos acústicos y temporales de archivos de audio para entrenar un modelo discriminativo y mejorar el proceso de síntesis de un modelo neuronal profundo. Adicionalmente, el modelo implementado presenta la oportunidad para la aplicación de procedimientos sistemáticos, automatización de transcripciones usando notación musical, entrenamiento de habilidades auditivas para estudiantes de música y mejorar la implementación de redes neuronales profundas usando CPU en lugar de GPU debido a las ventajas de las redes convolucionales para el procesamiento de archivos de audio como vectores o matriz con una secuencia de notas.MaestríaMagister en Ingeniería Electrónic

    Audio Content-Based Music Retrieval

    Get PDF
    The rapidly growing corpus of digital audio material requires novel retrieval strategies for exploring large music collections. Traditional retrieval strategies rely on metadata that describe the actual audio content in words. In the case that such textual descriptions are not available, one requires content-based retrieval strategies which only utilize the raw audio material. In this contribution, we discuss content-based retrieval strategies that follow the query-by-example paradigm: given an audio query, the task is to retrieve all documents that are somehow similar or related to the query from a music collection. Such strategies can be loosely classified according to their "specificity", which refers to the degree of similarity between the query and the database documents. Here, high specificity refers to a strict notion of similarity, whereas low specificity to a rather vague one. Furthermore, we introduce a second classification principle based on "granularity", where one distinguishes between fragment-level and document-level retrieval. Using a classification scheme based on specificity and granularity, we identify various classes of retrieval scenarios, which comprise "audio identification", "audio matching", and "version identification". For these three important classes, we give an overview of representative state-of-the-art approaches, which also illustrate the sometimes subtle but crucial differences between the retrieval scenarios. Finally, we give an outlook on a user-oriented retrieval system, which combines the various retrieval strategies in a unified framework

    CONEqNet: convolutional music equalizer network

    Get PDF
    The process of parametric equalization of musical pieces seeks to highlight their qualities by cutting and/or stimulating certain frequencies. In this work, we present a neural model capable of equalizing a song according to the musical genre that is being played at a given moment. It is normal that (1) the equalization should adapt throughout the song and not always be the same for the whole song; and (2) songs do not always belong to a specific musical genre and may contain touches of different musical genres. The neural model designed in this work, called CONEqNet (convolutional music equalizer network), takes these aspects into account and proposes a neural model capable of adapting to the different changes that occur throughout a song and with the possibility of mixing nuances of different musical genres. For the training of this model, the well-known GTzan dataset, which provides 1,000 fragments of songs of 30 seconds each, divided into 10 genres, was used. The paper will show proofs of concept of the performance of the neural model.This work was funded by the private research project of Company BQ, the public research projects of the Spanish Ministry of Science and Innovation PID2020-118249RB-C22 and PDC2021-121567-C22 - AEI/10.13039/501100011033, and the Madrid Government (Comunidad de Madrid-Spain) under the Multiannual Agreement with UC3M in the line of Excellence of University Professors (EPUC3M17) and in the context of the V PRICIT (Regional Programme of Research and Technological Innovation)

    Generative models for music using transformer architectures

    Get PDF
    openThis thesis focus on growth and impact of Transformes architectures which are mainly used for Natural Language Processing tasks for Audio generation. We think that music, with its notes, chords, and volumes, is a language. You could think of symbolic representation of music as human language. A brief sound synthesis history which gives basic foundation for modern AI-generated music models is mentioned . The most recent in AI-generated audio is carefully studied and instances of AI-generated music is told in many contexts. Deep learning models and their applications to real-world issues are one of the key subjects that are covered. The main areas of interest include transformer-based audio generation, including the training procedure, encoding and decoding techniques, and post-processing stages. Transformers have several key advantages, including long-term consistency and the ability to create minute-long audio compositions. Numerous studies on the various representations of music have been explained, including how neural network and deep learning techniques can be used to apply symbolic melodies, musical arrangements, style transfer, and sound production. This thesis largely focuses on transformation models, but it also recognises the importance of numerous AI-based generative models, including GAN. Overall, this thesis enhances generative models for music composition and provides a complete understanding of transformer design. It shows the possibilities of AI-generated sound synthesis by emphasising the most current developments.This thesis focus on growth and impact of Transformes architectures which are mainly used for Natural Language Processing tasks for Audio generation. We think that music, with its notes, chords, and volumes, is a language. You could think of symbolic representation of music as human language. A brief sound synthesis history which gives basic foundation for modern AI-generated music models is mentioned . The most recent in AI-generated audio is carefully studied and instances of AI-generated music is told in many contexts. Deep learning models and their applications to real-world issues are one of the key subjects that are covered. The main areas of interest include transformer-based audio generation, including the training procedure, encoding and decoding techniques, and post-processing stages. Transformers have several key advantages, including long-term consistency and the ability to create minute-long audio compositions. Numerous studies on the various representations of music have been explained, including how neural network and deep learning techniques can be used to apply symbolic melodies, musical arrangements, style transfer, and sound production. This thesis largely focuses on transformation models, but it also recognises the importance of numerous AI-based generative models, including GAN. Overall, this thesis enhances generative models for music composition and provides a complete understanding of transformer design. It shows the possibilities of AI-generated sound synthesis by emphasising the most current developments
    corecore