6 research outputs found

    Harmonic Nature of Maddalam : - A Study

    Get PDF
     The sound samples of different strokes of maddalam are analysed using MIR toolbox. The frequency spectrum, attack and decay parameters are studied. The reasons for the harmonic nature of maddalam are identified

    On the Distributional Representation of Ragas: Experiments with Allied Raga Pairs

    Get PDF
    Raga grammar provides a theoretical framework that supports creativity and flexibility in improvisation while carefully maintaining the distinctiveness of each raga in the ears of a listener. A computational model for raga grammar can serve as a powerful tool to characterize grammaticality in performance. Like in other forms of tonal music, a distributional representation capturing tonal hierarchy has been found to be useful in characterizing a raga’s distinctiveness in performance. In the continuous-pitch melodic tradition, several choices arise for the defining attributes of a histogram representation of pitches. These can be resolved by referring to one of the main functions of the representation, namely to embody the raga grammar and therefore the technical boundary of a raga in performance. Based on the analyses of a representative dataset of audio performances in allied ragas by eminent Hindustani vocalists, we propose a computational representation of distributional information, and further apply it to obtain insights about how this aspect of raga distinctiveness is manifested in practice over different time scales by very creative performers

    Exploring deep learning based methods for information retrieval in Indian classical music

    Get PDF
    A vital aspect of Indian Classical Music (ICM) is Raga, which serves as a melodic framework for compositions and improvisations alike. Raga Recognition is an important music information retrieval task in ICM as it can aid numerous downstream applications ranging from music recommendations to organizing huge music collections. In this work, we propose a deep learning based approach to Raga recognition. Our approach employs efficient pre-possessing and learns temporal sequences in music data using Long Short Term Memory based Recurrent Neural Networks (LSTM-RNN). We train and test the network on smaller sequences sampled from the original audio while the final inference is performed on the audio as a whole. Our method achieves an accuracy of 88.1% and 97 % during inference on the Comp Music Carnatic dataset and its 10 Raga subset respectively making it the state of-the-art for the Raga recognition task. Our approach also enables sequence ranking which aids us in retrieving melodic patterns from a given music data base that are closely related to the presented query sequence

    Culturally sensitive strategies for automatic music prediction

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 103-112).Music has been shown to form an essential part of the human experience-every known society engages in music. However, as universal as it may be, music has evolved into a variety of genres, peculiar to particular cultures. In fact people acquire musical skill, understanding, and appreciation specific to the music they have been exposed to. This process of enculturation builds mental structures that form the cognitive basis for musical expectation. In this thesis I argue that in order for machines to perform musical tasks like humans do, in particular to predict music, they need to be subjected to a similar enculturation process by design. This work is grounded in an information theoretic framework that takes cultural context into account. I introduce a measure of musical entropy to analyze the predictability of musical events as a function of prior musical exposure. Then I discuss computational models for music representation that are informed by genre-specific containers for musical elements like notes. Finally I propose a software framework for automatic music prediction. The system extracts a lexicon of melodic, or timbral, and rhythmic primitives from audio, and generates a hierarchical grammar to represent the structure of a particular musical form. To improve prediction accuracy, context can be switched with cultural plug-ins that are designed for specific musical instruments and genres. In listening experiments involving music synthesis a culture-specific design fares significantly better than a culture-agnostic one. Hence my findings support the importance of computational enculturation for automatic music prediction. Furthermore I suggest that in order to sustain and cultivate the diversity of musical traditions around the world it is indispensable that we design culturally sensitive music technology.by Mihir Sarkar.Ph.D

    Sparse and Low-rank Modeling for Automatic Speech Recognition

    Get PDF
    This thesis deals with exploiting the low-dimensional multi-subspace structure of speech towards the goal of improving acoustic modeling for automatic speech recognition (ASR). Leveraging the parsimonious hierarchical nature of speech, we hypothesize that whenever a speech signal is measured in a high-dimensional feature space, the true class information is embedded in low-dimensional subspaces whereas noise is scattered as random high-dimensional erroneous estimations in the features. In this context, the contribution of this thesis is twofold: (i) identify sparse and low-rank modeling approaches as excellent tools for extracting the class-specific low-dimensional subspaces in speech features, and (ii) employ these tools under novel ASR frameworks to enrich the acoustic information present in the speech features towards the goal of improving ASR. Techniques developed in this thesis focus on deep neural network (DNN) based posterior features which, under the sparse and low-rank modeling approaches, unveil the underlying class-specific low-dimensional subspaces very elegantly. In this thesis, we tackle ASR tasks of varying difficulty, ranging from isolated word recognition (IWR) and connected digit recognition (CDR) to large-vocabulary continuous speech recognition (LVCSR). For IWR and CDR, we propose a novel \textit{Compressive Sensing} (CS) perspective towards ASR. Here exemplar-based speech recognition is posed as a problem of recovering sparse high-dimensional word representations from compressed low-dimensional phonetic representations. In the context of LVCSR, this thesis argues that albeit their power in representation learning, DNN based acoustic models still have room for improvement in exploiting the \textit{union of low-dimensional subspaces} structure of speech data. Therefore, this thesis proposes to enhance DNN posteriors by projecting them onto the manifolds of the underlying classes using principal component analysis (PCA) or compressive sensing based dictionaries. Projected posteriors are shown to be more accurate training targets for learning better acoustic models, resulting in improved ASR performance. The proposed approach is evaluated on both close-talk and far-field conditions, confirming the importance of sparse and low-rank modeling of speech in building a robust ASR framework. Finally, the conclusions of this thesis are further consolidated by an information theoretic analysis approach which explicitly quantifies the contribution of proposed techniques in improving ASR
    corecore