29 research outputs found

    A Lyrics-matching QBH System for Interactive Environments

    Get PDF
    (Abstract to follow

    Audio-Based Retrieval of Musical Score Data

    Get PDF
    Given an audio query, such as polyphonic musical piece, this thesis address the problem of retrieving a matching (similar) musical score data from a collection of musical scores. There are different techniques for measuring similarity between any musical piece such as metadata based similarity measure, collaborative filtering and content-based similarity measure. In this thesis, we use the information in the digital music itself for similarity measures and this technique is known as content-based similarity measure. First we extract chroma features to represents musical segments. Chroma feature captures both melodic information and harmonic information and is robust to timbre variation. Tempo variation in the performance of a same song may cause dissimilarity between them. In order to address this issue we extract beat sequences and combine them with chroma features to obtain beat synchronous chroma features. Next, we use Dynamic Time Warping (DTW) algorithm. This algorithm first computes the DTW matrix between two feature sequences and calculates the cost of traversing from starting point to end point of the matrix. Minimum the cost value, more similar the musical segments are. The performance of DTW is improved by choosing suitable path constraints and path weight. Then, we implement LSH algorithm, which first indexes the data and then searches for a similar item. Processing time of LSH is shorter than that of DTW. For a smaller fragment of query audio, say 30 seconds, LSH outperformed DTW. Performance of LSH depends on the number of hash tables, number of projections per table and width of the projection. Both algorithms were applied in two types of data sets, RWC (where audio and midi are from the same source) and TUT (where audio and midi are from different sources). The contribution of this thesis is twofold. First we proposed a suitable feature representation of a musical segment for melodic similarity. And then we apply two different similarity measure algorithms and enhance their performances. This thesis work also includes development of mobile application capable of recording audio from surroundings and displaying its acoustic features in real time

    Drawbacks and Proposed Solutions for Real-time Processing on Existing State-of-the-art Locality Sensitive Hashing Techniques

    Full text link
    Nearest-neighbor query processing is a fundamental operation for many image retrieval applications. Often, images are stored and represented by high-dimensional vectors that are generated by feature-extraction algorithms. Since tree-based index structures are shown to be ineffective for high dimensional processing due to the well-known "Curse of Dimensionality", approximate nearest neighbor techniques are used for faster query processing. Locality Sensitive Hashing (LSH) is a very popular and efficient approximate nearest neighbor technique that is known for its sublinear query processing complexity and theoretical guarantees. Nowadays, with the emergence of technology, several diverse application domains require real-time high-dimensional data storing and processing capacity. Existing LSH techniques are not suitable to handle real-time data and queries. In this paper, we discuss the challenges and drawbacks of existing LSH techniques for processing real-time high-dimensional image data. Additionally, through experimental analysis, we propose improvements for existing state-of-the-art LSH techniques for efficient processing of high-dimensional image data.Comment: Accepted and Presented at the 5th International Conference on Signal and Image Processing (SIGI-2019), Dubai, UA

    GPU Acceleration of Melody Accurate Matching in Query-by-Humming

    Get PDF
    With the increasing scale of the melody database, the query-by-humming system faces the trade-offs between response speed and retrieval accuracy. Melody accurate matching is the key factor to restrict the response speed. In this paper, we present a GPU acceleration method for melody accurate matching, in order to improve the response speed without reducing retrieval accuracy. The method develops two parallel strategies (intra-task parallelism and inter-task parallelism) to obtain accelerated effects. The efficiency of our method is validated through extensive experiments. Evaluation results show that our single GPU implementation achieves 20x to 40x speedup ratio, when compared to a typical general purpose CPU's execution time

    Sparse and Nonnegative Factorizations For Music Understanding

    Get PDF
    In this dissertation, we propose methods for sparse and nonnegative factorization that are specifically suited for analyzing musical signals. First, we discuss two constraints that aid factorization of musical signals: harmonic and co-occurrence constraints. We propose a novel dictionary learning method that imposes harmonic constraints upon the atoms of the learned dictionary while allowing the dictionary size to grow appropriately during the learning procedure. When there is significant spectral-temporal overlap among the musical sources, our method outperforms popular existing matrix factorization methods as measured by the recall and precision of learned dictionary atoms. We also propose co-occurrence constraints -- three simple and convenient multiplicative update rules for nonnegative matrix factorization (NMF) that enforce dependence among atoms. Using examples in music transcription, we demonstrate the ability of these updates to represent each musical note with multiple atoms and cluster the atoms for source separation purposes. Second, we study how spectral and temporal information extracted by nonnegative factorizations can improve upon musical instrument recognition. Musical instrument recognition in melodic signals is difficult, especially for classification systems that rely entirely upon spectral information instead of temporal information. Here, we propose a simple and effective method of combining spectral and temporal information for instrument recognition. While existing classification methods use traditional features such as statistical moments, we extract novel features from spectral and temporal atoms generated by NMF using a biologically motivated multiresolution gamma filterbank. Unlike other methods that require thresholds, safeguards, and hierarchies, the proposed spectral-temporal method requires only simple filtering and a flat classifier. Finally, we study how to perform sparse factorization when a large dictionary of musical atoms is already known. Sparse coding methods such as matching pursuit (MP) have been applied to problems in music information retrieval such as transcription and source separation with moderate success. However, when the set of dictionary atoms is large, identification of the best match in the dictionary with the residual is slow -- linear in the size of the dictionary. Here, we propose a variant called approximate matching pursuit (AMP) that is faster than MP while maintaining scalability and accuracy. Unlike MP, AMP uses an approximate nearest-neighbor (ANN) algorithm to find the closest match in a dictionary in sublinear time. One such ANN algorithm, locality-sensitive hashing (LSH), is a probabilistic hash algorithm that places similar, yet not identical, observations into the same bin. While the accuracy of AMP is comparable to similar MP methods, the computational complexity is reduced. Also, by using LSH, this method scales easily; the dictionary can be expanded without reorganizing any data structures

    Query by Humming

    Get PDF
    This TFG would explore different methods to retrieve song information from a query humming the song.[ANGL脠S] In this thesis, a Query by Singing/Humming (QbSH) system has been developed. A QbSH system tries to retrieve information of a song given a melody recorded by the user. The system compares human queries with melodies extracted from audio files. A pitch extraction algorithm has been used to obtain the melodies for both queries and database songs. The preprocessing of the signals turned out to be crucial, and has been deeply studied. The matching step used Dynamic Time Warping, which computes a distance between two signals absorbing tempo variations. Several databases have been built to assess the system. Finally, a complete Graphic User Interface has been programmed to allow the user to analyze the system step by step. In the end, this thesis contains a thorough experience through the creation of the system which, obtaining competitive results, provides a solid basis for further development.[CASTELL脌] En esta tesis se ha desarrollado un sistema de Query by Singing/Humming (QbSH). Estos sistemas tratan de recuperar informaci贸n de una canci贸n dada una melod铆a grabada por el usuario. El sistema compara grabaciones humanas con melod铆as extra铆das de archivos de audio. Se ha utilizado un algoritmo de extracci贸n del pitch para obtener las melod铆as de la grabaci贸n y de las canciones de la base de datos. El preprocesado de las se帽ales ha resultado ser crucial, y ha sido estudiado en profundidad. Para la clasificaci贸n se ha utilizado Dynamic Time Warping, que calcula la distancia entre dos se帽ales absorbiendo variaciones temporales. Diversas bases de datos se han construido para evaluar el sistema. Finalmente, se ha programado una completa interfaz gr谩fica para permitir al usuario analizar el sistema paso por paso. As铆, esta tesis contiene una experiencia completa de la creaci贸n del sistema que, obteniendo resultados competitivos, proporciona una base s贸lida para futuros desarrollos.[CATAL脌] En aquesta tesi s鈥檋a desenvolupat un sistema de Query by Singing/Humming (QbSH). Aquests sistemes tracten de recuperar informaci贸 d鈥檜na can莽贸 donada una melodia gravada per l鈥檜suari. El sistema compara gravacions humanes amb melodies extretes d鈥檃rxius d鈥櫭爑dio. S鈥檋a fet servir un algoritme d鈥檈xtracci贸 del pitch per obtindre les melodies de la gravaci贸 i de les can莽ons de la base de dades. El preprocessat dels senyals ha resultat ser crucial, i ha estat estudiat en profunditat. Per la classificaci贸 s鈥檋a utilitzat Dynamic Time Warping, que calcula la dist脿ncia entre dos senyals absorbint variacions temporals. Diverses bases de dades s鈥檋an constru茂t per avaluar el sistema. Finalment, s鈥檋a programat una completa interf铆cie gr脿fica per permetre a l鈥檜suari analitzar el sistema pas per pas. Aix铆, aquesta tesi cont茅 una experi猫ncia completa de la creaci贸 del sistema que, obtenint resultats competitius, proporciona una base s貌lida per futurs desenvolupaments

    Query by Humming (Android app)

    Get PDF
    Query by Humming/Singing is the technology to retrieve information of a song (title, artist, etc.) from singing (or humming) a small excerpt. This TFG should develop and integrate the required technology to create an application.[ANGL脠S]In this thesis, a Query by Singing/Humming (QbSH) has been developed. A QbSH system tries to retrieve information of a song given a melody recorded by the user. It has been developed as a client/server system, where the client is an Android application (programmed on Java) and the server is located on a Unix system and written on C++. The system compares a melody recorded by the user with other melodies previously recorded by other users and tagged with song information by the system administrator. A pitch extraction algorithm is applied in order to extract the melody for the query songs, then a processing algorithm in order to enhance the signal and prepare it for the matching. In the matching step Dynamic Time Warping (DTW) has been applied, which computes a distance between two signals and absorbs tempo variations. As a result, this thesis contains a full experience of audio processing, systems administration, communications and programming skills.[CASTELL脌] En esta tesis se ha desarrollado un sistema de Query by Singing/Humming (QbSH). Estos sistemas tratan de recuperar informaci贸n de una canci贸n a partir de una melodia grabada por el usuario. El sistema ha sido desarrollado como un sistema cliente/servidor, donde el cliente es una aplicaci贸n Android (programada en Java) y el servidor est谩 basado en una m谩quina Unix y escrito en C++. El sistema compara una melod铆a grabada por el usuario con otras melod铆as previamente grabadas por otros usuarios y etiquetadas con informaci贸n de la canci贸n por el propio administrador del sistema. Para extraer la melod铆a de los fragmentos grabados por el usuario, se ha aplicado un algoritmo de extracci贸n de pitch. Posteriormente se ha aplicado un preprocesado para mejorar la se帽al y prepararla para la clasificaci贸n. En la etapa de clasificaci贸n se ha aplicado el Dynamic Tiime Warping (DTW), que calcula la distancia entre dos se帽ales absorbiendo variaciones temporales. De esta forma, esta tesis contiene una experiencia completa en procesado de audio, administraci贸n de sistemas, comunicaciones y habilidades en programaci贸n.[CATAL脌] En aquesta tesi s鈥檋a desenvolupat un sistema de Query by Singing/Humming (QbSH). Aquests sistemes tracten de recuperar informaci贸 d鈥檜na can莽贸 donada una melodia gravada per l鈥檜suari. Ha estat desenvolupat com un sistema client/servidor, on el client 茅s una aplicaci贸 Android (programada en Java) i el servidor est脿 basat en una m脿quina Unix i escrit en C++. El sistema compara una melodia gravada per l'usuari amb altres melodies pr猫viament gravades per altres usuaris i etiquetades amb informaci贸 de la can莽贸 pel propi administrador del sistema. Per a extreure la melodia dels fragments gravats per l'usuari, s'ha aplicat un algoritme d'extracci贸 de pitch. Posteriorment s'ha aplicat un preprocessat per a millorar la senyal i preparar-la per a la classificaci贸. A l'etapa de classificaci贸 s'ha aplicat el Dynamic time Warping (DTW), que calcula la dist脿ncia entre dues senyals absorbint variacions temporals. Aix铆, aquesta tesi cont茅 una experi猫ncia completa en processat d'脿udio, administraci贸 de sistemes, comunicacions i habilitats en programaci贸

    Machine Annotation of Traditional Irish Dance Music

    Get PDF
    The work presented in this thesis is validated in experiments using 130 realworld field recordings of traditional music from sessions, classes, concerts and commercial recordings. Test audio includes solo and ensemble playing on a variety of instruments recorded in real-world settings such as noisy public sessions. Results are reported using standard measures from the field of information retrieval (IR) including accuracy, error, precision and recall and the system is compared to alternative approaches for CBMIR common in the literature
    corecore