24 research outputs found

    On the Application of Generic Summarization Algorithms to Music

    Get PDF
    Several generic summarization algorithms were developed in the past and successfully applied in fields such as text and speech summarization. In this paper, we review and apply these algorithms to music. To evaluate this summarization's performance, we adopt an extrinsic approach: we compare a Fado Genre Classifier's performance using truncated contiguous clips against the summaries extracted with those algorithms on 2 different datasets. We show that Maximal Marginal Relevance (MMR), LexRank and Latent Semantic Analysis (LSA) all improve classification performance in both datasets used for testing.Comment: 12 pages, 1 table; Submitted to IEEE Signal Processing Letter

    Using Generic Summarization to Improve Music Information Retrieval Tasks

    Get PDF
    In order to satisfy processing time constraints, many MIR tasks process only a segment of the whole music signal. This practice may lead to decreasing performance, since the most important information for the tasks may not be in those processed segments. In this paper, we leverage generic summarization algorithms, previously applied to text and speech summarization, to summarize items in music datasets. These algorithms build summaries, that are both concise and diverse, by selecting appropriate segments from the input signal which makes them good candidates to summarize music as well. We evaluate the summarization process on binary and multiclass music genre classification tasks, by comparing the performance obtained using summarized datasets against the performances obtained using continuous segments (which is the traditional method used for addressing the previously mentioned time constraints) and full songs of the same original dataset. We show that GRASSHOPPER, LexRank, LSA, MMR, and a Support Sets-based Centrality model improve classification performance when compared to selected 30-second baselines. We also show that summarized datasets lead to a classification performance whose difference is not statistically significant from using full songs. Furthermore, we make an argument stating the advantages of sharing summarized datasets for future MIR research.Comment: 24 pages, 10 tables; Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processin

    Detection of speech signal in strong ship-radiated noise based on spectrum entropy

    Get PDF
    Comparing the frequency spectrum distributions calculated from several successive frames, the change of the frequency spectrum of speech frames between successive frames is larger than that of the ship-radiated noise. The aim of this work is to propose a novel speech detection algorithm in strong ship-radiated noise. As inaccurate sentence boundaries are a major cause in automatic speech recognition in strong noise background. Hence, based on that characteristic, a new feature repeating pattern of frequency spectrum trend (RPFST) was calculated based on spectrum entropy. Firstly, the speech is detected roughly with the precision of 1 s by calculating the feature RPFST. Then, the detection precision is up to 20 ms, the length of frames, by method of frame shifting. Finally, benchmarked on a large measured data set, the detection accuracy (92 %) is achieved. The experimental results show the feasibility of the algorithm to all kinds of speech and ship-radiated noise

    Inpainting of long audio segments with similarity graphs

    Full text link
    We present a novel method for the compensation of long duration data loss in audio signals, in particular music. The concealment of such signal defects is based on a graph that encodes signal structure in terms of time-persistent spectral similarity. A suitable candidate segment for the substitution of the lost content is proposed by an intuitive optimization scheme and smoothly inserted into the gap, i.e. the lost or distorted signal region. Extensive listening tests show that the proposed algorithm provides highly promising results when applied to a variety of real-world music signals

    Music shapelets for fast cover song regognition

    Get PDF
    A cover song is a new performance or recording of a previously recorded music by an artist other than the original one. The automatic identification of cover songs is useful for a wide range of tasks, from fans looking for new versions of their favorite songs to organizations involved in licensing copyrighted songs. This is a difficult task given that a cover may differ from the original song in key, timbre, tempo, structure, arrangement and even language of the vocals. Cover song identification has attracted some attention recently. However, most of the state-of-the-art approaches are based on similarity search, which involves a large number of similarity computations to retrieve potential cover versions for a query recording. In this paper, we adapt the idea of time series shapelets for contentbased music retrieval. Our proposal adds a training phase that finds small excerpts of feature vectors that best describe each song. We demonstrate that we can use such small segments to identify cover songs with higher identification rates and more than one order of magnitude faster than methods that use features to describe the whole music.FAPESP (grants #2011/17698-5, #2013/26151-5, and 2015/07628-0)CNPq (grants 446330/2014-0 and 303083/2013-1

    Fusion of Multimodal Information in Music Content Analysis

    Get PDF
    Music is often processed through its acoustic realization. This is restrictive in the sense that music is clearly a highly multimodal concept where various types of heterogeneous information can be associated to a given piece of music (a musical score, musicians\u27 gestures, lyrics, user-generated metadata, etc.). This has recently led researchers to apprehend music through its various facets, giving rise to "multimodal music analysis" studies. This article gives a synthetic overview of methods that have been successfully employed in multimodal signal analysis. In particular, their use in music content processing is discussed in more details through five case studies that highlight different multimodal integration techniques. The case studies include an example of cross-modal correlation for music video analysis, an audiovisual drum transcription system, a description of the concept of informed source separation, a discussion of multimodal dance-scene analysis, and an example of user-interactive music analysis. In the light of these case studies, some perspectives of multimodality in music processing are finally suggested

    Clustering by compression

    Full text link
    We present a new method for clustering based on compression. The method doesn't use subject-specific features or background knowledge, and works as follows: First, we determine a universal similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pairwise concatenation). Second, we apply a hierarchical clustering method. The NCD is universal in that it is not restricted to a specific application area, and works across application area boundaries. A theoretical precursor, the normalized information distance, co-developed by one of the authors, is provably optimal but uses the non-computable notion of Kolmogorov complexity. We propose precise notions of similarity metric, normal compressor, and show that the NCD based on a normal compressor is a similarity metric that approximates universality. To extract a hierarchy of clusters from the distance matrix, we determine a dendrogram (binary tree) by a new quartet method and a fast heuristic to implement it. The method is implemented and available as public software, and is robust under choice of different compressors. To substantiate our claims of universality and robustness, we report evidence of successful application in areas as diverse as genomics, virology, languages, literature, music, handwritten digits, astronomy, and combinations of objects from completely different domains, using statistical, dictionary, and block sorting compressors. In genomics we presented new evidence for major questions in Mammalian evolution, based on whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta hypothesis against the Theria hypothesis.Comment: LaTeX, 27 pages, 20 figure

    Detection of speech signal in strong ship-radiated noise based on spectrum entropy

    Get PDF
    Comparing the frequency spectrum distributions calculated from several successive frames, the change of the frequency spectrum of speech frames between successive frames is larger than that of the ship-radiated noise. The aim of this work is to propose a novel speech detection algorithm in strong ship-radiated noise. As inaccurate sentence boundaries are a major cause in automatic speech recognition in strong noise background. Hence, based on that characteristic, a new feature repeating pattern of frequency spectrum trend (RPFST) was calculated based on spectrum entropy. Firstly, the speech is detected roughly with the precision of 1 s by calculating the feature RPFST. Then, the detection precision is up to 20 ms, the length of frames, by method of frame shifting. Finally, benchmarked on a large measured data set, the detection accuracy (92 %) is achieved. The experimental results show the feasibility of the algorithm to all kinds of speech and ship-radiated noise

    Onde é que eu já ouvi isto?

    Get PDF
    Tese de mestrado em Engenharia Informática, apresentada à Universidade de Lisboa, através da Faculdade de Ciências, 2012Nos últimos anos, avanços tecnológicos a nível de compressão de áudio e redes de computadores tem solicitado um aumento gigante na disponibilidade e partilha de música digital. O objectivo fundamental deste projecto é desenvolver um protótipo, pelo qual a semelhança entre várias peças de áudio possa ser medida, exclusivamente, no conteúdo do áudio em si, isto é, a partir das suas propriedades e características mais básicas. Este protótipo irá analisar as características inerentes de cada peça de áudio e usar os dados provenientes dessa análise para comparar músicas, independentemente de qualquer metadata que possa existir. A base para essa comparação consiste numa impressão digital do áudio em si, que tem como objectivo gerar uma assinatura que identifica um pedaço de áudio. Esta assinatura, transforma o sinal de áudio numa sequência de vectores sendo esta sequência de vectores, um conjunto de características espectrais, representadas como: Zero-Crossings, Spectral Centroid, Rolloff, Flux e Mel-Frequency Cepstral Coeficientes (MFCC) do sinal de áudio. Mais especificamente, o sinal de áudio é convertido numa sequência de símbolos, que correspondem às características de uma peça de áudio. Esta “impressão digital” do áudio, não só identifica uma peça musical, mas também fornece informações sobre suas características musicais. Usando este protótipo, será possível uma selecção de filmes com base na semelhança entre as peças de áudio, ou seja, será possível exibir ao usuário uma série de filmes, que possuam sequências de áudio semelhante a um tipo de áudio escolhido pelo mesmo permitindo, por isso, pesquisar numa base de documentos de vídeo através, apenas, de peças de áudio. O trabalho insere-se numa das tarefas do projecto VIRUS (Video Information Retrieval Using Subtitles), financiado pela FCT, para a qual as técnicas foram, grande parte, já desenvolvidas.Over de last ten years, technological advances at the level of compression of audio and computer networks has prompted a huge increase in the availability and sharing of digital music. The main purpose of this project is to develop a prototype, for which the similarity between various pieces of audio can be measured, exclusively on the audio content itself, that is, from their most basic properties and characteristics. This prototype will analyze the inherent characteristics of each piece of audio and use the data from this analysis to compare music regardless, of any metadata that may exist. The basis for this comparison is a fingerprint of the audio itself, which aims to generate a signature that identifies the piece of audio. This signature, transform the audio signal is a sequence of vectors witch is, a set of spectral features, represented as: Zero-Crossings, Spectral Centroid, Rolloff, Flux and Mel-Frequency Cepstral Coefficients (MFCC) audio signal. More specifically, the audio signal is converted into a sequence of symbols that correspond to the characteristics of a piece of audio. This "fingerprint" of the audio, not only identifies a piece of music, but also provides information on its musical characteristics. Using this prototype, it’s possible to select a movie based on the similarity between pieces of audio specified by the user, or the user can a series of films that have audio similar to a type of audio selected by de user. Through this prototype is also possible to search in a database of video, by specifying only pieces of audio. The work is part of a project's tasks VIRUS (Video Information Retrieval Using Subtitles), funded by FCT, for which the techniques were largely already developed
    corecore