26 research outputs found

    Toward the Scientific Evaluation of Music Information Retrieval Systems

    Get PDF
    This paper outlines the findings-to-date of a project to assist in the efforts being made to establish a TREC-like evaluation paradigm within the Music Information Retrieval (MIR) research community. The findings and recommendations are based upon expert opinion garnered from members of the Information Retrieval (IR), Music Digital Library (MDL) and MIR communities with regard to the construction and implementation of scientifically valid evaluation frameworks. Proposed recommendations include the creation of data-rich query records that are both grounded in real-world requirements and neutral with respect to retrieval technique(s) being examined; adoption, and subsequent validation, of a “reasonable person” approach to “relevance” assessment; and, the development of a secure, yet accessible, research environment that allows researchers to remotely access the large-scale testbed collection

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Design and evaluation of dynamic feature-based segmentation on music

    Get PDF
    viii, 94 leaves : ill. ; 29 cmSegmentation is an indispensable step in the field of Music Information Retrieval (MIR). Segmentation refers to the splitting of a music piece into significant sections. Classically there has been a great deal of attention focused on various issues of segmentation, such as: perceptual segmentation vs. computational segmentation, segmentation evaluations, segmentation algorithms, etc. In this thesis, we conduct a series of perceptual experiments which challenge several of the traditional assumptions with respect to segmentation. Identifying some deficiencies in the current segmentation evaluation methods, we present a novel standardized evaluation approach which considers segmentation as a supportive step towards feature extraction in the MIR process. Furthermore, we propose a simple but effective segmentation algorithm and evaluate it utilizing our evaluation approach

    The MIREX Grand Challenge: A Framework of Holistic User-Experience Evaluation in Music Information Retrieval

    Get PDF
    Music Information Retrieval (MIR) evaluation has traditionally focused on system‐centered approaches where components of MIR systems are evaluated against predefined data sets and golden answers (i.e., ground truth). There are two major limitations of such system‐centered evaluation approaches: (a) The evaluation focuses on subtasks in music information retrieval, but not on entire systems and (b) users and their interactions with MIR systems are largely excluded. This article describes the first implementation of a holistic user‐experience evaluation in MIR, the MIREX Grand Challenge, where complete MIR systems are evaluated, with user experience being the single overarching goal. It is the first time that complete MIR systems have been evaluated with end users in a realistic scenario. We present the design of the evaluation task, the evaluation criteria and a novel evaluation interface, and the data‐collection platform. This is followed by an analysis of the results, reflection on the experience and lessons learned, and plans for future directions

    DLI-2: Creating the Digital Music Library: Final Report to the National Science Foundation

    Get PDF
    Indiana University’s Variations2 Digital Music Library project focused on three chief areas of research and development: system architecture, including content representation and metadata standards; component-based application architecture; and network services. We tested and evaluated commercial technologies, primarily for multimedia and storage management; developed custom software solutions for the needs of the music library community; integrated commercial and custom software products; and tested and evaluated prototype systems for music instruction and library services, locally at Indiana University, and at a number of satellite sites, in the U.S. and overseas. This document is the project's final report to the National Science Foundation.This work was sponsored by the National Science Foundation under award no. 9909068, as part of the DLI-2 initiative

    Music information retrieval: conceptuel framework, annotation and user behaviour

    Get PDF
    Understanding music is a process both based on and influenced by the knowledge and experience of the listener. Although content-based music retrieval has been given increasing attention in recent years, much of the research still focuses on bottom-up retrieval techniques. In order to make a music information retrieval system appealing and useful to the user, more effort should be spent on constructing systems that both operate directly on the encoding of the physical energy of music and are flexible with respect to users’ experiences. This thesis is based on a user-centred approach, taking into account the mutual relationship between music as an acoustic phenomenon and as an expressive phenomenon. The issues it addresses are: the lack of a conceptual framework, the shortage of annotated musical audio databases, the lack of understanding of the behaviour of system users and shortage of user-dependent knowledge with respect to high-level features of music. In the theoretical part of this thesis, a conceptual framework for content-based music information retrieval is defined. The proposed conceptual framework - the first of its kind - is conceived as a coordinating structure between the automatic description of low-level music content, and the description of high-level content by the system users. A general framework for the manual annotation of musical audio is outlined as well. A new methodology for the manual annotation of musical audio is introduced and tested in case studies. The results from these studies show that manually annotated music files can be of great help in the development of accurate analysis tools for music information retrieval. Empirical investigation is the foundation on which the aforementioned theoretical framework is built. Two elaborate studies involving different experimental issues are presented. In the first study, elements of signification related to spontaneous user behaviour are clarified. In the second study, a global profile of music information retrieval system users is given and their description of high-level content is discussed. This study has uncovered relationships between the users’ demographical background and their perception of expressive and structural features of music. Such a multi-level approach is exceptional as it included a large sample of the population of real users of interactive music systems. Tests have shown that the findings of this study are representative of the targeted population. Finally, the multi-purpose material provided by the theoretical background and the results from empirical investigations are put into practice in three music information retrieval applications: a prototype of a user interface based on a taxonomy, an annotated database of experimental findings and a prototype semantic user recommender system. Results are presented and discussed for all methods used. They show that, if reliably generated, the use of knowledge on users can significantly improve the quality of music content analysis. This thesis demonstrates that an informed knowledge of human approaches to music information retrieval provides valuable insights, which may be of particular assistance in the development of user-friendly, content-based access to digital music collections

    Evaluation of Synthesised Sound Effects

    Get PDF
    PhDThe current fi eld of sound synthesis research presents a range of methods and approaches for synthesising a given sound. Sounds are synthesised to facilitate interaction or control of a sound, to enable sound searching through parametric control of a sound or to allow for the creation of an arti ficial nonexistent sound. In all of these cases, the ability of a synthesis technique to reproduce a desired sound is integral. This thesis uses an audio feature representation of audio to produce a sonically inspired taxonomy, based entirely on the sonic content of sound, which enables a user to search through a large set of sounds without the need for understanding of context. This provides an approach for using audio features to compare similarity between different audio effect samples in a sound effects library. This thesis then develops approaches for evaluation of synthesised sound effects. A large scale methodic subjective evaluation of synthesised sound effects is performed, evaluating a range of different synthesis methods in a range of different sound classes or sonic contexts. It is then identi fied that there are cases where synthesised sound effects can be considered as realistic as a recorded sample. An objective evaluation approach is then presented. Audio feature vectors are used to measure the relative objective similarities between two samples, and this is correlated with a perceptual evaluation of sound similarity. These objective measures are then compared based on the perceptual evaluations. Both evaluation approaches are then demonstrated in a case study of aeroacoustic sound effects, where these subjective and objective evaluation techniques are demonstrated for a speci fic case. There is no single best approach to synthesising sound effects. More consistent and rigorous evaluation methodologies will lead to a better understanding as to the advantages and disadvantages of each method. The outcome of this research suggests that further consistent perceptual and objective evaluation within the sound effect synthesis community will lead to a better understanding as to the successes and failings of existing work and thus facilitate an enhancement of current sound synthesis technologies.This work was supported by the EPSRC grant EP/M506394/1

    Statistical distribution of common audio features : encounters in a heavy-tailed universe

    Get PDF
    In the last few years some Music Information Retrieval (MIR) researchers have spotted important drawbacks in applying standard successful-in-monophonic algorithms to polyphonic music classification and similarity assessment. Noticeably, these so called “Bag-of-Frames” (BoF) algorithms share a common set of assumptions. These assumptions are substantiated in the belief that the numerical descriptions extracted from short-time audio excerpts (or frames) are enough to capture relevant information for the task at hand, that these frame-based audio descriptors are time independent, and that descriptor frames are well described by Gaussian statistics. Thus, if we want to improve current BoF algorithms we could: i) improve current audio descriptors, ii) include temporal information within algorithms working with polyphonic music, and iii) study and characterize the real statistical properties of these frame-based audio descriptors. From a literature review, we have detected that many works focus on the first two improvements, but surprisingly, there is a lack of research in the third one. Therefore, in this thesis we analyze and characterize the statistical distribution of common audio descriptors of timbre, tonal and loudness information. Contrary to what is usually assumed, our work shows that the studied descriptors are heavy-tailed distributed and thus, they do not belong to a Gaussian universe. This new knowledge led us to propose new algorithms that show improvements over the BoF approach in current MIR tasks such as genre classification, instrument detection, and automatic tagging of music. Furthermore, we also address new MIR tasks such as measuring the temporal evolution of Western popular music. Finally, we highlight some promising paths for future audio-content MIR research that will inhabit a heavy-tailed universe.En el campo de la extracción de información musical o Music Information Retrieval (MIR), los algoritmos llamados Bag-of-Frames (BoF) han sido aplicados con éxito en la clasificación y evaluación de similitud de señales de audio monofónicas. Por otra parte, investigaciones recientes han señalado problemas importantes a la hora de aplicar dichos algoritmos a señales de música polifónica. Estos algoritmos suponen que las descripciones numéricas extraídas de los fragmentos de audio de corta duración (o frames ) son capaces de capturar la información necesaria para la realización de las tareas planteadas, que el orden temporal de estos fragmentos de audio es irrelevante y que las descripciones extraídas de los segmentos de audio pueden ser correctamente descritas usando estadísticas Gaussianas. Por lo tanto, si se pretende mejorar los algoritmos BoF actuales se podría intentar: i) mejorar los descriptores de audio, ii) incluir información temporal en los algoritmos que trabajan con música polifónica y iii) estudiar y caracterizar las propiedades estadísticas reales de los descriptores de audio. La bibliografía actual sobre el tema refleja la existencia de un número considerable de trabajos centrados en las dos primeras opciones de mejora, pero sorprendentemente, hay una carencia de trabajos de investigación focalizados en la tercera opción. Por lo tanto, esta tesis se centra en el análisis y caracterización de la distribución estadística de descriptores de audio comúnmente utilizados para representar información tímbrica, tonal y de volumen. Al contrario de lo que se asume habitualmente, nuestro trabajo muestra que los descriptores de audio estudiados se distribuyen de acuerdo a una distribución de “cola pesada” y por lo tanto no pertenecen a un universo Gaussiano. Este descubrimiento nos permite proponer nuevos algoritmos que evidencian mejoras importantes sobre los algoritmos BoF actualmente utilizados en diversas tareas de MIR tales como clasificación de género, detección de instrumentos musicales y etiquetado automático de música. También nos permite proponer nuevas tareas tales como la medición de la evolución temporal de la música popular occidental. Finalmente, presentamos algunas prometedoras líneas de investigación para tareas de MIR ubicadas, a partir de ahora, en un universo de “cola pesada”.En l’àmbit de la extracció de la informació musical o Music Information Retrieval (MIR), els algorismes anomenats Bag-of-Frames (BoF) han estat aplicats amb èxit en la classificació i avaluació de similitud entre senyals monofòniques. D’altra banda, investigacions recents han assenyalat importants inconvenients a l’hora d’aplicar aquests mateixos algorismes en senyals de música polifònica. Aquests algorismes BoF suposen que les descripcions numèriques extretes dels fragments d’àudio de curta durada (frames) son suficients per capturar la informació rellevant per als algorismes, que els descriptors basats en els fragments son independents del temps i que l’estadística Gaussiana descriu correctament aquests descriptors. Per a millorar els algorismes BoF actuals doncs, es poden i) millorar els descriptors, ii) incorporar informació temporal dins els algorismes que treballen amb música polifònica i iii) estudiar i caracteritzar les propietats estadístiques reals d’aquests descriptors basats en fragments d’àudio. Sorprenentment, de la revisió bibliogràfica es desprèn que la majoria d’investigacions s’han centrat en els dos primers punts de millora mentre que hi ha una mancança quant a la recerca en l’àmbit del tercer punt. És per això que en aquesta tesi, s’analitza i caracteritza la distribució estadística dels descriptors més comuns de timbre, to i volum. El nostre treball mostra que contràriament al què s’assumeix, els descriptors no pertanyen a l’univers Gaussià sinó que es distribueixen segons una distribució de “cua pesada”. Aquest descobriment ens permet proposar nous algorismes que evidencien millores importants sobre els algorismes BoF utilitzats actualment en diferents tasques com la classificació del gènere, la detecció d’instruments musicals i l’etiquetatge automàtic de música. Ens permet també proposar noves tasques com la mesura de l’evolució temporal de la música popular occidental. Finalment, presentem algunes prometedores línies d’investigació per a tasques de MIR ubicades a partir d’ara en un univers de “cua pesada”
    corecore