56,259 research outputs found
Panako: a scalable acoustic fingerprinting system handling time-scale and pitch modification
In this paper a scalable granular acoustic fingerprinting system robust against time and pitch scale modification is presented. The aim of acoustic fingerprinting is to identify identical, or recognize similar, audio fragments in a large set using condensed representations of audio signals, i.e. fingerprints. A robust fingerprinting system generates similar fingerprints for perceptually similar audio signals. The new system, presented here, handles a variety of distortions well. It is designed to be robust against pitch shifting, time stretching and tempo changes, while remaining scalable. After a query, the system returns the start time in the reference audio, and the amount of pitch shift and tempo change that has been applied. The design of the system that offers this unique combination of features is the main contribution of this research. The fingerprint itself consists of a combination of key points in a Constant-Q spectrogram. The system is evaluated on commodity hardware using a freely available reference database with fingerprints of over 30.000 songs. The results show that the system responds quickly and reliably on queries, while handling time and pitch scale modifications of up to ten percent
Music Synchronization, Audio Matching, Pattern Detection, and User Interfaces for a Digital Music Library System
Over the last two decades, growing efforts to digitize our cultural heritage could be observed. Most of these digitization initiatives pursuit either one or both of the following goals: to conserve the documents - especially those threatened by decay - and to provide remote access on a grand scale. For music documents these trends are observable as well, and by now several digital music libraries are in existence. An important characteristic of these music libraries is an inherent multimodality resulting from the large variety of available digital music representations, such as scanned score, symbolic score, audio recordings, and videos. In addition, for each piece of music there exists not only one document of each type, but many. Considering and exploiting this multimodality and multiplicity, the DFG-funded digital library initiative PROBADO MUSIC aimed at developing a novel user-friendly interface for content-based retrieval, document access, navigation, and browsing in large music collections. The implementation of such a front end requires the multimodal linking and indexing of the music documents during preprocessing. As the considered music collections can be very large, the automated or at least semi-automated calculation of these structures would be recommendable. The field of music information retrieval (MIR) is particularly concerned with the development of suitable procedures, and it was the goal of PROBADO MUSIC to include existing and newly developed MIR techniques to realize the envisioned digital music library system. In this context, the present thesis discusses the following three MIR tasks: music synchronization, audio matching, and pattern detection. We are going to identify particular issues in these fields and provide algorithmic solutions as well as prototypical implementations. In Music synchronization, for each position in one representation of a piece of music the corresponding position in another representation is calculated. This thesis focuses on the task of aligning scanned score pages of orchestral music with audio recordings. Here, a previously unconsidered piece of information is the textual specification of transposing instruments provided in the score. Our evaluations show that the neglect of such information can result in a measurable loss of synchronization accuracy. Therefore, we propose an OCR-based approach for detecting and interpreting the transposition information in orchestral scores. For a given audio snippet, audio matching methods automatically calculate all musically similar excerpts within a collection of audio recordings. In this context, subsequence dynamic time warping (SSDTW) is a well-established approach as it allows for local and global tempo variations between the query and the retrieved matches. Moving to real-life digital music libraries with larger audio collections, however, the quadratic runtime of SSDTW results in untenable response times. To improve on the response time, this thesis introduces a novel index-based approach to SSDTW-based audio matching. We combine the idea of inverted file lists introduced by Kurth and MĂĽller (Efficient index-based audio matching, 2008) with the shingling techniques often used in the audio identification scenario. In pattern detection, all repeating patterns within one piece of music are determined. Usually, pattern detection operates on symbolic score documents and is often used in the context of computer-aided motivic analysis. Envisioned as a new feature of the PROBADO MUSIC system, this thesis proposes a string-based approach to pattern detection and a novel interactive front end for result visualization and analysis
Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency
Subsequence matching has appeared to be an ideal approach for solving many
problems related to the fields of data mining and similarity retrieval. It has
been shown that almost any data class (audio, image, biometrics, signals) is or
can be represented by some kind of time series or string of symbols, which can
be seen as an input for various subsequence matching approaches. The variety of
data types, specific tasks and their partial or full solutions is so wide that
the choice, implementation and parametrization of a suitable solution for a
given task might be complicated and time-consuming; a possibly fruitful
combination of fragments from different research areas may not be obvious nor
easy to realize. The leading authors of this field also mention the
implementation bias that makes difficult a proper comparison of competing
approaches. Therefore we present a new generic Subsequence Matching Framework
(SMF) that tries to overcome the aforementioned problems by a uniform frame
that simplifies and speeds up the design, development and evaluation of
subsequence matching related systems. We identify several relatively separate
subtasks solved differently over the literature and SMF enables to combine them
in straightforward manner achieving new quality and efficiency. This framework
can be used in many application domains and its components can be reused
effectively. Its strictly modular architecture and openness enables also
involvement of efficient solutions from different fields, for instance
efficient metric-based indexes. This is an extended version of a paper
published on DEXA 2012.Comment: This is an extended version of a paper published on DEXA 201
Recommended from our members
The Variable Markov Oracle: Algorithms for Human Gesture Applications
This article introduces the Variable Markov Oracle (VMO) data structure for multivariate time series indexing. VMO can identify repetitive fragments and find sequential similarities between observations. VMO can also be viewed as a combination of online clustering algorithms with variable-order Markov constraints. The authors use VMO for gesture query-by-content and gesture following. A probabilistic interpretation of the VMO query-matching algorithm is proposed to find an analogy to the inference problem in a hidden Markov model (HMM). This probabilistic interpretation extends VMO to be not only a data structure but also a model for time series. Query-by-content experiments were conducted on a gesture database that was recorded using a Kinect 3D camera, showing state-of-the-art performance. The query-by-content experiments' results are compared to previous works using HMM and dynamic time warping. Gesture following is described in the context of an interactive dance environment that aims to integrate human movements with computer-generated graphics to create an augmented reality performance
Audio Classification from Time-Frequency Texture
Time-frequency representations of audio signals often resemble texture
images. This paper derives a simple audio classification algorithm based on
treating sound spectrograms as texture images. The algorithm is inspired by an
earlier visual classification scheme particularly efficient at classifying
textures. While solely based on time-frequency texture features, the algorithm
achieves surprisingly good performance in musical instrument classification
experiments
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
- …