70 research outputs found
Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency
Subsequence matching has appeared to be an ideal approach for solving many
problems related to the fields of data mining and similarity retrieval. It has
been shown that almost any data class (audio, image, biometrics, signals) is or
can be represented by some kind of time series or string of symbols, which can
be seen as an input for various subsequence matching approaches. The variety of
data types, specific tasks and their partial or full solutions is so wide that
the choice, implementation and parametrization of a suitable solution for a
given task might be complicated and time-consuming; a possibly fruitful
combination of fragments from different research areas may not be obvious nor
easy to realize. The leading authors of this field also mention the
implementation bias that makes difficult a proper comparison of competing
approaches. Therefore we present a new generic Subsequence Matching Framework
(SMF) that tries to overcome the aforementioned problems by a uniform frame
that simplifies and speeds up the design, development and evaluation of
subsequence matching related systems. We identify several relatively separate
subtasks solved differently over the literature and SMF enables to combine them
in straightforward manner achieving new quality and efficiency. This framework
can be used in many application domains and its components can be reused
effectively. Its strictly modular architecture and openness enables also
involvement of efficient solutions from different fields, for instance
efficient metric-based indexes. This is an extended version of a paper
published on DEXA 2012.Comment: This is an extended version of a paper published on DEXA 201
Literary review of content-based music recognition paradigms
During the last few decades, a need for novel retrieval strategies for large audio databases emerged as millions of digital audio documents became accessible for everyone through the Internet. It became essential that the users could search for songs that they had no prior information about using only the content of the audio as a query. In practice this means that when a user hears an unknown song
coming out of the radio and wants to get more information about it, he or she can simply record a sample of the song with a mobile device and send it to a music recognition application as a query. Query results would then be presented on the screen with all the necessary meta data, such as the song name and artist. The retrieval systems are expected to perform quickly and accurately against large databases that may contain millions of songs, which poses lots of challenges for the researchers.
This thesis is a literature review which will go through some audio retrieval paradigms that allow querying for songs using only their audio content, such as audio fingerprinting. It will also address the typical problems and challenges of audio retrieval and compare how each of these proposed paradigms performs in these challenging scenarios
Query by Humming (Android app)
Query by Humming/Singing is the technology to retrieve information of a song (title, artist, etc.) from singing (or humming) a small excerpt. This TFG should develop and integrate the required technology to create an application.[ANGLÈS]In this thesis, a Query by Singing/Humming (QbSH) has been developed. A QbSH system tries to retrieve information of a song given a melody recorded by the user. It has been developed as a client/server system, where the client is an Android application (programmed on Java) and the server is located on a Unix system and written on C++. The system compares a melody recorded by the user with other melodies previously recorded by other users and tagged with song information by the system administrator. A pitch extraction algorithm is applied in order to extract the melody for the query songs, then a processing algorithm in order to enhance the signal and prepare it for the matching. In the matching step Dynamic Time Warping (DTW) has been applied, which computes a distance between two signals and absorbs tempo variations. As a result, this thesis contains a full experience of audio processing, systems administration, communications and programming skills.[CASTELLÀ] En esta tesis se ha desarrollado un sistema de Query by Singing/Humming (QbSH). Estos sistemas tratan de recuperar información de una canción a partir de una melodia grabada por el usuario. El sistema ha sido desarrollado como un sistema cliente/servidor, donde el cliente es una aplicación Android (programada en Java) y el servidor está basado en una máquina Unix y escrito en C++. El sistema compara una melodía grabada por el usuario con otras melodías previamente grabadas por otros usuarios y etiquetadas con información de la canción por el propio administrador del sistema. Para extraer la melodía de los fragmentos grabados por el usuario, se ha aplicado un algoritmo de extracción de pitch. Posteriormente se ha aplicado un preprocesado para mejorar la señal y prepararla para la clasificación. En la etapa de clasificación se ha aplicado el Dynamic Tiime Warping (DTW), que calcula la distancia entre dos señales absorbiendo variaciones temporales. De esta forma, esta tesis contiene una experiencia completa en procesado de audio, administración de sistemas, comunicaciones y habilidades en programación.[CATALÀ] En aquesta tesi s’ha desenvolupat un sistema de Query by Singing/Humming (QbSH). Aquests sistemes tracten de recuperar informació d’una cançó donada una melodia gravada per l’usuari. Ha estat desenvolupat com un sistema client/servidor, on el client és una aplicació Android (programada en Java) i el servidor està basat en una màquina Unix i escrit en C++. El sistema compara una melodia gravada per l'usuari amb altres melodies prèviament gravades per altres usuaris i etiquetades amb informació de la cançó pel propi administrador del sistema. Per a extreure la melodia dels fragments gravats per l'usuari, s'ha aplicat un algoritme d'extracció de pitch. Posteriorment s'ha aplicat un preprocessat per a millorar la senyal i preparar-la per a la classificació. A l'etapa de classificació s'ha aplicat el Dynamic time Warping (DTW), que calcula la distància entre dues senyals absorbint variacions temporals. Així, aquesta tesi conté una experiència completa en processat d'àudio, administració de sistemes, comunicacions i habilitats en programació
Sequential decision making in artificial musical intelligence
Over the past 60 years, artificial intelligence has grown from a largely academic field of research to a ubiquitous array of tools and approaches used in everyday technology. Despite its many recent successes and growing prevalence, certain meaningful facets of computational intelligence have not been as thoroughly explored. Such additional facets cover a wide array of complex mental tasks which humans carry out easily, yet are difficult for computers to mimic. A prime example of a domain in which human intelligence thrives, but machine understanding is still fairly limited, is music. Over the last decade, many researchers have applied computational tools to carry out tasks such as genre identification, music summarization, music database querying, and melodic segmentation. While these are all useful algorithmic solutions, we are still a long way from constructing complete music agents, able to mimic (at least partially) the complexity with which humans approach music. One key aspect which hasn't been sufficiently studied is that of sequential decision making in musical intelligence. This thesis strives to answer the following question: Can a sequential decision making perspective guide us in the creation of better music agents, and social agents in general? And if so, how? More specifically, this thesis focuses on two aspects of musical intelligence: music recommendation and human-agent (and more generally agent-agent) interaction in the context of music. The key contributions of this thesis are the design of better music playlist recommendation algorithms; the design of algorithms for tracking user preferences over time; new approaches for modeling people's behavior in situations that involve music; and the design of agents capable of meaningful interaction with humans and other agents in a setting where music plays a roll (either directly or indirectly). Though motivated primarily by music-related tasks, and focusing largely on people's musical preferences, this thesis also establishes that insights from music-specific case studies can also be applicable in other concrete social domains, such as different types of content recommendation. Showing the generality of insights from musical data in other contexts serves as evidence for the utility of music domains as testbeds for the development of general artificial intelligence techniques. Ultimately, this thesis demonstrates the overall usefulness of taking a sequential decision making approach in settings previously unexplored from this perspectiveComputer Science
Handwritten Text Generation from Visual Archetypes
Generating synthetic images of handwritten text in a writer-specific style is
a challenging task, especially in the case of unseen styles and new words, and
even more when these latter contain characters that are rarely encountered
during training. While emulating a writer's style has been recently addressed
by generative models, the generalization towards rare characters has been
disregarded. In this work, we devise a Transformer-based model for Few-Shot
styled handwritten text generation and focus on obtaining a robust and
informative representation of both the text and the style. In particular, we
propose a novel representation of the textual content as a sequence of dense
vectors obtained from images of symbols written as standard GNU Unifont glyphs,
which can be considered their visual archetypes. This strategy is more suitable
for generating characters that, despite having been seen rarely during
training, possibly share visual details with the frequently observed ones. As
for the style, we obtain a robust representation of unseen writers' calligraphy
by exploiting specific pre-training on a large synthetic dataset. Quantitative
and qualitative results demonstrate the effectiveness of our proposal in
generating words in unseen styles and with rare characters more faithfully than
existing approaches relying on independent one-hot encodings of the characters.Comment: Accepted at CVPR202
Multimodal Music Information Retrieval: From Content Analysis to Multimodal Fusion
Ph.DDOCTOR OF PHILOSOPH
ハミング ニヨル ケンサク キノウ オ ソナエタ オンガク ハイシン システム ノ カイハツ
Music retrieval systems are extremely useful for collecting digital music data from
on-line music distribution sites. Especially, there is a great need to develop effective
techniques for content-based music retrieval systems, which can retrieve by humming
query. The main issues in this research is how to decide the similarity of each music
features extracted from music data. In order to calculate the similarity, some conventional
methods use Euclid distance or DP matching, but it is very hard to solve the problem of the
vagueness of humming query. In this paper, we propose a new similar music retrieval
method based on humming query using the Earth Mover's Distance as the distance measure.
Computing the EMD is based on a solution to the transportation problem, and the EMD is
applied as the distance measure on similar image retrieval systems. In addition, we focus
that the time complexity of the EMD is exponential worst case toward the number of
notes, the improved method to decrease the number of notes in the music feature is also
proposed. Experimental results show that the proposed method can improve the retrieval
precision of conventional systems
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
- …