Search CORE

43 research outputs found

Discovering distorted repeating patterns in polyphonic music through longest increasing subsequences

Author: Laaksonen Antti
Lemström Kjell
Publication venue
Publication date: 01/01/2021
Field of study

We study the problem of identifying repetitions under transposition and time-warp invariances in polyphonic symbolic music. Using a novel onset-time-pair representation, we reduce the repeating pattern discovery problem to instances of the classical problem of finding the longest increasing subsequences. The resulting algorithm works in O(n(2) log n) time where n is the number of notes in a musical work. We also study windowed variants of the problem where onset-time differences between notes are restricted, and show that they can also be solved in O(n(2) log n) time using the algorithm.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Transposition and time-scale invariant geometric music retrieval

Author: Lemström Kjell
Publication venue: Heidelberg, Berlin, Springer Verlag,
Publication date: 01/01/2010
Field of study

Non Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Lainakappaleiden tunnistaminen tiedon tiivistämiseen perustuvia etäisyysmittoja käyttäen

Author: Ahonen Teppo
Publication venue: 'University of Helsinki Libraries'
Publication date: 01/04/2016
Field of study

Measuring similarity in music data is a problem with various potential applications. In recent years, the task known as cover song identification has gained widespread attention. In cover song identification, the purpose is to determine whether a piece of music is a different rendition of a previous version of the composition. The task is quite trivial for a human listener, but highly challenging for a computer. This research approaches the problem from an information theoretic starting point. Assuming that cover versions share musical information with the original performance, we strive to measure the degree of this common information as the amount of computational resources needed to turn one version into another. Using a similarity measure known as normalized compression distance, we approximate the non-computable Kolmogorov complexity as the length of an object when compressed using a real-world data compression algorithm. If two pieces of music share musical information, we should be able to compress one using a model learned from the other. In order to use compression-based similarity measuring, the meaningful musical information needs to be extracted from the raw audio signal data. The most commonly used representation for this task is known as chromagram: a sequence of real-valued vectors describing the temporal tonal content of the piece of music. Measuring the similarity between two chromagrams effectively with a data compression algorithm requires further processing to extract relevant features and find a more suitable discrete representation for them. Here, the challenge is to process the data without losing the distinguishing characteristics of the music. In this research, we study the difficult nature of cover song identification and search for an effective compression-based system for the task. Harmonic and melodic features, different representations for them, commonly used data compression algorithms, and several other variables of the problem are addressed thoroughly. The research seeks to shed light on how different choices in the scheme attribute to the performance of the system. Additional attention is paid to combining different features, with several combination strategies studied. Extensive empirical evaluation of the identification system has been performed, using large sets of real-world music data. Evaluations show that the compression-based similarity measuring performs relatively well but fails to achieve the accuracy of the existing solution that measures similarity by using common subsequences. The best compression-based results are obtained by a combination of distances based on two harmonic representations obtained from chromagrams using hidden Markov model chord estimation, and an octave-folded version of the extracted salient melody representation. The most distinct reason for the shortcoming of the compression performance is the scarce amount of data available for a single piece of music. This was partially overcome by internal data duplication. As a whole, the process is solid and provides a practical foundation for an information theoretic approach for cover song identification.Lainakappeleiksi kutsutaan musiikkiesityksiä, jotka ovat eri esittäjän tekemiä uusia tulkintoja kappaleen alkuperäisen esittäjän tekemästä versiosta. Toisinaan lainakappaleet voivat olla hyvinkin samanlaisia alkuperäisversioiden kanssa, toisinaan versioilla saattaa olla vain nimellisesti yhtäläisyyksiä. Ihmisille lainakappaleiden tunnistaminen on yleensä helppoa, jos alkuperäisesitys on tuttu. Lainakappaleiden automaattinen, algoritmeihin perustuva tunnistaminen on kuitenkin huomattavasti haastavampi ongelma, eikä täysin tyydyttäviä ratkaisuja ole vielä esitetty. Ongelman ratkaisulla olisi useita tutkimuksellisesti ja kaupallisesti potentiaalisia sovelluskohteita, kuten esimerkiksi plagioinnin automaattinen tunnistaminen. Väitöskirjassa lainakappeleiden automaattista tunnistamista käsitellään informaatioteoreettisesta lähtökohdasta. Tutkimuksessa selvitetään, pystytäänkö kappaleiden sisältämää tonaalista samanlaisuutta mittaamaan siten, että sen perusteella voidaan todeta eri esitysten olevan pohjimmiltaan saman sävellyksen eri tulkintoja. Samanlaisuuden mittaamisessa hyödynnetään tiedontiivistysalgoritmeihin perustuvaa samanlaisuusmetriikkaa, jota varten musiikkikappaleista pitää pystyä erottamaan ja esittämään sen sävellyksellisesti yksilöivimmät piirteet. Tutkimus tehdään laajalla aineistolla audiomuotoista populaarimusiikkia. Väitöstutkimus käy läpi useita tutkimusongelman eri vaiheita lähtien signaalidatan käsittelemiseen liittyvistä parametreista, edeten siihen miten signaalista erotettu esitysmuoto saadaan muunnettua merkkijonomuotoiseksi siten, että prosessin tulos edelleen kuvaa kappaleen keskeisiä musiikillisia piirteitä, ja miten saatua merkkijonodataa voidaan vielä jatkokäsitellä tunnistamisen parantamiseksi. Tämän ohella väitöksessä tutkitaan, miten kappaleiden erilaiset musiikilliset eroavaisuudet (tempo, sävellaji, sovitukset) vaikuttavat tunnistamiseen ja miten näiden eroavaisuuksien vaikutus mittaamisessa voidaan minimoida. Tutkimuksen kohteena on myös yleisimpien tiedontiivistysalgoritmien soveltuvuus mittausmenetelmänä käsiteltävään ongelmaan. Näiden lisäksi tutkimus esittelee, miten samasta kappaleesta irrotettuja useita erilaisia esitysmuotoja voidaan yhdistää paremman tunnistamistarkkuuden saavuttamiseksi. Lopputuloksena väitöskirja esittelee tiedontiivistystä hyödyntävän järjestelmän lainakappaleiden tunnistamiseen ja käsittelee sen keskeiset vahvuudet ja heikkoudet. Tutkimuksen tuloksena arvioidaan myös mitkä asiat tekevät lainakappaleiden automaattisesta tunnistamisesta niin haastavan ongelman kuin mitä se on

Helsingin yliopiston digitaalinen arkisto

Dissimilarity-based learning for complex data

Author: Mokbel Bassam
Publication venue: Universität Bielefeld
Publication date: 01/01/2016
Field of study

Mokbel B. Dissimilarity-based learning for complex data. Bielefeld: Universität Bielefeld; 2016.Rapid advances of information technology have entailed an ever increasing amount of digital data, which raises the demand for powerful data mining and machine learning tools. Due to modern methods for gathering, preprocessing, and storing information, the collected data become more and more complex: a simple vectorial representation, and comparison in terms of the Euclidean distance is often no longer appropriate to capture relevant aspects in the data. Instead, problem-adapted similarity or dissimilarity measures refer directly to the given encoding scheme, allowing to treat information constituents in a relational manner. This thesis addresses several challenges of complex data sets and their representation in the context of machine learning. The goal is to investigate possible remedies, and propose corresponding improvements of established methods, accompanied by examples from various application domains. The main scientific contributions are the following: (I) Many well-established machine learning techniques are restricted to vectorial input data only. Therefore, we propose the extension of two popular prototype-based clustering and classification algorithms to non-negative symmetric dissimilarity matrices. (II) Some dissimilarity measures incorporate a fine-grained parameterization, which allows to configure the comparison scheme with respect to the given data and the problem at hand. However, finding adequate parameters can be hard or even impossible for human users, due to the intricate effects of parameter changes and the lack of detailed prior knowledge. Therefore, we propose to integrate a metric learning scheme into a dissimilarity-based classifier, which can automatically adapt the parameters of a sequence alignment measure according to the given classification task. (III) A valuable instrument to make complex data sets accessible are dimensionality reduction techniques, which can provide an approximate low-dimensional embedding of the given data set, and, as a special case, a planar map to visualize the data's neighborhood structure. To assess the reliability of such an embedding, we propose the extension of a well-known quality measure to enable a fine-grained, tractable quantitative analysis, which can be integrated into a visualization. This tool can also help to compare different dissimilarity measures (and parameter settings), if ground truth is not available. (IV) All techniques are demonstrated on real-world examples from a variety of application domains, including bioinformatics, motion capturing, music, and education

Publications at Bielefeld University

Getting aligned on representational alignment

Biological and artificial information processing systems form representations that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the extent to which the representations formed by these diverse systems agree? Do similarities in representations then translate into similar behavior? How can a system's representations be modified to better match those of another system? These questions pertaining to the study of representational alignment are at the heart of some of the most active research areas in cognitive science, neuroscience, and machine learning. For example, cognitive scientists measure the representational alignment of multiple individuals to identify shared cognitive priors, neuroscientists align fMRI responses from multiple individuals into a shared representational space for group-level analyses, and ML researchers distill knowledge from teacher models into student models by increasing their alignment. Unfortunately, there is limited knowledge transfer between research communities interested in representational alignment, so progress in one field often ends up being rediscovered independently in another. Thus, greater cross-field communication would be advantageous. To improve communication between these fields, we propose a unifying framework that can serve as a common language between researchers studying representational alignment. We survey the literature from all three fields and demonstrate how prior work fits into this framework. Finally, we lay out open problems in representational alignment where progress can benefit all three of these fields. We hope that our work can catalyze cross-disciplinary collaboration and accelerate progress for all communities studying and developing information processing systems. We note that this is a working paper and encourage readers to reach out with their suggestions for future revisions.Comment: Working paper, changes to be made in upcoming revision

arXiv.org e-Print Archive

Self-supervised learning for transferable representations

Author: Ericsson Linus
Publication venue: The University of Edinburgh
Publication date: 12/02/2024
Field of study

Machine learning has undeniably achieved remarkable advances thanks to large labelled datasets and supervised learning. However, this progress is constrained by the labour-intensive annotation process. It is not feasible to generate extensive labelled datasets for every problem we aim to address. Consequently, there has been a notable shift in recent times toward approaches that solely leverage raw data. Among these, self-supervised learning has emerged as a particularly powerful approach, offering scalability to massive datasets and showcasing considerable potential for effective knowledge transfer. This thesis investigates self-supervised representation learning with a strong focus on computer vision applications. We provide a comprehensive survey of self-supervised methods across various modalities, introducing a taxonomy that categorises them into four distinct families while also highlighting practical considerations for real-world implementation. Our focus thenceforth is on the computer vision modality, where we perform a comprehensive benchmark evaluation of state-of-the-art self supervised models against many diverse downstream transfer tasks. Our findings reveal that self-supervised models often outperform supervised learning across a spectrum of tasks, albeit with correlations weakening as tasks transition beyond classification, particularly for datasets with distribution shifts. Digging deeper, we investigate the influence of data augmentation on the transferability of contrastive learners, uncovering a trade-off between spatial and appearance-based invariances that generalise to real-world transformations. This begins to explain the differing empirical performances achieved by self-supervised learners on different downstream tasks, and it showcases the advantages of specialised representations produced with tailored augmentation. Finally, we introduce a novel self-supervised pre-training algorithm for object detection, aligning pre-training with downstream architecture and objectives, leading to reduced localisation errors and improved label efficiency. In conclusion, this thesis contributes a comprehensive understanding of self-supervised representation learning and its role in enabling effective transfer across computer vision tasks

Edinburgh Research Archive

Recommended from our members

Learning Video Representation from Self-supervision

Author: Chen Brian
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2023
Field of study

This thesis investigates the problem of learning video representations for video understanding. Previous works have explored the use of data-driven deep learning approaches, which have been shown to be effective in learning useful video representations. However, obtaining large amounts of labeled data can be costly and time-consuming. We investigate self-supervised approach as for multimodal video data to overcome this challenge. Video data typically contains multiple modalities, such as visual, audio, transcribed speech, and textual captions, which can serve as pseudo-labels for representation learning without needing manual labeling. By utilizing these modalities, we can train deep representations over large-scale video data consisting of millions of video clips collected from the internet. We demonstrate the scalability benefits of multimodal self-supervision by achieving new state-of-the-art performance in various domains, including video action recognition, text-to-video retrieval, and text-to-video grounding. We also examine the limitations of these approaches, which often rely on the association assumption involving multiple modalities of data used in self-supervision. For example, the text transcript is often assumed to be about the video content, and two segments of the same video share similar semantics. To overcome this problem, we propose new methods for learning video representations with more intelligent sampling strategies to capture samples that share high-level semantics or consistent concepts. The proposed methods include a clustering component to address false negative pairs in multimodal paired contrastive learning, a novel sampling strategy for finding visually groundable video-text pairs, an investigation of object tracking supervision for temporal association, and a new multimodal task for demonstrating the effectiveness of the proposed model. We aim to develop more robust and generalizable video representations for real-world applications, such as human-to-robot interaction and event extraction from large-scale news sources

Columbia University Academic Commons

A Critical Look at the Music Classification Experiment Pipeline: Using Interventions to Detect and Account for Confounding Effects

Author: RODRÍGUEZ-ALGARRA Francisco
Publication venue: 'Queen Mary University of London'
Publication date: 10/06/2020
Field of study

PhD ThesisThis dissertation focuses on the problemof confounding in the design and analysis of music classification experiments. Classification experiments dominate evaluation of music content analysis systems and methods, but achieving high performance on such experiments does not guarantee systems properly address the intended problem. The research presented here proposes and illustrates modifications to the conventional experimental pipeline, which aim at improving the understanding of the evaluated systems and methods, facilitating valid conclusions on their suitability for the target problem. Firstly,multiple analyses are conducted to determinewhich cues scattering-based systems use to predict the annotations of the GTZAN music genre collection. In-depth system analysis informs empirical approaches that alter the experimental pipeline. In particular, deflation manipulations and targeted interventions on the partitioning strategy, the learning algorithm and the frequency content of the data reveal that systems using scattering-based features exploit faults in GTZAN and previously unknown information at inaudible frequencies. Secondly, the use of interventions on the experimental pipeline is extended and systematised to a procedure for characterising effects of confounding information in the results of classification experiments. Regulated bootstrap, a novel resampling strategy, is proposed to address challenges associated with interventions dealing with partitioning. The procedure is demonstrated on GTZAN, analysing the effect of artist replication and infrasonic information on performance measurements using a wide range of systemconstruction methods. Finally, mathematical models relating measurements from classification experiments and potentially contributing factors are proposed and discussed. Suchmodels enable decomposing measurements into contributions of interest, which may differ depending on the goals of the study, including those from pipeline interventions. The adequacy for classification experiments of some conventional assumptions underlying such models is also examined. The reported research highlights the need for evaluation procedures that go beyond performance maximisation. Accounting for the effects of confounding information using procedures grounded on the principles of experimental design promises to facilitate the development of systems that generalise beyond the restricted experimental settings

Queen Mary Research Online

Knowledge Patterns for the Web: extraction, tranformation and reuse

Author: Nuzzolese Andrea Giovanni <1983>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 19/05/2014
Field of study

This thesis aims at investigating methods and software architectures for discovering what are the typical and frequently occurring structures used for organizing knowledge in the Web. We identify these structures as Knowledge Patterns (KPs). KP discovery needs to address two main research problems: the heterogeneity of sources, formats and semantics in the Web (i.e., the knowledge soup problem) and the difficulty to draw relevant boundary around data that allows to capture the meaningful knowledge with respect to a certain context (i.e., the knowledge boundary problem). Hence, we introduce two methods that provide different solutions to these two problems by tackling KP discovery from two different perspectives: (i) the transformation of KP-like artifacts to KPs formalized as OWL2 ontologies; (ii) the bottom-up extraction of KPs by analyzing how data are organized in Linked Data. The two methods address the knowledge soup and boundary problems in different ways. The first method provides a solution to the two aforementioned problems that is based on a purely syntactic transformation step of the original source to RDF followed by a refactoring step whose aim is to add semantics to RDF by select meaningful RDF triples. The second method allows to draw boundaries around RDF in Linked Data by analyzing type paths. A type path is a possible route through an RDF that takes into account the types associated to the nodes of a path. Then we present K~ore, a software architecture conceived to be the basis for developing KP discovery systems and designed according to two software architectural styles, i.e, the Component-based and REST. Finally we provide an example of reuse of KP based on Aemoo, an exploratory search tool which exploits KPs for performing entity summarization

AMS Tesi di Dottorato