440 research outputs found

    Sequential Complexity as a Descriptor for Musical Similarity

    Get PDF
    We propose string compressibility as a descriptor of temporal structure in audio, for the purpose of determining musical similarity. Our descriptors are based on computing track-wise compression rates of quantised audio features, using multiple temporal resolutions and quantisation granularities. To verify that our descriptors capture musically relevant information, we incorporate our descriptors into similarity rating prediction and song year prediction tasks. We base our evaluation on a dataset of 15500 track excerpts of Western popular music, for which we obtain 7800 web-sourced pairwise similarity ratings. To assess the agreement among similarity ratings, we perform an evaluation under controlled conditions, obtaining a rank correlation of 0.33 between intersected sets of ratings. Combined with bag-of-features descriptors, we obtain performance gains of 31.1% and 10.9% for similarity rating prediction and song year prediction. For both tasks, analysis of selected descriptors reveals that representing features at multiple time scales benefits prediction accuracy.Comment: 13 pages, 9 figures, 8 tables. Accepted versio

    Predicting performance difficulty from piano sheet music images

    Full text link
    Estimating the performance difficulty of a musical score is crucial in music education for adequately designing the learning curriculum of the students. Although the Music Information Retrieval community has recently shown interest in this task, existing approaches mainly use machine-readable scores, leaving the broader case of sheet music images unaddressed. Based on previous works involving sheet music images, we use a mid-level representation, bootleg score, describing notehead positions relative to staff lines coupled with a transformer model. This architecture is adapted to our task by introducing an encoding scheme that reduces the encoded sequence length to one-eighth of the original size. In terms of evaluation, we consider five datasets -- more than 7500 scores with up to 9 difficulty levels -- , two of them particularly compiled for this work. The results obtained when pretraining the scheme on the IMSLP corpus and fine-tuning it on the considered datasets prove the proposal's validity, achieving the best-performing model with a balanced accuracy of 40.34\% and a mean square error of 1.33. Finally, we provide access to our code, data, and models for transparency and reproducibility

    A WiFi RSSI Ranking Fingerprint Positioning System and Its Application to Indoor Activities of Daily Living Recognition

    Get PDF
    WiFi RSSI (Received Signal Strength Indicators) seem to be the basis of the most widely used method for Indoor Positioning Systems (IPS) driven by the growth of deployed WiFi Access Points (AP), especially within urban areas. However, there are still several challenges to be tackled: its accuracy is often 2-3m, it’s prone to interference and attenuation effects, and the diversity of Radio Frequency (RF) receivers, e.g., smartphones, affects its accuracy. RSSI fingerprinting can be used to mitigate against interference and attenuation effects. In this paper, we present a novel, more accurate, RSSI ranking-based method that consists of three parts. First, an AP selection based on a Genetic Algorithm (GA) is applied to reduce the positioning computational cost and increase the positioning accuracy. Second, Kendall Tau Correlation Coefficient (KTCC) and a Convolutional Neural Network (CNN) are applied to extract the ranking features for estimating locations. Third, an Extended Kalman filter (EKF) is then used to smooth the estimated sequential locations before Multi-Dimensional Dynamic Time Warping (MD-DTW) is used to match similar trajectories or paths representing ADLs from different or the same users that vary in time and space In order to leverage and evaluate our IPS system, we also used it to recognise Activities of Daily Living (ADL) in an office like environment. It was able to achieve an average positioning accuracy of 1.42m and a 79.5% recognition accuracy for 9 location-driven activities

    Cyber Security

    Get PDF
    This open access book constitutes the refereed proceedings of the 17th International Annual Conference on Cyber Security, CNCERT 2021, held in Beijing, China, in AJuly 2021. The 14 papers presented were carefully reviewed and selected from 51 submissions. The papers are organized according to the following topical sections: ​data security; privacy protection; anomaly detection; traffic analysis; social network security; vulnerability detection; text classification

    Association mapping in tetraploid potato

    Get PDF
    The results of a four year project within the Centre for BioSystems Genomics (www.cbsg.nl), entitled “Association mapping and family genotyping in potato” are described in this thesis. This project was intended to investigate whether a recently emerged methodology, association mapping, could provide the means to improve potato breeding efficiency. In an attempt to answer this research question a set of potato cultivars representative for the commercial potato germplasm was selected. In total 240 cultivars and progenitor clones were chosen. In a later stage this set was expanded with 190 recent breeds contributed by five participating breeding companies which resulted in a total of 430 genotypes. In a pilot experiment, the results of which are reported in Chapter 2, a subset of 220 of the abovementioned 240 cultivars and progenitor clones was used. Phenotypic data was retrieved through contributions of the participating breeding companies and represented summary statistics of recent observations for a number of traits across years and locations, calculated following company specific procedures. With AFLP marker data, in the form of normalised log-transformed band intensities, obtained from five well-known primer combinations, the extent of linkage disequilibrium (LD), using the r2 statistic, was estimated. Population structure within the set of 220 cultivars was analysed by deploying a clustering approach. No apparent, nor statistically supported population structure was revealed and the LD seemed to decay below the threshold of 0.1 at a genetic distance of about 3cM with this set of marker data. Furthermore, marker-trait associations were investigated by fitting single marker regression models for phenotypic traits on marker band intensities with and without correction for population structure. Population structure correction was performed in a straightforward way by incorporating a design matrix into the model assuming that each breeding company represented a different breeding germplasm pool. The potential of association mapping in tetraploid potato has been demonstrated in this pilot experiment, because existing phenotypic data, a modest number of AFLP markers, and a relatively straightforward statistical analysis allowed identification of interesting associations for a number of agro-morphological and quality traits. These promising results encouraged us to engage into an encompassing genome-wide association mapping study in potato. Two association mapping panels were compiled. One panel comprising 205 genotypes, all of which were also present in the set used for the pilot experiment, and another panel containing in total 299 genotypes including the entire set of 190 recent breeds together with a series of standard cultivars, about 100 of which are in common with the first panel. Phenotypic data for the association panel with 205 genotypes were obtained in a field trial performed in 2006 in Wageningen at two locations with two replicates. We will refer to this set as the “2006 field trial”. Phenotypic data for the other panel with 299 genotypes was contributed by the five participating breeding companies and consisted of multi-year-multi-location data obtained during generations of clonal selection. The 2006 data were nicely balanced, because the trial was designed in that way. The historical breeding dataset was highly unbalanced. Analysis of these two differing phenotypic datasets was performed to deliver insight in variance components for the genotypic main effects and the genotype by environment interaction (GEI), besides estimated genotype main effects across environments. Both phenotypic datasets were analysed separately within a mixed model framework including terms for GEI. In Chapter 3 we describe both phenotypic datasets by comparing variance components, heritabilities (=repeatabilities), intra-dataset relationships and inter-dataset relationships. Broader aspects related to phenotypic datasets and their analysis are discussed as well. To retrieve information about hidden population structure and genetic relatedness, and to estimate the extent of LD in potato germplasm, we used marker information generated with 41 AFLP primer combinations and 53 microsatellite loci on a collection of 430 genotypes. These 430 genotypes contain all genotypes present in the two association mapping panels introduced before plus a few extra genotypes to increase potato germplasm coverage. Two methods were used: a Bayesian approach and a distance-based clustering approach. Chapter 4 describes the results of this exercise. Both strategies revealed a weak level of structure in our material. Groups were detected which complied with criteria such as their intended market segment, as well as groups differing in their year of first registration on a national list. Linkage disequilibrium, using the r2 statistic, appeared to decay below the threshold of 0.1 across linkage groups at a genetic distance of about 5cM on average. The results described in Chapter 4 are promising for association mapping research in potato. The odds are reasonable that useful marker-trait associations can be detected and that the potential mapping resolution will suffice for detection of QTL in an association mapping context. In Chapter 5 a comprehensive genome-wide association mapping study is presented. The adjusted genotypic means obtained from two association mapping panels as a result of phenotypic analysis performed in Chapter 3 were combined with marker information in two association mapping models. Marker information consisted of normalised log-transformed band intensities of 41 AFLP primer combinations and allele dosage information from 53 microsatellites. A baseline model without correction for population structure and a more advanced model with correction for population structure and genetic relatedness were applied. Population structure and genetic relatedness were estimated using available marker information. Interesting QTL could be identified for 19 agro-morphological and quality traits. The observed QTL partly confirm previous studies e.g. for tuber shape and frying colour, but also new QTL have been detected e.g. for after baking darkening and enzymatic browning. In the final chapter, the general discussion, results of preceding chapters are evaluated and their implications for research as well as breeding are discussed. <br/

    Multimedia Protection using Content and Embedded Fingerprints

    Get PDF
    Improved digital connectivity has made the Internet an important medium for multimedia distribution and consumption in recent years. At the same time, this increased proliferation of multimedia has raised significant challenges in secure multimedia distribution and intellectual property protection. This dissertation examines two complementary aspects of the multimedia protection problem that utilize content fingerprints and embedded collusion-resistant fingerprints. The first aspect considered is the automated identification of multimedia using content fingerprints, which is emerging as an important tool for detecting copyright violations on user generated content websites. A content fingerprint is a compact identifier that captures robust and distinctive properties of multimedia content, which can be used for uniquely identifying the multimedia object. In this dissertation, we describe a modular framework for theoretical modeling and analysis of content fingerprinting techniques. Based on this framework, we analyze the impact of distortions in the features on the corresponding fingerprints and also consider the problem of designing a suitable quantizer for encoding the features in order to improve the identification accuracy. The interaction between the fingerprint designer and a malicious adversary seeking to evade detection is studied under a game-theoretic framework and optimal strategies for both parties are derived. We then focus on analyzing and understanding the matching process at the fingerprint level. Models for fingerprints with different types of correlations are developed and the identification accuracy under each model is examined. Through this analysis we obtain useful guidelines for designing practical systems and also uncover connections to other areas of research. A complementary problem considered in this dissertation concerns tracing the users responsible for unauthorized redistribution of multimedia. Collusion-resistant fingerprints, which are signals that uniquely identify the recipient, are proactively embedded in the multimedia before redistribution and can be used for identifying the malicious users. We study the problem of designing collusion resistant fingerprints for embedding in compressed multimedia. Our study indicates that directly adapting traditional fingerprinting techniques to this new setting of compressed multimedia results in low collusion resistance. To withstand attacks, we propose an anti-collusion dithering technique for embedding fingerprints that significantly improves the collusion resistance compared to traditional fingerprints

    Robust short clip representation and fast search through large video collections

    Get PDF
    Master'sMASTER OF ENGINEERIN

    Cyber Security

    Get PDF
    This open access book constitutes the refereed proceedings of the 17th International Annual Conference on Cyber Security, CNCERT 2021, held in Beijing, China, in AJuly 2021. The 14 papers presented were carefully reviewed and selected from 51 submissions. The papers are organized according to the following topical sections: ​data security; privacy protection; anomaly detection; traffic analysis; social network security; vulnerability detection; text classification

    Detection and management of redundancy for information retrieval

    Get PDF
    The growth of the web, authoring software, and electronic publishing has led to the emergence of a new type of document collection that is decentralised, amorphous, dynamic, and anarchic. In such collections, redundancy is a significant issue. Documents can spread and propagate across such collections without any control or moderation. Redundancy can interfere with the information retrieval process, leading to decreased user amenity in accessing information from these collections, and thus must be effectively managed. The precise definition of redundancy varies with the application. We restrict ourselves to documents that are co-derivative: those that share a common heritage, and hence contain passages of common text. We explore document fingerprinting, a well-known technique for the detection of co-derivative document pairs. Our new lossless fingerprinting algorithm improves the effectiveness of a range of document fingerprinting approaches. We empirically show that our algorithm can be highly effective at discovering co-derivative document pairs in large collections. We study the occurrence and management of redundancy in a range of application domains. On the web, we find that document fingerprinting is able to identify widespread redundancy, and that this redundancy has a significant detrimental effect on the quality of search results. Based on user studies, we suggest that redundancy is most appropriately managed as a postprocessing step on the ranked list and explain how and why this should be done. In the genomic area of sequence homology search, we explain why the existing techniques for redundancy discovery are increasingly inefficient, and present a critique of the current approaches to redundancy management. We show how document fingerprinting with a modified version of our algorithm provides significant efficiency improvements, and propose a new approach to redundancy management based on wildcards. We demonstrate that our scheme provides the benefits of existing techniques but does not have their deficiencies. Redundancy in distributed information retrieval systems - where different parts of the collection are searched by autonomous servers - cannot be effectively managed using traditional fingerprinting techniques. We thus propose a new data structure, the grainy hash vector, for redundancy detection and management in this environment. We show in preliminary tests that the grainy hash vector is able to accurately detect a good proportion of redundant document pairs while maintaining low resource usage

    Supporting lay users in privacy decisions when sharing sensitive data

    Get PDF
    The first part of the thesis focuses on assisting users in choosing their privacy settings, by using machine learning to derive the optimal set of privacy settings for the user. In contrast to other work, our approach uses context factors as well as individual factors to provide a personalized set of privacy settings. The second part consists of a set of intelligent user interfaces to assist the users throughout the complete privacy journey, from defining friend groups that allow targeted information sharing; through user interfaces for selecting information recipients, to find possible errors or unusual settings, and to refine them; up to mechanisms to gather in-situ feedback on privacy incidents, and investigating how to use these to improve a user’s privacy in the future. Our studies have shown that including tailoring the privacy settings significantly increases the correctness of the predicted privacy settings; whereas the user interfaces have been shown to significantly decrease the amount of unwanted disclosures.Insbesondere nach den jüngsten Datenschutzskandalen in sozialen Netzwerken wird der Datenschutz für Benutzer immer wichtiger. Obwohl die meisten Benutzer behaupten Wert auf Datenschutz zu legen, verhalten sie sich online allerdings völlig anders: Sie lassen die meisten Datenschutzeinstellungen der online genutzten Dienste, wie z. B. von sozialen Netzwerken oder Diensten zur Standortfreigabe, unberührt und passen sie nicht an ihre Datenschutzanforderungen an. In dieser Arbeit werde ich einen Ansatz zur Lösung dieses Problems vorstellen, der auf zwei verschiedenen Säulen basiert. Der erste Teil konzentriert sich darauf, Benutzer bei der Auswahl ihrer Datenschutzeinstellungen zu unterstützen, indem maschinelles Lernen verwendet wird, um die optimalen Datenschutzeinstellungen für den Benutzer abzuleiten. Im Gegensatz zu anderen Arbeiten verwendet unser Ansatz Kontextfaktoren sowie individuelle Faktoren, um personalisierte Datenschutzeinstellungen zu generieren. Der zweite Teil besteht aus einer Reihe intelligenter Benutzeroberflächen, die die Benutzer in verschiedene Datenschutzszenarien unterstützen. Dies beginnt bei einer Oberfläche zur Definition von Freundesgruppen, die im Anschluss genutzt werden können um einen gezielten Informationsaustausch zu ermöglichen, bspw. in sozialen Netzwerken; über Benutzeroberflächen um die Empfänger von privaten Daten auszuwählen oder mögliche Fehler oder ungewöhnliche Datenschutzeinstellungen zu finden und zu verfeinern; bis hin zu Mechanismen, um In-Situ- Feedback zu Datenschutzverletzungen zum Zeitpunkt ihrer Entstehung zu sammeln und zu untersuchen, wie diese verwendet werden können, um die Privatsphäreeinstellungen eines Benutzers anzupassen. Unsere Studien haben gezeigt, dass die Verwendung von individuellen Faktoren die Korrektheit der vorhergesagten Datenschutzeinstellungen erheblich erhöht. Es hat sich gezeigt, dass die Benutzeroberflächen die Anzahl der Fehler, insbesondere versehentliches Teilen von Daten, erheblich verringern
    corecore