155,012 research outputs found

    Identifying Cover Songs Using Information-Theoretic Measures of Similarity

    Get PDF
    This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/This paper investigates methods for quantifying similarity between audio signals, specifically for the task of cover song detection. We consider an information-theoretic approach, where we compute pairwise measures of predictability between time series. We compare discrete-valued approaches operating on quantized audio features, to continuous-valued approaches. In the discrete case, we propose a method for computing the normalized compression distance, where we account for correlation between time series. In the continuous case, we propose to compute information-based measures of similarity as statistics of the prediction error between time series. We evaluate our methods on two cover song identification tasks using a data set comprised of 300 Jazz standards and using the Million Song Dataset. For both datasets, we observe that continuous-valued approaches outperform discrete-valued approaches. We consider approaches to estimating the normalized compression distance (NCD) based on string compression and prediction, where we observe that our proposed normalized compression distance with alignment (NCDA) improves average performance over NCD, for sequential compression algorithms. Finally, we demonstrate that continuous-valued distances may be combined to improve performance with respect to baseline approaches. Using a large-scale filter-and-refine approach, we demonstrate state-of-the-art performance for cover song identification using the Million Song Dataset.The work of P. Foster was supported by an Engineering and Physical Sciences Research Council Doctoral Training Account studentship

    Implementation of similarity measures for event sequences in myCBR

    Get PDF
    The computation of the similarities between event sequences is important for many fields because many activities follow a sequential order. For instance, an industrial plan that triggers different types of alarms due to detected event sequences or the treatment sequence that a patient receives while he/she is hospitalized. With the appropriate tools and techniques to compute the similarity between two event sequences we may be able to detect patterns or regularities in event data and so be able to perform predictions or recommendations based on detected similar sequences. The present work is intended to describe the implementation of two event sequence similarity measures in myCBR, with the purpose of creating a similarity measurement approach for complex domains that employ the use of event sequences. Besides, an initial experimentation is performed in order to study if the proposed measures and measurement approach are able to predict future situations based on similar event sequences

    A temporal precedence based clustering method for gene expression microarray data

    Get PDF
    Background: Time-course microarray experiments can produce useful data which can help in understanding the underlying dynamics of the system. Clustering is an important stage in microarray data analysis where the data is grouped together according to certain characteristics. The majority of clustering techniques are based on distance or visual similarity measures which may not be suitable for clustering of temporal microarray data where the sequential nature of time is important. We present a Granger causality based technique to cluster temporal microarray gene expression data, which measures the interdependence between two time-series by statistically testing if one time-series can be used for forecasting the other time-series or not. Results: A gene-association matrix is constructed by testing temporal relationships between pairs of genes using the Granger causality test. The association matrix is further analyzed using a graph-theoretic technique to detect highly connected components representing interesting biological modules. We test our approach on synthesized datasets and real biological datasets obtained for Arabidopsis thaliana. We show the effectiveness of our approach by analyzing the results using the existing biological literature. We also report interesting structural properties of the association network commonly desired in any biological system. Conclusions: Our experiments on synthesized and real microarray datasets show that our approach produces encouraging results. The method is simple in implementation and is statistically traceable at each step. The method can produce sets of functionally related genes which can be further used for reverse-engineering of gene circuits

    Radiometric normalization of temporal images combining automatic detection of pseudo-invariant features from the distance and similarity spectral measures, density scatterplot analysis, and robust regression

    Get PDF
    Radiometric precision is difficult to maintain in orbital images due to several factors (atmospheric conditions. Eartli-sun distance, detector calibration, illumination, and viewing angles). These unwanted effects must be removed for radiometric consistency among temporal images, leaving only land-leaving radiances, for optimum change detection A variety of relative radiometric correction techniques were developed for the correction or rectification of images, of the same area, through use of reference targets whose reflectance do not change significantly with time, i.e., pseudo-invariant features (PEFs). This paper proposes a new technique for radiometric normalization, which uses three sequential methods for an accurate PEFs selection: spectral measures of temporal data (spectral distance and similarity), density scatter plot analysis (ridge method), and robust regression. The spectral measures used are the spectral angle (Spectral Angle Mapper, SAM), spectral correlation (Spectral Correlation Mapper. SCM), and Euclidean distance. The spectral measures between the spectra at times tl and t2 and are calculated for each pixel. After classification using threshold values, it is possible to define points with the same spectral behavior, including PEFs. The distance and similarity measures are complementary and can be calculated together. The ridge method uses a density plot generated from unages acquired on different dates for the selection of PEFs. In a density plot, the invariant pixels, together, form a higli-density ridge, while variant pixels (clouds and land cover changes) are spread, having low density, facilitating its exclusion. Finally, the selected PEFs are subjected to a robust regression (M-estimate) between pairs of temporal bands for the detection and elimination of outliers, and to obtain the optimal linear equation for a given set of target points. The robust regression is insensitive to outliers, i.e.. observation that appears to deviate strongly from the rest of the data in which it occurs, and as in our case, change areas. New sequential methods enable one to select by different attributes, a number of invariant targets over the brightness range of the images

    XML Schema Clustering with Semantic and Hierarchical Similarity Measures

    Get PDF
    With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

    Sequential Complexity as a Descriptor for Musical Similarity

    Get PDF
    We propose string compressibility as a descriptor of temporal structure in audio, for the purpose of determining musical similarity. Our descriptors are based on computing track-wise compression rates of quantised audio features, using multiple temporal resolutions and quantisation granularities. To verify that our descriptors capture musically relevant information, we incorporate our descriptors into similarity rating prediction and song year prediction tasks. We base our evaluation on a dataset of 15500 track excerpts of Western popular music, for which we obtain 7800 web-sourced pairwise similarity ratings. To assess the agreement among similarity ratings, we perform an evaluation under controlled conditions, obtaining a rank correlation of 0.33 between intersected sets of ratings. Combined with bag-of-features descriptors, we obtain performance gains of 31.1% and 10.9% for similarity rating prediction and song year prediction. For both tasks, analysis of selected descriptors reveals that representing features at multiple time scales benefits prediction accuracy.Comment: 13 pages, 9 figures, 8 tables. Accepted versio

    Dynamic change-point detection using similarity networks

    Full text link
    From a sequence of similarity networks, with edges representing certain similarity measures between nodes, we are interested in detecting a change-point which changes the statistical property of the networks. After the change, a subset of anomalous nodes which compares dissimilarly with the normal nodes. We study a simple sequential change detection procedure based on node-wise average similarity measures, and study its theoretical property. Simulation and real-data examples demonstrate such a simply stopping procedure has reasonably good performance. We further discuss the faulty sensor isolation (estimating anomalous nodes) using community detection.Comment: appeared in Asilomar Conference 201

    Feature-based time-series analysis

    Full text link
    This work presents an introduction to feature-based time-series analysis. The time series as a data type is first described, along with an overview of the interdisciplinary time-series analysis literature. I then summarize the range of feature-based representations for time series that have been developed to aid interpretable insights into time-series structure. Particular emphasis is given to emerging research that facilitates wide comparison of feature-based representations that allow us to understand the properties of a time-series dataset that make it suited to a particular feature-based representation or analysis algorithm. The future of time-series analysis is likely to embrace approaches that exploit machine learning methods to partially automate human learning to aid understanding of the complex dynamical patterns in the time series we measure from the world.Comment: 28 pages, 9 figure
    corecore