155,012 research outputs found
Identifying Cover Songs Using Information-Theoretic Measures of Similarity
This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/This paper investigates methods for quantifying similarity between audio signals, specifically for the task of cover song detection. We consider an information-theoretic approach, where we compute pairwise measures of predictability between time series. We compare discrete-valued approaches operating on quantized audio features, to continuous-valued approaches. In the discrete case, we propose a method for computing the normalized compression distance, where we account for correlation between time series. In the continuous case, we propose to compute information-based measures of similarity as statistics of the prediction error between time series. We evaluate our methods on two cover song identification tasks using a data set comprised of 300 Jazz standards and using the Million Song Dataset. For both datasets, we observe that continuous-valued approaches outperform discrete-valued approaches. We consider approaches to estimating the normalized compression distance (NCD) based on string compression and prediction, where we observe that our proposed normalized compression distance with alignment (NCDA) improves average performance over NCD, for sequential compression algorithms. Finally, we demonstrate that continuous-valued distances may be combined to improve performance with respect to baseline approaches. Using a large-scale filter-and-refine approach, we demonstrate state-of-the-art performance for cover song identification using the Million Song Dataset.The work of P. Foster was supported by an Engineering and Physical Sciences Research Council Doctoral Training Account studentship
Implementation of similarity measures for event sequences in myCBR
The computation of the similarities between event sequences is important for many fields because many activities follow a sequential order. For instance, an industrial plan that triggers different types of alarms due to detected event sequences or the treatment sequence that a patient receives while he/she is hospitalized. With the appropriate tools and techniques to compute the similarity between two event sequences we may be able to detect patterns or regularities in event data and so be able to perform predictions or recommendations based on detected similar sequences. The present work is intended to describe the implementation of two event sequence similarity measures in myCBR, with the purpose of creating a similarity measurement approach for complex domains that employ the use of event sequences. Besides, an initial experimentation is performed in order to study if the proposed measures and measurement approach are able to predict future situations based on similar event sequences
A temporal precedence based clustering method for gene expression microarray data
Background: Time-course microarray experiments can produce useful data which can help in understanding the underlying dynamics of the system. Clustering is an important stage in microarray data analysis where the data is grouped together according to certain characteristics. The majority of clustering techniques are based on distance or visual similarity measures which may not be suitable for clustering of temporal microarray data where the sequential nature of time is important. We present a Granger causality based technique to cluster temporal microarray gene expression data, which measures the interdependence between two time-series by statistically testing if one time-series can be used for forecasting the other time-series or not.
Results: A gene-association matrix is constructed by testing temporal relationships between pairs of genes using the Granger causality test. The association matrix is further analyzed using a graph-theoretic technique to detect highly connected components representing interesting biological modules. We test our approach on synthesized datasets and real biological datasets obtained for Arabidopsis thaliana. We show the effectiveness of our approach by analyzing the results using the existing biological literature. We also report interesting structural properties of the association network commonly desired in any biological system.
Conclusions: Our experiments on synthesized and real microarray datasets show that our approach produces encouraging results. The method is simple in implementation and is statistically traceable at each step. The method can produce sets of functionally related genes which can be further used for reverse-engineering of gene circuits
Radiometric normalization of temporal images combining automatic detection of pseudo-invariant features from the distance and similarity spectral measures, density scatterplot analysis, and robust regression
Radiometric precision is difficult to maintain in orbital images due to several factors (atmospheric conditions. Eartli-sun distance, detector calibration, illumination, and viewing angles). These unwanted effects must be removed for radiometric consistency among temporal images, leaving only land-leaving radiances, for optimum change detection A variety of relative radiometric correction techniques were developed for the correction or rectification of images, of the same area, through use of reference targets whose reflectance do not change significantly with time, i.e., pseudo-invariant features (PEFs). This paper proposes a new technique for radiometric normalization, which uses three sequential methods for an accurate PEFs selection: spectral measures of temporal data (spectral distance and similarity), density scatter plot analysis (ridge method), and robust regression. The spectral measures used are the spectral angle (Spectral Angle Mapper, SAM), spectral correlation (Spectral Correlation Mapper. SCM), and Euclidean distance. The spectral measures between the spectra at times tl and t2 and are calculated for each pixel. After classification using threshold values, it is possible to define points with the same spectral behavior, including PEFs. The distance and similarity measures are complementary and can be calculated together. The ridge method uses a density plot generated from unages acquired on different dates for the selection of PEFs. In a density plot, the invariant pixels, together, form a higli-density ridge, while variant pixels (clouds and land cover changes) are spread, having low density, facilitating its exclusion. Finally, the selected PEFs are subjected to a robust regression (M-estimate) between pairs of temporal bands for the detection and elimination of outliers, and to obtain the optimal linear equation for a given set of target points. The robust regression is insensitive to outliers, i.e.. observation that appears to deviate strongly from the rest of the data in which it occurs, and as in our case, change areas. New sequential methods enable one to select by different attributes, a number of invariant targets over the brightness range of the images
XML Schema Clustering with Semantic and Hierarchical Similarity Measures
With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis
Sequential Complexity as a Descriptor for Musical Similarity
We propose string compressibility as a descriptor of temporal structure in
audio, for the purpose of determining musical similarity. Our descriptors are
based on computing track-wise compression rates of quantised audio features,
using multiple temporal resolutions and quantisation granularities. To verify
that our descriptors capture musically relevant information, we incorporate our
descriptors into similarity rating prediction and song year prediction tasks.
We base our evaluation on a dataset of 15500 track excerpts of Western popular
music, for which we obtain 7800 web-sourced pairwise similarity ratings. To
assess the agreement among similarity ratings, we perform an evaluation under
controlled conditions, obtaining a rank correlation of 0.33 between intersected
sets of ratings. Combined with bag-of-features descriptors, we obtain
performance gains of 31.1% and 10.9% for similarity rating prediction and song
year prediction. For both tasks, analysis of selected descriptors reveals that
representing features at multiple time scales benefits prediction accuracy.Comment: 13 pages, 9 figures, 8 tables. Accepted versio
Dynamic change-point detection using similarity networks
From a sequence of similarity networks, with edges representing certain
similarity measures between nodes, we are interested in detecting a
change-point which changes the statistical property of the networks. After the
change, a subset of anomalous nodes which compares dissimilarly with the normal
nodes. We study a simple sequential change detection procedure based on
node-wise average similarity measures, and study its theoretical property.
Simulation and real-data examples demonstrate such a simply stopping procedure
has reasonably good performance. We further discuss the faulty sensor isolation
(estimating anomalous nodes) using community detection.Comment: appeared in Asilomar Conference 201
Feature-based time-series analysis
This work presents an introduction to feature-based time-series analysis. The
time series as a data type is first described, along with an overview of the
interdisciplinary time-series analysis literature. I then summarize the range
of feature-based representations for time series that have been developed to
aid interpretable insights into time-series structure. Particular emphasis is
given to emerging research that facilitates wide comparison of feature-based
representations that allow us to understand the properties of a time-series
dataset that make it suited to a particular feature-based representation or
analysis algorithm. The future of time-series analysis is likely to embrace
approaches that exploit machine learning methods to partially automate human
learning to aid understanding of the complex dynamical patterns in the time
series we measure from the world.Comment: 28 pages, 9 figure
- …