189,926 research outputs found
Identifying Cover Songs Using Information-Theoretic Measures of Similarity
This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/This paper investigates methods for quantifying similarity between audio signals, specifically for the task of cover song detection. We consider an information-theoretic approach, where we compute pairwise measures of predictability between time series. We compare discrete-valued approaches operating on quantized audio features, to continuous-valued approaches. In the discrete case, we propose a method for computing the normalized compression distance, where we account for correlation between time series. In the continuous case, we propose to compute information-based measures of similarity as statistics of the prediction error between time series. We evaluate our methods on two cover song identification tasks using a data set comprised of 300 Jazz standards and using the Million Song Dataset. For both datasets, we observe that continuous-valued approaches outperform discrete-valued approaches. We consider approaches to estimating the normalized compression distance (NCD) based on string compression and prediction, where we observe that our proposed normalized compression distance with alignment (NCDA) improves average performance over NCD, for sequential compression algorithms. Finally, we demonstrate that continuous-valued distances may be combined to improve performance with respect to baseline approaches. Using a large-scale filter-and-refine approach, we demonstrate state-of-the-art performance for cover song identification using the Million Song Dataset.The work of P. Foster was supported by an Engineering and Physical Sciences Research Council Doctoral Training Account studentship
Implementation of similarity measures for event sequences in myCBR
The computation of the similarities between event sequences is important for many fields because many activities follow a sequential order. For instance, an industrial plan that triggers different types of alarms due to detected event sequences or the treatment sequence that a patient receives while he/she is hospitalized. With the appropriate tools and techniques to compute the similarity between two event sequences we may be able to detect patterns or regularities in event data and so be able to perform predictions or recommendations based on detected similar sequences. The present work is intended to describe the implementation of two event sequence similarity measures in myCBR, with the purpose of creating a similarity measurement approach for complex domains that employ the use of event sequences. Besides, an initial experimentation is performed in order to study if the proposed measures and measurement approach are able to predict future situations based on similar event sequences
Dynamic change-point detection using similarity networks
From a sequence of similarity networks, with edges representing certain
similarity measures between nodes, we are interested in detecting a
change-point which changes the statistical property of the networks. After the
change, a subset of anomalous nodes which compares dissimilarly with the normal
nodes. We study a simple sequential change detection procedure based on
node-wise average similarity measures, and study its theoretical property.
Simulation and real-data examples demonstrate such a simply stopping procedure
has reasonably good performance. We further discuss the faulty sensor isolation
(estimating anomalous nodes) using community detection.Comment: appeared in Asilomar Conference 201
XML Schema Clustering with Semantic and Hierarchical Similarity Measures
With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis
On mining complex sequential data by means of FCA and pattern structures
Nowadays data sets are available in very complex and heterogeneous ways.
Mining of such data collections is essential to support many real-world
applications ranging from healthcare to marketing. In this work, we focus on
the analysis of "complex" sequential data by means of interesting sequential
patterns. We approach the problem using the elegant mathematical framework of
Formal Concept Analysis (FCA) and its extension based on "pattern structures".
Pattern structures are used for mining complex data (such as sequences or
graphs) and are based on a subsumption operation, which in our case is defined
with respect to the partial order on sequences. We show how pattern structures
along with projections (i.e., a data reduction of sequential structures), are
able to enumerate more meaningful patterns and increase the computing
efficiency of the approach. Finally, we show the applicability of the presented
method for discovering and analyzing interesting patient patterns from a French
healthcare data set on cancer. The quantitative and qualitative results (with
annotations and analysis from a physician) are reported in this use case which
is the main motivation for this work.
Keywords: data mining; formal concept analysis; pattern structures;
projections; sequences; sequential data.Comment: An accepted publication in International Journal of General Systems.
The paper is created in the wake of the conference on Concept Lattice and
their Applications (CLA'2013). 27 pages, 9 figures, 3 table
- …
