Search CORE

166,011 research outputs found

IDENTIFICATION OF COVER SONGS USING INFORMATION THEORETIC MEASURES OF SIMILARITY

Author: Dixon S
Foster P
IEEE
Klapuri A
Publication venue
Publication date: 01/01/2013
Field of study

13 pages, 5 figures, 4 tables. v3: Accepted version13 pages, 5 figures, 4 tables. v3: Accepted version13 pages, 5 figures, 4 tables. v3: Accepted versio

Queen Mary Research Online

Image Characterization and Classification by Physical Complexity

Author: Delahaye Jean-Paul
Gaucherel Cedric
Zenil Hector
Publication venue
Publication date: 03/07/2011
Field of study

We present a method for estimating the complexity of an image based on Bennett's concept of logical depth. Bennett identified logical depth as the appropriate measure of organized complexity, and hence as being better suited to the evaluation of the complexity of objects in the physical world. Its use results in a different, and in some sense a finer characterization than is obtained through the application of the concept of Kolmogorov complexity alone. We use this measure to classify images by their information content. The method provides a means for classifying and evaluating the complexity of objects by way of their visual representations. To the authors' knowledge, the method and application inspired by the concept of logical depth presented herein are being proposed and implemented for the first time.Comment: 30 pages, 21 figure

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL Descartes

Oxford University Research Archive

HAL-CIRAD

Hal-Diderot

Scalable Model-Based Management of Correlated Dimensional Time Series in ModelarDB+

Author: Jensen Søren Kejser
Pedersen Torben Bach
Thomsen Christian
Publication venue
Publication date: 01/01/2019
Field of study

To monitor critical infrastructure, high quality sensors sampled at a high frequency are increasingly used. However, as they produce huge amounts of data, only simple aggregates are stored. This removes outliers and fluctuations that could indicate problems. As a remedy, we present a model-based approach for managing time series with dimensions that exploits correlation in and among time series. Specifically, we propose compressing groups of correlated time series using an extensible set of model types within a user-defined error bound (possibly zero). We name this new category of model-based compression methods for time series Multi-Model Group Compression (MMGC). We present the first MMGC method GOLEMM and extend model types to compress time series groups. We propose primitives for users to effectively define groups for differently sized data sets, and based on these, an automated grouping method using only the time series dimensions. We propose algorithms for executing simple and multi-dimensional aggregate queries on models. Last, we implement our methods in the Time Series Management System (TSMS) ModelarDB (ModelarDB+). Our evaluation shows that compared to widely used formats, ModelarDB+ provides up to 13.7 times faster ingestion due to high compression, 113 times better compression due to the adaptivity of GOLEMM, 630 times faster aggregates by using models, and close to linear scalability. It is also extensible and supports online query processing.Comment: 12 Pages, 28 Figures, and 1 Tabl

arXiv.org e-Print Archive

Crossref

VBN

Artificial Sequences and Complexity Measures

In this paper we exploit concepts of information theory to address the fundamental problem of identifying and defining the most suitable tools to extract, in a automatic and agnostic way, information from a generic string of characters. We introduce in particular a class of methods which use in a crucial way data compression techniques in order to define a measure of remoteness and distance between pairs of sequences of characters (e.g. texts) based on their relative information content. We also discuss in detail how specific features of data compression techniques could be used to introduce the notion of dictionary of a given sequence and of Artificial Text and we show how these new tools can be used for information extraction purposes. We point out the versatility and generality of our method that applies to any kind of corpora of character strings independently of the type of coding behind them. We consider as a case study linguistic motivated problems and we present results for automatic language recognition, authorship attribution and self consistent-classification.Comment: Revised version, with major changes, of previous "Data Compression approach to Information Extraction and Classification" by A. Baronchelli and V. Loreto. 15 pages; 5 figure

arXiv.org e-Print Archive

City Research Online

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Sequential Complexity as a Descriptor for Musical Similarity

Author: Dixon S
Foster P
Mauch M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

We propose string compressibility as a descriptor of temporal structure in audio, for the purpose of determining musical similarity. Our descriptors are based on computing track-wise compression rates of quantised audio features, using multiple temporal resolutions and quantisation granularities. To verify that our descriptors capture musically relevant information, we incorporate our descriptors into similarity rating prediction and song year prediction tasks. We base our evaluation on a dataset of 15500 track excerpts of Western popular music, for which we obtain 7800 web-sourced pairwise similarity ratings. To assess the agreement among similarity ratings, we perform an evaluation under controlled conditions, obtaining a rank correlation of 0.33 between intersected sets of ratings. Combined with bag-of-features descriptors, we obtain performance gains of 31.1% and 10.9% for similarity rating prediction and song year prediction. For both tasks, analysis of selected descriptors reveals that representing features at multiple time scales benefits prediction accuracy.Comment: 13 pages, 9 figures, 8 tables. Accepted versio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queen Mary Research Online

The Extended Edit Distance Metric

Author: Fuad Muhammad Marwan Muhammad
Marteau Pierre-François
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/09/2007
Field of study

Similarity search is an important problem in information retrieval. This similarity is based on a distance. Symbolic representation of time series has attracted many researchers recently, since it reduces the dimensionality of these high dimensional data objects. We propose a new distance metric that is applied to symbolic data objects and we test it on time series data bases in a classification task. We compare it to other distances that are well known in the literature for symbolic data objects. We also prove, mathematically, that our distance is metric.Comment: Technical repor

arXiv.org e-Print Archive

Crossref

HAL Descartes

Studies on the bit rate requirements for a HDTV format with 1920 $times$ 1080 pixel resolution, progressive scanning at 50 Hz frame rate targeting large flat panel displays

Author: Bock A
Hoffmann H
Itagaki T
Wood D
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2006
Field of study

This paper considers the potential for an HDTV delivery format with 1920 times 1080 pixels progressive scanning and 50 frames per second in broadcast applications. The paper discusses the difficulties in characterizing the display to be assumed for reception. It elaborates on the required bit rate of the 1080p/50 format when critical content is coded in MPEG-4 H.264 AVC Part 10 and subjectively viewed on a large, flat panel display with 1920 times 1080 pixel resolution. The paper describes the initial subjective quality evaluations that have been made in these conditions. The results of these initial tests suggest that the required bit-rate for a 1080p/50 HDTV signal in emission could be kept equal or lower than that of 2nd generation HDTV formats, to achieve equal or better image qualit

Crossref

Brunel University Research Archive

A Codebook Generation Algorithm for Document Image Compression

Author: Danskin John
Young Neal
Zhang Qin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1997
Field of study

Pattern-matching-based document-compression systems (e.g. for faxing) rely on finding a small set of patterns that can be used to represent all of the ink in the document. Finding an optimal set of patterns is NP-hard; previous compression schemes have resorted to heuristics. This paper describes an extension of the cross-entropy approach, used previously for measuring pattern similarity, to this problem. This approach reduces the problem to a k-medians problem, for which the paper gives a new algorithm with a provably good performance guarantee. In comparison to previous heuristics (First Fit, with and without generalized Lloyd's/k-means postprocessing steps), the new algorithm generates a better codebook, resulting in an overall improvement in compression performance of almost 17%

arXiv.org e-Print Archive

Crossref

Calculating the state parameter in crushable sands

Author: Ciantia Matteo
O'Sullivan Catherine
Publication venue: 'American Society of Civil Engineers (ASCE)'
Publication date: 17/12/2019
Field of study

The state parameter (y) measures the distance from the current state to the critical state line (CSL) in the compression plane. The existence of a correlation between both the peak angle of shearing resistance (�# ) and peak dilatancy and y is central to many constitutive models used to predict granular soil behaviour. These correlations do not explicitly consider particle crushing. Crushing induced evolution of the particle size distribution influences the CSL position and recent research supports used of a critical state plane (CSP) to account for changes in grading. This contribution evaluates the whether the CSP can be used to calculate y and thus enable prediction of the peak angle of �# and peak dilatancy where crushing takes place. The data considered were generated from a validated DEM model of Fontainebleau sand that considers particle crushing. It is shown that where y is calculated by considering the CSL of the original uncrushed material there can be in a significant error in predicting the material response. Where the CSP is used there is a significant improvement in our ability to predict behaviour whether the CSP is accurately determined using a large number of tests or approximated using crushing yield envelopes. It is shown that the state parameter calculated using the previously available definition can give a false sense of security when assessing liquefaction potential of potentially crushable soils. The contribution also highlights the stress-path dependency of the relationship between �# $ and y whichever approach is used to determine

Spiral - Imperial College Digital Repository

University of Dundee Online Publications