Search CORE

20,210 research outputs found

Exact Mean Computation in Dynamic Time Warping Spaces

Author: Brill Markus
Fluschnik Till
Froese Vincent
Jain Brijnesh
Niedermeier Rolf
Schultz David
Publication venue
Publication date: 31/05/2018
Field of study

Dynamic time warping constitutes a major tool for analyzing time series. In particular, computing a mean series of a given sample of series in dynamic time warping spaces (by minimizing the Fr\'echet function) is a challenging computational problem, so far solved by several heuristic and inexact strategies. We spot some inaccuracies in the literature on exact mean computation in dynamic time warping spaces. Our contributions comprise an exact dynamic program computing a mean (useful for benchmarking and evaluating known heuristics). Based on this dynamic program, we empirically study properties like uniqueness and length of a mean. Moreover, experimental evaluations reveal substantial deficits of state-of-the-art heuristics in terms of their output quality. We also give an exact polynomial-time algorithm for the special case of binary time series

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Searching by approximate personal-name matching

Author: Camps Pare Rafael
Daude Ventura Jordi
Publication venue
Publication date: 01/01/2003
Field of study

We discuss the design, building and evaluation of a method to access theinformation of a person, using his name as a search key, even if it has deformations. We present a similarity function, the DEA function, based on the probabilities of the edit operations accordingly to the involved letters and their position, and using a variable threshold. The efficacy of DEA is quantitatively evaluated, without human relevance judgments, very superior to the efficacy of known methods. A very efficient approximate search technique for the DEA function is also presented based on a compacted trie-tree structure.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

On palimpsests in neural memory: an information theory viewpoint

Author: Goyal Vivek K.
Kusuma Julius
Varshney Lav R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2016
Field of study

The finite capacity of neural memory and the reconsolidation phenomenon suggest it is important to be able to update stored information as in a palimpsest, where new information overwrites old information. Moreover, changing information in memory is metabolically costly. In this paper, we suggest that information-theoretic approaches may inform the fundamental limits in constructing such a memory system. In particular, we define malleable coding, that considers not only representation length but also ease of representation update, thereby encouraging some form of recycling to convert an old codeword into a new one. Malleability cost is the difficulty of synchronizing compressed versions, and malleable codes are of particular interest when representing information and modifying the representation are both expensive. We examine the tradeoff between compression efficiency and malleability cost, under a malleability metric defined with respect to a string edit distance. This introduces a metric topology to the compressed domain. We characterize the exact set of achievable rates and malleability as the solution of a subgraph isomorphism problem. This is all done within the optimization approach to biology framework.Accepted manuscrip

Boston University Institutional Repository (OpenBU)

Time series classification with ensembles of elastic distance measures

Author: A Stefan
Anthony Bagnall
H Deng
J Demšar
J Lin
J Lin
J Rodriguez
J Tanner
Jason Lines
L Breiman
M Baydogan
M Hall
PF Marteau
T Górecki
Y Jeong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2015
Field of study

Several alternative distance measures for comparing time series have recently been proposed and evaluated on time series classification (TSC) problems. These include variants of dynamic time warping (DTW), such as weighted and derivative DTW, and edit distance-based measures, including longest common subsequence, edit distance with real penalty, time warp with edit, and move–split–merge. These measures have the common characteristic that they operate in the time domain and compensate for potential localised misalignment through some elastic adjustment. Our aim is to experimentally test two hypotheses related to these distance measures. Firstly, we test whether there is any significant difference in accuracy for TSC problems between nearest neighbour classifiers using these distance measures. Secondly, we test whether combining these elastic distance measures through simple ensemble schemes gives significantly better accuracy. We test these hypotheses by carrying out one of the largest experimental studies ever conducted into time series classification. Our first key finding is that there is no significant difference between the elastic distance measures in terms of classification accuracy on our data sets. Our second finding, and the major contribution of this work, is to define an ensemble classifier that significantly outperforms the individual classifiers. We also demonstrate that the ensemble is more accurate than approaches not based in the time domain. Nearly all TSC papers in the data mining literature cite DTW (with warping window set through cross validation) as the benchmark for comparison. We believe that our ensemble is the first ever classifier to significantly outperform DTW and as such raises the bar for future work in this area

Crossref

University of East Anglia digital repository

If the Current Clique Algorithms are Optimal, so is Valiant's Parser

Author: Abboud Amir
Backurs Arturs
Williams Virginia Vassilevska
Publication venue
Publication date: 05/11/2015
Field of study

The CFG recognition problem is: given a context-free grammar

\mathcal{G}

and a string

w

of length

n

, decide if

w

can be obtained from

\mathcal{G}

. This is the most basic parsing question and is a core computer science problem. Valiant's parser from 1975 solves the problem in

O(n^{\omega})

time, where

\omega<2.373

is the matrix multiplication exponent. Dozens of parsing algorithms have been proposed over the years, yet Valiant's upper bound remains unbeaten. The best combinatorial algorithms have mildly subcubic

O(n^3/\log^3{n})

complexity. Lee (JACM'01) provided evidence that fast matrix multiplication is needed for CFG parsing, and that very efficient and practical algorithms might be hard or even impossible to obtain. Lee showed that any algorithm for a more general parsing problem with running time

O(|\mathcal{G}|\cdot n^{3-\varepsilon})

can be converted into a surprising subcubic algorithm for Boolean Matrix Multiplication. Unfortunately, Lee's hardness result required that the grammar size be

|\mathcal{G}|=\Omega(n^6)

. Nothing was known for the more relevant case of constant size grammars. In this work, we prove that any improvement on Valiant's algorithm, even for constant size grammars, either in terms of runtime or by avoiding the inefficiencies of fast matrix multiplication, would imply a breakthrough algorithm for the

k

-Clique problem: given a graph on

n

nodes, decide if there are

k

that form a clique. Besides classifying the complexity of a fundamental problem, our reduction has led us to similar lower bounds for more modern and well-studied cubic time problems for which faster algorithms are highly desirable in practice: RNA Folding, a central problem in computational biology, and Dyck Language Edit Distance, answering an open question of Saha (FOCS'14)

arXiv.org e-Print Archive

Crossref

Metrics for Graph Comparison: A Practitioner's Guide

Author: Meyer Francois G.
Wills Peter
Publication venue
Publication date: 16/12/2019
Field of study

Comparison of graph structure is a ubiquitous task in data analysis and machine learning, with diverse applications in fields such as neuroscience, cyber security, social network analysis, and bioinformatics, among others. Discovery and comparison of structures such as modular communities, rich clubs, hubs, and trees in data in these fields yields insight into the generative mechanisms and functional properties of the graph. Often, two graphs are compared via a pairwise distance measure, with a small distance indicating structural similarity and vice versa. Common choices include spectral distances (also known as

\lambda

distances) and distances based on node affinities. However, there has of yet been no comparative study of the efficacy of these distance measures in discerning between common graph topologies and different structural scales. In this work, we compare commonly used graph metrics and distance measures, and demonstrate their ability to discern between common topological features found in both random graph models and empirical datasets. We put forward a multi-scale picture of graph structure, in which the effect of global and local structure upon the distance measures is considered. We make recommendations on the applicability of different distance measures to empirical graph data problem based on this multi-scale view. Finally, we introduce the Python library NetComp which implements the graph distances used in this work

arXiv.org e-Print Archive