Search CORE

2,798 research outputs found

KV-match: A Subsequence Matching Approach Supporting Normalization and Time Warping [Extended Version]

Author: Pan Ningting
Wang Chen
Wang Jianmin
Wang Peng
Wang Wei
Wu Jiaye
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/09/2018
Field of study

The volume of time series data has exploded due to the popularity of new applications, such as data center management and IoT. Subsequence matching is a fundamental task in mining time series data. All index-based approaches only consider raw subsequence matching (RSM) and do not support subsequence normalization. UCR Suite can deal with normalized subsequence match problem (NSM), but it needs to scan full time series. In this paper, we propose a novel problem, named constrained normalized subsequence matching problem (cNSM), which adds some constraints to NSM problem. The cNSM problem provides a knob to flexibly control the degree of offset shifting and amplitude scaling, which enables users to build the index to process the query. We propose a new index structure, KV-index, and the matching algorithm, KV-match. With a single index, our approach can support both RSM and cNSM problems under either ED or DTW distance. KV-index is a key-value structure, which can be easily implemented on local files or HBase tables. To support the query of arbitrary lengths, we extend KV-match to KV-match

_{DP}

, which utilizes multiple varied-length indexes to process the query. We conduct extensive experiments on synthetic and real-world datasets. The results verify the effectiveness and efficiency of our approach.Comment: 13 page

arXiv.org e-Print Archive

Crossref

Distributed PCP Theorems for Hardness of Approximation in P

Author: Abboud Amir
Rubinstein Aviad
Williams Ryan
Publication venue
Publication date: 01/01/1952
Field of study

We present a new distributed model of probabilistically checkable proofs (PCP). A satisfying assignment

x \in \{0,1\}^n

to a CNF formula

\varphi

is shared between two parties, where Alice knows

x_1, \dots, x_{n/2}

, Bob knows

x_{n/2+1},\dots,x_n

, and both parties know

\varphi

. The goal is to have Alice and Bob jointly write a PCP that

x

satisfies

\varphi

, while exchanging little or no information. Unfortunately, this model as-is does not allow for nontrivial query complexity. Instead, we focus on a non-deterministic variant, where the players are helped by Merlin, a third party who knows all of

x

. Using our framework, we obtain, for the first time, PCP-like reductions from the Strong Exponential Time Hypothesis (SETH) to approximation problems in P. In particular, under SETH we show that there are no truly-subquadratic approximation algorithms for Bichromatic Maximum Inner Product over {0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate Regular Expression Matching, and Diameter in Product Metric. All our inapproximability factors are nearly-tight. In particular, for the first two problems we obtain nearly-polynomial factors of

2^{(\log n)^{1-o(1)}}

; only

(1+o(1))

-factor lower bounds (under SETH) were known before

arXiv.org e-Print Archive

Biblioteca Virtual del Patrimonio Bibliográfico (Virtual Library of Bibliographical Heritage)

Crossref

A generalized matrix profile framework with support for contextual series analysis

Author: De Paepe Dieter
De Turck Filip
Janssens Olivier
Ongenae Femke
Steenwinckel Bram
Van Hoecke Sofie
Vanden Hautte Sander
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

The Matrix Profile is a state-of-the-art time series analysis technique that can be used for motif discovery, anomaly detection, segmentation and others, in various domains such as healthcare, robotics, and audio. Where recent techniques use the Matrix Profile as a preprocessing or modeling step, we believe there is unexplored potential in generalizing the approach. We derived a framework that focuses on the implicit distance matrix calculation. We present this framework as the Series Distance Matrix (SDM). In this framework, distance measures (SDM-generators) and distance processors (SDM-consumers) can be freely combined, allowing for more flexibility and easier experimentation. In SDM, the Matrix Profile is but one specific configuration. We also introduce the Contextual Matrix Profile (CMP) as a new SDM-consumer capable of discovering repeating patterns. The CMP provides intuitive visualizations for data analysis and can find anomalies that are not discords. We demonstrate this using two real world cases. The CMP is the first of a wide variety of new techniques for series analysis that fits within SDM and can complement the Matrix Profile

Ghent University Academic Bibliography

De Novo Assembly of Nucleotide Sequences in a Compressed Feature Space

Author: Robertson David L.
Tapinos Avraam
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2017
Field of study

Sequencing technologies allow for an in-depth analysis of biological species but the size of the generated datasets introduce a number of analytical challenges. Recently, we demonstrated the application of numerical sequence representations and data transformations for the alignment of short reads to a reference genome. Here, we expand out approach for de novo assembly of short reads. Our results demonstrate that highly compressed data can encapsulate the signal suffi- ciently to accurately assemble reads to big contigs or complete genomes

Crossref

Enlighten

Event-based personal retrieval

Author: E. Tulving
J.D. Bovey
N. Adams
W.M. Newman
W.M. Newman
Publication venue: 'SAGE Publications'
Publication date: 01/10/1996
Field of study

People who work in a research, academic or business environment often have personal information collections which are large enough to need retrieval aids. A major difference between personal information retrieval and normal document retrieval is that the items to be retrieved are often associated with events in the searcher's life and can be retrieved by their relationship to other events as well as by content. This paper describes some of the background to event-based retrieval and then describes a prototype graphical event-based retrieval system

Crossref

Kent Academic Repository

Synchronization Strings: Codes for Insertions and Deletions Approaching the Singleton Bound

Author: Braverman Mark
Haeupler Bernhard
Haeupler Bernhard
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/04/2017
Field of study

We introduce synchronization strings as a novel way of efficiently dealing with synchronization errors, i.e., insertions and deletions. Synchronization errors are strictly more general and much harder to deal with than commonly considered half-errors, i.e., symbol corruptions and erasures. For every

\epsilon >0

, synchronization strings allow to index a sequence with an

\epsilon^{-O(1)}

size alphabet such that one can efficiently transform

k

synchronization errors into

(1+\epsilon)k

half-errors. This powerful new technique has many applications. In this paper, we focus on designing insdel codes, i.e., error correcting block codes (ECCs) for insertion deletion channels. While ECCs for both half-errors and synchronization errors have been intensely studied, the later has largely resisted progress. Indeed, it took until 1999 for the first insdel codes with constant rate, constant distance, and constant alphabet size to be constructed by Schulman and Zuckerman. Insdel codes for asymptotically large or small noise rates were given in 2016 by Guruswami et al. but these codes are still polynomially far from the optimal rate-distance tradeoff. This makes the understanding of insdel codes up to this work equivalent to what was known for regular ECCs after Forney introduced concatenated codes in his doctoral thesis 50 years ago. A direct application of our synchronization strings based indexing method gives a simple black-box construction which transforms any ECC into an equally efficient insdel code with a slightly larger alphabet size. This instantly transfers much of the highly developed understanding for regular ECCs over large constant alphabets into the realm of insdel codes. Most notably, we obtain efficient insdel codes which get arbitrarily close to the optimal rate-distance tradeoff given by the Singleton bound for the complete noise spectrum

arXiv.org e-Print Archive

Crossref