142 research outputs found

    Finding Maximal Exact Matches in Graphs

    Get PDF

    Which Regular Expression Patterns are Hard to Match?

    Full text link
    Regular expressions constitute a fundamental notion in formal language theory and are frequently used in computer science to define search patterns. A classic algorithm for these problems constructs and simulates a non-deterministic finite automaton corresponding to the expression, resulting in an O(mn)O(mn) running time (where mm is the length of the pattern and nn is the length of the text). This running time can be improved slightly (by a polylogarithmic factor), but no significantly faster solutions are known. At the same time, much faster algorithms exist for various special cases of regular expressions, including dictionary matching, wildcard matching, subset matching, word break problem etc. In this paper, we show that the complexity of regular expression matching can be characterized based on its {\em depth} (when interpreted as a formula). Our results hold for expressions involving concatenation, OR, Kleene star and Kleene plus. For regular expressions of depth two (involving any combination of the above operators), we show the following dichotomy: matching and membership testing can be solved in near-linear time, except for "concatenations of stars", which cannot be solved in strongly sub-quadratic time assuming the Strong Exponential Time Hypothesis (SETH). For regular expressions of depth three the picture is more complex. Nevertheless, we show that all problems can either be solved in strongly sub-quadratic time, or cannot be solved in strongly sub-quadratic time assuming SETH. An intriguing special case of membership testing involves regular expressions of the form "a star of an OR of concatenations", e.g., [a∣ab∣bc]∗[a|ab|bc]^*. This corresponds to the so-called {\em word break} problem, for which a dynamic programming algorithm with a runtime of (roughly) O(nm)O(n\sqrt{m}) is known. We show that the latter bound is not tight and improve the runtime to O(nm0.44…)O(nm^{0.44\ldots})

    Making the Dynamic Time Warping Distance Warping-Invariant

    Get PDF
    The literature postulates that the dynamic time warping (dtw) distance can cope with temporal variations but stores and processes time series in a form as if the dtw-distance cannot cope with such variations. To address this inconsistency, we first show that the dtw-distance is not warping-invariant. The lack of warping-invariance contributes to the inconsistency mentioned above and to a strange behavior. To eliminate these peculiarities, we convert the dtw-distance to a warping-invariant semi-metric, called time-warp-invariant (twi) distance. Empirical results suggest that the error rates of the twi and dtw nearest-neighbor classifier are practically equivalent in a Bayesian sense. However, the twi-distance requires less storage and computation time than the dtw-distance for a broad range of problems. These results challenge the current practice of applying the dtw-distance in nearest-neighbor classification and suggest the proposed twi-distance as a more efficient and consistent option.Comment: arXiv admin note: substantial text overlap with arXiv:1808.0996

    Dynamic and Internal Longest Common Substring

    Get PDF
    Given two strings S and T, each of length at most n, the longest common substring (LCS) problem is to find a longest substring common to S and T. This is a classical problem in computer science with an O(n) -time solution. In the fully dynamic setting, edit operations are allowed in either of the two strings, and the problem is to find an LCS after each edit. We present the first solution to the fully dynamic LCS problem requiring sublinear time in n per edit operation. In particular, we show how to find an LCS after each edit operation in O~ (n2 / 3) time, after O~ (n) -time and space preprocessing. This line of research has been recently initiated in a somewhat restricted dynamic variant by Amir et al. [SPIRE 2017]. More specifically, the authors presented an O~ (n) -sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in O~ (1) time. At CPM 2018, three papers (Abedin et al., Funakoshi et al., and Urabe et al.) studied analogously restricted dynamic variants of problems on strings; specifically, computing the longest palindrome and the Lyndon factorization of a string after a single edit operation. We develop dynamic sublinear-time algorithms for both of these problems as well. We also consider internal LCS queries, that is, queries in which we are to return an LCS of a pair of substrings of S and T. We show that answering such queries is hard in general and propose efficient data structures for several restricted cases

    They are Not Equally Reliable: Semantic Event Search Using Differentiated Concept Classifiers

    Full text link
    © 2016 IEEE. Complex event detection on unconstrained Internet videos has seen much progress in recent years. However, state-of-the-art performance degrades dramatically when the number of positive training exemplars falls short. Since label acquisition is costly, laborious, and time-consuming, there is a real need to consider the much more challenging semantic event search problem, where no example video is given. In this paper, we present a state-of-the-art event search system without any example videos. Relying on the key observation that events (e.g. dog show) are usually compositions of multiple mid-level concepts (e.g. 'dog,' 'theater,' and 'dog jumping'), we first train a skip-gram model to measure the relevance of each concept with the event of interest. The relevant concept classifiers then cast votes on the test videos but their reliability, due to lack of labeled training videos, has been largely unaddressed. We propose to combine the concept classifiers based on a principled estimate of their accuracy on the unlabeled test videos. A novel warping technique is proposed to improve the performance and an efficient highly-scalable algorithm is provided to quickly solve the resulting optimization. We conduct extensive experiments on the latest TRECVID MEDTest 2014, MEDTest 2013 and CCV datasets, and achieve state-of-the-art performances
    • …
    corecore