923 research outputs found

    Recovery from Non-Decomposable Distance Oracles

    Get PDF
    A line of work has looked at the problem of recovering an input from distance queries. In this setting, there is an unknown sequence s{0,1}ns \in \{0,1\}^{\leq n}, and one chooses a set of queries y{0,1}O(n)y \in \{0,1\}^{\mathcal{O}(n)} and receives d(s,y)d(s,y) for a distance function dd. The goal is to make as few queries as possible to recover ss. Although this problem is well-studied for decomposable distances, i.e., distances of the form d(s,y)=i=1nf(si,yi)d(s,y) = \sum_{i=1}^n f(s_i, y_i) for some function ff, which includes the important cases of Hamming distance, p\ell_p-norms, and MM-estimators, to the best of our knowledge this problem has not been studied for non-decomposable distances, for which there are important special cases such as edit distance, dynamic time warping (DTW), Frechet distance, earth mover's distance, and so on. We initiate the study and develop a general framework for such distances. Interestingly, for some distances such as DTW or Frechet, exact recovery of the sequence ss is provably impossible, and so we show by allowing the characters in yy to be drawn from a slightly larger alphabet this then becomes possible. In a number of cases we obtain optimal or near-optimal query complexity. We also study the role of adaptivity for a number of different distance functions. One motivation for understanding non-adaptivity is that the query sequence can be fixed and the distances of the input to the queries provide a non-linear embedding of the input, which can be used in downstream applications involving, e.g., neural networks for natural language processing.Comment: This work has been presented at conference The 14th Innovations in Theoretical Computer Science (ITCS 2023) and accepted for publishing in the journal IEEE Transactions on Information Theor

    Fine-grained complexity and algorithm engineering of geometric similarity measures

    Get PDF
    Point sets and sequences are fundamental geometric objects that arise in any application that considers movement data, geometric shapes, and many more. A crucial task on these objects is to measure their similarity. Therefore, this thesis presents results on algorithms, complexity lower bounds, and algorithm engineering of the most important point set and sequence similarity measures like the Fréchet distance, the Fréchet distance under translation, and the Hausdorff distance under translation. As an extension to the mere computation of similarity, also the approximate near neighbor problem for the continuous Fréchet distance on time series is considered and matching upper and lower bounds are shown.Punktmengen und Sequenzen sind fundamentale geometrische Objekte, welche in vielen Anwendungen auftauchen, insbesondere in solchen die Bewegungsdaten, geometrische Formen, und ähnliche Daten verarbeiten. Ein wichtiger Bestandteil dieser Anwendungen ist die Berechnung der Ähnlichkeit von Objekten. Diese Dissertation präsentiert Resultate, genauer gesagt Algorithmen, untere Komplexitätsschranken und Algorithm Engineering der wichtigsten Ähnlichkeitsmaße für Punktmengen und Sequenzen, wie zum Beispiel Fréchetdistanz, Fréchetdistanz unter Translation und Hausdorffdistanz unter Translation. Als eine Erweiterung der bloßen Berechnung von Ähnlichkeit betrachten wir auch das Near Neighbor Problem für die kontinuierliche Fréchetdistanz auf Zeitfolgen und zeigen obere und untere Schranken dafür

    Sparse Graph Learning from Spatiotemporal Time Series

    Full text link
    Outstanding achievements of graph neural networks for spatiotemporal time series analysis show that relational constraints introduce an effective inductive bias into neural forecasting architectures. Often, however, the relational information characterizing the underlying data-generating process is unavailable and the practitioner is left with the problem of inferring from data which relational graph to use in the subsequent processing stages. We propose novel, principled - yet practical - probabilistic score-based methods that learn the relational dependencies as distributions over graphs while maximizing end-to-end the performance at task. The proposed graph learning framework is based on consolidated variance reduction techniques for Monte Carlo score-based gradient estimation, is theoretically grounded, and, as we show, effective in practice. In this paper, we focus on the time series forecasting problem and show that, by tailoring the gradient estimators to the graph learning problem, we are able to achieve state-of-the-art performance while controlling the sparsity of the learned graph and the computational scalability. We empirically assess the effectiveness of the proposed method on synthetic and real-world benchmarks, showing that the proposed solution can be used as a stand-alone graph identification procedure as well as a graph learning component of an end-to-end forecasting architecture.Comment: updated and extended versio
    corecore