Search CORE

85 research outputs found

Clustering on Sliding Windows in Polylogarithmic Space

Author: Braverman Vladimir
Lang Harry
Levin Keith
Monemizadeh Morteza
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015)
Publication date: 01/01/2015
Field of study

In PODS 2003, Babcock, Datar, Motwani and O\u27Callaghan gave the first streaming solution for the k-median problem on sliding windows using O(frack k tau^4 W^2tau log^2 W) space, with a O(2^O(1/tau)) approximation factor, where W is the window size and tau in (0,1/2) is a user-specified parameter. They left as an open question whether it is possible to improve this to polylogarithmic space. Despite much progress on clustering and sliding windows, this question has remained open for more than a decade. In this paper, we partially answer the main open question posed by Babcock, Datar, Motwani and O\u27Callaghan. We present an algorithm yielding an exponential improvement in space compared to the previous result given in Babcock, et al. In particular, we give the first polylogarithmic space (alpha,beta)-approximation for metric k-median clustering in the sliding window model, where alpha and beta are constants, under the assumption, also made by Babcock et al., that the optimal k-median cost on any given window is bounded by a polynomial in the window size. We justify this assumption by showing that when the cost is exponential in the window size, no sublinear space approximation is possible. Our main technical contribution is a simple but elegant extension of smooth functions as introduced by Braverman and Ostrovsky, which allows us to apply well-known techniques for solving problems in the sliding window model to functions that are not smooth, such as the k-median cost

CiteSeerX

Dagstuhl Research Online Publication Server

An experimental evaluation of sliding-window algorithms for k-means clustering

Author: Mallick Satyaki
Publication venue
Publication date: 30/09/2021
Field of study

Pure OAI Repository

Improved Algorithms for Time Decay Streams

Author: Braverman Vladimir
Lang Harry
Ullah Enayat
Zhou Samson
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

In the time-decay model for data streams, elements of an underlying data set arrive sequentially with the recently arrived elements being more important. A common approach for handling large data sets is to maintain a coreset, a succinct summary of the processed data that allows approximate recovery of a predetermined query. We provide a general framework that takes any offline-coreset and gives a time-decay coreset for polynomial time decay functions. We also consider the exponential time decay model for k-median clustering, where we provide a constant factor approximation algorithm that utilizes the online facility location algorithm. Our algorithm stores O(k log(h Delta)+h) points where h is the half-life of the decay function and Delta is the aspect ratio of the dataset. Our techniques extend to k-means clustering and M-estimators as well

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Near Optimal Linear Algebra in the Online and Sliding Window Models

Author: Braverman Vladimir
Drineas Petros
Musco Cameron
Musco Christopher
Upadhyay Jalaj
Woodruff David P.
Zhou Samson
Publication venue
Publication date: 19/04/2020
Field of study

We initiate the study of numerical linear algebra in the sliding window model, where only the most recent

W

updates in a stream form the underlying data set. We first introduce a unified row-sampling based framework that gives randomized algorithms for spectral approximation, low-rank approximation/projection-cost preservation, and

\ell_1

-subspace embeddings in the sliding window model, which often use nearly optimal space and achieve nearly input sparsity runtime. Our algorithms are based on "reverse online" versions of offline sampling distributions such as (ridge) leverage scores,

\ell_1

sensitivities, and Lewis weights to quantify both the importance and the recency of a row. Our row-sampling framework rather surprisingly implies connections to the well-studied online model; our structural results also give the first sample optimal (up to lower order terms) online algorithm for low-rank approximation/projection-cost preservation. Using this powerful primitive, we give online algorithms for column/row subset selection and principal component analysis that resolves the main open question of Bhaskara et. al.,(FOCS 2019). We also give the first online algorithm for

\ell_1

-subspace embeddings. We further formalize the connection between the online model and the sliding window model by introducing an additional unified framework for deterministic algorithms using a merge and reduce paradigm and the concept of online coresets. Our sampling based algorithms in the row-arrival online model yield online coresets, giving deterministic algorithms for spectral approximation, low-rank approximation/projection-cost preservation, and

\ell_1

-subspace embeddings in the sliding window model that use nearly optimal space

arXiv.org e-Print Archive

Fast filtering and animation of large dynamic networks

Author: Aiello Luca Maria
Grabowicz Przemyslaw A.
Menczer Filippo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/10/2014
Field of study

Detecting and visualizing what are the most relevant changes in an evolving network is an open challenge in several domains. We present a fast algorithm that filters subsets of the strongest nodes and edges representing an evolving weighted graph and visualize it by either creating a movie, or by streaming it to an interactive network visualization tool. The algorithm is an approximation of exponential sliding time-window that scales linearly with the number of interactions. We compare the algorithm against rectangular and exponential sliding time-window methods. Our network filtering algorithm: i) captures persistent trends in the structure of dynamic weighted networks, ii) smoothens transitions between the snapshots of dynamic network, and iii) uses limited memory and processor time. The algorithm is publicly available as open-source software.Comment: 6 figures, 2 table

arXiv.org e-Print Archive

Springer - Publisher Connector

MPG.PuRe

On optimally partitioning a text to improve its compression

Author: Ferragina Paolo
Nitto Igor
Venturini Rossano
Publication venue
Publication date: 01/01/2009
Field of study

In this paper we investigate the problem of partitioning an input string T in such a way that compressing individually its parts via a base-compressor C gets a compressed output that is shorter than applying C over the entire T at once. This problem was introduced in the context of table compression, and then further elaborated and extended to strings and trees. Unfortunately, the literature offers poor solutions: namely, we know either a cubic-time algorithm for computing the optimal partition based on dynamic programming, or few heuristics that do not guarantee any bounds on the efficacy of their computed partition, or algorithms that are efficient but work in some specific scenarios (such as the Burrows-Wheeler Transform) and achieve compression performance that might be worse than the optimal-partitioning by a

\Omega(\sqrt{\log n})

factor. Therefore, computing efficiently the optimal solution is still open. In this paper we provide the first algorithm which is guaranteed to compute in O(n \log_{1+\eps}n) time a partition of T whose compressed output is guaranteed to be no more than

(1+\epsilon)

-worse the optimal one, where

\epsilon

may be any positive constant

arXiv.org e-Print Archive

CiteSeerX

Archivio della Ricerca - Università di Pisa

Diameter and k-Center in Sliding Windows

Author: Cohen-Addad Vincent
Schwiegelshohn Chris
Sohler Christian
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016)
Publication date: 01/01/2016
Field of study

In this paper we develop streaming algorithms for the diameter problem and the k-center clustering problem in the sliding window model. In this model we are interested in maintaining a solution for the N most recent points of the stream. In the diameter problem we would like to maintain two points whose distance approximates the diameter of the point set in the window. Our algorithm computes a (3 + epsilon)-approximation and uses O(1/epsilon*ln(alpha)) memory cells, where alpha is the ratio of the largest and smallest distance and is assumed to be known in advance. We also prove that under reasonable assumptions obtaining a (3 - epsilon)-approximation requires Omega(N1/3) space. For the k-center problem, where the goal is to find k centers that minimize the maximum distance of a point to its nearest center, we obtain a (6 + epsilon)-approximation using O(k/epsilon*ln(alpha)) memory cells and a (4 + epsilon)-approximation for the special case k = 2. We also prove that any algorithm for the 2-center problem that achieves an approximation ratio of less than 4 requires Omega(N^{1/3}) space

Dagstuhl Research Online Publication Server

Archivio della ricerca- Università di Roma La Sapienza