1,503 research outputs found
Prioritized Metric Structures and Embedding
Metric data structures (distance oracles, distance labeling schemes, routing
schemes) and low-distortion embeddings provide a powerful algorithmic
methodology, which has been successfully applied for approximation algorithms
\cite{llr}, online algorithms \cite{BBMN11}, distributed algorithms
\cite{KKMPT12} and for computing sparsifiers \cite{ST04}. However, this
methodology appears to have a limitation: the worst-case performance inherently
depends on the cardinality of the metric, and one could not specify in advance
which vertices/points should enjoy a better service (i.e., stretch/distortion,
label size/dimension) than that given by the worst-case guarantee.
In this paper we alleviate this limitation by devising a suit of {\em
prioritized} metric data structures and embeddings. We show that given a
priority ranking of the graph vertices (respectively,
metric points) one can devise a metric data structure (respectively, embedding)
in which the stretch (resp., distortion) incurred by any pair containing a
vertex will depend on the rank of the vertex. We also show that other
important parameters, such as the label size and (in some sense) the dimension,
may depend only on . In some of our metric data structures (resp.,
embeddings) we achieve both prioritized stretch (resp., distortion) and label
size (resp., dimension) {\em simultaneously}. The worst-case performance of our
metric data structures and embeddings is typically asymptotically no worse than
of their non-prioritized counterparts.Comment: To appear at STOC 201
Algorithms for Constructing Overlay Networks For Live Streaming
We present a polynomial time approximation algorithm for constructing an
overlay multicast network for streaming live media events over the Internet.
The class of overlay networks constructed by our algorithm include networks
used by Akamai Technologies to deliver live media events to a global audience
with high fidelity. We construct networks consisting of three stages of nodes.
The nodes in the first stage are the entry points that act as sources for the
live streams. Each source forwards each of its streams to one or more nodes in
the second stage that are called reflectors. A reflector can split an incoming
stream into multiple identical outgoing streams, which are then sent on to
nodes in the third and final stage that act as sinks and are located in edge
networks near end-users. As the packets in a stream travel from one stage to
the next, some of them may be lost. A sink combines the packets from multiple
instances of the same stream (by reordering packets and discarding duplicates)
to form a single instance of the stream with minimal loss. Our primary
contribution is an algorithm that constructs an overlay network that provably
satisfies capacity and reliability constraints to within a constant factor of
optimal, and minimizes cost to within a logarithmic factor of optimal. Further
in the common case where only the transmission costs are minimized, we show
that our algorithm produces a solution that has cost within a factor of 2 of
optimal. We also implement our algorithm and evaluate it on realistic traces
derived from Akamai's live streaming network. Our empirical results show that
our algorithm can be used to efficiently construct large-scale overlay networks
in practice with near-optimal cost
Fast and Deterministic Approximations for k-Cut
In an undirected graph, a k-cut is a set of edges whose removal breaks the graph into at least k connected components. The minimum weight k-cut can be computed in n^O(k) time, but when k is treated as part of the input, computing the minimum weight k-cut is NP-Hard [Goldschmidt and Hochbaum, 1994]. For poly(m,n,k)-time algorithms, the best possible approximation factor is essentially 2 under the small set expansion hypothesis [Manurangsi, 2017]. Saran and Vazirani [1995] showed that a (2 - 2/k)-approximately minimum weight k-cut can be computed via O(k) minimum cuts, which implies a O~(km) randomized running time via the nearly linear time randomized min-cut algorithm of Karger [2000]. Nagamochi and Kamidoi [2007] showed that a (2 - 2/k)-approximately minimum weight k-cut can be computed deterministically in O(mn + n^2 log n) time. These results prompt two basic questions. The first concerns the role of randomization. Is there a deterministic algorithm for 2-approximate k-cuts matching the randomized running time of O~(km)? The second question qualitatively compares minimum cut to 2-approximate minimum k-cut. Can 2-approximate k-cuts be computed as fast as the minimum cut - in O~(m) randomized time?
We give a deterministic approximation algorithm that computes (2 + eps)-minimum k-cuts in O(m log^3 n / eps^2) time, via a (1 + eps)-approximation for an LP relaxation of k-cut
The Secure Link Prediction Problem
Link Prediction is an important and well-studied problem for social networks.
Given a snapshot of a graph, the link prediction problem predicts which new
interactions between members are most likely to occur in the near future. As
networks grow in size, data owners are forced to store the data in remote cloud
servers which reveals sensitive information about the network. The graphs are
therefore stored in encrypted form.
We study the link prediction problem on encrypted graphs. To the best of our
knowledge, this secure link prediction problem has not been studied before. We
use the number of common neighbors for prediction. We present three algorithms
for the secure link prediction problem. We design prototypes of the schemes and
formally prove their security. We execute our algorithms in real-life datasets.Comment: This has been accepted for publication in Advances in Mathematics of
Communications (AMC) journa
Improved Outlier Robust Seeding for k-means
The -means is a popular clustering objective, although it is inherently
non-robust and sensitive to outliers. Its popular seeding or initialization
called -means++ uses sampling and comes with a provable
approximation guarantee \cite{AV2007}. However, in the presence of adversarial
noise or outliers, sampling is more likely to pick centers from distant
outliers instead of inlier clusters, and therefore its approximation guarantees
\textit{w.r.t.} -means solution on inliers, does not hold.
Assuming that the outliers constitute a constant fraction of the given data,
we propose a simple variant in the sampling distribution, which makes it
robust to the outliers. Our algorithm runs in time, outputs
clusters, discards marginally more points than the optimal number of outliers,
and comes with a provable approximation guarantee.
Our algorithm can also be modified to output exactly clusters instead of
clusters, while keeping its running time linear in and . This is
an improvement over previous results for robust -means based on LP
relaxation and rounding \cite{Charikar}, \cite{KrishnaswamyLS18} and
\textit{robust -means++} \cite{DeshpandeKP20}. Our empirical results show
the advantage of our algorithm over -means++~\cite{AV2007}, uniform random
seeding, greedy sampling for means~\cite{tkmeanspp}, and robust
-means++~\cite{DeshpandeKP20}, on standard real-world and synthetic data
sets used in previous work. Our proposal is easily amenable to scalable,
faster, parallel implementations of -means++ \cite{Bahmani,BachemL017} and
is of independent interest for coreset constructions in the presence of
outliers \cite{feldman2007ptas,langberg2010universal,feldman2011unified}
- …