10,621 research outputs found
Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs
We develop data structures for dynamic closest pair problems with arbitrary
distance functions, that do not necessarily come from any geometric structure
on the objects. Based on a technique previously used by the author for
Euclidean closest pairs, we show how to insert and delete objects from an
n-object set, maintaining the closest pair, in O(n log^2 n) time per update and
O(n) space. With quadratic space, we can instead use a quadtree-like structure
to achieve an optimal time bound, O(n) per update. We apply these data
structures to hierarchical clustering, greedy matching, and TSP heuristics, and
discuss other potential applications in machine learning, Groebner bases, and
local improvement algorithms for partition and placement problems. Experiments
show our new methods to be faster in practice than previously used heuristics.Comment: 20 pages, 9 figures. A preliminary version of this paper appeared at
the 9th ACM-SIAM Symp. on Discrete Algorithms, San Francisco, 1998, pp.
619-628. For source code and experimental results, see
http://www.ics.uci.edu/~eppstein/projects/pairs
Cycle factors and renewal theory
For which values of does a uniformly chosen -regular graph on
vertices typically contain vertex-disjoint -cycles (a -cycle
factor)? To date, this has been answered for and for ; the
former, the Hamiltonicity problem, was finally answered in the affirmative by
Robinson and Wormald in 1992, while the answer in the latter case is negative
since with high probability most vertices do not lie on -cycles.
Here we settle the problem completely: the threshold for a -cycle factor
in as above is with . Precisely, we prove a 2-point concentration result: if divides then contains a -cycle factor
w.h.p., whereas if then w.h.p. it
does not. As a byproduct, we confirm the "Comb Conjecture," an old problem
concerning the embedding of certain spanning trees in the random graph
.
The proof follows the small subgraph conditioning framework, but the
associated second moment analysis here is far more delicate than in any earlier
use of this method and involves several novel features, among them a sharp
estimate for tail probabilities in renewal processes without replacement which
may be of independent interest.Comment: 45 page
I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis
Revelations of large scale electronic surveillance and data mining by
governments and corporations have fueled increased adoption of HTTPS. We
present a traffic analysis attack against over 6000 webpages spanning the HTTPS
deployments of 10 widely used, industry-leading websites in areas such as
healthcare, finance, legal services and streaming video. Our attack identifies
individual pages in the same website with 89% accuracy, exposing personal
details including medical conditions, financial and legal affairs and sexual
orientation. We examine evaluation methodology and reveal accuracy variations
as large as 18% caused by assumptions affecting caching and cookies. We present
a novel defense reducing attack accuracy to 27% with a 9% traffic increase, and
demonstrate significantly increased effectiveness of prior defenses in our
evaluation context, inclusive of enabled caching, user-specific cookies and
pages within the same website
- …