10,621 research outputs found

    Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs

    Full text link
    We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based on a technique previously used by the author for Euclidean closest pairs, we show how to insert and delete objects from an n-object set, maintaining the closest pair, in O(n log^2 n) time per update and O(n) space. With quadratic space, we can instead use a quadtree-like structure to achieve an optimal time bound, O(n) per update. We apply these data structures to hierarchical clustering, greedy matching, and TSP heuristics, and discuss other potential applications in machine learning, Groebner bases, and local improvement algorithms for partition and placement problems. Experiments show our new methods to be faster in practice than previously used heuristics.Comment: 20 pages, 9 figures. A preliminary version of this paper appeared at the 9th ACM-SIAM Symp. on Discrete Algorithms, San Francisco, 1998, pp. 619-628. For source code and experimental results, see http://www.ics.uci.edu/~eppstein/projects/pairs

    Cycle factors and renewal theory

    Full text link
    For which values of kk does a uniformly chosen 33-regular graph GG on nn vertices typically contain n/k n/k vertex-disjoint kk-cycles (a kk-cycle factor)? To date, this has been answered for k=nk=n and for klognk \ll \log n; the former, the Hamiltonicity problem, was finally answered in the affirmative by Robinson and Wormald in 1992, while the answer in the latter case is negative since with high probability most vertices do not lie on kk-cycles. Here we settle the problem completely: the threshold for a kk-cycle factor in GG as above is κ0log2n\kappa_0 \log_2 n with κ0=[112log23]14.82\kappa_0=[1-\frac12\log_2 3]^{-1}\approx 4.82. Precisely, we prove a 2-point concentration result: if kκ0log2(2n/e)k \geq \kappa_0 \log_2(2n/e) divides nn then GG contains a kk-cycle factor w.h.p., whereas if k<κ0log2(2n/e)log2nnk<\kappa_0\log_2(2n/e)-\frac{\log^2 n}n then w.h.p. it does not. As a byproduct, we confirm the "Comb Conjecture," an old problem concerning the embedding of certain spanning trees in the random graph G(n,p)G(n,p). The proof follows the small subgraph conditioning framework, but the associated second moment analysis here is far more delicate than in any earlier use of this method and involves several novel features, among them a sharp estimate for tail probabilities in renewal processes without replacement which may be of independent interest.Comment: 45 page

    I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis

    Full text link
    Revelations of large scale electronic surveillance and data mining by governments and corporations have fueled increased adoption of HTTPS. We present a traffic analysis attack against over 6000 webpages spanning the HTTPS deployments of 10 widely used, industry-leading websites in areas such as healthcare, finance, legal services and streaming video. Our attack identifies individual pages in the same website with 89% accuracy, exposing personal details including medical conditions, financial and legal affairs and sexual orientation. We examine evaluation methodology and reveal accuracy variations as large as 18% caused by assumptions affecting caching and cookies. We present a novel defense reducing attack accuracy to 27% with a 9% traffic increase, and demonstrate significantly increased effectiveness of prior defenses in our evaluation context, inclusive of enabled caching, user-specific cookies and pages within the same website
    corecore