86,644 research outputs found

    K-nearest Neighbor Search by Random Projection Forests

    Full text link
    K-nearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of tree-based methodology, we propose random projection forests (rpForests), for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable.Comment: 15 pages, 4 figures, 2018 IEEE Big Data Conferenc

    Fast Distributed Approximation for Max-Cut

    Full text link
    Finding a maximum cut is a fundamental task in many computational settings. Surprisingly, it has been insufficiently studied in the classic distributed settings, where vertices communicate by synchronously sending messages to their neighbors according to the underlying graph, known as the LOCAL\mathcal{LOCAL} or CONGEST\mathcal{CONGEST} models. We amend this by obtaining almost optimal algorithms for Max-Cut on a wide class of graphs in these models. In particular, for any ϵ>0\epsilon > 0, we develop randomized approximation algorithms achieving a ratio of (1ϵ)(1-\epsilon) to the optimum for Max-Cut on bipartite graphs in the CONGEST\mathcal{CONGEST} model, and on general graphs in the LOCAL\mathcal{LOCAL} model. We further present efficient deterministic algorithms, including a 1/31/3-approximation for Max-Dicut in our models, thus improving the best known (randomized) ratio of 1/41/4. Our algorithms make non-trivial use of the greedy approach of Buchbinder et al. (SIAM Journal on Computing, 2015) for maximizing an unconstrained (non-monotone) submodular function, which may be of independent interest

    Privacy-Friendly Collaboration for Cyber Threat Mitigation

    Full text link
    Sharing of security data across organizational boundaries has often been advocated as a promising way to enhance cyber threat mitigation. However, collaborative security faces a number of important challenges, including privacy, trust, and liability concerns with the potential disclosure of sensitive data. In this paper, we focus on data sharing for predictive blacklisting, i.e., forecasting attack sources based on past attack information. We propose a novel privacy-enhanced data sharing approach in which organizations estimate collaboration benefits without disclosing their datasets, organize into coalitions of allied organizations, and securely share data within these coalitions. We study how different partner selection strategies affect prediction accuracy by experimenting on a real-world dataset of 2 billion IP addresses and observe up to a 105% prediction improvement.Comment: This paper has been withdrawn as it has been superseded by arXiv:1502.0533

    Unions of Onions: Preprocessing Imprecise Points for Fast Onion Decomposition

    Full text link
    Let D\mathcal{D} be a set of nn pairwise disjoint unit disks in the plane. We describe how to build a data structure for D\mathcal{D} so that for any point set PP containing exactly one point from each disk, we can quickly find the onion decomposition (convex layers) of PP. Our data structure can be built in O(nlogn)O(n \log n) time and has linear size. Given PP, we can find its onion decomposition in O(nlogk)O(n \log k) time, where kk is the number of layers. We also provide a matching lower bound. Our solution is based on a recursive space decomposition, combined with a fast algorithm to compute the union of two disjoint onionComment: 10 pages, 5 figures; a preliminary version appeared at WADS 201
    corecore