86,644 research outputs found
K-nearest Neighbor Search by Random Projection Forests
K-nearest neighbor (kNN) search has wide applications in many areas,
including data mining, machine learning, statistics and many applied domains.
Inspired by the success of ensemble methods and the flexibility of tree-based
methodology, we propose random projection forests (rpForests), for kNN search.
rpForests finds kNNs by aggregating results from an ensemble of random
projection trees with each constructed recursively through a series of
carefully chosen random projections. rpForests achieves a remarkable accuracy
in terms of fast decay in the missing rate of kNNs and that of discrepancy in
the kNN distances. rpForests has a very low computational complexity. The
ensemble nature of rpForests makes it easily run in parallel on multicore or
clustered computers; the running time is expected to be nearly inversely
proportional to the number of cores or machines. We give theoretical insights
by showing the exponential decay of the probability that neighboring points
would be separated by ensemble random projection trees when the ensemble size
increases. Our theory can be used to refine the choice of random projections in
the growth of trees, and experiments show that the effect is remarkable.Comment: 15 pages, 4 figures, 2018 IEEE Big Data Conferenc
Fast Distributed Approximation for Max-Cut
Finding a maximum cut is a fundamental task in many computational settings.
Surprisingly, it has been insufficiently studied in the classic distributed
settings, where vertices communicate by synchronously sending messages to their
neighbors according to the underlying graph, known as the or
models. We amend this by obtaining almost optimal
algorithms for Max-Cut on a wide class of graphs in these models. In
particular, for any , we develop randomized approximation
algorithms achieving a ratio of to the optimum for Max-Cut on
bipartite graphs in the model, and on general graphs in the
model.
We further present efficient deterministic algorithms, including a
-approximation for Max-Dicut in our models, thus improving the best known
(randomized) ratio of . Our algorithms make non-trivial use of the greedy
approach of Buchbinder et al. (SIAM Journal on Computing, 2015) for maximizing
an unconstrained (non-monotone) submodular function, which may be of
independent interest
Privacy-Friendly Collaboration for Cyber Threat Mitigation
Sharing of security data across organizational boundaries has often been
advocated as a promising way to enhance cyber threat mitigation. However,
collaborative security faces a number of important challenges, including
privacy, trust, and liability concerns with the potential disclosure of
sensitive data. In this paper, we focus on data sharing for predictive
blacklisting, i.e., forecasting attack sources based on past attack
information. We propose a novel privacy-enhanced data sharing approach in which
organizations estimate collaboration benefits without disclosing their
datasets, organize into coalitions of allied organizations, and securely share
data within these coalitions. We study how different partner selection
strategies affect prediction accuracy by experimenting on a real-world dataset
of 2 billion IP addresses and observe up to a 105% prediction improvement.Comment: This paper has been withdrawn as it has been superseded by
arXiv:1502.0533
Unions of Onions: Preprocessing Imprecise Points for Fast Onion Decomposition
Let be a set of pairwise disjoint unit disks in the plane.
We describe how to build a data structure for so that for any
point set containing exactly one point from each disk, we can quickly find
the onion decomposition (convex layers) of .
Our data structure can be built in time and has linear size.
Given , we can find its onion decomposition in time, where
is the number of layers. We also provide a matching lower bound. Our solution
is based on a recursive space decomposition, combined with a fast algorithm to
compute the union of two disjoint onionComment: 10 pages, 5 figures; a preliminary version appeared at WADS 201
- …