1,979 research outputs found
Hashing as Tie-Aware Learning to Rank
Hashing, or learning binary embeddings of data, is frequently used in nearest
neighbor retrieval. In this paper, we develop learning to rank formulations for
hashing, aimed at directly optimizing ranking-based evaluation metrics such as
Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG). We
first observe that the integer-valued Hamming distance often leads to tied
rankings, and propose to use tie-aware versions of AP and NDCG to evaluate
hashing for retrieval. Then, to optimize tie-aware ranking metrics, we derive
their continuous relaxations, and perform gradient-based optimization with deep
neural networks. Our results establish the new state-of-the-art for image
retrieval by Hamming ranking in common benchmarks.Comment: 15 pages, 3 figures. IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 201
Linear-Size Approximations to the Vietoris-Rips Filtration
The Vietoris-Rips filtration is a versatile tool in topological data
analysis. It is a sequence of simplicial complexes built on a metric space to
add topological structure to an otherwise disconnected set of points. It is
widely used because it encodes useful information about the topology of the
underlying metric space. This information is often extracted from its so-called
persistence diagram. Unfortunately, this filtration is often too large to
construct in full. We show how to construct an O(n)-size filtered simplicial
complex on an -point metric space such that its persistence diagram is a
good approximation to that of the Vietoris-Rips filtration. This new filtration
can be constructed in time. The constant factors in both the size
and the running time depend only on the doubling dimension of the metric space
and the desired tightness of the approximation. For the first time, this makes
it computationally tractable to approximate the persistence diagram of the
Vietoris-Rips filtration across all scales for large data sets.
We describe two different sparse filtrations. The first is a zigzag
filtration that removes points as the scale increases. The second is a
(non-zigzag) filtration that yields the same persistence diagram. Both methods
are based on a hierarchical net-tree and yield the same guarantees
Near Optimal Parallel Algorithms for Dynamic DFS in Undirected Graphs
Depth first search (DFS) tree is a fundamental data structure for solving
graph problems. The classical algorithm [SiComp74] for building a DFS tree
requires time for a given graph having vertices and edges.
Recently, Baswana et al. [SODA16] presented a simple algorithm for updating DFS
tree of an undirected graph after an edge/vertex update in time.
However, their algorithm is strictly sequential. We present an algorithm
achieving similar bounds, that can be adopted easily to the parallel
environment.
In the parallel model, a DFS tree can be computed from scratch using
processors in expected time [SiComp90] on an EREW PRAM, whereas
the best deterministic algorithm takes time
[SiComp90,JAlg93] on a CRCW PRAM. Our algorithm can be used to develop optimal
(upto polylog n factors deterministic algorithms for maintaining fully dynamic
DFS and fault tolerant DFS, of an undirected graph.
1- Parallel Fully Dynamic DFS:
Given an arbitrary online sequence of vertex/edge updates, we can maintain a
DFS tree of an undirected graph in time per update using
processors on an EREW PRAM.
2- Parallel Fault tolerant DFS:
An undirected graph can be preprocessed to build a data structure of size
O(m) such that for a set of updates (where is constant) in the graph,
the updated DFS tree can be computed in time using
processors on an EREW PRAM.
Moreover, our fully dynamic DFS algorithm provides, in a seamless manner,
nearly optimal (upto polylog n factors) algorithms for maintaining a DFS tree
in semi-streaming model and a restricted distributed model. These are the first
parallel, semi-streaming and distributed algorithms for maintaining a DFS tree
in the dynamic setting.Comment: Accepted to appear in SPAA'17, 32 Pages, 5 Figure
Approximate Bregman near neighbors in sublinear time: beyond the triangle inequality
pre-printBregman divergences are important distance measures that are used extensively in data-driven applications such as computer vision, text mining, and speech processing, and are a key focus of interest in machine learning. Answering nearest neighbor (NN) queries under these measures is very important in these applications and has been the subject of extensive study, but is problematic because these distance measures lack metric properties like symmetry and the triangle inequality. In this paper, we present the first provably approximate nearest-neighbor (ANN) algorithms. These process queries in O(logn) time for Bregman divergences in fixed dimensional spaces. We also obtain polylogn bounds for a more abstract class of distance measures (containing Bregman divergences) which satisfy certain structural properties . Both of these bounds apply to both the regular asymmetric Bregman divergences as well as their symmetrized versions. To do so, we develop two geometric properties vital to our analysis: a reverse triangle inequality (RTI) and a relaxed triangle inequality called m-defectiveness where m is a domain-dependent parameter. Bregman divergences satisfy the RTI but not m-defectiveness. However, we show that the square root of a Bregman divergence does satisfy m-defectiveness. This allows us to then utilize both properties in an efficient search data structure that follows the general two-stage paradigm of a ring-tree decomposition followed by a quad tree search used in previous near-neighbor algorithms for Euclidean space and spaces of bounded doubling dimension. Our first algorithm resolves a query for a d-dimensional (1+e)-ANN in O ( logne )O(d) time and O (nlogd-1 n) space and holds for generic m-defective distance measures satisfying a RTI. Our second algorithm is more specific in analysis to the Bregman divergences and uses a further structural constant, the maximum ratio of second derivatives over each dimension of our domain (c0). This allows us to locate a (1+e)-ANN in O(logn) time and O(n) space, where there is a further (c0)d factor in the big-Oh for the query time
Using Incremental Many-to-One Queries to Build a Fast and Tight Heuristic for A* in Road Networks
We study exact, efficient, and practical algorithms for route planning applications in large road networks. On the one hand, such algorithms should be able to answer shortest path queries within milliseconds. On the other hand, routing applications often require integrating the current traffic situation, planning ahead with predictions for future traffic, respecting forbidden turns, and many other features depending on the specific application. Therefore, such algorithms must be flexible and able to support a variety of problem variants. In this work, we revisit the A* algorithm to build a simple, extensible, and unified algorithmic framework applicable to many route planning problems. A* has been previously used for routing in road networks. However, its performance was not competitive because no sufficiently fast and tight distance estimation function was available. We present a novel, efficient, and accurate A* heuristic using Contraction Hierarchies, another popular speedup technique. The core of our heuristic is a new Contraction Hierarchies query algorithm called Lazy RPHAST, which can efficiently compute shortest distances from many incrementally provided sources toward a common target. Additionally, we describe A* optimizations to accelerate the processing of low-degree vertices, which are typical in road networks, and present a new pruning criterion for symmetrical bidirectional A*. An extensive experimental study confirms the practicality of our approach for many applications
- …