157,386 research outputs found
An Efficient Representation for Filtrations of Simplicial Complexes
A filtration over a simplicial complex is an ordering of the simplices of
such that all prefixes in the ordering are subcomplexes of . Filtrations
are at the core of Persistent Homology, a major tool in Topological Data
Analysis. In order to represent the filtration of a simplicial complex, the
entire filtration can be appended to any data structure that explicitly stores
all the simplices of the complex such as the Hasse diagram or the recently
introduced Simplex Tree [Algorithmica '14]. However, with the popularity of
various computational methods that need to handle simplicial complexes, and
with the rapidly increasing size of the complexes, the task of finding a
compact data structure that can still support efficient queries is of great
interest.
In this paper, we propose a new data structure called the Critical Simplex
Diagram (CSD) which is a variant of the Simplex Array List (SAL) [Algorithmica
'17]. Our data structure allows one to store in a compact way the filtration of
a simplicial complex, and allows for the efficient implementation of a large
range of basic operations. Moreover, we prove that our data structure is
essentially optimal with respect to the requisite storage space. Finally, we
show that the CSD representation admits fast construction algorithms for Flag
complexes and relaxed Delaunay complexes.Comment: A preliminary version appeared in SODA 201
Exact Computation of a Manifold Metric, via Lipschitz Embeddings and Shortest Paths on a Graph
Data-sensitive metrics adapt distances locally based the density of data
points with the goal of aligning distances and some notion of similarity. In
this paper, we give the first exact algorithm for computing a data-sensitive
metric called the nearest neighbor metric. In fact, we prove the surprising
result that a previously published -approximation is an exact algorithm.
The nearest neighbor metric can be viewed as a special case of a
density-based distance used in machine learning, or it can be seen as an
example of a manifold metric. Previous computational research on such metrics
despaired of computing exact distances on account of the apparent difficulty of
minimizing over all continuous paths between a pair of points. We leverage the
exact computation of the nearest neighbor metric to compute sparse spanners and
persistent homology. We also explore the behavior of the metric built from
point sets drawn from an underlying distribution and consider the more general
case of inputs that are finite collections of path-connected compact sets.
The main results connect several classical theories such as the conformal
change of Riemannian metrics, the theory of positive definite functions of
Schoenberg, and screw function theory of Schoenberg and Von Neumann. We develop
novel proof techniques based on the combination of screw functions and
Lipschitz extensions that may be of independent interest.Comment: 15 page
Structural Analysis: Shape Information via Points-To Computation
This paper introduces a new hybrid memory analysis, Structural Analysis,
which combines an expressive shape analysis style abstract domain with
efficient and simple points-to style transfer functions. Using data from
empirical studies on the runtime heap structures and the programmatic idioms
used in modern object-oriented languages we construct a heap analysis with the
following characteristics: (1) it can express a rich set of structural, shape,
and sharing properties which are not provided by a classic points-to analysis
and that are useful for optimization and error detection applications (2) it
uses efficient, weakly-updating, set-based transfer functions which enable the
analysis to be more robust and scalable than a shape analysis and (3) it can be
used as the basis for a scalable interprocedural analysis that produces precise
results in practice.
The analysis has been implemented for .Net bytecode and using this
implementation we evaluate both the runtime cost and the precision of the
results on a number of well known benchmarks and real world programs. Our
experimental evaluations show that the domain defined in this paper is capable
of precisely expressing the majority of the connectivity, shape, and sharing
properties that occur in practice and, despite the use of weak updates, the
static analysis is able to precisely approximate the ideal results. The
analysis is capable of analyzing large real-world programs (over 30K bytecodes)
in less than 65 seconds and using less than 130MB of memory. In summary this
work presents a new type of memory analysis that advances the state of the art
with respect to expressive power, precision, and scalability and represents a
new area of study on the relationships between and combination of concepts from
shape and points-to analyses
FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search
We present FLASH (\textbf{F}ast \textbf{L}SH \textbf{A}lgorithm for
\textbf{S}imilarity search accelerated with \textbf{H}PC), a similarity search
system for ultra-high dimensional datasets on a single machine, that does not
require similarity computations and is tailored for high-performance computing
platforms. By leveraging a LSH style randomized indexing procedure and
combining it with several principled techniques, such as reservoir sampling,
recent advances in one-pass minwise hashing, and count based estimations, we
reduce the computational and parallelization costs of similarity search, while
retaining sound theoretical guarantees.
We evaluate FLASH on several real, high-dimensional datasets from different
domains, including text, malicious URL, click-through prediction, social
networks, etc. Our experiments shed new light on the difficulties associated
with datasets having several million dimensions. Current state-of-the-art
implementations either fail on the presented scale or are orders of magnitude
slower than FLASH. FLASH is capable of computing an approximate k-NN graph,
from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than
10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam
dataset, using brute-force (), will require at least 20 teraflops. We
provide CPU and GPU implementations of FLASH for replicability of our results
The Topology ToolKit
This system paper presents the Topology ToolKit (TTK), a software platform
designed for topological data analysis in scientific visualization. TTK
provides a unified, generic, efficient, and robust implementation of key
algorithms for the topological analysis of scalar data, including: critical
points, integral lines, persistence diagrams, persistence curves, merge trees,
contour trees, Morse-Smale complexes, fiber surfaces, continuous scatterplots,
Jacobi sets, Reeb spaces, and more. TTK is easily accessible to end users due
to a tight integration with ParaView. It is also easily accessible to
developers through a variety of bindings (Python, VTK/C++) for fast prototyping
or through direct, dependence-free, C++, to ease integration into pre-existing
complex systems. While developing TTK, we faced several algorithmic and
software engineering challenges, which we document in this paper. In
particular, we present an algorithm for the construction of a discrete gradient
that complies to the critical points extracted in the piecewise-linear setting.
This algorithm guarantees a combinatorial consistency across the topological
abstractions supported by TTK, and importantly, a unified implementation of
topological data simplification for multi-scale exploration and analysis. We
also present a cached triangulation data structure, that supports time
efficient and generic traversals, which self-adjusts its memory usage on demand
for input simplicial meshes and which implicitly emulates a triangulation for
regular grids with no memory overhead. Finally, we describe an original
software architecture, which guarantees memory efficient and direct accesses to
TTK features, while still allowing for researchers powerful and easy bindings
and extensions. TTK is open source (BSD license) and its code, online
documentation and video tutorials are available on TTK's website
- …