19 research outputs found
Computational and Theoretical Issues of Multiparameter Persistent Homology for Data Analysis
The basic goal of topological data analysis is to apply topology-based descriptors
to understand and describe the shape of data. In this context, homology is one of
the most relevant topological descriptors, well-appreciated for its discrete nature,
computability and dimension independence. A further development is provided
by persistent homology, which allows to track homological features along a oneparameter
increasing sequence of spaces. Multiparameter persistent homology, also
called multipersistent homology, is an extension of the theory of persistent homology
motivated by the need of analyzing data naturally described by several parameters,
such as vector-valued functions. Multipersistent homology presents several issues in
terms of feasibility of computations over real-sized data and theoretical challenges
in the evaluation of possible descriptors. The focus of this thesis is in the interplay
between persistent homology theory and discrete Morse Theory. Discrete Morse
theory provides methods for reducing the computational cost of homology and persistent
homology by considering the discrete Morse complex generated by the discrete
Morse gradient in place of the original complex. The work of this thesis addresses
the problem of computing multipersistent homology, to make such tool usable in real
application domains. This requires both computational optimizations towards the
applications to real-world data, and theoretical insights for finding and interpreting
suitable descriptors. Our computational contribution consists in proposing a new
Morse-inspired and fully discrete preprocessing algorithm. We show the feasibility
of our preprocessing over real datasets, and evaluate the impact of the proposed
algorithm as a preprocessing for computing multipersistent homology. A theoretical
contribution of this thesis consists in proposing a new notion of optimality for such
a preprocessing in the multiparameter context. We show that the proposed notion
generalizes an already known optimality notion from the one-parameter case. Under
this definition, we show that the algorithm we propose as a preprocessing is optimal
in low dimensional domains. In the last part of the thesis, we consider preliminary
applications of the proposed algorithm in the context of topology-based multivariate
visualization by tracking critical features generated by a discrete gradient field compatible
with the multiple scalar fields under study. We discuss (dis)similarities of such
critical features with the state-of-the-art techniques in topology-based multivariate
data visualization
Jacobi Fiber Surfaces for Bivariate Reeb Space Computation
This paper presents an efficient algorithm for the computation of the Reeb space of an input bivariate piecewise linear scalar function f defined on a tetrahedral mesh. By extending and generalizing algorithmic concepts from the univariate case to the bivariate one, we report the first practical, output-sensitive algorithm for the exact computation of such a Reeb space. The algorithm starts by identifying the Jacobi set of f , the bivariate analogs of critical points in the univariate case. Next, the Reeb space is computed by segmenting the input mesh along the new notion of Jacobi Fiber Surfaces, the bivariate analog of critical contours in the univariate case. We additionally present a simplification heuristic that enables the progressive coarsening of the Reeb space. Our algorithm is simple to implement and most of its computations can be trivially parallelized. We report performance numbers demonstrating orders of magnitude speedups over previous approaches, enabling for the first time the tractable computation of bivariate Reeb spaces in practice. Moreover, unlike range-based quantization approaches (such as the Joint Contour Net), our algorithm is parameter-free. We demonstrate the utility of our approach by using the Reeb space as a semi-automatic segmentation tool for bivariate data. In particular, we introduce continuous scatterplot peeling, a technique which enables the reduction of the cluttering in the continuous scatterplot, by interactively selecting the features of the Reeb space to project. We provide a VTK-based C++ implementation of our algorithm that can be used for reproduction purposes or for the development of new Reeb space based visualization techniques
The Topology ToolKit
This system paper presents the Topology ToolKit (TTK), a software platform
designed for topological data analysis in scientific visualization. TTK
provides a unified, generic, efficient, and robust implementation of key
algorithms for the topological analysis of scalar data, including: critical
points, integral lines, persistence diagrams, persistence curves, merge trees,
contour trees, Morse-Smale complexes, fiber surfaces, continuous scatterplots,
Jacobi sets, Reeb spaces, and more. TTK is easily accessible to end users due
to a tight integration with ParaView. It is also easily accessible to
developers through a variety of bindings (Python, VTK/C++) for fast prototyping
or through direct, dependence-free, C++, to ease integration into pre-existing
complex systems. While developing TTK, we faced several algorithmic and
software engineering challenges, which we document in this paper. In
particular, we present an algorithm for the construction of a discrete gradient
that complies to the critical points extracted in the piecewise-linear setting.
This algorithm guarantees a combinatorial consistency across the topological
abstractions supported by TTK, and importantly, a unified implementation of
topological data simplification for multi-scale exploration and analysis. We
also present a cached triangulation data structure, that supports time
efficient and generic traversals, which self-adjusts its memory usage on demand
for input simplicial meshes and which implicitly emulates a triangulation for
regular grids with no memory overhead. Finally, we describe an original
software architecture, which guarantees memory efficient and direct accesses to
TTK features, while still allowing for researchers powerful and easy bindings
and extensions. TTK is open source (BSD license) and its code, online
documentation and video tutorials are available on TTK's website
A Topological Distance between Multi-fields based on Multi-Dimensional Persistence Diagrams
The problem of computing topological distance between two scalar fields based
on Reeb graphs or contour trees has been studied and applied successfully to
various problems in topological shape matching, data analysis, and
visualization. However, generalizing such results for computing distance
measures between two multi-fields based on their Reeb spaces is still in its
infancy. Towards this, in the current paper we propose a technique to compute
an effective distance measure between two multi-fields by computing a novel
\emph{multi-dimensional persistence diagram} (MDPD) corresponding to each of
the (quantized) Reeb spaces. First, we construct a multi-dimensional Reeb graph
(MDRG), which is a hierarchical decomposition of the Reeb space into a
collection of Reeb graphs. The MDPD corresponding to each MDRG is then computed
based on the persistence diagrams of the component Reeb graphs of the MDRG. Our
distance measure extends the Wasserstein distance between two persistence
diagrams of Reeb graphs to MDPDs of MDRGs. We prove that the proposed measure
is a pseudo-metric and satisfies a stability property. Effectiveness of the
proposed distance measure has been demonstrated in (i) shape retrieval contest
data - SHREC and (ii) Pt-CO bond detection data from computational
chemistry. Experimental results show that the proposed distance measure based
on the Reeb spaces has more discriminating power in clustering the shapes and
detecting the formation of a stable Pt-CO bond as compared to the similar
measures between Reeb graphs.Comment: Acepted in the IEEE Transactions on Visualization and Computer
Graphic
Hypersweeps, Convective Clouds and Reeb Spaces
Isosurfaces are one of the most prominent tools in scientific data visualisation. An isosurface is a surface that defines the boundary of a feature of interest in space for a given threshold. This is integral in analysing data from the physical sciences which observe and simulate three or four dimensional phenomena. However it is time consuming and impractical to discover surfaces of interest by manually selecting different thresholds.
The systematic way to discover significant isosurfaces in data is with a topological data structure called the contour tree. The contour tree encodes the connectivity and shape of each isosurface at all possible thresholds. The first part of this work has been devoted to developing algorithms that use the contour tree to discover significant features in data using high performance computing systems. Those algorithms provided a clear speedup over previous methods and were used to visualise physical plasma simulations.
A major limitation of isosurfaces and contour trees is that they are only applicable when a single property is associated with data points. However scientific data sets often take multiple properties into account. A recent breakthrough generalised isosurfaces to fiber surfaces. Fiber surfaces define the boundary of a feature where the threshold is defined in terms of multiple parameters, instead of just one. In this work we used fiber surfaces together with isosurfaces and the contour tree to create a novel application that helps atmosphere scientists visualise convective cloud formation. Using this application, they were able to, for the first time, visualise the physical properties of certain structures that trigger cloud formation.
Contour trees can also be generalised to handle multiple parameters. The natural extension of the contour tree is called the Reeb space and it comes from the pure mathematical field of fiber topology. The Reeb space is not yet fully understood mathematically and algorithms for computing it have significant practical limitations. A key difficulty is that while the contour tree is a traditional one dimensional data structure made up of points and lines between them, the Reeb space is far more complex. The Reeb space is made up of two dimensional sheets, attached to each other in intricate ways. The last part of this work focuses on understanding the structure of Reeb spaces and the rules that are followed when sheets are combined. This theory builds towards developing robust combinatorial algorithms to compute and use Reeb spaces for practical data analysis
Multivariate Topology Simplification
Topological simplification of scalar and vector fields is well-established as an effective method for analysing and visualising complex data sets. For multivariate (alternatively, multi-field) data, topological analysis requires simultaneous advances both mathematically and computationally. We propose a robust multivariate topology simplification method based on âlipâ-pruning from the Reeb space. Mathematically, we show that the projection of the Jacobi set of multivariate data into the Reeb space produces a Jacobi structure that separates the Reeb space into simple components. We also show that the dual graph of these components gives rise to a Reeb skeleton that has properties similar to the scalar contour tree and Reeb graph, for topologically simple domains. We then introduce a range measure to give a scaling-invariant total ordering of the components or features that can be used for simplification. Computationally, we show how to compute Jacobi structure, Reeb skeleton, range and geometric measures in the Joint Contour Net (an approximation of the Reeb space) and that these can be used for visualisation similar to the contour tree or Reeb graph
Pattern search for the visualization of scalar, vector, and line fields
The main topic of this thesis is pattern search in data sets for the purpose of visual data analysis. By giving a reference pattern, pattern search aims to discover similar occurrences in a data set with invariance to translation, rotation and scaling. To address this problem, we developed algorithms dealing with different types of data: scalar fields, vector fields, and line fields.
For scalar fields, we use the SIFT algorithm (Scale-Invariant Feature Transform) to find a sparse sampling of prominent features in the data with invariance to translation, rotation, and scaling. Then, the user can define a pattern as a set of SIFT features by e.g. brushing a region of interest. Finally, we locate and rank matching patterns in the entire data set. Due to the sparsity and accuracy of SIFT features, we achieve fast and memory-saving pattern query in large scale scalar fields.
For vector fields, we propose a hashing strategy in scale space to accelerate the convolution-based pattern query. We encode the local flow behavior in scale space using a sequence of hierarchical base descriptors, which are pre-computed and hashed into a number of hash tables. This ensures a fast fetching of similar occurrences in the flow and requires only a constant number of table lookups.
For line fields, we present a stream line segmentation algorithm to split long stream lines into globally-consistent segments, which provides similar segmentations for similar flow structures. It gives the benefit of isolating a pattern from long and dense stream lines, so that our patterns can be defined sparsely and have a significant extent, i.e., they are integration-based and not local. This allows for a greater flexibility in defining features of interest. For user-defined patterns of curve segments, our algorithm finds similar ones that are invariant to similarity transformations.
Additionally, we present a method for shape recovery from multiple views. This semi-automatic method fits a template mesh to high-resolution normal data. In contrast to existing 3D reconstruction approaches, we accelerate the data acquisition time by omitting the structured light scanning step of obtaining low frequency 3D information.Das Hauptthema dieser Arbeit ist die Mustersuche in DatensÀtzen zur visuellen Datenanalyse. Durch die Vorgabe eines Referenzmusters versucht die Mustersuche Àhnliche Vorkommen in einem Datensatz mit Translations-, Rotations- und Skalierungsinvarianz zu entdecken. In diesem Zusammenhang haben wir Algorithmen entwickelt, die sich mit verschiedenen Arten von Daten befassen: Skalarfelder, Vektorfelder und Linienfelder.
Bei Skalarfeldern benutzen wir den SIFT-Algorithmus (Scale-Invariant Feature Transform), um ein spĂ€rliches Abtasten von markanten Merkmalen in Daten mit Translations-, Rotations- und Skalierungsinvarianz zu finden. Danach kann der Benutzer ein Muster als Menge von SIFT-Merkmalspunkten definieren, zum Beispiel durch Markieren einer interessierenden Region. SchlieĂlich lokalisieren wir passende Muster im gesamten Datensatz und stufen sie ein. Aufgrund der spĂ€rlichen Verteilung und der Genauigkeit von SIFT-Merkmalspunkten erreichen wir eine schnelle und speichersparende Musterabfrage in groĂen Skalarfeldern.
FĂŒr Vektorfelder schlagen wir eine Hashing-Strategie zur Beschleunigung der faltungsbasierten Musterabfrage im Skalenraum vor. Wir kodieren das lokale Flussverhalten im Skalenraum durch eine Sequenz von hierarchischen Basisdeskriptoren, welche vorberechnet und als Zahlen in einer Hashtabelle gespeichert sind. Dies stellt eine schnelle Abfrage von Ă€hnlichen Vorkommen im Fluss sicher und benötigt lediglich eine konstante Anzahl von Nachschlageoperationen in der Tabelle.
FĂŒr Linienfelder prĂ€sentieren wir einen Algorithmus zur Segmentierung von Stromlinien, um lange Stromlinen in global konsistente Segmente aufzuteilen. Dies erlaubt eine gröĂere FlexibilitĂ€t bei der Definition von Mustern. FĂŒr vom Benutzer definierte Muster von Kurvensegmenten findet unser Algorithmus Ă€hnliche Kurvensegmente, die unter Ăhnlichkeitstransformationen invariant sind.
ZusÀtzlich prÀsentieren wir eine Methode zur Rekonstruktion von Formen aus mehreren Ansichten. Diese halbautomatische Methode passt ein Template an hochauflösendeNormalendatenan. Im Gegensatz zu existierenden 3D-Rekonstruktionsverfahren beschleunigen wir die Datenaufnahme, indem wir auf die Streifenprojektion verzichten, um niederfrequente 3D Informationen zu gewinnen