Search CORE

6,109 research outputs found

Effective data reduction algorithm for topological data analysis

Author: Choi Seonmi
Oh Jinseok
Park Jeong Rye
Yang Seung Yeop
Yun Hongdae
Publication venue
Publication date: 23/06/2023
Field of study

One of the most interesting tools that have recently entered the data science toolbox is topological data analysis (TDA). With the explosion of available data sizes and dimensions, identifying and extracting the underlying structure of a given dataset is a fundamental challenge in data science, and TDA provides a methodology for analyzing the shape of a dataset using tools and prospects from algebraic topology. However, the computational complexity makes it quickly infeasible to process large datasets, especially those with high dimensions. Here, we introduce a preprocessing strategy called the Characteristic Lattice Algorithm (CLA), which allows users to reduce the size of a given dataset as desired while maintaining geometric and topological features in order to make the computation of TDA feasible or to shorten its computation time. In addition, we derive a stability theorem and an upper bound of the barcode errors for CLA based on the bottleneck distance.Comment: 13 pages, 10 figures, 2 table

arXiv.org e-Print Archive

Topologically faithful image segmentation via induced matching of persistence barcodes

Author: Bauer Ulrich
Menze Bjoern
Paetzold Johannes C.
Shit Suprosanna
Stucki Nico
Publication venue
Publication date: 28/11/2022
Field of study

Image segmentation is a largely researched field where neural networks find vast applications in many facets of technology. Some of the most popular approaches to train segmentation networks employ loss functions optimizing pixel-overlap, an objective that is insufficient for many segmentation tasks. In recent years, their limitations fueled a growing interest in topology-aware methods, which aim to recover the correct topology of the segmented structures. However, so far, none of the existing approaches achieve a spatially correct matching between the topological features of ground truth and prediction. In this work, we propose the first topologically and feature-wise accurate metric and loss function for supervised image segmentation, which we term Betti matching. We show how induced matchings guarantee the spatially correct matching between barcodes in a segmentation setting. Furthermore, we propose an efficient algorithm to compute the Betti matching of images. We show that the Betti matching error is an interpretable metric to evaluate the topological correctness of segmentations, which is more sensitive than the well-established Betti number error. Moreover, the differentiability of the Betti matching loss enables its use as a loss function. It improves the topological performance of segmentation networks across six diverse datasets while preserving the volumetric performance. Our code is available in https://github.com/nstucki/Betti-matching

arXiv.org e-Print Archive

パターン照合問題に対する高速なアルゴリズム

Author: Diptarama Hendrian
Publication venue
Publication date: 27/03/2018
Field of study

Tohoku University篠原歩課

Tohoku University Repository (TOUR) / 東北大学機関リポジトリ

A Geometric Perspective on Sparse Filtrations

Author: Cavanna Nicholas J.
Jahanseir Mahmoodreza
Sheehy Donald R.
Publication venue
Publication date: 11/06/2015
Field of study

We present a geometric perspective on sparse filtrations used in topological data analysis. This new perspective leads to much simpler proofs, while also being more general, applying equally to Rips filtrations and Cech filtrations for any convex metric. We also give an algorithm for finding the simplices in such a filtration and prove that the vertex removal can be implemented as a sequence of elementary edge collapses

arXiv.org e-Print Archive

CiteSeerX

Computing multiparameter persistent homology through a discrete Morse-based approach

Author: De Floriani L.
Iuricich F.
Landi C.
Scaramuccia S.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Persistent homology allows for tracking topological features, like loops, holes and their higher-dimensional analogues, along a single-parameter family of nested shapes. Computing descriptors for complex data characterized by multiple parameters is becoming a major challenging task in several applications, including physics, chemistry, medicine, and geography. Multiparameter persistent homology generalizes persistent homology to allow for the exploration and analysis of shapes endowed with multiple filtering functions. Still, computational constraints prevent multiparameter persistent homology to be a feasible tool for analyzing large size data sets. We consider discrete Morse theory as a strategy to reduce the computation of multiparameter persistent homology by working on a reduced dataset. We propose a new preprocessing algorithm, well suited for parallel and distributed implementations, and we provide the first evaluation of the impact of multiparameter persistent homology on computations

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

An Encoding for Order-Preserving Matching

Author: Gagie Travis
Manzini Giovanni
Venturini Rossano
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th Annual European Symposium on Algorithms (ESA 2017)
Publication date: 01/01/2017
Field of study

Encoding data structures store enough information to answer the queries they are meant to support but not enough to recover their underlying datasets. In this paper we give the first encoding data structure for the challenging problem of order-preserving pattern matching. This problem was introduced only a few years ago but has already attracted significant attention because of its applications in data analysis. Two strings are said to be an order-preserving match if the relative order of their characters is the same: e.g., (4, 1, 3, 2) and (10, 3, 7, 5) are an order-preserving match. We show how, given a string S[1..n] over an arbitrary alphabet of size sigma and a constant c >=1, we can build an O(n log log n)-bit encoding such that later, given a pattern P[1..m] with m >= log^c n, we can return the number of order-preserving occurrences of P in S in O(m) time. Within the same time bound we can also return the starting position of some order-preserving match for P in S (if such a match exists). We prove that our space bound is within a constant factor of optimal if log(sigma) = Omega(log log n); our query time is optimal if log(sigma) = Omega(log n). Our space bound contrasts with the Omega(n log n) bits needed in the worst case to store S itself, an index for order-preserving pattern matching with no restrictions on the pattern length, or an index for standard pattern matching even with restrictions on the pattern length. Moreover, we can build our encoding knowing only how each character compares to O(log^c n) neighbouring characters

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Dagstuhl Research Online Publication Server

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

4D Seismic History Matching Incorporating Unsupervised Learning

Author: Etienam Clement
Publication venue
Publication date: 01/01/2019
Field of study

The work discussed and presented in this paper focuses on the history matching of reservoirs by integrating 4D seismic data into the inversion process using machine learning techniques. A new integrated scheme for the reconstruction of petrophysical properties with a modified Ensemble Smoother with Multiple Data Assimilation (ES-MDA) in a synthetic reservoir is proposed. The permeability field inside the reservoir is parametrised with an unsupervised learning approach, namely K-means with Singular Value Decomposition (K-SVD). This is combined with the Orthogonal Matching Pursuit (OMP) technique which is very typical for sparsity promoting regularisation schemes. Moreover, seismic attributes, in particular, acoustic impedance, are parametrised with the Discrete Cosine Transform (DCT). This novel combination of techniques from machine learning, sparsity regularisation, seismic imaging and history matching aims to address the ill-posedness of the inversion of historical production data efficiently using ES-MDA. In the numerical experiments provided, I demonstrate that these sparse representations of the petrophysical properties and the seismic attributes enables to obtain better production data matches to the true production data and to quantify the propagating waterfront better compared to more traditional methods that do not use comparable parametrisation techniques

arXiv.org e-Print Archive

Crossref

The University of Manchester - Institutional Repository