74 research outputs found

    Persistent Homology Tools for Image Analysis

    Get PDF
    Topological Data Analysis (TDA) is a new field of mathematics emerged rapidly since the first decade of the century from various works of algebraic topology and geometry. The goal of TDA and its main tool of persistent homology (PH) is to provide topological insight into complex and high dimensional datasets. We take this premise onboard to get more topological insight from digital image analysis and quantify tiny low-level distortion that are undetectable except possibly by highly trained persons. Such image distortion could be caused intentionally (e.g. by morphing and steganography) or naturally in abnormal human tissue/organ scan images as a result of onset of cancer or other diseases. The main objective of this thesis is to design new image analysis tools based on persistent homological invariants representing simplicial complexes on sets of pixel landmarks over a sequence of distance resolutions. We first start by proposing innovative automatic techniques to select image pixel landmarks to build a variety of simplicial topologies from a single image. Effectiveness of each image landmark selection demonstrated by testing on different image tampering problems such as morphed face detection, steganalysis and breast tumour detection. Vietoris-Rips simplicial complexes constructed based on the image landmarks at an increasing distance threshold and topological (homological) features computed at each threshold and summarized in a form known as persistent barcodes. We vectorise the space of persistent barcodes using a technique known as persistent binning where we demonstrated the strength of it for various image analysis purposes. Different machine learning approaches are adopted to develop automatic detection of tiny texture distortion in many image analysis applications. Homological invariants used in this thesis are the 0 and 1 dimensional Betti numbers. We developed an innovative approach to design persistent homology (PH) based algorithms for automatic detection of the above described types of image distortion. In particular, we developed the first PH-detector of morphing attacks on passport face biometric images. We shall demonstrate significant accuracy of 2 such morph detection algorithms with 4 types of automatically extracted image landmarks: Local Binary patterns (LBP), 8-neighbour super-pixels (8NSP), Radial-LBP (R-LBP) and centre-symmetric LBP (CS-LBP). Using any of these techniques yields several persistent barcodes that summarise persistent topological features that help gaining insights into complex hidden structures not amenable by other image analysis methods. We shall also demonstrate significant success of a similarly developed PH-based universal steganalysis tool capable for the detection of secret messages hidden inside digital images. We also argue through a pilot study that building PH records from digital images can differentiate breast malignant tumours from benign tumours using digital mammographic images. The research presented in this thesis creates new opportunities to build real applications based on TDA and demonstrate many research challenges in a variety of image processing/analysis tasks. For example, we describe a TDA-based exemplar image inpainting technique (TEBI), superior to existing exemplar algorithm, for the reconstruction of missing image regions

    Multi-site, Multi-domain Airway Tree Modeling (ATM'22): A Public Benchmark for Pulmonary Airway Segmentation

    Full text link
    Open international challenges are becoming the de facto standard for assessing computer vision and image analysis algorithms. In recent years, new methods have extended the reach of pulmonary airway segmentation that is closer to the limit of image resolution. Since EXACT'09 pulmonary airway segmentation, limited effort has been directed to quantitative comparison of newly emerged algorithms driven by the maturity of deep learning based approaches and clinical drive for resolving finer details of distal airways for early intervention of pulmonary diseases. Thus far, public annotated datasets are extremely limited, hindering the development of data-driven methods and detailed performance evaluation of new algorithms. To provide a benchmark for the medical imaging community, we organized the Multi-site, Multi-domain Airway Tree Modeling (ATM'22), which was held as an official challenge event during the MICCAI 2022 conference. ATM'22 provides large-scale CT scans with detailed pulmonary airway annotation, including 500 CT scans (300 for training, 50 for validation, and 150 for testing). The dataset was collected from different sites and it further included a portion of noisy COVID-19 CTs with ground-glass opacity and consolidation. Twenty-three teams participated in the entire phase of the challenge and the algorithms for the top ten teams are reviewed in this paper. Quantitative and qualitative results revealed that deep learning models embedded with the topological continuity enhancement achieved superior performance in general. ATM'22 challenge holds as an open-call design, the training data and the gold standard evaluation are available upon successful registration via its homepage.Comment: 32 pages, 16 figures. Homepage: https://atm22.grand-challenge.org/. Submitte

    Topological Data Analysis of Weight Spaces in Convolutional Neural Networks

    Get PDF
    Convolutional Neural Networks (CNNs) have become one of the most commonly used tools for performing image classification. Unfortunately, as with most machine learning algorithms, CNNs suffer from a lack of interpretability. CNNs are trained by using a training data set and a loss function to tune a set of parameters known as the layer weights. This tuning process is based on the classical method of gradient descent, but it relies on a strong stochastic component, which makes the weight behavior during training difficult to understand. However, since CNNs are governed largely by the weights that make up each of the layers, if one can gain an understanding of the space in which these weights lie, then much can be learned about the structure of the CNN and how it calculates its output. Topological Data Analysis (TDA) is a recent addition to the field of data science, which uses ideas from geometry and algebraic topology to create a novel methodology for analyzing high-dimensional datasets. Specifically, TDA offers a mathematically rigorous method for studying the structure of CNN weight spaces. In this thesis, we use TDA to study the weights of a binary classification CNN model trained on a large dataset known as Dogs vs Cats. Our analysis reveals that, during training, the 3x3 convolutional filter weights of the CNN model in question exhibit non-trivial homological properties. Namely, persistent 1-cycles occur within the first homology groups. This structure is similar to the structure that is found in 3x3 high-variance image patches of natural images, demonstrating that a CNN built on this data set learns features of the ambient structure of the image data. This demonstrates the validity of the CNN and, along with work done by Carlsson and Gabrielsson, furthers the hypothesis that convolutional layer weights arising from training a CNN on natural image data lie on a space with non-trivial geometry, in particular a non-empty first homology group

    Piecewise Linear Manifold Clustering

    Full text link
    This work studies the application of topological analysis to non-linear manifold clustering. A novel method, that exploits the data clustering structure, allows to generate a topological representation of the point dataset. An analysis of topological construction under different simulated conditions is performed to explore the capabilities and limitations of the method, and demonstrated statistically significant improvements in performance. Furthermore, we introduce a new information-theoretical validation measure for clustering, that exploits geometrical properties of clusters to estimate clustering compressibility, for evaluation of the clustering goodness-of-fit without any prior information about true class assignments. We show how the new validation measure, when used as regularization criteria, allows creation of clusters that are more informative. A final contribution is a new metaclustering technique that allows to create a model-based clustering beyond point and linear shaped structures. Driven by topological structure and our information-theoretical criteria, this technique provides structured view of the data on new comprehensive and interpretation level. Improvements of our clustering approach are demonstrated on a variety of synthetic and real datasets, including image and climatological data

    Advances of Machine Learning in Materials Science: Ideas and Techniques

    Full text link
    In this big data era, the use of large dataset in conjunction with machine learning (ML) has been increasingly popular in both industry and academia. In recent times, the field of materials science is also undergoing a big data revolution, with large database and repositories appearing everywhere. Traditionally, materials science is a trial-and-error field, in both the computational and experimental departments. With the advent of machine learning-based techniques, there has been a paradigm shift: materials can now be screened quickly using ML models and even generated based on materials with similar properties; ML has also quietly infiltrated many sub-disciplinary under materials science. However, ML remains relatively new to the field and is expanding its wing quickly. There are a plethora of readily-available big data architectures and abundance of ML models and software; The call to integrate all these elements in a comprehensive research procedure is becoming an important direction of material science research. In this review, we attempt to provide an introduction and reference of ML to materials scientists, covering as much as possible the commonly used methods and applications, and discussing the future possibilities.Comment: 80 pages; 22 figures. To be published in Frontiers of Physics, 18, xxxxx, (2023

    New Directions for Contact Integrators

    Get PDF
    Contact integrators are a family of geometric numerical schemes which guarantee the conservation of the contact structure. In this work we review the construction of both the variational and Hamiltonian versions of these methods. We illustrate some of the advantages of geometric integration in the dissipative setting by focusing on models inspired by recent studies in celestial mechanics and cosmology.Comment: To appear as Chapter 24 in GSI 2021, Springer LNCS 1282

    Q(sqrt(-3))-Integral Points on a Mordell Curve

    Get PDF
    We use an extension of quadratic Chabauty to number fields,recently developed by the author with Balakrishnan, Besser and M ̈uller,combined with a sieving technique, to determine the integral points overQ(√−3) on the Mordell curve y2 = x3 − 4

    Computational and Theoretical Issues of Multiparameter Persistent Homology for Data Analysis

    Get PDF
    The basic goal of topological data analysis is to apply topology-based descriptors to understand and describe the shape of data. In this context, homology is one of the most relevant topological descriptors, well-appreciated for its discrete nature, computability and dimension independence. A further development is provided by persistent homology, which allows to track homological features along a oneparameter increasing sequence of spaces. Multiparameter persistent homology, also called multipersistent homology, is an extension of the theory of persistent homology motivated by the need of analyzing data naturally described by several parameters, such as vector-valued functions. Multipersistent homology presents several issues in terms of feasibility of computations over real-sized data and theoretical challenges in the evaluation of possible descriptors. The focus of this thesis is in the interplay between persistent homology theory and discrete Morse Theory. Discrete Morse theory provides methods for reducing the computational cost of homology and persistent homology by considering the discrete Morse complex generated by the discrete Morse gradient in place of the original complex. The work of this thesis addresses the problem of computing multipersistent homology, to make such tool usable in real application domains. This requires both computational optimizations towards the applications to real-world data, and theoretical insights for finding and interpreting suitable descriptors. Our computational contribution consists in proposing a new Morse-inspired and fully discrete preprocessing algorithm. We show the feasibility of our preprocessing over real datasets, and evaluate the impact of the proposed algorithm as a preprocessing for computing multipersistent homology. A theoretical contribution of this thesis consists in proposing a new notion of optimality for such a preprocessing in the multiparameter context. We show that the proposed notion generalizes an already known optimality notion from the one-parameter case. Under this definition, we show that the algorithm we propose as a preprocessing is optimal in low dimensional domains. In the last part of the thesis, we consider preliminary applications of the proposed algorithm in the context of topology-based multivariate visualization by tracking critical features generated by a discrete gradient field compatible with the multiple scalar fields under study. We discuss (dis)similarities of such critical features with the state-of-the-art techniques in topology-based multivariate data visualization
    • …
    corecore