74 research outputs found
Persistent Homology Tools for Image Analysis
Topological Data Analysis (TDA) is a new field of mathematics emerged rapidly since the first decade of the century from various works of algebraic topology and
geometry. The goal of TDA and its main tool of persistent homology (PH) is to provide topological insight into complex and high dimensional datasets. We take this
premise onboard to get more topological insight from digital image analysis and quantify tiny low-level distortion that are undetectable except possibly by highly trained persons. Such image distortion could be caused intentionally (e.g. by morphing and steganography) or naturally in abnormal human tissue/organ scan images as a result of onset of cancer or other diseases.
The main objective of this thesis is to design new image analysis tools based on persistent homological invariants representing simplicial complexes on sets of pixel landmarks over a sequence of distance resolutions. We first start by proposing innovative automatic techniques to select image pixel landmarks to build a variety of
simplicial topologies from a single image. Effectiveness of each image landmark selection demonstrated by testing on different image tampering problems such as morphed face detection, steganalysis and breast tumour detection.
Vietoris-Rips simplicial complexes constructed based on the image landmarks at an increasing distance threshold and topological (homological) features computed at each threshold and summarized in a form known as persistent barcodes. We vectorise the space of persistent barcodes using a technique known as persistent binning where we demonstrated the strength of it for various image analysis purposes. Different machine learning approaches are adopted to develop automatic detection of tiny
texture distortion in many image analysis applications. Homological invariants used in this thesis are the 0 and 1 dimensional Betti numbers. We developed an innovative approach to design persistent homology (PH) based
algorithms for automatic detection of the above described types of image distortion. In particular, we developed the first PH-detector of morphing attacks on passport face biometric images. We shall demonstrate significant accuracy of 2 such morph detection algorithms with 4 types of automatically extracted image landmarks: Local Binary patterns (LBP), 8-neighbour super-pixels (8NSP), Radial-LBP (R-LBP) and centre-symmetric LBP (CS-LBP). Using any of these techniques yields several persistent barcodes that summarise persistent topological features that help gaining insights into complex hidden structures not amenable by other image analysis methods. We shall also demonstrate significant success of a similarly developed PH-based universal steganalysis tool capable for the detection of secret messages hidden inside digital images. We also argue through a pilot study that building PH records from digital images can differentiate breast malignant tumours from benign tumours using digital mammographic images. The research presented in this thesis creates new opportunities to build real applications based on TDA and demonstrate many research challenges in a variety of image processing/analysis tasks. For example, we describe a TDA-based exemplar image inpainting technique (TEBI), superior to existing exemplar algorithm, for the reconstruction of missing image regions
Multi-site, Multi-domain Airway Tree Modeling (ATM'22): A Public Benchmark for Pulmonary Airway Segmentation
Open international challenges are becoming the de facto standard for
assessing computer vision and image analysis algorithms. In recent years, new
methods have extended the reach of pulmonary airway segmentation that is closer
to the limit of image resolution. Since EXACT'09 pulmonary airway segmentation,
limited effort has been directed to quantitative comparison of newly emerged
algorithms driven by the maturity of deep learning based approaches and
clinical drive for resolving finer details of distal airways for early
intervention of pulmonary diseases. Thus far, public annotated datasets are
extremely limited, hindering the development of data-driven methods and
detailed performance evaluation of new algorithms. To provide a benchmark for
the medical imaging community, we organized the Multi-site, Multi-domain Airway
Tree Modeling (ATM'22), which was held as an official challenge event during
the MICCAI 2022 conference. ATM'22 provides large-scale CT scans with detailed
pulmonary airway annotation, including 500 CT scans (300 for training, 50 for
validation, and 150 for testing). The dataset was collected from different
sites and it further included a portion of noisy COVID-19 CTs with ground-glass
opacity and consolidation. Twenty-three teams participated in the entire phase
of the challenge and the algorithms for the top ten teams are reviewed in this
paper. Quantitative and qualitative results revealed that deep learning models
embedded with the topological continuity enhancement achieved superior
performance in general. ATM'22 challenge holds as an open-call design, the
training data and the gold standard evaluation are available upon successful
registration via its homepage.Comment: 32 pages, 16 figures. Homepage: https://atm22.grand-challenge.org/.
Submitte
Topological Data Analysis of Weight Spaces in Convolutional Neural Networks
Convolutional Neural Networks (CNNs) have become one of the most commonly used tools for performing image classification. Unfortunately, as with most machine learning algorithms, CNNs suffer from a lack of interpretability. CNNs are trained by using a training data set and a loss function to tune a set of parameters known as the layer weights. This tuning process is based on the classical method of gradient descent, but it relies on a strong stochastic component, which makes the weight behavior during training difficult to understand. However, since CNNs are governed largely by the weights that make up each of the layers, if one can gain an understanding of the space in which these weights lie, then much can be learned about the structure of the CNN and how it calculates its output. Topological Data Analysis (TDA) is a recent addition to the field of data science, which uses ideas from geometry and algebraic topology to create a novel methodology for analyzing high-dimensional datasets. Specifically, TDA offers a mathematically rigorous method for studying the structure of CNN weight spaces. In this thesis, we use TDA to study the weights of a binary classification CNN model trained on a large dataset known as Dogs vs Cats. Our analysis reveals that, during training, the 3x3 convolutional filter weights of the CNN model in question exhibit non-trivial homological properties. Namely, persistent 1-cycles occur within the first homology groups. This structure is similar to the structure that is found in 3x3 high-variance image patches of natural images, demonstrating that a CNN built on this data set learns features of the ambient structure of the image data. This demonstrates the validity of the CNN and, along with work done by Carlsson and Gabrielsson, furthers the hypothesis that convolutional layer weights arising from training a CNN on natural image data lie on a space with non-trivial geometry, in particular a non-empty first homology group
Piecewise Linear Manifold Clustering
This work studies the application of topological analysis to non-linear manifold clustering. A novel method, that exploits the data clustering structure, allows to generate a topological representation of the point dataset. An analysis of topological construction under different simulated conditions is performed to explore the capabilities and limitations of the method, and demonstrated statistically significant improvements in performance. Furthermore, we introduce a new information-theoretical validation measure for clustering, that exploits geometrical properties of clusters to estimate clustering compressibility, for evaluation of the clustering goodness-of-fit without any prior information about true class assignments. We show how the new validation measure, when used as regularization criteria, allows creation of clusters that are more informative. A final contribution is a new metaclustering technique that allows to create a model-based clustering beyond point and linear shaped structures. Driven by topological structure and our information-theoretical criteria, this technique provides structured view of the data on new comprehensive and interpretation level. Improvements of our clustering approach are demonstrated on a variety of synthetic and real datasets, including image and climatological data
Advances of Machine Learning in Materials Science: Ideas and Techniques
In this big data era, the use of large dataset in conjunction with machine
learning (ML) has been increasingly popular in both industry and academia. In
recent times, the field of materials science is also undergoing a big data
revolution, with large database and repositories appearing everywhere.
Traditionally, materials science is a trial-and-error field, in both the
computational and experimental departments. With the advent of machine
learning-based techniques, there has been a paradigm shift: materials can now
be screened quickly using ML models and even generated based on materials with
similar properties; ML has also quietly infiltrated many sub-disciplinary under
materials science. However, ML remains relatively new to the field and is
expanding its wing quickly. There are a plethora of readily-available big data
architectures and abundance of ML models and software; The call to integrate
all these elements in a comprehensive research procedure is becoming an
important direction of material science research. In this review, we attempt to
provide an introduction and reference of ML to materials scientists, covering
as much as possible the commonly used methods and applications, and discussing
the future possibilities.Comment: 80 pages; 22 figures. To be published in Frontiers of Physics, 18,
xxxxx, (2023
New Directions for Contact Integrators
Contact integrators are a family of geometric numerical schemes which
guarantee the conservation of the contact structure. In this work we review the
construction of both the variational and Hamiltonian versions of these methods.
We illustrate some of the advantages of geometric integration in the
dissipative setting by focusing on models inspired by recent studies in
celestial mechanics and cosmology.Comment: To appear as Chapter 24 in GSI 2021, Springer LNCS 1282
Q(sqrt(-3))-Integral Points on a Mordell Curve
We use an extension of quadratic Chabauty to number fields,recently developed by the author with Balakrishnan, Besser and M ̈uller,combined with a sieving technique, to determine the integral points overQ(√−3) on the Mordell curve y2 = x3 − 4
Computational and Theoretical Issues of Multiparameter Persistent Homology for Data Analysis
The basic goal of topological data analysis is to apply topology-based descriptors
to understand and describe the shape of data. In this context, homology is one of
the most relevant topological descriptors, well-appreciated for its discrete nature,
computability and dimension independence. A further development is provided
by persistent homology, which allows to track homological features along a oneparameter
increasing sequence of spaces. Multiparameter persistent homology, also
called multipersistent homology, is an extension of the theory of persistent homology
motivated by the need of analyzing data naturally described by several parameters,
such as vector-valued functions. Multipersistent homology presents several issues in
terms of feasibility of computations over real-sized data and theoretical challenges
in the evaluation of possible descriptors. The focus of this thesis is in the interplay
between persistent homology theory and discrete Morse Theory. Discrete Morse
theory provides methods for reducing the computational cost of homology and persistent
homology by considering the discrete Morse complex generated by the discrete
Morse gradient in place of the original complex. The work of this thesis addresses
the problem of computing multipersistent homology, to make such tool usable in real
application domains. This requires both computational optimizations towards the
applications to real-world data, and theoretical insights for finding and interpreting
suitable descriptors. Our computational contribution consists in proposing a new
Morse-inspired and fully discrete preprocessing algorithm. We show the feasibility
of our preprocessing over real datasets, and evaluate the impact of the proposed
algorithm as a preprocessing for computing multipersistent homology. A theoretical
contribution of this thesis consists in proposing a new notion of optimality for such
a preprocessing in the multiparameter context. We show that the proposed notion
generalizes an already known optimality notion from the one-parameter case. Under
this definition, we show that the algorithm we propose as a preprocessing is optimal
in low dimensional domains. In the last part of the thesis, we consider preliminary
applications of the proposed algorithm in the context of topology-based multivariate
visualization by tracking critical features generated by a discrete gradient field compatible
with the multiple scalar fields under study. We discuss (dis)similarities of such
critical features with the state-of-the-art techniques in topology-based multivariate
data visualization
- …