    Visualising Basins of Attraction for the Cross-Entropy and the Squared Error Neural Network Loss Functions

    Quantification of the stationary points and the associated basins of attraction of neural network loss surfaces is an important step towards a better understanding of neural network loss surfaces at large. This work proposes a novel method to visualise basins of attraction together with the associated stationary points via gradient-based random sampling. The proposed technique is used to perform an empirical study of the loss surfaces generated by two different error metrics: quadratic loss and entropic loss. The empirical observations confirm the theoretical hypothesis regarding the nature of neural network attraction basins. Entropic loss is shown to exhibit stronger gradients and fewer stationary points than quadratic loss, indicating that entropic loss has a more searchable landscape. Quadratic loss is shown to be more resilient to overfitting than entropic loss. Both losses are shown to exhibit local minima, but the number of local minima is shown to decrease with an increase in dimensionality. Thus, the proposed visualisation technique successfully captures the local minima properties exhibited by the neural network loss surfaces, and can be used for the purpose of fitness landscape analysis of neural networks.

    Quantification of the stationary points and the associated basins of attraction of neural network loss surfaces is an important step towards a better understanding of neural network loss surfaces at large. This work proposes a novel method to visualise basins of attraction together with the associated stationary points via gradient-based stochastic sampling. The proposed technique is used to perform an empirical study of the loss surfaces generated by two different error metrics: quadratic loss and entropic loss. The empirical observations confirm the theoretical hypothesis regarding the nature of neural network attraction basins. Entropic loss is shown to exhibit stronger gradients and fewer stationary points than quadratic loss, indicating that entropic loss has a more searchable landscape. Quadratic loss is shown to be more resilient to overfitting than entropic loss. Both losses are shown to exhibit local minima, but the number of local minima is shown to decrease with an increase in dimensionality. Thus, the proposed visualisation technique successfully captures the local minima properties exhibited by the neural network loss surfaces, and can be used for the purpose of fitness landscape analysis of neural networks.

    Fitness Landscape Analysis of Feed-Forward Neural Networks

    Neural network training is a highly non-convex optimisation problem with poorly understood properties. Due to the inherent high dimensionality, neural network search spaces cannot be intuitively visualised, thus other means to establish search space properties have to be employed. Fitness landscape analysis encompasses a selection of techniques designed to estimate the properties of a search landscape associated with an optimisation problem. Applied to neural network training, fitness landscape analysis can be used to establish a link between the properties of the error landscape and various neural network hyperparameters. This study applies fitness landscape analysis to investigate the influence of the search space boundaries, regularisation parameters, loss functions, activation functions, and feed-forward neural network architectures on the properties of the resulting error landscape. A novel gradient-based sampling technique is proposed, together with a novel method to quantify and visualise stationary points and the associated basins of attraction in neural network error landscapes.

    Mixed Order Hyper-Networks for Function Approximation and Optimisation

    Many systems take inputs, which can be measured and sometimes controlled, and outputs, which can also be measured and which depend on the inputs. Taking numerous measurements from such systems produces data, which may be used to either model the system with the goal of predicting the output associated with a given input (function approximation, or regression) or of finding the input settings required to produce a desired output (optimisation, or search). Approximating or optimising a function is central to the field of computational intelligence. There are many existing methods for performing regression and optimisation based on samples of data but they all have limitations. Multi layer perceptrons (MLPs) are universal approximators, but they suffer from the black box problem, which means their structure and the function they implement is opaque to the user. They also suffer from a propensity to become trapped in local minima or large plateaux in the error function during learning. A regression method with a structure that allows models to be compared, human knowledge to be extracted, optimisation searches to be guided and model complexity to be controlled is desirable. This thesis presents such as method. This thesis presents a single framework for both regression and optimisation: the mixed order hyper network (MOHN). A MOHN implements a function f:{-1,1}^n ->R to arbitrary precision. The structure of a MOHN makes the ways in which input variables interact to determine the function output explicit, which allows human insights and complexity control that are very difficult in neural networks with hidden units. The explicit structure representation also allows efficient algorithms for searching for an input pattern that leads to a desired output. A number of learning rules for estimating the weights based on a sample of data are presented along with a heuristic method for choosing which connections to include in a model. Several methods for searching a MOHN for inputs that lead to a desired output are compared. Experiments compare a MOHN to an MLP on regression tasks. The MOHN is found to achieve a comparable level of accuracy to an MLP but suffers less from local minima in the error function and shows less variance across multiple training trials. It is also easier to interpret and combine from an ensemble. The trade-off between the fit of a model to its training data and that to an independent set of test data is shown to be easier to control in a MOHN than an MLP. A MOHN is also compared to a number of existing optimisation methods including those using estimation of distribution algorithms, genetic algorithms and simulated annealing. The MOHN is able to find optimal solutions in far fewer function evaluations than these methods on tasks selected from the literature

    Physics-based Machine Learning Approaches to Complex Systems and Climate Analysis

    Komplexe Systeme wie das Klima der Erde bestehen aus vielen Komponenten, die durch eine komplizierte Kopplungsstruktur miteinander verbunden sind. FĂŒr die Analyse solcher Systeme erscheint es daher naheliegend, Methoden aus der Netzwerktheorie, der Theorie dynamischer Systeme und dem maschinellen Lernen zusammenzubringen. Durch die Kombination verschiedener Konzepte aus diesen Bereichen werden in dieser Arbeit drei neuartige AnsĂ€tze zur Untersuchung komplexer Systeme betrachtet. Im ersten Teil wird eine Methode zur Konstruktion komplexer Netzwerke vorgestellt, die in der Lage ist, Windpfade des sĂŒdamerikanischen Monsunsystems zu identifizieren. Diese Analyse weist u.a. auf den Einfluss der Rossby-WellenzĂŒge auf das Monsunsystem hin. Dies wird weiter untersucht, indem gezeigt wird, dass der Niederschlag mit den Rossby-Wellen phasenkohĂ€rent ist. So zeigt der erste Teil dieser Arbeit, wie komplexe Netzwerke verwendet werden können, um rĂ€umlich-zeitliche VariabilitĂ€tsmuster zu identifizieren, die dann mit Methoden der nichtlinearen Dynamik weiter analysiert werden können. Die meisten komplexen Systeme weisen eine große Anzahl von möglichen asymptotischen ZustĂ€nden auf. Um solche ZustĂ€nde zu beschreiben, wird im zweiten Teil die Monte Carlo Basin Bifurcation Analyse (MCBB), eine neuartige numerische Methode, vorgestellt. Angesiedelt zwischen der klassischen Analyse mit Ordnungsparametern und einer grĂŒndlicheren, detaillierteren Bifurkationsanalyse, kombiniert MCBB Zufallsstichproben mit Clustering, um die verschiedenen ZustĂ€nde und ihre Einzugsgebiete zu identifizieren. Bei von Vorhersagen von komplexen Systemen ist es nicht immer einfach, wie Vorwissen in datengetriebenen Methoden integriert werden kann. Eine Möglichkeit hierzu ist die Verwendung von Neuronalen Partiellen Differentialgleichungen. Hier wird im letzten Teil der Arbeit gezeigt, wie hochdimensionale rĂ€umlich-zeitlich chaotische Systeme mit einem solchen Ansatz modelliert und vorhergesagt werden können.Complex systems such as the Earth's climate are comprised of many constituents that are interlinked through an intricate coupling structure. For the analysis of such systems it therefore seems natural to bring together methods from network theory, dynamical systems theory and machine learning. By combining different concepts from these fields three novel approaches for the study of complex systems are considered throughout this thesis. In the first part, a novel complex network construction method is introduced that is able to identify the most important wind paths of the South American Monsoon system. Aside from the importance of cross-equatorial flows, this analysis points to the impact Rossby Wave trains have both on the precipitation and low-level circulation. This connection is then further explored by showing that the precipitation is phase coherent to the Rossby Wave. As such, the first part of this thesis demonstrates how complex networks can be used to identify spatiotemporal variability patterns within large amounts of data, that are then further analysed with methods from nonlinear dynamics. Most complex systems exhibit a large number of possible asymptotic states. To investigate and track such states, Monte Carlo Basin Bifurcation analysis (MCBB), a novel numerical method is introduced in the second part. Situated between the classical analysis with macroscopic order parameters and a more thorough, detailed bifurcation analysis, MCBB combines random sampling with clustering methods to identify and characterise the different asymptotic states and their basins of attraction. Forecasts of complex system are the next logical step. When doing so, it is not always straightforward how prior knowledge in data-driven methods. One possibility to do is by using Neural Partial Differential Equations. Here, it is demonstrated how high-dimensional spatiotemporally chaotic systems can be modelled and predicted with such an approach in the last part of the thesis

    Statistical and deep learning methods for geoscience problems

    Machine learning is the new frontier for technology development in geosciences and has developed extremely fast in the past decade. With the increased compute power provided by distributed computing and Graphics Processing Units (GPUs) and their exploitation provided by machine learning (ML) frameworks such as Keras, Pytorch, and Tensorflow, ML algorithms can now solve complex scientific problems. Although powerful, ML algorithms need to be applied to suitable problems conditioned for optimal results. For this reason ML algorithms require not only a deep understanding of the problem but also of the algorithm’s ability. In this dissertation, I show that Simple statistical techniques can often outperform ML-based models if applied correctly. In this dissertation, I show the success of deep learning in addressing two difficult problems. In the first application I use deep learning to auto-detect the leaks in a carbon capture project using pressure field data acquired from the DOE Cranfield site in Mississippi. I use the history of pressure, rates, and cumulative injection volumes to detect leaks as pressure anomaly. I use a different deep learning workflow to forecast high-energy electrons in Earth’s outer radiation belt using in situ measurements of different space weather parameters such as solar wind density and pressure. I focus on predicting electron fluxes of 2 MeV and higher energy and introduce the ensemble of deep learning models to further improve the results as compared to using a single deep learning architecture. I also show an example where a carefully constructed statistical approach, guided by the human interpreter, outperforms deep learning algorithms implemented by others. Here, the goal is to correlate multiple well logs across a survey area in order to map not only the thickness, but also to characterize the behavior of stacked gamma ray parasequence sets. Using tools including maximum likelihood estimation (MLE) and dynamic time warping (DTW) provides a means of generating quantitative maps of upward fining and upward coarsening across the oil field. The ultimate goal is to link such extensive well control with the spectral attribute signature of 3D seismic data volumes to provide a detailed maps of not only the depositional history, but also insight into lateral and vertical variation of mineralogy important to the effective completion of shale resource plays

    Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation

    The growing literature on "benign overfitting" in overparameterized models has been mostly restricted to regression or binary classification settings; however, most success stories of modern machine learning have been recorded in multiclass settings. Motivated by this discrepancy, we study benign overfitting in multiclass linear classification. Specifically, we consider the following popular training algorithms on separable data: (i) empirical risk minimization (ERM) with cross-entropy loss, which converges to the multiclass support vector machine (SVM) solution; (ii) ERM with least-squares loss, which converges to the min-norm interpolating (MNI) solution; and, (iii) the one-vs-all SVM classifier. First, we provide a simple sufficient condition under which all three algorithms lead to classifiers that interpolate the training data and have equal accuracy. When the data is generated from Gaussian mixtures or a multinomial logistic model, this condition holds under high enough effective overparameterization. Second, we derive novel error bounds on the accuracy of the MNI classifier, thereby showing that all three training algorithms lead to benign overfitting under sufficient overparameterization. Ultimately, our analysis shows that good generalization is possible for SVM solutions beyond the realm in which typical margin-based bounds apply

    Understanding the Problem Structure of Optimisation Problems in Water Resources

    Optimisation algorithms are widely used in water resources to identify the optimal solutions for problems with multiple possible solutions. Many studies in this field focus on the development and application of advanced optimisation algorithms, making significant contributions in improving optimisation performance. On the other hand, the performance of optimisation algorithms is also related to the features of the problems being solved, therefore, selecting appropriate algorithms for corresponding problems is also a key to the success of optimisation. Although a number of metrics have been developed to assess these features, they have not been applied to problems in the water resources field. The primary reason for this is that the computational cost associated with the calculation of many of these metrics increases significantly with problem size, making them unsuitable for problems in water resources. Consequently, there is a lack of knowledge about the features of problems in the water resources field. This PhD thesis aims to understand the features of problems in water resources, and the process can be split into two stages. The first stage is to identify metrics that can be applied within an affordable computational cost. This is addressed in the first content chapter (Paper 1). The second stage is to apply metrics identified in the first stage to understand the features of problems in the water resources field, including the calibration of artificial neural network models (Paper 2) and conceptual rainfall runoff models (Paper 3). This includes the understanding of optimisation difficulty of these problems according to their features, and how their features change through the change of their problem structure and the types of problems to which they are applied. In the first paper, the computational cost of fitness landscape metrics (explanatory landscape analysis (ELA) metrics) used in computer science is tested and metrics that are suitable for application to water resources problems are identified. Each metric used to understand the features of problems requires a given number of samples, which usually increases with an increase in problem size (dimensionality). Consequently, metrics which require a big increase in sample size through the increase of problem size are not suitable for real-world water resources problems. To identify ELA metrics that have low dependence on problem size, 110 metrics in total are tested on a range of benchmark functions and a number of environmental modelling problems, and 28 are identified to be able to be applied to complex problems without significant increase in computational cost. This finding provides us a new approach to better understand the problem structure of optimisation problems in water resources and has the potential to provide guidance in optimisation algorithm selection for problems in the water resources field. In the second paper, metrics identified to have low dependence on problem size in the first paper are applied to Artificial Neural Network (ANN) model calibration problems. ANN models for different environmental problems with different number of inputs and hidden nodes are used in the test. The environmental problems considered include Kentucky River Catchment Rainfall‐Runoff Data (USA), Murray River Salinity Data (Australia), Myponga Water Distribution System Chlorine Data (Australia), and South Australian Surface Water Turbidity Data (Australia). It is demonstrated that ELA metrics can be used successfully to characterize the features of the error surfaces of ANN models, thereby helping to explain the reasons for an increase or decrease in calibration difficulty, and in doing so, shedding new light on findings in existing literature. Results show that the error surfaces of ANNs with relatively simple structures have a more well-defined overall shape and have fewer local optima, while the error surfaces of ANNs with more complex structures are flatter and have many distributed, deep local optima. Consequently, ANNs with simpler structures can be calibrated successfully using gradient-based methods, such as the back-propagation algorithm, whereas ANNs with more complex structures are best calibrated using a hybrid approach combining metaheuristics, such as genetic algorithms, with gradient-based methods. In the third paper, the ELA metrics identified to have low dependence on problem size in the first paper are applied to Conceptual Rainfall Runoff (CRR) model calibration problems. Different CRRs with different model types, error functions, catchment conditions and data lengths are tested to identify how they affect the features of problem structure, which are related to their model calibration and parameter identification difficulty. It is suggested that ELA metrics can be used to quantify key features of the error surfaces of CRR models, including their roughness and flatness, as well as their degree of optima dispersion. This enables key error surface features to be compared for CRR models with different combinations of attributes (e.g. model structure, catchment climate conditions, error metrics and calibration data lengths and composition) in a consistent, efficient and easily communicable fashion. Results from the application of these metrics to the error surfaces of 420 CRR models with different combinations of the above attributes indicate that model structure differences result in the differences in surface roughness and relative optima dispersion. Additionally, increasing catchment wetness increases the relative roughness of error surfaces, it also decreases optima dispersion. This suggests that model structure and catchment climate conditions can be key issues in affecting the calibration difficulty, efficiency and parameter uniqueness. The experiments conducted in this study also encourage further tests on further CRR models and catchments to identify general patterns between calibration performance, model structure and catchment characteristics.

    Understanding Optimisation Processes with Biologically-Inspired Visualisations

    Evolutionary algorithms (EAs) constitute a branch of artificial intelligence utilised to evolve solutions to solve optimisation problems abound in industry and research. EAs often generate many solutions and visualisation has been a primary strategy to display EA solutions, given that visualisation is a multi-domain well-evaluated medium to comprehend extensive data. The endeavour of visualising solutions is inherent with challenges resulting from high dimensional phenomenons and the large number of solutions to display. Recently, scholars have produced methods to mitigate some of these known issues when illustrating solutions. However, one key consideration is that displaying the final subset of solutions exclusively (rather than the whole population) discards most of the informativeness of the search, creating inadequate insight into the black-box EA. There is an unequivocal knowledge gap and requirement for methods which can visualise the whole population of solutions from an optimiser and subjugate the high-dimensional problems and scaling issues to create interpretability of the EA search process. Furthermore, a requirement for explainability in evolutionary computing has been demanded by the evolutionary computing community, which could take the form of visualisations, to support EA comprehension much like the support explainable artificial intelligence has brought to artificial intelligence. In this thesis, we report novel visualisation methods that can be used to visualise large and high-dimensional optimiser populations with the aim of creating greater interpretability during a search. We consider the nascent intersection of visualisation and explainability in evolutionary computing. The potential high informativeness of a visualisation method from an early chapter of this work forms an effective platform to develop an explainability visualisation method, namely the population dynamics plot, to attempt to inject explainability into the inner workings of the search process. We further support the visualisation of populations using machine learning to construct models which can capture the characteristics of an EA search and develop intelligent visualisations which use artificial intelligence to potentially enhance and support visualisation for a more informative search process. The methods developed in this thesis are evaluated both quantitatively and qualitatively. We use multi-feature benchmark problems to show the method’s ability to reveal specific problem characteristics such as disconnected fronts, local optima and bias, as well as potentially creating a better understanding of the problem landscape and optimiser search for evaluating and comparing algorithm performance (we show the visualisation method to be more insightful than conventional metrics like hypervolume alone). One of the most insightful methods developed in this thesis can produce a visualisation requiring less than 1% of the time and memory necessary to produce a visualisation of the same objective space solutions using existing methods. This allows for greater scalability and the use in short compile time applications such as online visualisations. Predicated by an existing visualisation method in this thesis, we then develop and apply an explainability method to a real-world problem and evaluate it to show the method to be highly effective at explaining the search via solutions in the objective spaces, solution lineage and solution variation operators to compactly comprehend, evaluate and communicate the search of an optimiser, although we note the explainability properties are only evaluated against the author’s ability and could be evaluated further in future work with a usability study. The work is then supported by the development of intelligent visualisation models that may allow one to predict solutions in optima (importantly local optima) in unseen problems by using a machine learning model. The results are effective, with some models able to predict and visualise solution optima with a balanced F1 accuracy metric of 96%. The results of this thesis provide a suite of visualisations which aims to provide greater informativeness of the search and scalability than previously existing literature. The work develops one of the first explainability methods aiming to create greater insight into the search space, solution lineage and reproductive operators. The work applies machine learning to potentially enhance EA understanding via visualisation. These models could also be used for a number of applications outside visualisation. Ultimately, the work provides novel methods for all EA stakeholders which aims to support understanding, evaluation and communication of EA processes with visualisation
