160 research outputs found

    Multi-epoch machine learning for galaxy formation

    Get PDF
    In this thesis I utilise a range of machine learning techniques in conjunction with hydrodynamical cosmological simulations. In Chapter 2 I present a novel machine learning method for predicting the baryonic properties of dark matter only subhalos taken from N-body simulations. The model is built using a tree-based algorithm and incorporates subhalo properties over a wide range of redshifts as its input features. I train the model using a hydrodynamical simulation which enables it to predict black hole mass, gas mass, magnitudes, star formation rate, stellar mass, and metallicity. This new model surpasses the performance of previous models. Furthermore, I explore the predictive power of each input property by looking at feature importance scores from the tree-based model. By applying the method to the LEGACY N-body simulation I generate a large volume mock catalog of the quasar population at z=3. By comparing this mock catalog with observations, I demonstrate that the IllustrisTNG subgrid model for black holes is not accurately capturing the growth of the most massive objects. In Chapter 3 I apply my method to investigate the evolution of galaxy properties in different simulations, and in various environments within a single simulation. By comparing the Illustris, EAGLE, and TNG simulations I show that subgrid model physics plays a more significant role than the choice of hydrodynamics method. Using the CAMELS simulation suite I consider the impact of cosmological and astrophysical parameters on the buildup of stellar mass within the TNG and SIMBA models. In the final chapter I apply a combination of neural networks and symbolic regression methods to construct a semi-analytic model which reproduces the galaxy population from a cosmological simulation. The neural network based approach is capable of producing a more accurate population than a previous method of binning based on halo mass. The equations resulting from symbolic regression are found to be a good approximation of the neural network

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Adaptive swarm optimisation assisted surrogate model for pipeline leak detection and characterisation.

    Get PDF
    Pipelines are often subject to leakage due to ageing, corrosion and weld defects. It is difficult to avoid pipeline leakage as the sources of leaks are diverse. Various pipeline leakage detection methods, including fibre optic, pressure point analysis and numerical modelling, have been proposed during the last decades. One major issue of these methods is distinguishing the leak signal without giving false alarms. Considering that the data obtained by these traditional methods are digital in nature, the machine learning model has been adopted to improve the accuracy of pipeline leakage detection. However, most of these methods rely on a large training dataset for accurate training models. It is difficult to obtain experimental data for accurate model training. Some of the reasons include the huge cost of an experimental setup for data collection to cover all possible scenarios, poor accessibility to the remote pipeline, and labour-intensive experiments. Moreover, datasets constructed from data acquired in laboratory or field tests are usually imbalanced, as leakage data samples are generated from artificial leaks. Computational fluid dynamics (CFD) offers the benefits of providing detailed and accurate pipeline leakage modelling, which may be difficult to obtain experimentally or with the aid of analytical approach. However, CFD simulation is typically time-consuming and computationally expensive, limiting its pertinence in real-time applications. In order to alleviate the high computational cost of CFD modelling, this study proposed a novel data sampling optimisation algorithm, called Adaptive Particle Swarm Optimisation Assisted Surrogate Model (PSOASM), to systematically select simulation scenarios for simulation in an adaptive and optimised manner. The algorithm was designed to place a new sample in a poorly sampled region or regions in parameter space of parametrised leakage scenarios, which the uniform sampling methods may easily miss. This was achieved using two criteria: population density of the training dataset and model prediction fitness value. The model prediction fitness value was used to enhance the global exploration capability of the surrogate model, while the population density of training data samples is beneficial to the local accuracy of the surrogate model. The proposed PSOASM was compared with four conventional sequential sampling approaches and tested on six commonly used benchmark functions in the literature. Different machine learning algorithms are explored with the developed model. The effect of the initial sample size on surrogate model performance was evaluated. Next, pipeline leakage detection analysis - with much emphasis on a multiphase flow system - was investigated in order to find the flow field parameters that provide pertinent indicators in pipeline leakage detection and characterisation. Plausible leak scenarios which may occur in the field were performed for the gas-liquid pipeline using a three-dimensional RANS CFD model. The perturbation of the pertinent flow field indicators for different leak scenarios is reported, which is expected to help in improving the understanding of multiphase flow behaviour induced by leaks. The results of the simulations were validated against the latest experimental and numerical data reported in the literature. The proposed surrogate model was later applied to pipeline leak detection and characterisation. The CFD modelling results showed that fluid flow parameters are pertinent indicators in pipeline leak detection. It was observed that upstream pipeline pressure could serve as a critical indicator for detecting leakage, even if the leak size is small. In contrast, the downstream flow rate is a dominant leakage indicator if the flow rate monitoring is chosen for leak detection. The results also reveal that when two leaks of different sizes co-occur in a single pipe, detecting the small leak becomes difficult if its size is below 25% of the large leak size. However, in the event of a double leak with equal dimensions, the leak closer to the pipe upstream is easier to detect. The results from all the analyses demonstrate the PSOASM algorithm's superiority over the well-known sequential sampling schemes employed for evaluation. The test results show that the PSOASM algorithm can be applied for pipeline leak detection with limited training datasets and provides a general framework for improving computational efficiency using adaptive surrogate modelling in various real-life applications

    50 Years of quantum chromodynamics – Introduction and Review

    Get PDF

    Evaluating data linkage algorithms with perfect synthetic ground truth

    Get PDF
    Data linkage algorithms join datasets by identifying commonalities between them. The ability to evaluate the efficacy of different algorithms is a challenging problem that is often overlooked. If incorrect links are made or links are missed by a linkage algorithm then conclusions based on its linkage may be unfounded. Evaluating linkage quality is particularly challenging in domains where datasets are large and the number of links is low. Example domains include historical population data, bibliographic data, and administrative data. In these domains the evaluation of linkage quality is not well understood. A common approach to evaluating linkage quality is the use of metrics, most commonly precision, recall, and F-measure. These metrics indicate how often links are missed or false links are made. To calculate a metric, datasets are used where the true links and non-links are known. The linkage algorithm attempts to link the datasets and the constructed set of links is compared with the set of true links. In these domains we can rarely have confidence that the evaluation datasets contain all the true links and that no false links have been included. If such errors exist in the evaluation datasets, the calculated metrics may not truly reflect the performance of the linkage algorithm. This presents issues when making comparisons between linkage algorithms. To rigorously evaluate the efficacy of linkage algorithms, it is necessary to objectively measure an algorithm’s linkage quality with a range of different configuration parameters and datasets. These many datasets must be of appropriate scale and have ground truth which denotes all true links and non-links. Evaluating algorithms using shared standardised datasets enables objective comparisons between linkage algorithms. To facilitate objective linkage evaluation, a set of standardised datasets need to be shared and widely adopted. This thesis establishes an approach for the construction of synthetic datasets that can be used to evaluate linkage algorithms. This thesis addresses the following research questions: • What are appropriate approaches to the evaluation of linkage algorithms? • Is it feasible to synthesise realistic evaluation data? • Is synthetic evaluation data with perfect ground truth useful for evaluation? • How should synthesised data be statistically validated for correctness? • How should sets of synthesised data be used to evaluate linkage? • How can the evaluation of linkage algorithms be effectively communicated? This thesis makes a number of contributions, most notably a framework for the comprehensive evaluation of data linkage algorithms, thus significantly improving the comparability of linkage algorithms, especially in domains lacking evaluation data. The thesis demonstrates these techniques within the population reconstruction domain. Integral to the evaluation framework, approaches to synthesis and statistical validation of evaluation datasets have been investigated, resulting in a simulation model able to create many, characteristically varied, large-scale datasets

    Development of a surrogate model of an amine scrubbing digital twin using machine learning methods

    Get PDF
    Advancements in the process industry require building more complex simulations and performing computationally intensive operations like optimization. To overcome the numerical limit of conventional process simulations a surrogate model is a viable strategy. In this work, a surrogate model of an industrial amine scrubbing digital twin has been developed. The surrogate model has been built based on the process simulation created in Aspen HYSYS and validated as a digital twin against real process data collected during a steady-state operation. The surrogate relies on an accurate Design of Experiments procedure. In this case, the Latin-Hypercube method has been chosen and several nested domains have been defined in ranges around the nominal steady state operative condition. Several machine learning models have been trained using cross-validation, and the most accurate has been selected to predict each target. The resulting surrogate model showed a satisfactory performance, given the data available

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    Machine Learning-Based Data and Model Driven Bayesian Uncertanity Quantification of Inverse Problems for Suspended Non-structural System

    Get PDF
    Inverse problems involve extracting the internal structure of a physical system from noisy measurement data. In many fields, the Bayesian inference is used to address the ill-conditioned nature of the inverse problem by incorporating prior information through an initial distribution. In the nonparametric Bayesian framework, surrogate models such as Gaussian Processes or Deep Neural Networks are used as flexible and effective probabilistic modeling tools to overcome the high-dimensional curse and reduce computational costs. In practical systems and computer models, uncertainties can be addressed through parameter calibration, sensitivity analysis, and uncertainty quantification, leading to improved reliability and robustness of decision and control strategies based on simulation or prediction results. However, in the surrogate model, preventing overfitting and incorporating reasonable prior knowledge of embedded physics and models is a challenge. Suspended Nonstructural Systems (SNS) pose a significant challenge in the inverse problem. Research on their seismic performance and mechanical models, particularly in the inverse problem and uncertainty quantification, is still lacking. To address this, the author conducts full-scale shaking table dynamic experiments and monotonic & cyclic tests, and simulations of different types of SNS to investigate mechanical behaviors. To quantify the uncertainty of the inverse problem, the author proposes a new framework that adopts machine learning-based data and model driven stochastic Gaussian process model calibration to quantify the uncertainty via a new black box variational inference that accounts for geometric complexity measure, Minimum Description length (MDL), through Bayesian inference. It is validated in the SNS and yields optimal generalizability and computational scalability

    Ultra-fast screening of stress-sensitive (naturally fractured) reservoirs using flow diagnostics

    Get PDF
    Quantifying the impact of poro-mechanics on reservoir performance is critical to the sustainable management of subsurface reservoirs containing either hydrocarbons, groundwater, geothermal heat, or being targeted for geological storage of fluids (e.g., CO2 or H2). On the other hand, accounting for poro-mechanical effects in full-field reservoir simulation studies and uncertainty quantification workflows in complex reservoir models is challenging, mainly because exploring and capturing the full range of geological and mechanical uncertainties requires a large number of numerical simulations and is hence computationally intensive. Specifically, the integration of poro-mechanical effects in full-field reservoir simulation studies is still limited, mainly because of the high computational cost. Consequently, poro-mechanical effects are often ignored in reservoir engineering workflows, which may result in inadequate reservoir performance forecasts. This thesis hence develops an alternative approach that couples hydrodynamics using existing flow diagnostics simulations for single- and dual-porosity models with poro mechanics to screen the impact of coupled poro-mechanical processes on reservoir performance. Due to the steady-state nature of the calculations and the effective proposed coupling strategy, these calculations remain computationally efficient while providing first-order approximations of the interplay between poro-mechanics and hydrodynamics, as we demonstrate through a series of case studies. This thesis also introduces a new uncertainty quantification workflow using the proposed poro-mechanical informed flow diagnostics and proxy models. These computationally efficient calculations allow us to quickly screen poro-mechanics and assess a broader range of geological, petrophysical, and mechanical uncertainties to rank, compare, and cluster a large ensemble of models to select representative candidates for more detailed full-physics coupled reservoir simulations.James Watt Scholarshi
    • …
    corecore