498 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Machine learning applications in search algorithms for gravitational waves from compact binary mergers
Gravitational waves from compact binary mergers are now routinely observed by Earth-bound detectors. These observations enable exciting new science, as they have opened a new window to the Universe.
However, extracting gravitational-wave signals from the noisy detector data is a challenging problem. The most sensitive search algorithms for compact binary mergers use matched filtering, an algorithm that compares the data with a set of expected template signals. As detectors are upgraded and more sophisticated signal models become available, the number of required templates will increase, which can make some sources computationally prohibitive to search for. The computational cost is of particular concern when low-latency alerts should be issued to maximize the time for electromagnetic follow-up observations. One potential solution to reduce computational requirements that has started to be explored in the last decade is machine learning. However, different proposed deep learning searches target varying parameter spaces and use metrics that are not always comparable to existing literature. Consequently, a clear picture of the capabilities of machine learning searches has been sorely missing.
In this thesis, we closely examine the sensitivity of various deep learning gravitational-wave search algorithms and introduce new methods to detect signals from binary black hole and binary neutron star mergers at previously untested statistical confidence levels. By using the sensitive distance as our core metric, we allow for a direct comparison of our algorithms to state-of-the-art search pipelines. As part of this thesis, we organized a global mock data challenge to create a benchmark for machine learning search algorithms targeting compact binaries. This way, the tools developed in this thesis are made available to the greater community by publishing them as open source software.
Our studies show that, depending on the parameter space, deep learning gravitational-wave search algorithms are already competitive with current production search pipelines. We also find that strategies developed for traditional searches can be effectively adapted to their machine learning counterparts. In regions where matched filtering becomes computationally expensive, available deep learning algorithms are also limited in their capability. We find reduced sensitivity to long duration signals compared to the excellent results for short-duration binary black hole signals
Analytical validation of innovative magneto-inertial outcomes: a controlled environment study.
peer reviewe
Machine learning for the sustainable energy transition: a data-driven perspective along the value chain from manufacturing to energy conversion
According to the special report Global Warming of 1.5 °C of the IPCC, climate action is not only necessary but more than ever urgent. The world is witnessing rising sea levels, heat waves, events of flooding, droughts, and desertification resulting in the loss of lives and damage to livelihoods, especially in countries of the Global South. To mitigate climate change and commit to the Paris agreement, it is of the uttermost importance to reduce greenhouse gas emissions coming from the most emitting sector, namely the energy sector. To this end, large-scale penetration of renewable energy systems into the energy market is crucial for the energy transition toward a sustainable future by replacing fossil fuels and improving access to energy with socio-economic benefits. With the advent of Industry 4.0, Internet of Things technologies have been increasingly applied to the energy sector introducing the concept of smart grid or, more in general, Internet of Energy. These paradigms are steering the energy sector towards more efficient, reliable, flexible, resilient, safe, and sustainable solutions with huge environmental and social potential benefits. To realize these concepts, new information technologies are required, and among the most promising possibilities are Artificial Intelligence and Machine Learning which in many countries have already revolutionized the energy industry. This thesis presents different Machine Learning algorithms and methods for the implementation of new strategies to make renewable energy systems more efficient and reliable. It presents various learning algorithms, highlighting their advantages and limits, and evaluating their application for different tasks in the energy context. In addition, different techniques are presented for the preprocessing and cleaning of time series, nowadays collected by sensor networks mounted on every renewable energy system. With the possibility to install large numbers of sensors that collect vast amounts of time series, it is vital to detect and remove irrelevant, redundant, or noisy features, and alleviate the curse of dimensionality, thus improving the interpretability of predictive models, speeding up their learning process, and enhancing their generalization properties. Therefore, this thesis discussed the importance of dimensionality reduction in sensor networks mounted on renewable energy systems and, to this end, presents two novel unsupervised algorithms. The first approach maps time series in the network domain through visibility graphs and uses a community detection algorithm to identify clusters of similar time series and select representative parameters. This method can group both homogeneous and heterogeneous physical parameters, even when related to different functional areas of a system. The second approach proposes the Combined Predictive Power Score, a method for feature selection with a multivariate formulation that explores multiple sub-sets of expanding variables and identifies the combination of features with the highest predictive power over specified target variables. This method proposes a selection algorithm for the optimal combination of variables that converges to the smallest set of predictors with the highest predictive power. Once the combination of variables is identified, the most relevant parameters in a sensor network can be selected to perform dimensionality reduction. Data-driven methods open the possibility to support strategic decision-making, resulting in a reduction of Operation & Maintenance costs, machine faults, repair stops, and spare parts inventory size. Therefore, this thesis presents two approaches in the context of predictive maintenance to improve the lifetime and efficiency of the equipment, based on anomaly detection algorithms. The first approach proposes an anomaly detection model based on Principal Component Analysis that is robust to false alarms, can isolate anomalous conditions, and can anticipate equipment failures. The second approach has at its core a neural architecture, namely a Graph Convolutional Autoencoder, which models the sensor network as a dynamical functional graph by simultaneously considering the information content of individual sensor measurements (graph node features) and the nonlinear correlations existing between all pairs of sensors (graph edges). The proposed neural architecture can capture hidden anomalies even when the turbine continues to deliver the power requested by the grid and can anticipate equipment failures. Since the model is unsupervised and completely data-driven, this approach can be applied to any wind turbine equipped with a SCADA system. When it comes to renewable energies, the unschedulable uncertainty due to their intermittent nature represents an obstacle to the reliability and stability of energy grids, especially when dealing with large-scale integration. Nevertheless, these challenges can be alleviated if the natural sources or the power output of renewable energy systems can be forecasted accurately, allowing power system operators to plan optimal power management strategies to balance the dispatch between intermittent power generations and the load demand. To this end, this thesis proposes a multi-modal spatio-temporal neural network for multi-horizon wind power forecasting. In particular, the model combines high-resolution Numerical Weather Prediction forecast maps with turbine-level SCADA data and explores how meteorological variables on different spatial scales together with the turbines' internal operating conditions impact wind power forecasts. The world is undergoing a third energy transition with the main goal to tackle global climate change through decarbonization of the energy supply and consumption patterns. This is not only possible thanks to global cooperation and agreements between parties, power generation systems advancements, and Internet of Things and Artificial Intelligence technologies but also necessary to prevent the severe and irreversible consequences of climate change that are threatening life on the planet as we know it. This thesis is intended as a reference for researchers that want to contribute to the sustainable energy transition and are approaching the field of Artificial Intelligence in the context of renewable energy systems
Analysis and forecasting of asset quality, risk management and financial stability for the Greek banking system
The increase in non-performing loans (NPLs) during the financial crisis of 2008, which has been converted into a fiscal crisis, as well as the risk of a medium-term increase due to the COVID-19 pandemic has put into question the robustness of many banks and the financial stability of the whole sector. As far as the banking sector is concerned, the management of non-performing loans represents the most significant challenge as their stock reached unprecedented levels, with the deterioration in asset quality being widespread. Addressing the problem of non-performing loans with the assistance of
credit risk modeling is important from both a micro and a macro-prudential perspective, since it would not only improve the financial soundness and the capital adequacy of the banking sector, but also free-up funds to be directed to other more productive sectors of the economy.
This Thesis extends earlier research by employing a short-term monitoring system with the aim to forecast “failures” i.e. NPL creation. The creation of such a monitoring system allows the risk of a “failure” to change over time, measuring the likelihood of “failure” given the survival time and a set of explanatory variables. The application of Cox proportional hazards models and survival trees to forecast NPLs can be usefully employed in the Greek corporate sectors.
The research aim of this thesis consists of two domains: The first aim is the investigation of the determinants that contribute to the NPLs formation. Two GAMLSS models are being tested, a linear GAMLSS model and a nonlinear semi-parametric GAMLSS model which includes smoothing functions that capture potential nonlinear relationships between the explanatory variables to model the parameters favorably. The explanatory variables of the models consist of credit risk variables, macroeconomic variables, bank-specific variables and supervisory and market variables, while the response variable is the non-performing loans.
The second aim is to provide answers on whether proportional hazards Cox models and survival tree models can forecast NPLs of loans that are provided in specific corporate sectors in Greece by the use of the most granular data set of corporate borrowers. By evaluating a series of Cox models, a short-term monitoring system has been created with the aim to forecast “failures” i.e. NPL creation. The Cox proportional hazards regression models are incorporating time-to-event, involving a timeline, described by the survival function, indicating the probability that a loan becomes an NPL until time t. The time period counts from the origination of the loan until the “death” of the loan, i.e. its termination, incorporating an “in between” observation point. The event is when the loan is initially being “infected”, i.e. has become NPL. Regarding survival trees, the data set was divided into more subsets, which are easier to model separately and hence yield an improved overall performance. Such models are then beneficial to implement with different machine learning techniques. Predictors (or covariates) are defined as the sectors of the Greek economy and the model is fitted both for the whole sample and for the sample of early terminated loans.
The Thesis is organized as follows: Chapter 1 - Introduction addresses the role of banks in financial intermediation, the evolution of credit risk and some issues regarding the Greek banking sector. Chapter 2 constitutes a literature review on research focused on improving the predictive performance of different credit risk assessment methods. Chapter 3 outlines the competitive conditions in the banking sector to demonstrate whether the increase in concentration had affected the competitive conditions in the Greek banking system. In Chapter 4, the funding and the liquidity conditions in the Greek banking sector are being addressed. Chapter 5 contains the selection of aggregate sample, results and analysis of GAMLSS models that have been used for determining NPLs. Chapter 6 provides an introduction to the granular database on Large Exposures, which is used for deriving the panel sample of corporate borrowers whereby models of forecasting and prediction are being employed. Chapter 7 contains the application of Cox models and decision trees, the estimation procedure, parameters, model fit, estimation results and empirical findings. Chapter 8 provides an evaluation and applicability of models as well as the implications for further research. Finally, a conclusion is provided by summarizing my contribution to the research community and my recommendations to the banking industr
A Modified EM Algorithm for Shrinkage Estimation in Multivariate Hidden Markov Models
Τα κρυμμένα Μαρκοβιανά μοντέλα χρησιμοποιούνται σε ένα ευρύ πεδίο εφαρμογών, λόγω της κατασκευής
τους που τα καθιστά μαθηματικώς διαχειρίσιμα και επιτρέπει τη χρήση αποτελεσματικών υπολογιστικών
τεχνικών. ́Εχουν αναπτυχθεί μέθοδοι για την εκτίμηση των παραμέτρων του μοντέλου, όπως ο αλγόριθμος
EM, αλλά και για την εύρεση των κρυμμένων καταστάσεων της Μαρκοβιανής αλυσίδας, όπως ο αλγόριθμος
Viterbi.
Σε εφαρμογές στις οποίες η διάσταση των δεδομένων είναι συγκρίσιμη με το μέγεθος του δέιγματος,
είναι γνωστό πως ο δειγματικός πίνακας συνδιακύμανσης είναι αριθμητικά ασταθής, γεγονός που επηρεάζει
άμεσα το βήμα μεγιστοποίησης (M-step) του αλγορίθμου EM, στο οποίο εμπλέκεται ο υπολογισμός του
αντιστρόφου του. Το πρόβλημα αυτό μπορεί να ενταθεί λόγω ενδεχόμενης ύπαρξης καταστάσεων οι οποίες
εμφανίζονται σπάνια, με αποτέλεσμα το μέγεθος δείγματος για την εκτίμηση των αντίστοιχων παραμέτρων
να είναι μικρό. Επομένως, η άμεση χρήση αυτών των μεθόδων είναι πιθανό να οδηγήσει σε αριθμητικά προβ-
λήματα, όσον αφορά στην εκτίμηση του πίνακα συνδιακύμανσης και του αντιστρόφου του, επηρεάζοντας
επιπλέον την εκτίμηση του πίνακα πιθανοτήτων μετάβασης και την ανακατασκευή της κρυμμένης Μαρκο-
βιανής αλυσίδας.
Στη συγκεκριμένη εργασία μελετάται θεωρητικά και αλγοριθμικά μία τροποποίηση του αλγορίθμου EM,
έτσι ώστε ο εκτιμήτης που προκύπτει για τον πίνακα συνδιακύμανσης, κατά το βήμα μεγιστοποίησης, να
είναι αυτός που απορρέει από τη χρήση της μεθόδου συρρίκνωσης (shrinkage). Για τον σκοπό αυτό, στη
συνάρτηση της λογαριθμικής πιθανοφάνειας ενσωματώνονται κάποιες ποινές, ώστε να κανονικοποιηθεί το
αντίστοιχο πρόβλημα μεγιστοποίησης. Η συνάρτηση αυτή, χρησιμοποιείται και στο βήμα εκτίμησης (E-step).
Επίσης, μελετάται αλγοριθμικά και μία παραλλαγή αυτής της μεθόδου, στην οποία η συνάρτηση με τις ποινές
χρησιμοποιείται μόνο κατά το βήμα μεγιστοποίησης (M-step).Hidden Markov models are used in a wide range of applications due to their construction that
renders them mathematically tractable and allows for the use of efficient computational techniques.
There are methods for the estimation of the model’s parameters, such as the EM algorithm, but also
for the estimation of the hidden states of the underlying Markov chain, such as the Viterbi algorithm.
In applications where the dimension of the data is comparable to the sample size, the sample
covariance matrix is known to be ill-conditioned, which directly affects the maximisation step (M-
step) of the EM algorithm, where its inverse is involved in the computations. This problem might be
amplified if there are rarely visited states resulting in a small sample size for the estimation of the
corresponding parameters. Therefore, the direct implementation of these methods can be proved to
be troublesome, as many computational problems might occur in the estimation of the covariance
matrix and its inverse, further affecting the estimation of the one-step transition probability matrix
and the reconstruction of the hidden Markov chain.
In this paper, a modified version of the EM algorithm is studied, both theoretically and computa-
tionally, in order to obtain the shrinkage estimator of the covariance matrix during the maximisation
step. This is achieved by maximising a penalised log-likelihood function, which is also used in the
estimation step (E-step). A variant of this modified version, where the penalised log-likelihood func-
tion is only used in the maximisation step (M-step), is also studied computationally
Aggregation Strategies for Distributed Gaussian Processes
Gaussian processes are robust and flexible non-parametric statistical models that benefit from the Bayes theorem by assigning a Gaussian prior distribution to the unknown function. Despite their capability to provide high-accuracy predictions, they suffer from high computational costs. Various solutions have been proposed in the literature to deal with computational complexity. The main idea is to reduce the training cost, which is cubic in the size of the training set.
A distributed Gaussian process is a divide-and-conquer approach that divides the entire training data set into several partitions and employs a local approximation scenario to train a Gaussian process at each data partition. An ensemble technique combines the local Gaussian experts to provide final aggregated predictions. Available baselines aggregate local predictions assuming perfect diversity between experts. However, this assumption is often violated in practice and leads to sub-optimal solutions.
This thesis deals with dependency issues between experts. Aggregation based on experts' interactions improves accuracy and can lead to statistically consistent results. Few works have considered modeling dependencies between experts. Despite their theoretical advantages, their prediction steps are costly and cubically depend on the number of experts. We benefit from the experts' interactions in both dependence and independence-based aggregations. In conventional aggregation methods that combine experts using a conditional independence assumption, we transform the available experts set into clusters of highly correlated experts using spectral clustering. The final aggregation uses these clusters instead of the original experts. It reduces the effect of the independence assumption in the ensemble technique. Moreover, we develop a novel aggregation method for dependent experts using the latent variable graphical model and define the target function as a latent variable in a connected undirected graph.
Besides, we propose two novel expert selection strategies in distributed learning. They improve the efficiency and accuracy of the prediction step by excluding weak experts in the ensemble method. The first is a static selection method that assigns a fixed set of experts to all new entry points in the prediction step using the Markov random field model. The second solution increases the flexibility of the selection step by converting it into a multi-label classification problem. It provides an entry-dependent selection model and assigns the most relevant experts to each data point.
We address all related theoretical and practical aspects of the proposed solutions. The findings present valuable insights for distributed learning models and advance the state-of-the-art in several directions. Indeed, the proposed solutions do not need restricted assumptions and can be easily extended to non-Gaussian experts in distributed and federated learning.Gaußsche Prozesse sind robuste und flexible nichtparametrische statistische Modelle, die Bayes-Theorem verwenden, um einer unbekannten Funktion eine Gaußsche Prior-Verteilung zuzuweisen. Trotz ihrer Fähigkeit, hochgenaue Vorhersagen zu liefern, leiden sie unter hohen Rechenkosten. In der Literatur wurden verschiedene Lösungen vorgeschlagen, um die Rechenkomplexität zu beherrschen. Die Hauptidee besteht darin, die Trainingskosten zu reduzieren, die in der Größe des Trainingssets kubisch sind.
Der verteilte Gaußsche Prozess ist ein Teile-und-Herrsche-Ansatz, der den gesamten Trainingsdatensatz in mehrere Partitionen unterteilt und ein lokales Näherungsszenario verwendet, um einen Gaußschen Prozess an jeder Datenpartition zu trainieren. Eine Ensemble-Technik kombiniert die lokalen Gaußschen Experten, um endgültige aggregierte Vorhersagen zu liefern. Verfügbare Basislösungen aggregieren lokale Vorhersagen unter der Annahme einer perfekten Diversität zwischen Experten. Diese Annahme wird jedoch in der Praxis oft verletzt und führt zu suboptimalen Lösungen.
Diese Arbeit beschäftigt sich mit Abhängigkeitsproblemen zwischen Experten. Die Aggregation basierend auf den Interaktionen von Experten verbessert die Genauigkeit und kann zu statistisch konsistenten Ergebnissen führen. Nur wenige Arbeiten haben die Modellierung von Abhängigkeiten zwischen Experten in Betracht gezogen. Trotz ihrer theoretischen Vorteile sind ihre Vorhersageschritte kostspielig und hängen kubisch von der Anzahl der Experten ab. Wir profitieren von den Interaktionen der Experten sowohl bei abhängigkeits- als auch bei unabhängigkeitsbasierten Aggregationen. In konventionellen Aggregationsverfahren, die Experten unter Verwendung einer bedingten Unabhängigkeitsannahme kombinieren, transformieren wir den verfügbaren Expertensatz in Cluster von hochgradig korrelierten Experten unter Verwendung von spektralem Clustering. Die endgültige Aggregation verwendet diese Cluster anstelle der ursprünglichen Experten. Diese Vorgehensweise reduziert den Effekt der Unabhängigkeits- annahme in der Ensemble-Technik. Darüber hinaus entwickeln wir eine neuartige Aggregationsmethode für abhängige Experten unter Verwendung eines latenten Variablen-Grafikmodells und definieren die Zielfunktion als latente Variable in einem verbundenen ungerichteten Graphen.
Außerdem schlagen wir zwei neue Expertenauswahlstrategien für verteiltes Lernen vor. Sie verbessern die Effizienz und Genauigkeit des Vorhersageschritts, indem sie schwache Experten in der Ensemble-Methode ausschließen. Das erste ist ein statisches Auswahlverfahren, das allen neuen Eintrittspunkten im Vorhersageschritt unter Verwendung des Markov-Zufallsfeldmodells eine feste Gruppe von Experten zuweist. Die zweite Lösung erhöht die Flexibilität des Auswahlschritts, indem sie ihn in ein Klassifizierungsproblem mit mehreren Labels umwandelt. Es bietet ein eintragsabhängiges Auswahlmodell und ordnet jedem Datenpunkt die relevantesten Experten zu.
Wir gehen auf alle damit verbundenen theoretischen und praktischen Aspekte der vorgeschlagenen Lösungen ein. Die Ergebnisse stellen wertvolle Erkenntnisse für verteilte Lernmodelle dar und bringen den Stand der Technik in mehrere Richtungen voran. Tatsächlich benötigen sie keine eingeschränkten Annahmen und können leicht auf nicht-Gaußsche Experten für verteiltes und föderiertes Lernen erweitert werden
Sparse model-based clustering of three-way data via lasso-type penalties
Mixtures of matrix Gaussian distributions provide a probabilistic framework
for clustering continuous matrix-variate data, which are becoming increasingly
prevalent in various fields. Despite its widespread adoption and successful
application, this approach suffers from over-parameterization issues, making it
less suitable even for matrix-variate data of moderate size. To overcome this
drawback, we introduce a sparse model-based clustering approach for three-way
data. Our approach assumes that the matrix mixture parameters are sparse and
have different degree of sparsity across clusters, allowing to induce parsimony
in a flexible manner. Estimation of the model relies on the maximization of a
penalized likelihood, with specifically tailored group and graphical lasso
penalties. These penalties enable the selection of the most informative
features for clustering three-way data where variables are recorded over
multiple occasions and allow to capture cluster-specific association
structures. The proposed methodology is tested extensively on synthetic data
and its validity is demonstrated in application to time-dependent crime
patterns in different US cities
Scalable Learning of Bayesian Networks Using Feedback Arc Set-Based Heuristics
Bayesianske nettverk er en viktig klasse av probabilistiske grafiske modeller. De består av en struktur (en rettet asyklisk graf) som beskriver betingede uavhengighet mellom stokastiske variabler og deres parametere (lokale sannsynlighetsfordelinger). Med andre ord er Bayesianske nettverk generative modeller som beskriver simultanfordelingene på en kompakt form.
Den største utfordringen med å lære et Bayesiansk nettverk skyldes selve strukturen, og på grunn av den kombinatoriske karakteren til asyklisitetsegenskapen er det ingen overraskelse at strukturlæringsproblemet generelt er NP-hardt. Det eksisterer algoritmer som løser dette problemet eksakt: dynamisk programmering og heltalls lineær programmering er de viktigste kandidatene når man ønsker å finne strukturen til små til mellomstore Bayesianske nettverk fra data. På den annen side er heuristikk som bakkeklatringsvarianter ofte brukt når man forsøker å lære strukturen til større nettverk med tusenvis av variabler, selv om disse heuristikkene vanligvis ikke har teoretiske garantier og ytelsen i praksis kan bli uforutsigbar når man arbeider med storskala læring.
Denne oppgaven tar for seg utvikling av skalerbare metoder som takler det strukturlæringsproblemet av Bayesianske nettverk, samtidig som det forsøkes å opprettholde et nivå av teoretisk kontroll. Dette ble oppnådd ved bruk av relaterte kombinatoriske problemer, nemlig det maksimale asykliske subgrafproblemet (maximum acyclic subgraph) og det duale problemet (feedback arc set). Selv om disse problemene er NP-harde i seg selv, er de betydelig mer håndterbare i praksis. Denne oppgaven utforsker måter å kartlegge Bayesiansk nettverksstrukturlæring til maksimale asykliske subgrafforekomster og trekke ut omtrentlige løsninger for det første problemet, basert på løsninger oppnådd for det andre.
Vår forskning tyder på at selv om økt skalerbarhet kan oppnås på denne måten, er det adskillig mer utfordrende å opprettholde den teoretisk forståelsen med denne tilnærmingen. Videre fant vi ut at å lære strukturen til Bayesianske nettverk basert på maksimal asyklisk subgraf kanskje ikke er den beste metoden generelt, men vi identifiserte en kontekst - lineære strukturelle ligningsmodeller - der vi eksperimentelt kunne validere fordelene med denne tilnærmingen, som fører til rask og skalerbar identifisering av strukturen og med mulighet til å lære komplekse strukturer på en måte som er konkurransedyktig med moderne metoder.Bayesian networks form an important class of probabilistic graphical models. They consist of a structure (a directed acyclic graph) expressing conditional independencies among random variables, as well as parameters (local probability distributions). As such, Bayesian networks are generative models encoding joint probability distributions in a compact form.
The main difficulty in learning a Bayesian network comes from the structure itself, owing to the combinatorial nature of the acyclicity property; it is well known and does not come as a surprise that the structure learning problem is NP-hard in general. Exact algorithms solving this problem exist: dynamic programming and integer linear programming are prime contenders when one seeks to recover the structure of small-to-medium sized Bayesian networks from data. On the other hand, heuristics such as hill climbing variants are commonly used when attempting to approximately learn the structure of larger networks with thousands of variables, although these heuristics typically lack theoretical guarantees and their performance in practice may become unreliable when dealing with large scale learning.
This thesis is concerned with the development of scalable methods tackling the Bayesian network structure learning problem, while attempting to maintain a level of theoretical control. This was achieved via the use of related combinatorial problems, namely the maximum acyclic subgraph problem and its dual problem the minimum feedback arc set problem. Although these problems are NP-hard themselves, they exhibit significantly better tractability in practice. This thesis explores ways to map Bayesian network structure learning into maximum acyclic subgraph instances and extract approximate solutions for the first problem, based on the solutions obtained for the second.
Our research suggests that although increased scalability can be achieved this way, maintaining theoretical understanding based on this approach is much more challenging. Furthermore, we found that learning the structure of Bayesian networks based on maximum acyclic subgraph/minimum feedback arc set may not be the go-to method in general, but we identified a setting - linear structural equation models - in which we could experimentally validate the benefits of this approach, leading to fast and scalable structure recovery with the ability to learn complex structures in a competitive way compared to state-of-the-art baselines.Doktorgradsavhandlin
Computational modelling and optimal control of interacting particle systems: connecting dynamic density functional theory and PDE-constrained optimization
Processes that can be described by systems of interacting particles are ubiquitous in nature, society, and industry, ranging from animal flocking, the spread of diseases, and formation of opinions to nano-filtration, brewing, and printing. In real-world applications it is often relevant to not only model a process of interest, but to also optimize it in order to achieve a desired outcome with minimal resources, such as time, money, or energy.
Mathematically, the dynamics of interacting particle systems can be described using Dynamic Density Functional Theory (DDFT). The resulting models are nonlinear, nonlocal partial differential equations (PDEs) that include convolution integral terms. Such terms also enter the naturally arising no-flux boundary conditions. Due to the nonlocal, nonlinear nature of such problems they are challenging both to analyse and solve numerically.
In order to optimize processes that are modelled by PDEs, one can apply tools from PDE-constrained optimization. The aim here is to drive a quantity of interest towards a target state by varying a control variable. This is constrained by a PDE describing the process of interest, in which the control enters as a model parameter. Such problems can be tackled by deriving and solving the (first-order) optimality system, which couples the PDE model with a second PDE and an algebraic equation. Solving such a system numerically is challenging, since large matrices arise in its discretization, for which efficient solution strategies have to be found. Most work in PDE-constrained optimization addresses problems in which the control is applied linearly, and which are constrained by local, often linear PDEs, since introducing nonlinearity significantly increases the complexity in both the analysis and numerical solution
of the optimization problem.
However, in order to optimize real-world processes described by nonlinear, nonlocal DDFT models, one has to develop an optimal control framework for such models. The aim is to drive the particles to some desired distribution by applying control either linearly, through a particle source, or bilinearly, though an advective field. The optimization process is constrained by the DDFT model that describes how the particles move under the influence of advection, diffusion, external forces, and particle–particle interactions. In order to tackle this, the (first-order) optimality system is derived, which, since it involves nonlinear (integro-)PDEs that are coupled nonlocally in space and time, is significantly harder than in the standard case. Novel numerical methods are developed, effectively combining pseudospectral methods and iterative solvers, to efficiently and accurately solve such a system.
In a next step this framework is extended so that it can capture and optimize industrially relevant processes, such as brewing and nano-filtration. In order to do so, extensions to both the DDFT model and the numerical method are made. Firstly, since industrial processes often involve tubes, funnels, channels, or tanks of various shapes, the PDE model itself, as well as the optimization problem, need to be solved on complicated domains. This is achieved by developing a novel spectral element approach that is compatible with both the PDE solver and the optimal control framework. Secondly, many industrial processes, such as nano-filtration, involve more than one type of particle. Therefore, the DDFT model is extended to describe multiple particle species. Finally, depending on the application of interest, additional physical effects need to be included in the model. In this thesis, to model sedimentation processes in brewing, the model is modified to capture volume exclusion effects
- …