13 research outputs found
Invertible Kernel PCA with Random Fourier Features
Kernel principal component analysis (kPCA) is a widely studied method to
construct a low-dimensional data representation after a nonlinear
transformation. The prevailing method to reconstruct the original input signal
from kPCA -- an important task for denoising -- requires us to solve a
supervised learning problem. In this paper, we present an alternative method
where the reconstruction follows naturally from the compression step. We first
approximate the kernel with random Fourier features. Then, we exploit the fact
that the nonlinear transformation is invertible in a certain subdomain. Hence,
the name \emph{invertible kernel PCA (ikPCA)}. We experiment with different
data modalities and show that ikPCA performs similarly to kPCA with supervised
reconstruction on denoising tasks, making it a strong alternative.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
ECG-Based Electrolyte Prediction: Evaluating Regression and Probabilistic Methods
Objective: Imbalances of the electrolyte concentration levels in the body can
lead to catastrophic consequences, but accurate and accessible measurements
could improve patient outcomes. While blood tests provide accurate
measurements, they are invasive and the laboratory analysis can be slow or
inaccessible. In contrast, an electrocardiogram (ECG) is a widely adopted tool
which is quick and simple to acquire. However, the problem of estimating
continuous electrolyte concentrations directly from ECGs is not well-studied.
We therefore investigate if regression methods can be used for accurate
ECG-based prediction of electrolyte concentrations. Methods: We explore the use
of deep neural networks (DNNs) for this task. We analyze the regression
performance across four electrolytes, utilizing a novel dataset containing over
290000 ECGs. For improved understanding, we also study the full spectrum from
continuous predictions to binary classification of extreme concentration
levels. To enhance clinical usefulness, we finally extend to a probabilistic
regression approach and evaluate different uncertainty estimates. Results: We
find that the performance varies significantly between different electrolytes,
which is clinically justified in the interplay of electrolytes and their
manifestation in the ECG. We also compare the regression accuracy with that of
traditional machine learning models, demonstrating superior performance of
DNNs. Conclusion: Discretization can lead to good classification performance,
but does not help solve the original problem of predicting continuous
concentration levels. While probabilistic regression demonstrates potential
practical usefulness, the uncertainty estimates are not particularly
well-calibrated. Significance: Our study is a first step towards accurate and
reliable ECG-based prediction of electrolyte concentration levels.Comment: Code and trained models are available at
https://github.com/philippvb/ecg-electrolyte-regressio
On Deep Learning for Low-Dimensional Representations
In science and engineering, we are often concerned with creating mathematical models from data. These models are abstractions of observed real-world processes where the goal is often to understand these processes or to use the models to predict future instances of the observed process. Natural processes often exhibit low-dimensional structures which we can embed into the model. In mechanistic models, we directly include this structure into the model through mathematical equations often inspired by physical constraints. In contrast, within machine learning and particularly in deep learning we often deal with high-dimensional data such as images and learn a model without imposing a low-dimensional structure. Instead, we learn some kind of representations that are useful for the task at hand. While representation learning arguably enables the power of deep neural networks, it is less clear how to understand real-world processes from these models or whether we can benefit from including a low-dimensional structure in the model. Learning from data with intrinsic low-dimensional structure and how to replicate this structure in machine learning models is studied within this dissertation. While we put specific emphasis on deep neural networks, we also consider kernel machines in the context of Gaussian processes, as well as linear models, for example by studying the generalisation of models with an explicit low-dimensional structure. First, we argue that many real-world observations have an intrinsic low-dimensional structure. We can find evidence of this structure for example through low-rank approximations of many real-world data sets. Then, we face two open-ended research questions. First, we study the behaviour of machine learning models when they are trained on data with low-dimensional structures. Here we investigate fundamental aspects of learning low-dimensional representations and how well models with explicit low-dimensional structures perform. Second, we focus on applications in the modelling of dynamical systems and the medical domain. We investigate how we can benefit from low-dimensional representations for these applications and explore the potential of low-dimensional model structures for predictive tasks. Finally, we give a brief outlook on how we go beyond learning low-dimensional structures and identify the underlying mechanisms that generate the data to better model and understand these processes. This dissertation provides an overview of learning low-dimensional structures in machine learning models. It covers a wide range of topics from representation learning over the study of generalisation in overparameterized models to applications with time series and medical applications. However, each contribution opens up a range of questions to study in the future. Therefore this dissertation serves as a starting point to further explore learning of low-dimensional structure and representations
On Deep Learning for Low-Dimensional Representations
In science and engineering, we are often concerned with creating mathematical models from data. These models are abstractions of observed real-world processes where the goal is often to understand these processes or to use the models to predict future instances of the observed process. Natural processes often exhibit low-dimensional structures which we can embed into the model. In mechanistic models, we directly include this structure into the model through mathematical equations often inspired by physical constraints. In contrast, within machine learning and particularly in deep learning we often deal with high-dimensional data such as images and learn a model without imposing a low-dimensional structure. Instead, we learn some kind of representations that are useful for the task at hand. While representation learning arguably enables the power of deep neural networks, it is less clear how to understand real-world processes from these models or whether we can benefit from including a low-dimensional structure in the model. Learning from data with intrinsic low-dimensional structure and how to replicate this structure in machine learning models is studied within this dissertation. While we put specific emphasis on deep neural networks, we also consider kernel machines in the context of Gaussian processes, as well as linear models, for example by studying the generalisation of models with an explicit low-dimensional structure. First, we argue that many real-world observations have an intrinsic low-dimensional structure. We can find evidence of this structure for example through low-rank approximations of many real-world data sets. Then, we face two open-ended research questions. First, we study the behaviour of machine learning models when they are trained on data with low-dimensional structures. Here we investigate fundamental aspects of learning low-dimensional representations and how well models with explicit low-dimensional structures perform. Second, we focus on applications in the modelling of dynamical systems and the medical domain. We investigate how we can benefit from low-dimensional representations for these applications and explore the potential of low-dimensional model structures for predictive tasks. Finally, we give a brief outlook on how we go beyond learning low-dimensional structures and identify the underlying mechanisms that generate the data to better model and understand these processes. This dissertation provides an overview of learning low-dimensional structures in machine learning models. It covers a wide range of topics from representation learning over the study of generalisation in overparameterized models to applications with time series and medical applications. However, each contribution opens up a range of questions to study in the future. Therefore this dissertation serves as a starting point to further explore learning of low-dimensional structure and representations
Tensor Network Kalman Filter for Large-Scale MIMO Systems: With Application to Adaptive Optics
For large-scale system with tens of thousands of states and outputs the computation in the conventional Kalman filter becomes time-consuming such that Kalman filtering in large-scale real-time application is practically infeasible. A possible mathematical framework to lift the curse of dimensionality is to lift the problem in higher dimensions with the use of tensors and then decompose it. The tensor-train decomposition is chosen due to its computational advantages for systems with low tensor-train rank. Within this thesis two main limitations of the existing tensor Kalman filter are solved. First, a method is developed based on tensor-train rank truncation of the covariances to increase the computational speed for more general systems. Second, a MIMO tensor Kalman filter is developed for a specific class of systems. The power of the developed methods is shown on the example of adaptive optics which fits into the framework. A comparison with state-of-the-art large-scale estimation algorithms shows the computational advantage of the tensor Kalman filter at the cost of approximation errors.Mechanical Engineering | Systems and Contro
Deep State Space Models for Nonlinear System Identification
Deep state space models (SSMs) are an actively researched model class for temporal models developed in the deep learning community which have a close connection to classic SSMs. The use of deep SSMs as a black-box identification model can describe a wide range of dynamics due to the flexibility of deep neural networks. Additionally, the probabilistic nature of the model class allows the uncertainty of the system to be modelled. In this work a deep SSM class and its parameter learning algorithm are explained in an effort to extend the toolbox of nonlinear identification methods with a deep learning based method. Six recent deep SSMs are evaluated in a first unified implementation on nonlinear system identification benchmarks
Tensor network Kalman filter for LTI systems
An extension of the Tensor Network (TN) Kalman filter [2], [3] for large scale LTI systems is presented in this paper. The TN Kalman filter can handle exponentially large state vectors without constructing them explicitly. In order to have efficient algebraic operations, a low TN rank is required. We exploit the possibility to approximate the covariance matrix as a TN with a low TN rank. This reduces the computational complexity for general SISO and MIMO LTI systems with TN rank greater than one significantly while obtaining an accurate estimation. Improvements of this method in terms of computational complexity compared to the conventional Kalman filter are demonstrated in numerical simulations for large scale systems.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Team Raf Van de PlasTeam Jan-Willem van Wingerde
Distributed training and scalability for the particle clustering method UCluster
In recent years, machine-learning methods have become increasingly important for the experiments at the Large Hadron Collider (LHC). They are utilised in everything from trigger systems to reconstruction and data analysis. The recent UCluster method is a general model providing unsupervised clustering of particle physics data, that can be easily modified to provide solutions for a variety of different decision problems. In the current paper, we improve on the UCluster method by adding the option of training the model in a scalable and distributed fashion, and thereby extending its utility to learn from arbitrarily large data sets. UCluster combines a graph-based neural network called ABCnet with a clustering step, using a combined loss function in the training phase. The original code is publicly available in TensorFlow v1.14 and has previously been trained on a single GPU. It shows a clustering accuracy of 81% when applied to the problem of multi-class classification of simulated jet events. Our implementation adds the distributed training functionality by utilising the Horovod distributed training framework, which necessitated a migration of the code to TensorFlow v2. Together with using parquet files for splitting data up between different compute nodes, the distributed training makes the model scalable to any amount of input data, something that will be essential for use with real LHC data sets. We find that the model is well suited for distributed training, with the training time decreasing in direct relation to the number of GPU’s used. However, further improvements by a more exhaustive and possibly distributed hyper-parameter search is required in order to achieve the reported accuracy of the original UCluster method
Deep networks for system identification : A survey
Deep learning is a topic of considerable current interest. The availability of massive data collections and powerful software resources has led to an impressive amount of results in many application areas that reveal essential but hidden properties of the observations. System identification learns mathematical descriptions of dynamic systems from input-output data and can thus benefit from the advances of deep neural networks to enrich the possible range of models to choose from. For this reason, we provide a survey of deep learning from a system identification perspective. We cover a wide spectrum of topics to enable researchers to understand the methods, providing rigorous practical and theoretical insights into the benefits and challenges of using them. The main aim of the identified model is to predict new data from previous observations. This can be achieved with different deep learning-based modelling techniques and we discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks. Their parameters have to be estimated from past data to optimize the prediction performance. For this purpose, we discuss a specific set of first-order optimization tools that have emerged as efficient. The survey then draws connections to the well-studied area of kernel-based methods. They control the data fit by regularization terms that penalize models not in line with prior assumptions. We illustrate how to cast them in deep architectures to obtain deep kernel-based methods. The success of deep learning also resulted in surprising empirical observations, like the counter-intuitive behaviour of models with many parameters. We discuss the role of overparameterized models, including their connection to kernels, as well as implicit regularization mechanisms which affect generalization, specifically the interesting phenomena of benign overfitting and double-descent. Finally, we highlight numerical, computational and software aspects in the area with the help of applied examples
Development and validation of deep learning ECG-based prediction of myocardial infarction in emergency department patients
Myocardial infarction diagnosis is a common challenge in the emergency department. In managed settings, deep learning-based models and especially convolutional deep models have shown promise in electrocardiogram (ECG) classification, but there is a lack of high-performing models for the diagnosis of myocardial infarction in real-world scenarios. We aimed to train and validate a deep learning model using ECGs to predict myocardial infarction in real-world emergency department patients. We studied emergency department patients in the Stockholm region between 2007 and 2016 that had an ECG obtained because of their presenting complaint. We developed a deep neural network based on convolutional layers similar to a residual network. Inputs to the model were ECG tracing, age, and sex; and outputs were the probabilities of three mutually exclusive classes: non-ST-elevation myocardial infarction (NSTEMI), ST-elevation myocardial infarction (STEMI), and control status, as registered in the SWEDEHEART and other registries. We used an ensemble of five models. Among 492,226 ECGs in 214,250 patients, 5,416 were recorded with an NSTEMI, 1,818 a STEMI, and 485,207 without a myocardial infarction. In a random test set, our model could discriminate STEMIs/NSTEMIs from controls with a C-statistic of 0.991/0.832 and had a Brier score of 0.001/0.008. The model obtained a similar performance in a temporally separated test set of the study sample, and achieved a C-statistic of 0.985 and a Brier score of 0.002 in discriminating STEMIs from controls in an external test set. We developed and validated a deep learning model with excellent performance in discriminating between control, STEMI, and NSTEMI on the presenting ECG of a real-world sample of the important population of all-comers to the emergency department. Hence, deep learning models for ECG decision support could be valuable in the emergency department