571 research outputs found

    Accelerating Generalized Linear Models by Trading off Computation for Uncertainty

    Full text link
    Bayesian Generalized Linear Models (GLMs) define a flexible probabilistic framework to model categorical, ordinal and continuous data, and are widely used in practice. However, exact inference in GLMs is prohibitively expensive for large datasets, thus requiring approximations in practice. The resulting approximation error adversely impacts the reliability of the model and is not accounted for in the uncertainty of the prediction. In this work, we introduce a family of iterative methods that explicitly model this error. They are uniquely suited to parallel modern computing hardware, efficiently recycle computations, and compress information to reduce both the time and memory requirements for GLMs. As we demonstrate on a realistically large classification problem, our method significantly accelerates training by explicitly trading off reduced computation for increased uncertainty.Comment: Main text: 10 pages, 6 figures; Supplements: 13 pages, 2 figure

    Analog Photonics Computing for Information Processing, Inference and Optimisation

    Full text link
    This review presents an overview of the current state-of-the-art in photonics computing, which leverages photons, photons coupled with matter, and optics-related technologies for effective and efficient computational purposes. It covers the history and development of photonics computing and modern analogue computing platforms and architectures, focusing on optimization tasks and neural network implementations. The authors examine special-purpose optimizers, mathematical descriptions of photonics optimizers, and their various interconnections. Disparate applications are discussed, including direct encoding, logistics, finance, phase retrieval, machine learning, neural networks, probabilistic graphical models, and image processing, among many others. The main directions of technological advancement and associated challenges in photonics computing are explored, along with an assessment of its efficiency. Finally, the paper discusses prospects and the field of optical quantum computing, providing insights into the potential applications of this technology.Comment: Invited submission by Journal of Advanced Quantum Technologies; accepted version 5/06/202

    Optimal partitioning of directed acyclic graphs with dependent costs between clusters

    Full text link
    Many statistical inference contexts, including Bayesian Networks (BNs), Markov processes and Hidden Markov Models (HMMS) could be supported by partitioning (i.e.~mapping) the underlying Directed Acyclic Graph (DAG) into clusters. However, optimal partitioning is challenging, especially in statistical inference as the cost to be optimised is dependent on both nodes within a cluster, and the mapping of clusters connected via parent and/or child nodes, which we call dependent clusters. We propose a novel algorithm called DCMAP for optimal cluster mapping with dependent clusters. Given an arbitrarily defined, positive cost function based on the DAG and cluster mappings, we show that DCMAP converges to find all optimal clusters, and returns near-optimal solutions along the way. Empirically, we find that the algorithm is time-efficient for a DBN model of a seagrass complex system using a computation cost function. For a 25 and 50-node DBN, the search space size was 9.91×1099.91\times 10^9 and 1.51×10211.51\times10^{21} possible cluster mappings, respectively, but near-optimal solutions with 88\% and 72\% similarity to the optimal solution were found at iterations 170 and 865, respectively. The first optimal solution was found at iteration 934 (95% CI 926,971)(\text{95\% CI } 926,971), and 2256 (2150,2271)(2150,2271) with a cost that was 4\% and 0.2\% of the naive heuristic cost, respectively

    Contextual directed acyclic graphs

    Full text link
    Estimating the structure of directed acyclic graphs (DAGs) from observational data remains a significant challenge in machine learning. Most research in this area concentrates on learning a single DAG for the entire population. This paper considers an alternative setting where the graph structure varies across individuals based on available "contextual" features. We tackle this contextual DAG problem via a neural network that maps the contextual features to a DAG, represented as a weighted adjacency matrix. The neural network is equipped with a novel projection layer that ensures the output matrices are sparse and satisfy a recently developed characterization of acyclicity. We devise a scalable computational framework for learning contextual DAGs and provide a convergence guarantee and an analytical gradient for backpropagating through the projection layer. Our experiments suggest that the new approach can recover the true context-specific graph where existing approaches fail

    Insights on Learning Tractable Probabilistic Graphical Models

    Get PDF

    A scalable formulation of joint modelling for longitudinal and time to event data and its application on large electronic health record data of diabetes complications

    Get PDF
    INTRODUCTION: Clinical decision-making in the management of diabetes and other chronic diseases depends upon individualised risk predictions of progression of the disease or complica- tions of disease. With sequential measurements of biomarkers, it should be possible to make dynamic predictions that are updated as new data arrive. Since the 1990s, methods have been developed to jointly model longitudinal measurements of biomarkers and time-to-event data, aiming to facilitate predictions in various fields. These methods offer a comprehensive approach to analyse both the longitudinal changes in biomarkers, and the occurrence of events, allowing for a more integrated understanding of the underlying processes and improved predictive capabilities. The aim of this thesis is to investigate whether established methods for joint modelling are able to scale to large-scale electronic health record datasets with multiple biomarkers measured asynchronously, and evaluates the performance of a novel approach that overcomes the limitations of existing methods. METHODS: The epidemiological study design utilised in this research is a retrospective observa- tional study. The data used for these analyses were obtained from a registry encompassing all individuals with type 1 diabetes in Scotland, which is delivered by the Scottish Care Information - Diabetes Collaboration platform. The two outcomes studied were time to cardiovascular disease (CVD) and time to end-stage renal disease (ESRD) from T1D diag- nosis. The longitudinal biomarkers examined in the study were glycosylated haemoglobin (HbA1c) and estimated glomerular filtration rate (eGFR). These biomarkers and endpoints were selected based on their prevalence in the T1D population and the established association between these biomarkers and the outcomes. As a state-of-the-art method for joint modelling, Brilleman’s stan_jm() function was evaluated. This is an implementation of a shared parameter joint model for longitudinal and time-to- event data in Stan contributed to the rstanarm package. This was compared with a novel approach based on sequential Bayesian updating of a continuous-time state-space model for the biomarkers, with predictions generated by a Kalman filter algorithm using the ctsem package fed into a Poisson time-splitting regression model for the events. In contrast to the standard joint modelling approach that can only fit a linear mixed model to the biomarkers, the ctsem package is able to fit a broader family of models that include terms for autoregressive drift and diffusion. As a baseline for comparison, a last-observation-carried-forward model was evaluated to predict time-to-event. RESULTS: The analyses were conducted using renal replacement therapy outcome data regarding 29764 individuals and cardiovascular disease outcome data on 29479 individuals in Scotland (as per the 2019 national registry extract). The CVD dataset was reduced to 24779 individuals with both HbA1c and eGFR data measured on the same date; a limitation of the modelling function itself. The datasets include 799 events of renal replacement therapy (RRT) or death due to renal failure (6.71 years average follow-up) and 2274 CVD events (7.54 years average follow-up) respectively. The standard approach to joint modelling using quadrature to integrate over the trajectories of the latent biomarker states, implemented in rstanarm, was found to be too slow to use even with moderate-sized datasets, e.g. 17.5 hours for a subset of 2633 subjects, 35.9 hours for 5265 subjects, and more than 68 hours for 10532 subjects. The sequential Bayesian updating approach was much faster, as it was able to analyse a dataset of 29121 individuals over 225598.3 person-years in 19 hours. Comparison of the fit of different longitudinal biomarker submodels showed that the fit of models that also included a drift and diffusion term was much better (AIC 51139 deviance units lower) than models that included only a linear mixed model slope term. Despite this, the improvement in predictive performance was slight for CVD (C-statistic 0.680 to 0.696 for 2112 individuals) and only moderate for end-stage renal disease (C-statistic 0.88 to 0.91 for 2000 individuals) by adding terms for diffusion and drift. The predictive performance of joint modelling in these datasets was only slightly better than using last-observation-carried-forward in the Poisson regression model (C-statistic 0.819 over 8625 person-years). CONCLUSIONS: I have demonstrated that unlike the standard approach to joint modelling, implemented in rstanarm, the time-splitting joint modelling approach based on sequential Bayesian updating can scale to a large dataset and allows biomarker trajectories to be modelled with a wider family of models that have better fit than simple linear mixed models. However, in this application, where the only biomarkers were HbA1c and eGFR, and the outcomes were time-to-CVD and end-stage renal disease, the increment in the predictive performance of joint modelling compared with last-observation-carried forward was slight. For other outcomes, where the ability to predict time-to-event depends upon modelling latent biomarker trajectories rather than just using the last-observation-carried-forward, the advantages of joint modelling may be greater. This thesis proceeds as follows. The first two chapters serve as an introduction to the joint modelling of longitudinal and time-to-event data and its relation to other methods for clinical risk prediction. Briefly, this part explores the rationale for utilising such an approach to manage chronic diseases, such as T1D, better. The methodological chapters of this thesis describe the mathematical formulation of a multivariate shared-parameter joint model and introduce its application and performance on a subset of individuals with T1D and data pertaining to CVD and ESRD outcomes. Additionally, the mathematical formulation of an alternative time-splitting approach is demonstrated and compared to a conventional method for estimating longitudinal trajectories of clinical biomarkers used in risk prediction. Also, the key features of the pipeline required to implement this approach are outlined. The final chapters of the thesis present an applied example that demonstrates the estimation and evaluation of the alternative modelling approach and explores the types of inferences that can be obtained for a subset of individuals with T1D that might progress to ESRD. Finally, this thesis highlights the strengths and weaknesses of applying and scaling up more complex modelling approaches to facilitate dynamic risk prediction for precision medicine

    Availability Analysis of Redundant and Replicated Cloud Services with Bayesian Networks

    Full text link
    Due to the growing complexity of modern data centers, failures are not uncommon any more. Therefore, fault tolerance mechanisms play a vital role in fulfilling the availability requirements. Multiple availability models have been proposed to assess compute systems, among which Bayesian network models have gained popularity in industry and research due to its powerful modeling formalism. In particular, this work focuses on assessing the availability of redundant and replicated cloud computing services with Bayesian networks. So far, research on availability has only focused on modeling either infrastructure or communication failures in Bayesian networks, but have not considered both simultaneously. This work addresses practical modeling challenges of assessing the availability of large-scale redundant and replicated services with Bayesian networks, including cascading and common-cause failures from the surrounding infrastructure and communication network. In order to ease the modeling task, this paper introduces a high-level modeling formalism to build such a Bayesian network automatically. Performance evaluations demonstrate the feasibility of the presented Bayesian network approach to assess the availability of large-scale redundant and replicated services. This model is not only applicable in the domain of cloud computing it can also be applied for general cases of local and geo-distributed systems.Comment: 16 pages, 12 figures, journa

    Inductive Bias in Machine Learning

    Get PDF
    Induktive Verzerrung beschreibt die PrĂ€ferenz fĂŒr Lösungen, welche ein Algorithmus fĂŒr maschinelles Lernen hat, bevor er Daten sieht. Sie ist notwendiger Bestandteil fĂŒr das Ziel des maschinellen Lernens, nĂ€mlich von einer Menge an Beispielen auf ungesehene Datenpunkte zu verallgemeinern. In der Praxis wird die induktive Verzerrung jedoch oft nicht explizit spezifiziert, was theoretisches VerstĂ€ndnis verhindert und das Vertrauen in maschinelles Lernen untergrĂ€bt. Am deutlichsten wird dieses Problem am zeitgenössischen Beispiel von deep learning, das zwar in vielen Anwendungen erfolgreich ist, aber auf einer Vielzahl schlecht verstandener Techniken und Heuristiken beruht. Ziel dieser Dissertation ist es, die versteckten induktiven Verzerrungen von Algorithmen des maschinellen Lernens aufzudecken. Im ersten Teil der Dissertation decken wir die induktive Verzerrung von NetGAN auf, einem komplexen generativen Graphenmodell, das scheinbar keine PrĂ€ferenzen hat. Wir stellen fest, dass die Ursache der Generalisierung nicht in der GAN-Architektur liegt, sondern in einer unscheinbaren Approximation mit niedrigem Rang. Wir nutzen diese Erkenntnis, um NetGAN von allen unnötigen Teilen, einschließlich des GAN, zu befreien und eine stark vereinfachte Reformulierung zu erhalten. Als NĂ€chstes prĂ€sentieren wir einen generischen Algorithmus, der die versteckte induktive Verzerrung in der approximativen Bayesschen Inferenz enthĂŒllt. WĂ€hrend die induktive Verzerrung bei der Bayesschen Inferenz vollstĂ€ndig durch den Prior beschrieben wird, greifen reale Anwendungen oft auf approximative Techniken zurĂŒck, die unkontrollierbare Fehler machen können. Indem wir das Problem in Form von inkompatiblen bedingten Verteilungen reformulieren, kommen wir zu einem generischen Algorithmus, der auf Pseudo-Gibbs-Sampling basiert und die Änderung der induktiven Verzerrung auf eine Änderung des Priors zurĂŒckfĂŒhrt. Der letzte Teil der Dissertation betrifft eine hĂ€ufige induktive Verzerrung beim kausalen Lernen, die Annahme unabhĂ€ngiger kausaler Mechanismen. Unter dieser Annahme betrachten wir SchĂ€tzer fĂŒr die StĂ€rke von Störfaktoren, die die Generalisierung von der Beobachtungsverteilung auf das zugrunde liegende kausale Modell bestimmt. Wir zeigen, dass ein bestehender SchĂ€tzer im Allgemeinen inkonsistent ist und prĂ€sentieren einen konsistenten SchĂ€tzer mit Werkzeugen aus der Theorie von Zufallsmatrizen.Inductive bias describes the preference for solutions that a machine learning algorithm holds before seeing any data. It is a necessary ingredient for the goal of machine learning, which is to generalize from a set of examples to unseen data points. Yet, the inductive bias of learning algorithms is often not specified explicitly in practice, which prevents a theoretical understanding and undermines trust in machine learning. This issue is most prominently visible in the contemporary case of deep learning, which is widely successful in applications but relies on many poorly understood techniques and heuristics. This thesis aims to uncover the hidden inductive biases of machine learning algorithms. In the first part of the thesis, we uncover the implicit inductive bias of NetGAN, a complex graph generative model with seemingly no prior preferences. We find that the root of its generalization properties does not lie in the GAN architecture but in an inconspicuous low-rank approximation. We then use this insight to strip NetGAN of all unnecessary parts, including the GAN, and obtain a highly simplified reformulation. Next, we present a generic algorithm that reverse-engineers hidden inductive bias in approximate Bayesian inference. While the inductive bias is completely described by the prior distribution in full Bayesian inference, real-world applications often resort to approximate techniques that can make uncontrollable errors. By reframing the problem in terms of incompatible conditional distributions, we arrive at a generic algorithm based on pseudo-Gibbs sampling that attributes the change in inductive bias to a change in the prior distribution. The last part of the thesis concerns a common inductive bias in causal learning, the assumption of independent causal mechanisms. Under this assumption, we consider estimators for confounding strength, which governs the generalization ability from observational distribution to the underlying causal model. We show that an existing estimator is generally inconsistent and propose a consistent estimator based on tools from random matrix theory

    Insights on Learning Tractable Probabilistic Graphical Models

    Get PDF

    Heckerthoughts

    Full text link
    This manuscript is technical memoir about my work at Stanford and Microsoft Research. Included are fundamental concepts central to machine learning and artificial intelligence, applications of these concepts, and stories behind their creation
    • 

    corecore