80 research outputs found

    Distributed Asynchronous Optimization with Unbounded Delays: How Slow Can You Go?

    Get PDF
    International audienceOne of the most widely used training methods for large-scale machine learning problems is distributed asynchronous stochastic gradient descent (DASGD). However, a key issue in its implementation is that of delays: when a "worker" node asynchronously contributes a gradient update to the "master", the global model parameter may have changed, rendering this information stale. In massively parallel computing grids, these delays can quickly add up if a node is saturated, so the convergence of DASGD is uncertain under these conditions. Nevertheless, by using a judiciously chosen quasilinear step-size sequence, we show that it is possible to amortize these delays and achieve global convergence with probability 1, even under polynomially growing delays, reaffirming in this way the successful application of DASGD to large-scale optimization problems

    Techniques for comparing efficacy and cost-effectiveness of cancer therapies, and improved inference tools

    Get PDF
    This thesis focuses on two separate topics, one lying at the intersection of health care and statistics, and the other one rising from classical statistical inference. Chapters 2 through 4 address the first topic. They explore and improve techniques for comparing both efficacy and cost-effectiveness of cancer therapies. Chapter 5 focuses on the second topic. It proposes a new estimator for the number of binomial experiments when the success probability is unknown. Chapter 2 of my thesis establishes an overall ranking of efficacy of possible interventions in patients with advanced or metastatic melanoma within a Bayesian setting. Currently, chemotherapy is established as the standard of care for melanoma, but is often associated with poor responses and short survival. However, recent groundbreaking discoveries in tumor biology and immune surveillance have yielded effective molecularly targeted therapies and immune agents. These new treatments have changed the therapeutic scenario to a completely new reality of high response rates, prolonged disease control, and the possibility of talking of a cure for some patients. These positive results have opened new avenues in the treatment of melanoma patients and, as expected, added layers of complexity to management of those patients. We perform a network meta-analysis in a hierarchical Bayesian random-effects model to assess the role of immunotherapies and targeted therapies. We also evaluate the impact of immunotherapy biomarkers within a hierarchical Bayesian setting with a view to support and improve the therapeutic decision-making process. Chapter 3 evaluates indirectly the effectiveness of two treatments for advanced castration-resistant prostate cancer (CRPC). Prostate cancer is the most commonly diagnosed cancer. It eventually progresses to CRPC. CRPC is one of the leading cause of cancer-related deaths among men in developed countries. Two novel androgen receptor pathway inhibitors, abiraterone acetate and enzalutamide, have recently become available. They have been developed with the aim of prolonging survival, minimizing complications, and maintaining or improving quality of life in patients with advanced or metastatic CRPC. However, these two treatment options have not been compared head to head against each other in a prospective randomized fashion. In order to choose the optimal treatment and the optimal sequencing of treatments, we perform two analyses. The first one is a comparative effectiveness study within a Bayesian hierarchical setting. The second one is a sequencing assessment of treatments in the context of exponential survival models, informed by Bayesian meta-analyses with between and within study variance components. Chapter 4 proposes an improved methodology for conducting both meta-analysis and secondary data analyses based on randomized controlled trials. One of the deficiencies inherent to traditional methodology is the lack of individual patient-level data which serves as a basic ingredient for secondary analyses. This shortcoming is handled by recovering the raw time-to-event data through the inverted Kaplan-Meier equations and simulations. The recovered survival distributions are then modeled within a Bayesian semi-parametric framework. We use a hierarchical Dirichlet Process to model discrete-time event probabilities across the time-line up to last follow-up, and a truncated Weibull model to model the tail of the distribution. This approach avoids assumption about the shape of the survival distributions up to the last follow-up time, allows incorporation of censored data, and accommodates study-to-study heterogeneity. The parametric nature of the Weibull model on the other hand is well suited to making inferences about the survival curve in the absence of data. Finally, patient-level disease trajectories are modeled using a Bayesian Markov model. We demonstrate this methodology using simulations and a study on advanced non-small cell lung cancer. Finally, Chapter 5 presents a new approach to the binomial n problem, which concerns the estimation of the number of binomial experiments when the success probability is unknown. Some real-life situations, where the problem arises, include the estimation of the number of unreported crimes as well as the number of undetected software errors. Due to its inherent instability, the problem remains fundamentally difficult. Furthermore, neither one of the two parameters of the binomial distribution are unbiasedly estimable when both are unknown. We present an efficient method of estimating the number of trials using a beta-binomial MLE approach. In the absence of replications, when inference about the parameter of interest is not possible, we present a Bayesian approach applied in the context of contingency tables.Ph.D

    Probabilistic Models for Exploring, Predicting, and Influencing Health Trajectories

    Get PDF
    Over the past decade, healthcare systems around the world have transitioned from paper to electronic health records. The majority of healthcare systems today now host large, on-premise clusters that support an institution-wide network of computers deployed at the point of care. A stream of transactions pass through this network each minute, recording information about what medications a patient is receiving, what procedures they have had, and the results of hundreds of physical examinations and laboratory tests. There is increasing pressure to leverage these repositories of data as a means to improve patient outcomes, drive down costs, or both. To date, however, there is no clear answer on how to best do this. In this thesis, we study two important problems that can help to accomplish these goals: disease subtyping and disease trajectory prediction. In disease subtyping, the goal is to better understand complex, heterogeneous diseases by discovering patient populations with similar symptoms and disease expression. As we discover and refine subtypes, we can integrate them into clinical practice to improve management and can use them to motivate new hypothesis-driven research into the genetic and molecular underpinnings of the disease. In disease trajectory prediction, our goal is to forecast how severe a patient's disease will become in the future. Tools to make accurate forecasts have clear implications for clinical decision support, but they can also improve our process for validating new therapies through trial enrichment. We identify several characteristics of EHR data that make it to difficult to do subtyping and disease trajectory prediction. The key contribution of this thesis is a collection of novel probabilistic models that address these challenges and make it possible to successfully solve the subtyping and disease trajectory prediction problems using EHR data
    • …
    corecore