353 research outputs found

    Towards the interpretation of time-varying regularization parameters in streaming penalized regression models

    Get PDF
    High-dimensional, streaming datasets are ubiquitous in modern applications. Examples range from finance and e-commerce to the study of biomedical and neuroimaging data. As a result, many novel algorithms have been proposed to address challenges posed by such datasets. In this work, we focus on the use of ℓ1 regularized linear models in the context of (possibly non-stationary) streaming data. Recently, it has been noted that the choice of the regularization parameter is fundamental in such models and several methods have been proposed which iteratively tune such a parameter in a time-varying manner; thereby allowing the underlying sparsity of estimated models to vary. Moreover, in many applications, inference on the regularization parameter may itself be of interest, as such a parameter is related to the underlying sparsity of the model. However, in this work, we highlight and provide extensive empirical evidence regarding how various (often unrelated) statistical properties in the data can lead to changes in the regularization parameter. In particular, through various synthetic experiments, we demonstrate that changes in the regularization parameter may be driven by changes in the true underlying sparsity, signal-to-noise ratio or even model misspecification. The purpose of this letter is, therefore, to highlight and catalog various statistical properties which induce changes in the associated regularization parameter. We conclude by presenting two applications: one relating to financial data and another to neuroimaging data, where the aforementioned discussion is relevant

    DANR: Discrepancy-aware Network Regularization

    Full text link
    Network regularization is an effective tool for incorporating structural prior knowledge to learn coherent models over networks, and has yielded provably accurate estimates in applications ranging from spatial economics to neuroimaging studies. Recently, there has been an increasing interest in extending network regularization to the spatio-temporal case to accommodate the evolution of networks. However, in both static and spatio-temporal cases, missing or corrupted edge weights can compromise the ability of network regularization to discover desired solutions. To address these gaps, we propose a novel approach---{\it discrepancy-aware network regularization} (DANR)---that is robust to inadequate regularizations and effectively captures model evolution and structural changes over spatio-temporal networks. We develop a distributed and scalable algorithm based on the alternating direction method of multipliers (ADMM) to solve the proposed problem with guaranteed convergence to global optimum solutions. Experimental results on both synthetic and real-world networks demonstrate that our approach achieves improved performance on various tasks, and enables interpretation of model changes in evolving networks

    Reducing Wait Time Prediction In Hospital Emergency Room: Lean Analysis Using a Random Forest Model

    Get PDF
    Most of the patients visiting emergency departments face long waiting times due to overcrowding which is a major concern across the hospital in the United States. Emergency Department (ED) overcrowding is a common phenomenon across hospitals, which leads to issues for the hospital management, such as increased patient s dissatisfaction and an increase in the number of patients choosing to terminate their ED visit without being attended to by a medical healthcare professional. Patients who have to Leave Without Being Seen (LWBS) by doctors often leads to loss of revenue to hospitals encouraging healthcare professionals to analyze ways to improve operational efficiency and reduce the operational expenses of an emergency department. To keep patients informed of the conditions in the emergency room, recently hospitals have started publishing wait times online. Posted wait times help patients to choose the ED which is least overcrowded thus benefiting patients with shortest waiting time and allowing hospitals to allocate and plan resources appropriately. This requires an accurate and efficient method to model the experienced waiting time for patients visiting an emergency medical services unit. In this thesis, the author seeks to estimate the waiting time for low acuity patients within an ED setting; using regularized regression methods such as Lasso, Ridge, Elastic Net, SCAD and MCP; along with tree-based regression (Random Forest). For accurately capturing the dynamic state of emergency rooms, queues of patients at various stage of ED is used as candidate predictor variables along with time patient s arrival time to account for diurnal variation. Best waiting time prediction model is selected based on the analysis of historical data from the hospital. Tree-based regression model predicts wait time of low acuity patients in ED with more accuracy when compared with regularized regression, conventional rolling average, and quantile regression methods. Finally, most influential predictors for predictability of patient wait time are identified for the best performing model

    Distributed Quantile Regression Analysis and a Group Variable Selection Method

    Get PDF
    This dissertation develops novel methodologies for distributed quantile regression analysis for big data by utilizing a distributed optimization algorithm called the alternating direction method of multipliers (ADMM). Specifically, we first write the penalized quantile regression into a specific form that can be solved by the ADMM and propose numerical algorithms for solving the ADMM subproblems. This results in the distributed QR-ADMM algorithm. Then, to further reduce the computational time, we formulate the penalized quantile regression into another equivalent ADMM form in which all the subproblems have exact closed-form solutions and hence avoid iterative numerical methods. This results in the single-loop QPADM algorithm that further improve on the computational efficiency of the QR-ADMM. Both QR-ADMM and QPADM enjoy flexible parallelization by enabling data splitting across both sample space and feature space, which make them especially appealing for the case when both sample size n and feature dimension p are large. Besides the QR-ADMM and QPADM algorithms for penalized quantile regression, we also develop a group variable selection method by approximating the Bayesian information criterion. Unlike existing penalization methods for feature selection, our proposed gMIC algorithm is free of parameter tuning and hence enjoys greater computational efficiency. Although the current version of gMIC focuses on the generalized linear model, it can be naturally extended to the quantile regression for feature selection. We provide theoretical analysis for our proposed methods. Specifically, we conduct numerical convergence analysis for the QR-ADMM and QPADM algorithms, and provide asymptotical theories and oracle property of feature selection for the gMIC method. All our methods are evaluated with simulation studies and real data analysis

    Sparse Modeling for Image and Vision Processing

    Get PDF
    In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

    Investigating the Effects of Network Dynamics on Quality of Delivery Prediction and Monitoring for Video Delivery Networks

    Get PDF
    Video streaming over the Internet requires an optimized delivery system given the advances in network architecture, for example, Software Defined Networks. Machine Learning (ML) models have been deployed in an attempt to predict the quality of the video streams. Some of these efforts have considered the prediction of Quality of Delivery (QoD) metrics of the video stream in an effort to measure the quality of the video stream from the network perspective. In most cases, these models have either treated the ML algorithms as black-boxes or failed to capture the network dynamics of the associated video streams. This PhD investigates the effects of network dynamics in QoD prediction using ML techniques. The hypothesis that this thesis investigates is that ML techniques that model the underlying network dynamics achieve accurate QoD and video quality predictions and measurements. The thesis results demonstrate that the proposed techniques offer performance gains over approaches that fail to consider network dynamics. This thesis results highlight that adopting the correct model by modelling the dynamics of the network infrastructure is crucial to the accuracy of the ML predictions. These results are significant as they demonstrate that improved performance is achieved at no additional computational or storage cost. These techniques can help the network manager, data center operatives and video service providers take proactive and corrective actions for improved network efficiency and effectiveness

    Distribution of probabilities of Financial Risk Meter (FRM)

    Get PDF
    In diesem Artikel wird ein systemischer Risikoindikator, das Financial Risk Meter (FRM), untersucht, der auf der Grundlage der Quantil-Lasso-Regression berechnet wird. Der Standard FRM Index ist der Durchschnitt der täglichen Bestrafungsparameter für alle ausgewählten Finanzinstitute. Dieses Papier erweitert das Standard FRM auf zahlreiche neuartige FRM Kandidaten, die das Systemrisiko erfassen und die bevorstehende Rezession vorhersagen könnten. FRM Kandidaten werden mithilfe von Quantilen von Bestrafungsparametern definiert, die aus der Verteilung der Renditen von Finanzinstituten abgeleitet werden. Das Comovement von FRM Kandidaten und häufig verwendeten systemischen Risikomaßnahmen wird mit dem Korrelationskoeffizienten, der Kolmogorov-Smirnov-Teststatistik und dem Granger-Kausaltest überprüft. Darüber hinaus können FRM-Kandidaten die Wahrscheinlichkeit wirtschaftlicher Rezessionen durch Anwendung binärer Regressionsmodelle vorhersagen. Empirische Experimente werden in zwei Zeiträumen durchgeführt, nämlich der Finanzkrise von 2007 und der COVID-19-Pandemie an zwei großen Finanzmärkten, den amerikanischen und europäischen Aktienmärkten. Die Ergebnisse zeigen, dass FRM-Kandidaten geeignete systemische Risikomaßnahmen und Rezessionsprädiktoren sind, da sie die Zunahme der allgemeinen Notlage und des Marktabschwungs erfassen und sich ähnlich und sogar besser bewegen können als die gängigen systemischen Risikomaßnahmen sowohl für die amerikanischen als auch für die europäischen Aktienmärkte. Darüber hinaus liegen die von FRM Kandidaten geschätzten Rezessionswahrscheinlichkeiten nahe an den tatsächlichen Rezessionsindikatoren. Zusammenfassend lässt sich sagen, dass FRM-Kandidaten hinsichtlich Machbarkeit und Robustheit als systemische Risikoindikatoren angesehen werden können.This paper studies a systemic risk indicator, Financial Risk Meter (FRM), which is calculated based on quantile Lasso regression. The standard FRM index is the average of daily penalization parameters for all selected financial institutions. This paper extends the standard FRM to numerous novel FRM candidates that could capture systemic risk and predict the upcoming recession. FRM candidates are defined by using quantiles of penalization parameters derived from the distribution of financial institutions’ returns. The co-movement of FRM candidates and commonly used systemic risk measures are checked with the correlation coefficient, the Kolmogorov-Smirnov test statistic and the Granger causality test. Furthermore, FRM candidates are able to predict the probability of economic recessions by applying binary regression models. Empirical experiments are implemented during two periods, namely the financial crisis of 2007 and the COVID-19 pandemic, in two major financial markets, the Americas and Europe stock markets. The results prove that FRM candidates are suitable systemic risk measures and recession predictors, since they can capture the increase of overall distress and market downturn, move similarly and even better than popular systemic risk measures for both Americas and Europe stock markets. Additionally, the recession probabilities estimated from FRM candidates are close to the actual recession indicators. In conclusion, FRM candidates can be regarded as systemic risk indicators in terms of feasibility and robustness
    corecore