353 research outputs found
Towards the interpretation of time-varying regularization parameters in streaming penalized regression models
High-dimensional, streaming datasets are ubiquitous in modern applications. Examples range from finance and e-commerce to the study of biomedical and neuroimaging data. As a result, many novel algorithms have been proposed to address challenges posed by such datasets. In this work, we focus on the use of ℓ1 regularized linear models in the context of (possibly non-stationary) streaming data. Recently, it has been noted that the choice of the regularization parameter is fundamental in such models and several methods have been proposed which iteratively tune such a parameter in a time-varying manner; thereby allowing the underlying sparsity of estimated models to vary. Moreover, in many applications, inference on the regularization parameter may itself be of interest, as such a parameter is related to the underlying sparsity of the model. However, in this work, we highlight and provide extensive empirical evidence regarding how various (often unrelated) statistical properties in the data can lead to changes in the regularization parameter. In particular, through various synthetic experiments, we demonstrate that changes in the regularization parameter may be driven by changes in the true underlying sparsity, signal-to-noise ratio or even model misspecification. The purpose of this letter is, therefore, to highlight and catalog various statistical properties which induce changes in the associated regularization parameter. We conclude by presenting two applications: one relating to financial data and another to neuroimaging data, where the aforementioned discussion is relevant
DANR: Discrepancy-aware Network Regularization
Network regularization is an effective tool for incorporating structural
prior knowledge to learn coherent models over networks, and has yielded
provably accurate estimates in applications ranging from spatial economics to
neuroimaging studies. Recently, there has been an increasing interest in
extending network regularization to the spatio-temporal case to accommodate the
evolution of networks. However, in both static and spatio-temporal cases,
missing or corrupted edge weights can compromise the ability of network
regularization to discover desired solutions. To address these gaps, we propose
a novel approach---{\it discrepancy-aware network regularization} (DANR)---that
is robust to inadequate regularizations and effectively captures model
evolution and structural changes over spatio-temporal networks. We develop a
distributed and scalable algorithm based on the alternating direction method of
multipliers (ADMM) to solve the proposed problem with guaranteed convergence to
global optimum solutions. Experimental results on both synthetic and real-world
networks demonstrate that our approach achieves improved performance on various
tasks, and enables interpretation of model changes in evolving networks
Reducing Wait Time Prediction In Hospital Emergency Room: Lean Analysis Using a Random Forest Model
Most of the patients visiting emergency departments face long waiting times due to overcrowding which is a major concern across the hospital in the United States. Emergency Department (ED) overcrowding is a common phenomenon across hospitals, which leads to issues for the hospital management, such as increased patient s dissatisfaction and an increase in the number of patients choosing to terminate their ED visit without being attended to by a medical healthcare professional. Patients who have to Leave Without Being Seen (LWBS) by doctors often leads to loss of revenue to hospitals encouraging healthcare professionals to analyze ways to improve operational efficiency and reduce the operational expenses of an emergency department. To keep patients informed of the conditions in the emergency room, recently hospitals have started publishing wait times online. Posted wait times help patients to choose the ED which is least overcrowded thus benefiting patients with shortest waiting time and allowing hospitals to allocate and plan resources appropriately. This requires an accurate and efficient method to model the experienced waiting time for patients visiting an emergency medical services unit.
In this thesis, the author seeks to estimate the waiting time for low acuity patients within an ED setting; using regularized regression methods such as Lasso, Ridge, Elastic Net, SCAD and MCP; along with tree-based regression (Random Forest). For accurately capturing the dynamic state of emergency rooms, queues of patients at various stage of ED is used as candidate predictor variables along with time patient s arrival time to account for diurnal variation. Best waiting time prediction model is selected based on the analysis of historical data from the hospital. Tree-based regression model predicts wait time of low acuity patients in ED with more accuracy when compared with regularized regression, conventional rolling average, and quantile regression methods. Finally, most influential predictors for predictability of patient wait time are identified for the best performing model
Distributed Quantile Regression Analysis and a Group Variable Selection Method
This dissertation develops novel methodologies for distributed quantile regression analysis
for big data by utilizing a distributed optimization algorithm called the alternating direction
method of multipliers (ADMM). Specifically, we first write the penalized quantile regression
into a specific form that can be solved by the ADMM and propose numerical algorithms
for solving the ADMM subproblems. This results in the distributed QR-ADMM
algorithm. Then, to further reduce the computational time, we formulate the penalized
quantile regression into another equivalent ADMM form in which all the subproblems have
exact closed-form solutions and hence avoid iterative numerical methods. This results in the
single-loop QPADM algorithm that further improve on the computational efficiency of the
QR-ADMM. Both QR-ADMM and QPADM enjoy flexible parallelization by enabling data
splitting across both sample space and feature space, which make them especially appealing
for the case when both sample size n and feature dimension p are large.
Besides the QR-ADMM and QPADM algorithms for penalized quantile regression, we
also develop a group variable selection method by approximating the Bayesian information
criterion. Unlike existing penalization methods for feature selection, our proposed gMIC
algorithm is free of parameter tuning and hence enjoys greater computational efficiency.
Although the current version of gMIC focuses on the generalized linear model, it can be
naturally extended to the quantile regression for feature selection.
We provide theoretical analysis for our proposed methods. Specifically, we conduct numerical
convergence analysis for the QR-ADMM and QPADM algorithms, and provide
asymptotical theories and oracle property of feature selection for the gMIC method. All
our methods are evaluated with simulation studies and real data analysis
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
Investigating the Effects of Network Dynamics on Quality of Delivery Prediction and Monitoring for Video Delivery Networks
Video streaming over the Internet requires an optimized delivery system given the advances in network architecture, for example, Software Defined Networks. Machine Learning (ML) models have been deployed in an attempt to predict the quality of the video streams. Some of these efforts have considered the prediction of Quality of Delivery (QoD) metrics of the video stream in an effort to measure the quality of the video stream from the network perspective. In most cases, these models have either treated the ML algorithms as black-boxes or failed to capture the network dynamics of the associated video streams.
This PhD investigates the effects of network dynamics in QoD prediction using ML techniques. The hypothesis that this thesis investigates is that ML techniques that model the underlying network dynamics achieve accurate QoD and video quality predictions and measurements. The thesis results demonstrate that the proposed techniques offer performance gains over approaches that fail to consider network dynamics. This thesis results highlight that adopting the correct model by modelling the dynamics of the network infrastructure is crucial to the accuracy of the ML predictions. These results are significant as they demonstrate that improved performance is achieved at no additional computational or storage cost. These techniques can help the network manager, data center operatives and video service providers take proactive and corrective actions for improved network efficiency and effectiveness
Distribution of probabilities of Financial Risk Meter (FRM)
In diesem Artikel wird ein systemischer Risikoindikator, das Financial Risk Meter (FRM), untersucht, der auf der Grundlage der Quantil-Lasso-Regression berechnet wird. Der Standard FRM Index ist der Durchschnitt der täglichen Bestrafungsparameter für alle ausgewählten Finanzinstitute. Dieses Papier erweitert das Standard FRM auf zahlreiche neuartige FRM Kandidaten, die das Systemrisiko erfassen und die bevorstehende Rezession vorhersagen könnten. FRM Kandidaten werden mithilfe von Quantilen von Bestrafungsparametern definiert, die aus der Verteilung der Renditen von Finanzinstituten abgeleitet werden. Das Comovement von FRM Kandidaten und häufig verwendeten systemischen Risikomaßnahmen wird mit dem Korrelationskoeffizienten, der Kolmogorov-Smirnov-Teststatistik und dem Granger-Kausaltest überprüft. Darüber hinaus können FRM-Kandidaten die Wahrscheinlichkeit wirtschaftlicher Rezessionen durch Anwendung binärer Regressionsmodelle vorhersagen. Empirische Experimente werden in zwei Zeiträumen durchgeführt, nämlich der Finanzkrise von 2007 und der COVID-19-Pandemie an zwei großen Finanzmärkten, den amerikanischen und europäischen Aktienmärkten. Die Ergebnisse zeigen, dass FRM-Kandidaten geeignete systemische Risikomaßnahmen und Rezessionsprädiktoren sind, da sie die Zunahme der allgemeinen Notlage und des Marktabschwungs erfassen und sich ähnlich und sogar besser bewegen können als die gängigen systemischen Risikomaßnahmen sowohl für die amerikanischen als auch für die europäischen Aktienmärkte. Darüber hinaus liegen die von FRM Kandidaten geschätzten Rezessionswahrscheinlichkeiten nahe an den tatsächlichen Rezessionsindikatoren. Zusammenfassend lässt sich sagen, dass FRM-Kandidaten hinsichtlich Machbarkeit und Robustheit als systemische Risikoindikatoren angesehen werden können.This paper studies a systemic risk indicator, Financial Risk Meter (FRM), which is calculated based on quantile Lasso regression. The standard FRM index is the average of daily penalization parameters for all selected financial institutions. This paper extends the standard FRM to numerous novel FRM candidates that could capture systemic risk and predict the upcoming recession. FRM candidates are defined by using quantiles of penalization parameters derived from the distribution of financial institutions’ returns. The co-movement of FRM candidates and commonly used systemic risk measures are checked with the correlation coefficient, the Kolmogorov-Smirnov test statistic and the Granger causality test. Furthermore, FRM candidates are able to predict the probability of economic recessions by applying binary regression models. Empirical experiments are implemented during two periods, namely the financial crisis of 2007 and the COVID-19 pandemic, in two major financial markets, the Americas and Europe stock markets. The results prove that FRM candidates are suitable systemic risk measures and recession predictors, since they can capture the increase of overall distress and market downturn, move similarly and even better than popular systemic risk measures for both Americas and Europe stock markets. Additionally, the recession probabilities estimated from FRM candidates are close to the actual recession indicators. In conclusion, FRM candidates can be regarded as systemic risk indicators in terms of feasibility and robustness
- …