Search CORE

853 research outputs found

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R

Author: Li Xingguo
Liu Han
Yuan Xiaoming
Zhao Tuo
Publication venue
Publication date: 27/06/2020
Field of study

This paper describes an R package named flare, which implements a family of new high dimensional regression methods (LAD Lasso, SQRT Lasso,

\ell_q

Lasso, and Dantzig selector) and their extensions to sparse precision matrix estimation (TIGER and CLIME). These methods exploit different nonsmooth loss functions to gain modeling flexibility, estimation robustness, and tuning insensitiveness. The developed solver is based on the alternating direction method of multipliers (ADMM). The package flare is coded in double precision C, and called from R by a user-friendly interface. The memory usage is optimized by using the sparse matrix output. The experiments show that flare is efficient and can scale up to large problems

arXiv.org e-Print Archive

Homotopy Parametric Simplex Method for Sparse Learning

Author: Liu Han
Pang Haotian
Vanderbei Robert
Zhao Tuo
Publication venue
Publication date: 27/11/2017
Field of study

High dimensional sparse learning has imposed a great computational challenge to large scale data analysis. In this paper, we are interested in a broad class of sparse learning approaches formulated as linear programs parametrized by a {\em regularization factor}, and solve them by the parametric simplex method (PSM). Our parametric simplex method offers significant advantages over other competing methods: (1) PSM naturally obtains the complete solution path for all values of the regularization parameter; (2) PSM provides a high precision dual certificate stopping criterion; (3) PSM yields sparse solutions through very few iterations, and the solution sparsity significantly reduces the computational cost per iteration. Particularly, we demonstrate the superiority of PSM over various sparse learning approaches, including Dantzig selector for sparse linear regression, LAD-Lasso for sparse robust linear regression, CLIME for sparse precision matrix estimation, sparse differential network estimation, and sparse Linear Programming Discriminant (LPD) analysis. We then provide sufficient conditions under which PSM always outputs sparse solutions such that its computational performance can be significantly boosted. Thorough numerical experiments are provided to demonstrate the outstanding performance of the PSM method.Comment: Accepted by NIPS 201

arXiv.org e-Print Archive

Sparse transition matrix estimation for high-dimensional and locally stationary vector autoregressive models

Author: Chen Xiaohui
Ding Xin
Qiu Ziyi
Publication venue
Publication date: 29/09/2017
Field of study

We consider the estimation of the transition matrix in the high-dimensional time-varying vector autoregression (TV-VAR) models. Our model builds on a general class of locally stationary VAR processes that evolve smoothly in time. We propose a hybridized kernel smoothing and

\ell^1

-regularized method to directly estimate the sequence of time-varying transition matrices. Under the sparsity assumption on the transition matrix, we establish the rate of convergence of the proposed estimator and show that the convergence rate depends on the smoothness of the locally stationary VAR processes only through the smoothness of the transition matrix function. In addition, for our estimator followed by thresholding, we prove that the false positive rate (type I error) and false negative rate (type II error) in the pattern recovery can asymptotically vanish in the presence of weak signals without assuming the minimum nonzero signal strength condition. Favorable finite sample performances over the

\ell^2

-penalized least-squares estimator and the unstructured maximum likelihood estimator are shown on simulated data. We also provide two real examples on estimating the dependence structures on financial stock prices and economic exchange rates datasets

arXiv.org e-Print Archive

Estimation of Large Covariance and Precision Matrices from Temporally Dependent Observations

Author: Nan Bin
Shu Hai
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 18/07/2017
Field of study

We consider the estimation of large covariance and precision matrices from high-dimensional sub-Gaussian or heavier-tailed observations with slowly decaying temporal dependence. The temporal dependence is allowed to be long-range so with longer memory than those considered in the current literature. We show that several commonly used methods for independent observations can be applied to the temporally dependent data. In particular, the rates of convergence are obtained for the generalized thresholding estimation of covariance and correlation matrices, and for the constrained

\ell_1

minimization and the

\ell_1

penalized likelihood estimation of precision matrix. Properties of sparsistency and sign-consistency are also established. A gap-block cross-validation method is proposed for the tuning parameter selection, which performs well in simulations. As a motivating example, we study the brain functional connectivity using resting-state fMRI time series data with long-range temporal dependence.Comment: The result for banding estimator of covariance matrix is given in the version 2 of this article. See arXiv:1412.5059v

arXiv.org e-Print Archive

A study on tuning parameter selection for the high-dimensional lasso

Author: Homrighausen Darren
McDonald Daniel J.
Publication venue: 'Informa UK Limited'
Publication date: 12/07/2019
Field of study

High-dimensional predictive models, those with more measurements than observations, require regularization to be well defined, perform well empirically, and possess theoretical guarantees. The amount of regularization, often determined by tuning parameters, is integral to achieving good performance. One can choose the tuning parameter in a variety of ways, such as through resampling methods or generalized information criteria. However, the theory supporting many regularized procedures relies on an estimate for the variance parameter, which is complicated in high dimensions. We develop a suite of information criteria for choosing the tuning parameter in lasso regression by leveraging the literature on high-dimensional variance estimation. We derive intuition showing that existing information-theoretic approaches work poorly in this setting. We compare our risk estimators to existing methods with an extensive simulation and derive some theoretical justification. We find that our new estimators perform well across a wide range of simulation conditions and evaluation criteria.Comment: 64 pages, 11 figure

arXiv.org e-Print Archive

Confidence intervals for high-dimensional Cox models

Author: Bradic Jelena
Samworth Richard J.
Yu Yi
Publication venue
Publication date: 03/03/2018
Field of study

The purpose of this paper is to construct confidence intervals for the regression coefficients in high-dimensional Cox proportional hazards regression models where the number of covariates may be larger than the sample size. Our debiased estimator construction is similar to those in Zhang and Zhang (2014) and van de Geer et al. (2014), but the time-dependent covariates and censored risk sets introduce considerable additional challenges. Our theoretical results, which provide conditions under which our confidence intervals are asymptotically valid, are supported by extensive numerical experiments.Comment: 36 pages, 1 figur

arXiv.org e-Print Archive

Design of Experiments for Screening

Author: A Boukouvalas
A Marrel
A Miller
A Saltelli
A Saltelli
A.E. Vine
AB Owen
AC Atkinson
AM Dean
B Abraham
B Bettonvil
B Bettonvil
B. Tang
B. Tang
BA Jones
BA Jones
BA Jones
C Daniel
C Linkletter
C.F.J. Wu
C.F.J. Wu
CA Mauro
CE Rasmussen
CJ Marley
CR Rao
CS Cheng
D Draguljić
D Dupuy
D Scott-Drechsel
D. Xing
D.T. Voss
DA Bulutoglu
DJ Finney
DKJ Lin
DKJ Lin
EI George
F Campolongo
F Campolongo
F Satterthwaite
FKH Phoa
FKH Phoa
G Damblin
G Pujol
G.S. Watson
GEP Box
GEP Box
GEP Box
GEP Box
GM James
H Moon
H. Wan
H. Xu
H. Yang
H.B.E. Wan
HA Chipman
JL Loeppky
JPC Kleijnen
K.Q. Ye
KHV Booth
KJ Ryan
KP Burnham
KT Fang
L Pronzato
L. Xiao
M Claeys-Bruno
M Hamada
M Hamada
M Johnson
M Liu
M.A. Wolters
MD McKay
MD Morris
MD Morris
MD Morris
MJ Hall
N Durrande
NA Butler
NK Nguyen
NK Nguyen
NK Nguyen
PR Scinto
PZG Qian
PZG Qian
PZG Qian
R Dorfman
R Jin
R Joseph
RB Gramacy
RK Meyer
RL Iman
RL Plackett
RV Lenth
S Ba
SC Cotter
SG Gilmour
SM Lewis
TJ Santner
VE Bowman
W DuMouchel
W Li
W.J. Welch
WA Brenneman
WW Li
X Qu
Y Benjamini
Y Liu
Publication venue
Publication date: 18/10/2015
Field of study

The aim of this paper is to review methods of designing screening experiments, ranging from designs originally developed for physical experiments to those especially tailored to experiments on numerical models. The strengths and weaknesses of the various designs for screening variables in numerical models are discussed. First, classes of factorial designs for experiments to estimate main effects and interactions through a linear statistical model are described, specifically regular and nonregular fractional factorial designs, supersaturated designs and systematic fractional replicate designs. Generic issues of aliasing, bias and cancellation of factorial effects are discussed. Second, group screening experiments are considered including factorial group screening and sequential bifurcation. Third, random sampling plans are discussed including Latin hypercube sampling and sampling plans to estimate elementary effects. Fourth, a variety of modelling methods commonly employed with screening designs are briefly described. Finally, a novel study demonstrates six screening methods on two frequently-used exemplars, and their performances are compared

arXiv.org e-Print Archive

An Imputation-Consistency Algorithm for High-Dimensional Missing Data Problems and Beyond

Author: Jia Bochao
Li Qizhai
Liang Faming
Luo Ye
Xue Jingnan
Publication venue
Publication date: 06/02/2018
Field of study

Missing data are frequently encountered in high-dimensional problems, but they are usually difficult to deal with using standard algorithms, such as the expectation-maximization (EM) algorithm and its variants. To tackle this difficulty, some problem-specific algorithms have been developed in the literature, but there still lacks a general algorithm. This work is to fill the gap: we propose a general algorithm for high-dimensional missing data problems. The proposed algorithm works by iterating between an imputation step and a consistency step. At the imputation step, the missing data are imputed conditional on the observed data and the current estimate of parameters; and at the consistency step, a consistent estimate is found for the minimizer of a Kullback-Leibler divergence defined on the pseudo-complete data. For high dimensional problems, the consistent estimate can be found under sparsity constraints. The consistency of the averaged estimate for the true parameter can be established under quite general conditions. The proposed algorithm is illustrated using high-dimensional Gaussian graphical models, high-dimensional variable selection, and a random coefficient model.Comment: 30 pages, 1 figur

arXiv.org e-Print Archive

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

Author: Ge Jason
Jiang Haoming
Li Xingguo
Liu Han
Wang Mengdi
Zhang Tong
Zhao Tuo
Publication venue
Publication date: 26/06/2020
Field of study

We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e.g., sparse linear regression, sparse logistic regression, sparse Poisson regression and scaled sparse linear regression) combined with efficient active set selection strategies. Besides, the library allows users to choose different sparsity-inducing regularizers, including the convex

\ell_1

, nonconvex MCP and SCAD regularizers. The library is coded in C++ and has user-friendly R and Python wrappers. Numerical experiments demonstrate that picasso can scale up to large problems efficiently

arXiv.org e-Print Archive

Machine learning in solar physics

Author: Cheung M. C. M.
Chifu I.
Gafeira R.
Ramos A. Asensio
Publication venue
Publication date: 27/06/2023
Field of study

The application of machine learning in solar physics has the potential to greatly enhance our understanding of the complex processes that take place in the atmosphere of the Sun. By using techniques such as deep learning, we are now in the position to analyze large amounts of data from solar observations and identify patterns and trends that may not have been apparent using traditional methods. This can help us improve our understanding of explosive events like solar flares, which can have a strong effect on the Earth environment. Predicting hazardous events on Earth becomes crucial for our technological society. Machine learning can also improve our understanding of the inner workings of the sun itself by allowing us to go deeper into the data and to propose more complex models to explain them. Additionally, the use of machine learning can help to automate the analysis of solar data, reducing the need for manual labor and increasing the efficiency of research in this field.Comment: 100 pages, 13 figures, 286 references, accepted for publication as a Living Review in Solar Physics (LRSP

arXiv.org e-Print Archive