1,836 research outputs found
Optimal Two-Step Prediction in Regression
High-dimensional prediction typically comprises two steps: variable selection
and subsequent least-squares refitting on the selected variables. However, the
standard variable selection procedures, such as the lasso, hinge on tuning
parameters that need to be calibrated. Cross-validation, the most popular
calibration scheme, is computationally costly and lacks finite sample
guarantees. In this paper, we introduce an alternative scheme, easy to
implement and both computationally and theoretically efficient
Simultaneous Variable and Covariance Selection with the Multivariate Spike-and-Slab Lasso
We propose a Bayesian procedure for simultaneous variable and covariance
selection using continuous spike-and-slab priors in multivariate linear
regression models where q possibly correlated responses are regressed onto p
predictors. Rather than relying on a stochastic search through the
high-dimensional model space, we develop an ECM algorithm similar to the EMVS
procedure of Rockova & George (2014) targeting modal estimates of the matrix of
regression coefficients and residual precision matrix. Varying the scale of the
continuous spike densities facilitates dynamic posterior exploration and allows
us to filter out negligible regression coefficients and partial covariances
gradually. Our method is seen to substantially outperform regularization
competitors on simulated data. We demonstrate our method with a re-examination
of data from a recent observational study of the effect of playing high school
football on several later-life cognition, psychological, and socio-economic
outcomes
Group-Lasso on Splines for Spectrum Cartography
The unceasing demand for continuous situational awareness calls for
innovative and large-scale signal processing algorithms, complemented by
collaborative and adaptive sensing platforms to accomplish the objectives of
layered sensing and control. Towards this goal, the present paper develops a
spline-based approach to field estimation, which relies on a basis expansion
model of the field of interest. The model entails known bases, weighted by
generic functions estimated from the field's noisy samples. A novel field
estimator is developed based on a regularized variational least-squares (LS)
criterion that yields finitely-parameterized (function) estimates spanned by
thin-plate splines. Robustness considerations motivate well the adoption of an
overcomplete set of (possibly overlapping) basis functions, while a sparsifying
regularizer augmenting the LS cost endows the estimator with the ability to
select a few of these bases that ``better'' explain the data. This parsimonious
field representation becomes possible, because the sparsity-aware spline-based
method of this paper induces a group-Lasso estimator for the coefficients of
the thin-plate spline expansions per basis. A distributed algorithm is also
developed to obtain the group-Lasso estimator using a network of wireless
sensors, or, using multiple processors to balance the load of a single
computational unit. The novel spline-based approach is motivated by a spectrum
cartography application, in which a set of sensing cognitive radios collaborate
to estimate the distribution of RF power in space and frequency. Simulated
tests corroborate that the estimated power spectrum density atlas yields the
desired RF state awareness, since the maps reveal spatial locations where idle
frequency bands can be reused for transmission, even when fading and shadowing
effects are pronounced.Comment: Submitted to IEEE Transactions on Signal Processin
A visual Analytics System for Optimizing Communications in Massively Parallel Applications
Current and future supercomputers have tens of thousands of compute nodes interconnected with high-dimensional networks and complex network topologies for improved performance. Application developers are required to write scalable parallel programs in order to achieve high throughput on these machines. Application performance is largely determined by efficient inter-process communication. A common way to analyze and optimize performance is through profiling parallel codes to identify communication bottlenecks. However, understanding gigabytes of profile data is not a trivial task. In this paper, we present a visual analytics system for identifying the scalability bottlenecks and improving the communication efficiency of massively parallel applications. Visualization methods used in this system are designed to comprehend large-scale and varied communication patterns on thousands of nodes in complex networks such as the 5D torus and the dragonfly. We also present efficient rerouting and remapping algorithms that can be coupled with our interactive visual analytics design for performance optimization. We demonstrate the utility of our system with several case studies using three benchmark applications on two leading supercomputers. The mapping suggestion from our system led to 38% improvement in hop-bytes for MiniAMR application on 4,096 MPI processes.This research has been sponsored in part by the U.S. National Science Foundation through grant IIS-1320229, and the U.S. Department of Energy through grants DE-SC0012610 and DE-SC0014917. This research has been funded in part and used resources of the Argonne Leadership Computing Facility at Argonne National Lab- oratory, which is supported by the Office of Science of the U.S. Department of Energy under contract no. DE-AC02-06CH11357. This work was supported in part by the DOE Office of Science, ASCR, under award numbers 57L38, 57L32, 57L11, 57K50, and 508050
Examining Connections between Gendered Dimensions of Inequality and Deforestation in Nepal
The United Nations recognizes empowering women as a key component of achieving numerous development-related goals. Qualitative studies suggest that communities where men and women have equal levels of agency over resource allocation and land tenure sometimes experience decreases in forest degradation and deforestation, all else being equal. However, these patterns are spatially heterogeneous, as are patterns of gender inequality in terms of land tenure and agency. This paper uses data from the Demographic and Health Surveys (DHS) to quantify the relationship between gender inequality and ecosystem degradation using three linear regression models, Empirical Bayesian Kriging, and mapping the intersections between gender inequality and deforestation. Results from LASSO, Ordinary Least Squares, and Stepwise regression models show that there is no linear relationship between gender inequality and deforestation. Additionally, the distributions of gender inequality as it pertains to land tenure and deforestation are highly heterogeneous over space, indicating potential sociocultural and sociodemographic factors not captured in my data. Further work should focus on identifying ways to incorporate complex gender dynamics into environmental planning at multiple levels of forest governance
- …