1,836 research outputs found

    Optimal Two-Step Prediction in Regression

    Full text link
    High-dimensional prediction typically comprises two steps: variable selection and subsequent least-squares refitting on the selected variables. However, the standard variable selection procedures, such as the lasso, hinge on tuning parameters that need to be calibrated. Cross-validation, the most popular calibration scheme, is computationally costly and lacks finite sample guarantees. In this paper, we introduce an alternative scheme, easy to implement and both computationally and theoretically efficient

    Simultaneous Variable and Covariance Selection with the Multivariate Spike-and-Slab Lasso

    Full text link
    We propose a Bayesian procedure for simultaneous variable and covariance selection using continuous spike-and-slab priors in multivariate linear regression models where q possibly correlated responses are regressed onto p predictors. Rather than relying on a stochastic search through the high-dimensional model space, we develop an ECM algorithm similar to the EMVS procedure of Rockova & George (2014) targeting modal estimates of the matrix of regression coefficients and residual precision matrix. Varying the scale of the continuous spike densities facilitates dynamic posterior exploration and allows us to filter out negligible regression coefficients and partial covariances gradually. Our method is seen to substantially outperform regularization competitors on simulated data. We demonstrate our method with a re-examination of data from a recent observational study of the effect of playing high school football on several later-life cognition, psychological, and socio-economic outcomes

    Group-Lasso on Splines for Spectrum Cartography

    Full text link
    The unceasing demand for continuous situational awareness calls for innovative and large-scale signal processing algorithms, complemented by collaborative and adaptive sensing platforms to accomplish the objectives of layered sensing and control. Towards this goal, the present paper develops a spline-based approach to field estimation, which relies on a basis expansion model of the field of interest. The model entails known bases, weighted by generic functions estimated from the field's noisy samples. A novel field estimator is developed based on a regularized variational least-squares (LS) criterion that yields finitely-parameterized (function) estimates spanned by thin-plate splines. Robustness considerations motivate well the adoption of an overcomplete set of (possibly overlapping) basis functions, while a sparsifying regularizer augmenting the LS cost endows the estimator with the ability to select a few of these bases that ``better'' explain the data. This parsimonious field representation becomes possible, because the sparsity-aware spline-based method of this paper induces a group-Lasso estimator for the coefficients of the thin-plate spline expansions per basis. A distributed algorithm is also developed to obtain the group-Lasso estimator using a network of wireless sensors, or, using multiple processors to balance the load of a single computational unit. The novel spline-based approach is motivated by a spectrum cartography application, in which a set of sensing cognitive radios collaborate to estimate the distribution of RF power in space and frequency. Simulated tests corroborate that the estimated power spectrum density atlas yields the desired RF state awareness, since the maps reveal spatial locations where idle frequency bands can be reused for transmission, even when fading and shadowing effects are pronounced.Comment: Submitted to IEEE Transactions on Signal Processin

    A visual Analytics System for Optimizing Communications in Massively Parallel Applications

    Get PDF
    Current and future supercomputers have tens of thousands of compute nodes interconnected with high-dimensional networks and complex network topologies for improved performance. Application developers are required to write scalable parallel programs in order to achieve high throughput on these machines. Application performance is largely determined by efficient inter-process communication. A common way to analyze and optimize performance is through profiling parallel codes to identify communication bottlenecks. However, understanding gigabytes of profile data is not a trivial task. In this paper, we present a visual analytics system for identifying the scalability bottlenecks and improving the communication efficiency of massively parallel applications. Visualization methods used in this system are designed to comprehend large-scale and varied communication patterns on thousands of nodes in complex networks such as the 5D torus and the dragonfly. We also present efficient rerouting and remapping algorithms that can be coupled with our interactive visual analytics design for performance optimization. We demonstrate the utility of our system with several case studies using three benchmark applications on two leading supercomputers. The mapping suggestion from our system led to 38% improvement in hop-bytes for MiniAMR application on 4,096 MPI processes.This research has been sponsored in part by the U.S. National Science Foundation through grant IIS-1320229, and the U.S. Department of Energy through grants DE-SC0012610 and DE-SC0014917. This research has been funded in part and used resources of the Argonne Leadership Computing Facility at Argonne National Lab- oratory, which is supported by the Office of Science of the U.S. Department of Energy under contract no. DE-AC02-06CH11357. This work was supported in part by the DOE Office of Science, ASCR, under award numbers 57L38, 57L32, 57L11, 57K50, and 508050

    Examining Connections between Gendered Dimensions of Inequality and Deforestation in Nepal

    Get PDF
    The United Nations recognizes empowering women as a key component of achieving numerous development-related goals. Qualitative studies suggest that communities where men and women have equal levels of agency over resource allocation and land tenure sometimes experience decreases in forest degradation and deforestation, all else being equal. However, these patterns are spatially heterogeneous, as are patterns of gender inequality in terms of land tenure and agency. This paper uses data from the Demographic and Health Surveys (DHS) to quantify the relationship between gender inequality and ecosystem degradation using three linear regression models, Empirical Bayesian Kriging, and mapping the intersections between gender inequality and deforestation. Results from LASSO, Ordinary Least Squares, and Stepwise regression models show that there is no linear relationship between gender inequality and deforestation. Additionally, the distributions of gender inequality as it pertains to land tenure and deforestation are highly heterogeneous over space, indicating potential sociocultural and sociodemographic factors not captured in my data. Further work should focus on identifying ways to incorporate complex gender dynamics into environmental planning at multiple levels of forest governance

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF
    • …
    corecore