236,775 research outputs found

    Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization

    Full text link
    Protecting vast quantities of data poses a daunting challenge for the growing number of organizations that collect, stockpile, and monetize it. The ability to distinguish data that is actually needed from data collected "just in case" would help these organizations to limit the latter's exposure to attack. A natural approach might be to monitor data use and retain only the working-set of in-use data in accessible storage; unused data can be evicted to a highly protected store. However, many of today's big data applications rely on machine learning (ML) workloads that are periodically retrained by accessing, and thus exposing to attack, the entire data store. Training set minimization methods, such as count featurization, are often used to limit the data needed to train ML workloads to improve performance or scalability. We present Pyramid, a limited-exposure data management system that builds upon count featurization to enhance data protection. As such, Pyramid uniquely introduces both the idea and proof-of-concept for leveraging training set minimization methods to instill rigor and selectivity into big data management. We integrated Pyramid into Spark Velox, a framework for ML-based targeting and personalization. We evaluate it on three applications and show that Pyramid approaches state-of-the-art models while training on less than 1% of the raw data

    Social Security and the Retirement and Savings Behavior of Low Income Households

    Get PDF
    In this paper, we develop and estimate a model of retirement and savings incorporating limited borrowing, stochastic wage offers, health status and survival, social security benefits, Medicare and employer provided health insurance coverage, and intentional bequests. The model is estimated on sample of relatively poor households from the first three waves of the Health and Retirement Study (HRS), for whom we would expect social security income to be of particular importance. The estimated model is used to simulate the responses to several counterfactual experiments corresponding to changes in social security rules. These include changes in benefit levels, in the payroll tax, in the social security earnings tax and in early and normal retirement ages.Social Security, Retirement, Savings

    Blood eosinophils and inhaled corticosteroid/long-acting β-2 agonist efficacy in COPD

    Get PDF
    Objective We performed a review of studies of fluticasone propionate (FP)/salmeterol (SAL) (combination inhaled corticosteroid (ICS)/long-acting β2-agonist (LABA)) in patients with COPD, which measured baseline (pretreatment) blood eosinophil levels, to test whether blood eosinophil levels ≥2% were associated with a greater reduction in exacerbation rates with ICS therapy. Methods Three studies of ≥1-year duration met the inclusion criteria. Moderate and severe exacerbation rates were analysed according to baseline blood eosinophil levels (<2% vs ≥2%). At baseline, 57–75% of patients had ≥2% blood eosinophils. Changes in FEV1 and St George’s Respiratory Questionnaire (SGRQ) scores were compared by eosinophil level. Results For patients with ≥2% eosinophils, FP/SAL was associated with significant reductions in exacerbation rates versus tiotropium (INSPIRE: n=719, rate ratio (RR)=0.75, 95% CI 0.60 to 0.92, p=0.006) and versus placebo (TRISTAN: n=1049, RR=0.63, 95% CI 0.50 to 0.79, p<0.001). No significant difference was seen in the <2% eosinophil subgroup in either study (INSPIRE: n=550, RR=1.18, 95% CI 0.92 to 1.51, p=0.186; TRISTAN: n=354, RR=0.99, 95% CI 0.67 to 1.47, p=0.957, respectively). In SCO30002 (n=373), no significant effects were observed (FP or FP/SAL vs placebo). No relationship was observed in any study between eosinophil subgroup and treatment effect on FEV1 and SGRQ. Discussion Baseline blood eosinophil levels may represent an informative marker for exacerbation reduction with ICS/LABA in patients with COPD and a history of moderate/severe exacerbations

    Landslide susceptibility mapping using multi-criteria evaluation techniques in Chittagong Metropolitan Area, Bangladesh

    Get PDF
    Landslides are a common hazard in the highly urbanized hilly areas in Chittagong Metropolitan Area (CMA), Bangladesh. The main cause of the landslides is torrential rain in short period of time. This area experiences several landslides each year, resulting in casualties, property damage, and economic loss. Therefore, the primary objective of this research is to produce the Landslide Susceptibility Maps for CMA so that appropriate landslide disaster risk reduction strategies can be developed. In this research, three different Geographic Information System-based Multi-Criteria Decision Analysis methods—the Artificial Hierarchy Process (AHP), Weighted Linear Combination (WLC), and Ordered Weighted Average (OWA)—were applied to scientifically assess the landslide susceptible areas in CMA. Nine different thematic layers or landslide causative factors were considered. Then, seven different landslide susceptible scenarios were generated based on the three weighted overlay techniques. Later, the performances of the methods were validated using the area under the relative operating characteristic curves. The accuracies of the landslide susceptibility maps produced by the AHP, WLC_1, WLC_2, WLC_3, OWA_1, OWA_2, and OWA_3 methods were found as 89.80, 83.90, 91.10, 88.50, 90.40, 95.10, and 87.10 %, respectively. The verification results showed satisfactory agreement between the susceptibility maps produced and the existing data on the 20 historical landslide locations

    Towards a Holistic Integration of Spreadsheets with Databases: A Scalable Storage Engine for Presentational Data Management

    Full text link
    Spreadsheet software is the tool of choice for interactive ad-hoc data management, with adoption by billions of users. However, spreadsheets are not scalable, unlike database systems. On the other hand, database systems, while highly scalable, do not support interactivity as a first-class primitive. We are developing DataSpread, to holistically integrate spreadsheets as a front-end interface with databases as a back-end datastore, providing scalability to spreadsheets, and interactivity to databases, an integration we term presentational data management (PDM). In this paper, we make a first step towards this vision: developing a storage engine for PDM, studying how to flexibly represent spreadsheet data within a database and how to support and maintain access by position. We first conduct an extensive survey of spreadsheet use to motivate our functional requirements for a storage engine for PDM. We develop a natural set of mechanisms for flexibly representing spreadsheet data and demonstrate that identifying the optimal representation is NP-Hard; however, we develop an efficient approach to identify the optimal representation from an important and intuitive subclass of representations. We extend our mechanisms with positional access mechanisms that don't suffer from cascading update issues, leading to constant time access and modification performance. We evaluate these representations on a workload of typical spreadsheets and spreadsheet operations, providing up to 20% reduction in storage, and up to 50% reduction in formula evaluation time

    Fast algorithms for solving H∞-norm minimization problems

    Get PDF
    We propose an efficient computational approach to minimize the H ∞-norm of a transfer-function matrix depending affinely on a set of free parameters. The minimization problem, formulated as a semi-infinite convex programming problem, is solved via a relaxation approach over a finite set of frequency values. In this way, a significant speed up is achieved by avoiding the solution of high order LMIs resulting by equivalently formulating the minimization problem as a high dimensional semidefinite programming problem. Numerical results illustrate the superiority of proposed approach over LMIs based techniques in solving zero order H∞-norm approximation problems

    Re-evaluating the success of the EPA's 33/50 program: evidence from facility participation

    Get PDF
    Using previously unavailable data, we examine facility participation in the 33/50 Program and its effect on aggregate and toxicity weighted emissions between1991 and 1995 for a sample of facilities whose parent firms committed to the Program. By focusing on individual facilities we avoid the biases created by aggregating emissions across facilities. We find that while more polluting facilities within a firm were more likely to participate, even when we account for the toxicity of emissions, across firms there is no evidence of greater participation by facilities with higher emissions. Although emissions of the 33/50 chemicals fell over the years, we find that participation in the Program did not lead to the decline in the 33/50 releases generated by these facilities.Toxic Release Inventory; program participation; program evaluation, GMM, dynamic panel
    corecore