39,630 research outputs found

    Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization

    Full text link
    Protecting vast quantities of data poses a daunting challenge for the growing number of organizations that collect, stockpile, and monetize it. The ability to distinguish data that is actually needed from data collected "just in case" would help these organizations to limit the latter's exposure to attack. A natural approach might be to monitor data use and retain only the working-set of in-use data in accessible storage; unused data can be evicted to a highly protected store. However, many of today's big data applications rely on machine learning (ML) workloads that are periodically retrained by accessing, and thus exposing to attack, the entire data store. Training set minimization methods, such as count featurization, are often used to limit the data needed to train ML workloads to improve performance or scalability. We present Pyramid, a limited-exposure data management system that builds upon count featurization to enhance data protection. As such, Pyramid uniquely introduces both the idea and proof-of-concept for leveraging training set minimization methods to instill rigor and selectivity into big data management. We integrated Pyramid into Spark Velox, a framework for ML-based targeting and personalization. We evaluate it on three applications and show that Pyramid approaches state-of-the-art models while training on less than 1% of the raw data

    A flexible mandatory access control policy for XML databases

    Get PDF
    A flexible mandatory access control policy (MAC) for XML databases is presented in this paper. The label type and label access policy can be defined according to the requirements of applications. In order to preserve the integrity of data in XML databases, a constraint between a read access rule and a write access rule in label access policy is introduced. Rules for label assignment and propagation are proposed to alleviate the workload of label assignment. Also, a solution for resolving conflicts of label assignments is proposed. At last, operations for implementation of the MAC policy in a XML database are illustrated

    Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server

    Full text link
    In last decade, data analytics have rapidly progressed from traditional disk-based processing to modern in-memory processing. However, little effort has been devoted at enhancing performance at micro-architecture level. This paper characterizes the performance of in-memory data analytics using Apache Spark framework. We use a single node NUMA machine and identify the bottlenecks hampering the scalability of workloads. We also quantify the inefficiencies at micro-architecture level for various data analysis workloads. Through empirical evaluation, we show that spark workloads do not scale linearly beyond twelve threads, due to work time inflation and thread level load imbalance. Further, at the micro-architecture level, we observe memory bound latency to be the major cause of work time inflation.Comment: Accepted to The 5th IEEE International Conference on Big Data and Cloud Computing (BDCloud 2015

    Health and social care workforce planning and development – an overview

    Get PDF
    Purpose - The purpose of this paper is to discuss the issues relating to getting the right health and social care staff with the right skills in the right place at the right time and at the right price. Design/methodology/approach - Key points arising from several master-classes with health and social care managers, supported by a literature review, generated remarkable insights into health and social care workforce planning and development (WP & D). Findings - Flawed methods and overwhelming data are major barriers to health and social care WP & D. Inefficient and ineffective WP & D policy and practice, therefore, may lead to inappropriate care teams, which in turn lead to sub-optimal and costly health and social care. Increasing health and social care demand and service re-design, as the population grows and ages, and services move from hospital to community, means that workforce planners face several challenges. Issues that drive and restrain their health and social care WP & D efforts are lucid and compelling, which leave planners in no doubt what is expected if they are to succeed and health and social care is to develop. One main barrier they face is that although WP & D definitions and models in the literature are logical, clear and effective, they are imperfect, so planners do not always have comprehensive tools or data to help them determine the ideal workforce. They face other barriers. First, WP & D can be fragmented and uni-disciplinary when modern health and social care is integrating. Second, recruitment and retention problems can easily stymie planners' best endeavours because the people that services need (i.e. staff with the right skills), even if they exist, are not evenly distributed throughout the country. Practical implications - This paper underlines triangulated workforce demand and supply methods (described in the paper), which help planners to equalise workloads among disparate groups and isolated practitioners - an important job satisfaction and staff retention issue. Regular and systematic workforce reviews help planners to justify their staffing establishments; it seems vital, therefore, that they have robust methods and supporting data at their fingertips

    Classification and reduction of pilot error

    Get PDF
    Human error is a primary or contributing factor in about two-thirds of commercial aviation accidents worldwide. With the ultimate goal of reducing pilot error accidents, this contract effort is aimed at understanding the factors underlying error events and reducing the probability of certain types of errors by modifying underlying factors such as flight deck design and procedures. A review of the literature relevant to error classification was conducted. Classification includes categorizing types of errors, the information processing mechanisms and factors underlying them, and identifying factor-mechanism-error relationships. The classification scheme developed by Jens Rasmussen was adopted because it provided a comprehensive yet basic error classification shell or structure that could easily accommodate addition of details on domain-specific factors. For these purposes, factors specific to the aviation environment were incorporated. Hypotheses concerning the relationship of a small number of underlying factors, information processing mechanisms, and error types types identified in the classification scheme were formulated. ASRS data were reviewed and a simulation experiment was performed to evaluate and quantify the hypotheses

    An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices

    Get PDF
    Statistical agencies face a dual mandate to publish accurate statistics while protecting respondent privacy. Increasing privacy protection requires decreased accuracy. Recognizing this as a resource allocation problem, we propose an economic solution: operate where the marginal cost of increasing privacy equals the marginal benefit. Our model of production, from computer science, assumes data are published using an efficient differentially private algorithm. Optimal choice weighs the demand for accurate statistics against the demand for privacy. Examples from U.S. statistical programs show how our framework can guide decision-making. Further progress requires a better understanding of willingness-to-pay for privacy and statistical accuracy

    Making a mess of academic work: experience, purpose and identity

    Get PDF
    Within the policy discourse of academic work, teaching, research and administration are seen as discrete elements of practice. We explore the assumptions evident in this 'official story' and contrast it with the messy experience of academic work, drawing upon empirical studies and conceptualisations from our own research and from recent literature. We propose that purposive disciplinary practice across time and space is inextricably entangled with and fundamental to academic experience and identity; the fabrications of managerialism, such as the workload allocation form, fragment this experience and attempt to reclassify purposes and conceptualisations of academic work. Using actor-network theory as an analytical tool, we explore the gap between official and unofficial stories, attempting to reframe the relationship between discipline and its various manifestations in academic practice and suggesting a research agenda for investigating academic work
    • 

    corecore