114,906 research outputs found

    Scheduling of data-intensive workloads in a brokered virtualized environment

    Full text link
    Providing performance predictability guarantees is increasingly important in cloud platforms, especially for data-intensive applications, for which performance depends greatly on the available rates of data transfer between the various computing/storage hosts underlying the virtualized resources assigned to the application. With the increased prevalence of brokerage services in cloud platforms, there is a need for resource management solutions that consider the brokered nature of these workloads, as well as the special demands of their intra-dependent components. In this paper, we present an offline mechanism for scheduling batches of brokered data-intensive workloads, which can be extended to an online setting. The objective of the mechanism is to decide on a packing of the workloads in a batch that minimizes the broker's incurred costs, Moreover, considering the brokered nature of such workloads, we define a payment model that provides incentives to these workloads to be scheduled as part of a batch, which we analyze theoretically. Finally, we evaluate the proposed scheduling algorithm, and exemplify the fairness of the payment model in practical settings via trace-based experiments

    Prochlo: Strong Privacy for Analytics in the Crowd

    Full text link
    The large-scale monitoring of computer users' software activities has become commonplace, e.g., for application telemetry, error reporting, or demographic profiling. This paper describes a principled systems architecture---Encode, Shuffle, Analyze (ESA)---for performing such monitoring with high utility while also protecting user privacy. The ESA design, and its Prochlo implementation, are informed by our practical experiences with an existing, large deployment of privacy-preserving software monitoring. (cont.; see the paper

    Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools

    Full text link
    This paper introduces a high-throughput software tool framework called {\it sam2bam} that enables users to significantly speedup pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156-186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize the multiple processors, available memory, high-bandwidth of storage, and hardware compression accelerators if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting the input data are provided by {\it plug-in} tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of NGS data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime for whole-genome sequencing data from about 20 hours to about nine minutes on the same system using up to 711 GB of memory

    How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility

    Full text link
    Recommendation systems are ubiquitous and impact many domains; they have the potential to influence product consumption, individuals' perceptions of the world, and life-altering decisions. These systems are often evaluated or trained with data from users already exposed to algorithmic recommendations; this creates a pernicious feedback loop. Using simulations, we demonstrate how using data confounded in this way homogenizes user behavior without increasing utility
    • …
    corecore