6,778 research outputs found

    PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems

    Full text link
    Machine Learning models are often composed of pipelines of transformations. While this design allows to efficiently execute single model components at training time, prediction serving has different requirements such as low latency, high throughput and graceful performance degradation under heavy load. Current prediction serving systems consider models as black boxes, whereby prediction-time-specific optimizations are ignored in favor of ease of deployment. In this paper, we present PRETZEL, a prediction serving system introducing a novel white box architecture enabling both end-to-end and multi-model optimizations. Using production-like model pipelines, our experiments show that PRETZEL is able to introduce performance improvements over different dimensions; compared to state-of-the-art approaches PRETZEL is on average able to reduce 99th percentile latency by 5.5x while reducing memory footprint by 25x, and increasing throughput by 4.7x.Comment: 16 pages, 14 figures, 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 201

    A MULTI-COMMODITY NETWORK FLOW APPROACH FOR SEQUENCING REFINED PRODUCTS IN PIPELINE SYSTEMS

    Get PDF
    In the oil industry, there is a special class of pipelines used for the transportation of refined products. The problem of sequencing the inputs to be pumped through this type of pipeline seeks to generate the optimal sequence of batches of products and their destination as well as the amount of product to be pumped such that the total operational cost of the system, or another operational objective, is optimized while satisfying the product demands according to the requirements set by the customers. This dissertation introduces a new modeling approach and proposes a solution methodology for this problem capable of dealing with the topology of all the scenarios reported in the literature so far. The system representation is based on a 1-0 multi commodity network flow formulation that models the dynamics of the system, including aspects such as conservation of product flow constraints at the depots, travel time of products from the refinery to their depot destination and what happens upstream and downstream the line whenever a product is being received at a given depot while another one is being injected into the line at the refinery. It is assumed that the products are already available at the refinery and their demand at each depot is deterministic and known beforehand. The model provides the sequence, the amounts, the destination and the trazability of the shipped batches of different products from their sources to their destinations during the entire horizon planning period while seeking the optimization of pumping and inventory holding costs satisfying the time window constraints. A survey for the available literature is presented. Given the problem structure, a decomposition based solution procedure is explored with the intention of exploiting the network structure using the network simplex method. A branch and bound algorithm that exploits the dynamics of the system assigning priorities for branching to a selected set of variables is proposed and its computational results for the solution, obtained via GAMS/CPLEX, of the formulation for random instances of the problem of different sizes are presented. Future research directions on this field are proposed

    Supply-based optimal scheduling of oil product pipelines

    Get PDF

    The "MIND" Scalable PIM Architecture

    Get PDF
    MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture
    • …
    corecore