1,270 research outputs found

    Efficient memory management in VOD disk array servers usingPer-Storage-Device buffering

    Get PDF
    We present a buffering technique that reduces video-on-demand server memory requirements in more than one order of magnitude. This technique, Per-Storage-Device Buffering (PSDB), is based on the allocation of a fixed number of buffers per storage device, as opposed to existing solutions based on per-stream buffering allocation. The combination of this technique with disk array servers is studied in detail, as well as the influence of Variable Bit Streams. We also present an interleaved data placement strategy, Constant Time Length Declustering, that results in optimal performance in the service of VBR streams. PSDB is evaluated by extensive simulation of a disk array server model that incorporates a simulation based admission test.This research was supported in part by the National R&D Program of Spain, Project Number TIC97-0438.Publicad

    Characterizing Deep-Learning I/O Workloads in TensorFlow

    Full text link
    The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on CPUs and then moved to accelerator for computation. In addition, checkpointing and restart operations are carried out to allow DL computing frameworks to restart quickly from a checkpoint. Because of this, I/O affects the performance of DL applications. In this work, we characterize the I/O performance and scaling of TensorFlow, an open-source programming framework developed by Google and specifically designed for solving DL problems. To measure TensorFlow I/O performance, we first design a micro-benchmark to measure TensorFlow reads, and then use a TensorFlow mini-application based on AlexNet to measure the performance cost of I/O and checkpointing in TensorFlow. To improve the checkpointing performance, we design and implement a burst buffer. We find that increasing the number of threads increases TensorFlow bandwidth by a maximum of 2.3x and 7.8x on our benchmark environments. The use of the tensorFlow prefetcher results in a complete overlap of computation on accelerator and input pipeline on CPU eliminating the effective cost of I/O on the overall performance. The use of a burst buffer to checkpoint to a fast small capacity storage and copy asynchronously the checkpoints to a slower large capacity storage resulted in a performance improvement of 2.6x with respect to checkpointing directly to slower storage on our benchmark environment.Comment: Accepted for publication at pdsw-DISCS 201

    Multi-disk subsystem organizations for very large databases

    Get PDF
    This thesis investigates efficient mappings of very large databases with non-uniform access to its data. to a. multi-disk subsystem. Two algorithms are developed to distribute the database across multiple disks, possibly with replication, in order to minimize latency and maximize throughput. These algorithms are compared with respect to the amount of replication overhead incurred to achieve desired throughput. A simulator is developed to simulate these two mapping algorithms and investigate the efficiency of these two mappings

    VLSI single-chip (255,223) Reed-Solomon encoder with interleaver

    Get PDF
    The invention relates to a concatenated Reed-Solomon/convolutional encoding system consisting of a Reed-Solomon outer code and a convolutional inner code for downlink telemetry in space missions, and more particularly to a Reed-Solomon encoder with programmable interleaving of the information symbols and code correction symbols to combat error bursts in the Viterbi decoder

    Deterministic, Stash-Free Write-Only ORAM

    Get PDF
    Write-Only Oblivious RAM (WoORAM) protocols provide privacy by encrypting the contents of data and also hiding the pattern of write operations over that data. WoORAMs provide better privacy than plain encryption and better performance than more general ORAM schemes (which hide both writing and reading access patterns), and the write-oblivious setting has been applied to important applications of cloud storage synchronization and encrypted hidden volumes. In this paper, we introduce an entirely new technique for Write-Only ORAM, called DetWoORAM. Unlike previous solutions, DetWoORAM uses a deterministic, sequential writing pattern without the need for any "stashing" of blocks in local state when writes fail. Our protocol, while conceptually simple, provides substantial improvement over prior solutions, both asymptotically and experimentally. In particular, under typical settings the DetWoORAM writes only 2 blocks (sequentially) to backend memory for each block written to the device, which is optimal. We have implemented our solution using the BUSE (block device in user-space) module and tested DetWoORAM against both an encryption only baseline of dm-crypt and prior, randomized WoORAM solutions, measuring only a 3x-14x slowdown compared to an encryption-only baseline and around 6x-19x speedup compared to prior work

    Digital Complex Correlator for a C-band Polarimetry survey

    Full text link
    The international Galactic Emission Mapping project aims to map and characterize the polarization field of the Milky Way. In Portugal it will cartograph the C-band sky polarized emission of the Northern Hemisphere and provide templates for map calibration and foreground control of microwave space probes like ESA Planck Surveyor mission. The receiver system is equipped with a novel receiver with a full digital back-end using an Altera Field Programmable Gate Array, having a very favorable cost/performance relation. This new digital backend comprises a base-band complex cross-correlator outputting the four Stokes parameters of the incoming polarized radiation. In this document we describe the design and implementation of the complex correlator using COTS components and a processing FPGA, detailing the method applied in the several algorithm stages and suitable for large sky area surveys.Comment: 15 pages, 10 figures; submitted to Experimental Astronomy, Springe

    Development of a prototype multi-processing interactive software invocation system

    Get PDF
    The Interactive Software Invocation System (NASA-ISIS) was first transported to the M68000 microcomputer, and then rewritten in the programming language Path Pascal. Path Pascal is a significantly enhanced derivative of Pascal, allowing concurrent algorithms to be expressed using the simple and elegant concept of Path Expressions. The primary results of this contract was to verify the viability of Path Pascal as a system's development language. The NASA-ISIS implementation using Path Pascal is a prototype of a large, interactive system in Path Pascal. As such, it is an excellent demonstration of the feasibility of using Path Pascal to write even more extensive systems. It is hoped that future efforts will build upon this research and, ultimately, that a full Path Pascal/ISIS Operating System (PPIOS) might be developed

    On Error Decoding of Locally Repairable and Partial MDS Codes

    Full text link
    We consider error decoding of locally repairable codes (LRC) and partial MDS (PMDS) codes through interleaved decoding. For a specific class of LRCs we investigate the success probability of interleaved decoding. For PMDS codes we show that there is a wide range of parameters for which interleaved decoding can increase their decoding radius beyond the minimum distance with the probability of successful decoding approaching 11, when the code length goes to infinity
    corecore