1,270 research outputs found
Efficient memory management in VOD disk array servers usingPer-Storage-Device buffering
We present a buffering technique that reduces video-on-demand server memory requirements in more than one order of magnitude. This technique, Per-Storage-Device Buffering (PSDB), is based on the allocation of a fixed number of buffers per storage device, as opposed to existing solutions based on per-stream buffering allocation. The combination of this technique with disk array servers is studied in detail, as well as the influence of Variable Bit Streams. We also present an interleaved data placement strategy, Constant Time Length Declustering, that results in optimal performance in the service of VBR streams. PSDB is evaluated by extensive simulation of a disk array server model that incorporates a simulation based admission test.This research was supported in part by the National R&D Program of Spain, Project Number TIC97-0438.Publicad
Characterizing Deep-Learning I/O Workloads in TensorFlow
The performance of Deep-Learning (DL) computing frameworks rely on the
performance of data ingestion and checkpointing. In fact, during the training,
a considerable high number of relatively small files are first loaded and
pre-processed on CPUs and then moved to accelerator for computation. In
addition, checkpointing and restart operations are carried out to allow DL
computing frameworks to restart quickly from a checkpoint. Because of this, I/O
affects the performance of DL applications. In this work, we characterize the
I/O performance and scaling of TensorFlow, an open-source programming framework
developed by Google and specifically designed for solving DL problems. To
measure TensorFlow I/O performance, we first design a micro-benchmark to
measure TensorFlow reads, and then use a TensorFlow mini-application based on
AlexNet to measure the performance cost of I/O and checkpointing in TensorFlow.
To improve the checkpointing performance, we design and implement a burst
buffer. We find that increasing the number of threads increases TensorFlow
bandwidth by a maximum of 2.3x and 7.8x on our benchmark environments. The use
of the tensorFlow prefetcher results in a complete overlap of computation on
accelerator and input pipeline on CPU eliminating the effective cost of I/O on
the overall performance. The use of a burst buffer to checkpoint to a fast
small capacity storage and copy asynchronously the checkpoints to a slower
large capacity storage resulted in a performance improvement of 2.6x with
respect to checkpointing directly to slower storage on our benchmark
environment.Comment: Accepted for publication at pdsw-DISCS 201
Multi-disk subsystem organizations for very large databases
This thesis investigates efficient mappings of very large databases with non-uniform access to its data. to a. multi-disk subsystem.
Two algorithms are developed to distribute the database across multiple disks, possibly with replication, in order to minimize latency and maximize throughput. These algorithms are compared with respect to the amount of replication overhead incurred to achieve desired throughput.
A simulator is developed to simulate these two mapping algorithms and investigate the efficiency of these two mappings
VLSI single-chip (255,223) Reed-Solomon encoder with interleaver
The invention relates to a concatenated Reed-Solomon/convolutional encoding system consisting of a Reed-Solomon outer code and a convolutional inner code for downlink telemetry in space missions, and more particularly to a Reed-Solomon encoder with programmable interleaving of the information symbols and code correction symbols to combat error bursts in the Viterbi decoder
Deterministic, Stash-Free Write-Only ORAM
Write-Only Oblivious RAM (WoORAM) protocols provide privacy by encrypting the
contents of data and also hiding the pattern of write operations over that
data. WoORAMs provide better privacy than plain encryption and better
performance than more general ORAM schemes (which hide both writing and reading
access patterns), and the write-oblivious setting has been applied to important
applications of cloud storage synchronization and encrypted hidden volumes. In
this paper, we introduce an entirely new technique for Write-Only ORAM, called
DetWoORAM. Unlike previous solutions, DetWoORAM uses a deterministic,
sequential writing pattern without the need for any "stashing" of blocks in
local state when writes fail. Our protocol, while conceptually simple, provides
substantial improvement over prior solutions, both asymptotically and
experimentally. In particular, under typical settings the DetWoORAM writes only
2 blocks (sequentially) to backend memory for each block written to the device,
which is optimal. We have implemented our solution using the BUSE (block device
in user-space) module and tested DetWoORAM against both an encryption only
baseline of dm-crypt and prior, randomized WoORAM solutions, measuring only a
3x-14x slowdown compared to an encryption-only baseline and around 6x-19x
speedup compared to prior work
Digital Complex Correlator for a C-band Polarimetry survey
The international Galactic Emission Mapping project aims to map and
characterize the polarization field of the Milky Way. In Portugal it will
cartograph the C-band sky polarized emission of the Northern Hemisphere and
provide templates for map calibration and foreground control of microwave space
probes like ESA Planck Surveyor mission. The receiver system is equipped with a
novel receiver with a full digital back-end using an Altera Field Programmable
Gate Array, having a very favorable cost/performance relation. This new digital
backend comprises a base-band complex cross-correlator outputting the four
Stokes parameters of the incoming polarized radiation. In this document we
describe the design and implementation of the complex correlator using COTS
components and a processing FPGA, detailing the method applied in the several
algorithm stages and suitable for large sky area surveys.Comment: 15 pages, 10 figures; submitted to Experimental Astronomy, Springe
Development of a prototype multi-processing interactive software invocation system
The Interactive Software Invocation System (NASA-ISIS) was first transported to the M68000 microcomputer, and then rewritten in the programming language Path Pascal. Path Pascal is a significantly enhanced derivative of Pascal, allowing concurrent algorithms to be expressed using the simple and elegant concept of Path Expressions. The primary results of this contract was to verify the viability of Path Pascal as a system's development language. The NASA-ISIS implementation using Path Pascal is a prototype of a large, interactive system in Path Pascal. As such, it is an excellent demonstration of the feasibility of using Path Pascal to write even more extensive systems. It is hoped that future efforts will build upon this research and, ultimately, that a full Path Pascal/ISIS Operating System (PPIOS) might be developed
On Error Decoding of Locally Repairable and Partial MDS Codes
We consider error decoding of locally repairable codes (LRC) and partial MDS
(PMDS) codes through interleaved decoding. For a specific class of LRCs we
investigate the success probability of interleaved decoding. For PMDS codes we
show that there is a wide range of parameters for which interleaved decoding
can increase their decoding radius beyond the minimum distance with the
probability of successful decoding approaching , when the code length goes
to infinity
- …