37,180 research outputs found
The Feasibility of Using Compression to Increase Memory System Performance
We investigate the feasibility of using instruction compression at some level in a multi-level memory hierarchy to increase memory system performance. For example, compressing at main memory means that main memory and the file system would contain compressed instructions, but upstream caches would see normal uncompressed instructions. Compression effectively increases the memory size and the block size reducing the miss rate at the expense of increased access latency due to decompression delays. We present a simple compression scheme using the most frequently used symbols and evaluate it with several other compression schemes. On a SPARC processor, our scheme obtained compression rirtio of 150% for most programs. We analytically evaluate the impact of compression on the average memory access (ime for various memory systems and compression approaches. Our results show that feasibility of using compression is sensitive to the miss rates and miss penalties at the point of compression and to a lesser extent the amount of compression possible. For high performance workstations of today, compression already shows promise; as miss penalties increase in future, compression will only become more feasible
GPUs as Storage System Accelerators
Massively multicore processors, such as Graphics Processing Units (GPUs),
provide, at a comparable price, a one order of magnitude higher peak
performance than traditional CPUs. This drop in the cost of computation, as any
order-of-magnitude drop in the cost per unit of performance for a class of
system components, triggers the opportunity to redesign systems and to explore
new ways to engineer them to recalibrate the cost-to-performance relation. This
project explores the feasibility of harnessing GPUs' computational power to
improve the performance, reliability, or security of distributed storage
systems. In this context, we present the design of a storage system prototype
that uses GPU offloading to accelerate a number of computationally intensive
primitives based on hashing, and introduce techniques to efficiently leverage
the processing power of GPUs. We evaluate the performance of this prototype
under two configurations: as a content addressable storage system that
facilitates online similarity detection between successive versions of the same
file and as a traditional system that uses hashing to preserve data integrity.
Further, we evaluate the impact of offloading to the GPU on competing
applications' performance. Our results show that this technique can bring
tangible performance gains without negatively impacting the performance of
concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201
Satellite on-board processing for earth resources data
Results of a survey of earth resources user applications and their data requirements, earth resources multispectral scanner sensor technology, and preprocessing algorithms for correcting the sensor outputs and for data bulk reduction are presented along with a candidate data format. Computational requirements required to implement the data analysis algorithms are included along with a review of computer architectures and organizations. Computer architectures capable of handling the algorithm computational requirements are suggested and the environmental effects of an on-board processor discussed. By relating performance parameters to the system requirements of each of the user requirements the feasibility of on-board processing is determined for each user. A tradeoff analysis is performed to determine the sensitivity of results to each of the system parameters. Significant results and conclusions are discussed, and recommendations are presented
The On-Site Analysis of the Cherenkov Telescope Array
The Cherenkov Telescope Array (CTA) observatory will be one of the largest
ground-based very high-energy gamma-ray observatories. The On-Site Analysis
will be the first CTA scientific analysis of data acquired from the array of
telescopes, in both northern and southern sites. The On-Site Analysis will have
two pipelines: the Level-A pipeline (also known as Real-Time Analysis, RTA) and
the level-B one. The RTA performs data quality monitoring and must be able to
issue automated alerts on variable and transient astrophysical sources within
30 seconds from the last acquired Cherenkov event that contributes to the
alert, with a sensitivity not worse than the one achieved by the final pipeline
by more than a factor of 3. The Level-B Analysis has a better sensitivity (not
be worse than the final one by a factor of 2) and the results should be
available within 10 hours from the acquisition of the data: for this reason
this analysis could be performed at the end of an observation or next morning.
The latency (in particular for the RTA) and the sensitivity requirements are
challenging because of the large data rate, a few GByte/s. The remote
connection to the CTA candidate site with a rather limited network bandwidth
makes the issue of the exported data size extremely critical and prevents any
kind of processing in real-time of the data outside the site of the telescopes.
For these reasons the analysis will be performed on-site with infrastructures
co-located with the telescopes, with limited electrical power availability and
with a reduced possibility of human intervention. This means, for example, that
the on-site hardware infrastructure should have low-power consumption. A
substantial effort towards the optimization of high-throughput computing
service is envisioned to provide hardware and software solutions with
high-throughput, low-power consumption at a low-cost.Comment: In Proceedings of the 34th International Cosmic Ray Conference
(ICRC2015), The Hague, The Netherlands. All CTA contributions at
arXiv:1508.0589
- …