1,012 research outputs found
Recommended from our members
The Graduate Program in Latin American Studies at the University of Texas at Austin
Latin American Studie
Frameless Representation and Manipulation of Image Data
Most image sensors mimic film, integrating light during an exposure interval and then reading the latent image as a complete frame. In contrast, frameless image capture attempts to construct a continuous waveform for each sensel describing how the Ev (exposure value required at each pixel) changes over time. This allows great flexibility in computationally extracting frames after exposure. An overview of how this could be accomplished was presented at EI2014, with an emphasis on frameless sensor technology. In contrast, the current work centers on deriving frameless data from sequences of conventionally captured frames
Improving Cache Performance by Selective Cache Bypass
In traditional cache-based computers, all memory references are made through cache. However, a significant number of items which are referenced in a program are referenced so infrequently that other cache traffic is certain to “bump” these items from cache before they are referenced again. I n such cases, not only is there no benefit in placing the item in cache, but there is the additional overhead of “bumping” some other item out of cache to make room for this useless cache entry. Where a cache line is larger than a processor word, there is an additional penalty in loading the entire line from memory into cache, whereas the reference could have been satisfied with a single word fetch. Simulations have shown that these effects typically degrade cache-based system performance (average reference time) by 10% to 30%. This performance loss is due to cache pollution; by simply forcing “polluting” references to directly reference main memory — bypassing the cache — much of this performance can be regained. The technique proposed in this paper involves the use of new hardware, called a Bypass-Cache, which, under program control, will determine whether each reference should be through the cache or bypassing the cache and referencing main memory directly. Several inexpensive heuristics for the compiler to determine how to make each reference are given
Extending Static Synchronization Beyond SIMD and VLIW
A key advantage of SIMD (Single Instruction stream, Multiple Data stream) architectures is that synchronization is effected statically at compile-time, hence the execution-time cost of synchronization between “processes” is essentially zero. VLIW (Very Long Instruction Word) machines are successful in large part because they preserve this property while providing more flexibility in terms of what kinds of operations can be parallelized. In this paper, we propose a new kind of architecture —- the “static barrier MIMD” or SBM — which can be viewed as a further generalization of the parallel execution abilities of static synchronization machines. Barrier MIMDs are asynchronous Multiple Instruction stream Multiple Data stream architectures capable of parallel execution of loops, subprogram calls, and variable execution- time instructions; however, little or no run-time synchronization is needed. When a group of processors within a barrier MIMD has just encountered a barrier, any conceptual synchronizations between the processors are statically accomplished with zero cost — as in a SIMD or VLIW and using similar compiler technology. Unlike these machines, however, as execution continues the relative timing of processors may become less precisely knowable as a static, compile-time, quantity. Where this imprecision becomes too large, the compiler simply inserts a synchronization barrier to insure that timing imprecision at that point is zero, and again employs purely static, implicit, synchronization. Both the architecture and the supporting compiler technology are discussed in detail
Data Layout Optimization and Code Transformation for Paged Memory Systems
Supercomputers need not only to have fast functional units, but also to have rapid access to massive quantities of data. Virtual memory paging and physically distributed memory systems both attempt to provide this large data space, but performance of a computer system using either memory organization is highly dependent on the page reference pattern and the number of pages available locally. Despite this, surprisingly little work has been done toward using the compiler to optimize memory system performance. In this paper, we introduce compiler techniques which use a combination of data layout and code transformation to improve paging performance for compiled programs. These same techniques can also be applied manually to improve performance using existing compilers
Automatic Parallelization of Database Queries
Although automatic parallelization of conventional language programs is now widely accepted, relatively little emphasis has been placed on automatic parallelization of database query programs (sometimes referred to as “multiple queries” ). In this paper, we discuss the unique problems associated with automatic parallelization of database programs. From this discussion, we derive a complete approach to automatic parallelization of database programs. Beside integrating a number of existing techniques, our approach relies heavily on several new concepts, including the concepts of “algorithm-level” analysis and hybrid static/dynamic scheduling
Compiler-Driven Cache Policy (Known Reference String)
Increasing cache hit-ratios has proved to be instrumental in improving performance of cache-based computers. This is particularly true for computers which have a high cache-miss/cache-hit memory reference delay ratio. Although software policies are often used for main vs. secondary memory caching , the speed required for an implementation of a CPU vs. main memory cache policy has prompted only investigation of policies which can be implemented directly in hardware. Based on compile-time analysis, it is possible to predict program behavior, thereby increasing the hit-ratio beyond the capability of pure run-time (hardware) techniques. In this report, compiler-driven techniques for this kind of cache policy are described. The SCP Model (software cache policy model) provides an optimal cache prefetch and placement/replacement policy when given an arbitrary memory reference string. In addition to suggesting a simplified cache hardware model, the SCP Model can be applied to various cache organizations such as direct mapping, set associative, and full associative. Analytic results demonstrate significant improvements in cache performance. The current work discusses an optimal cache policy which applies where the string of references is known at compile time. However, this constraint can be relaxed to encompass reference strings which are known only statistically, i.e., reference strings in which data aliases make the target of some references ambiguous. Companion reports, currently in preparation, detail the extension of the SCP Model to incorporate aliases, code incorporating loops, and conditional branches
Algorithm Choice For Multiple-Query Evaluation
Traditional query optimization concentrates on the optimization of the execution of each individual query. More recently, it has been observed that by considering a sequence of multiple queries some additional high-level optimizations can be performed. Once these optimizations have been performed, each operation is translated into executable code. The fundamental insight in this paper is that significant improvements can be gained by careful choice of the algorithm to be used for each operation. This choice is not merely based on efficiency of algorithms for individual operations, but rather on the efficiency of the algorithm choices for the entire multiple-query evaluation. An efficient procedure for automatically optimizing these algorithm choices is given
- …