75,454 research outputs found
Computation of incompressible viscous flows through turbopump components
Flow through pump components, such as an inducer and an impeller, is efficiently simulated by solving the incompressible Navier-Stokes equations. The solution method is based on the pseudocompressibility approach and uses an implicit-upwind differencing scheme together with the Gauss-Seidel line relaxation method. the equations are solved in steadily rotating reference frames and the centrifugal force and the Coriolis force are added to the equation of motion. Current computations use a one-equation Baldwin-Barth turbulence model which is derived from a simplified form of the standard k-epsilon model equations. The resulting computer code is applied to the flow analysis inside a generic rocket engine pump inducer, a fuel pump impeller, and SSME high pressure fuel turbopump impeller. Numerical results of inducer flow are compared with experimental measurements. In the fuel pump impeller, the effect of downstream boundary conditions is investigated. Flow analyses at 80 percent, 100 percent, and 120 percent of design conditions are presented
Enlarging instruction streams
The stream fetch engine is a high-performance fetch architecture based on the concept of an instruction stream. We call a sequence of instructions from the target of a taken branch to the next taken branch, potentially containing multiple basic blocks, a stream. The long length of instruction streams makes it possible for the stream fetch engine to provide a high fetch bandwidth and to hide the branch predictor access latency, leading to performance results close to a trace cache at a lower implementation cost and complexity. Therefore, enlarging instruction streams is an excellent way to improve the stream fetch engine. In this paper, we present several hardware and software mechanisms focused on enlarging those streams that finalize at particular branch types. However, our results point out that focusing on particular branch types is not a good strategy due to Amdahl's law. Consequently, we propose the multiple-stream predictor, a novel mechanism that deals with all branch types by combining single streams into long virtual streams. This proposal tolerates the prediction table access latency without requiring the complexity caused by additional hardware mechanisms like prediction overriding. Moreover, it provides high-performance results which are comparable to state-of-the-art fetch architectures but with a simpler design that consumes less energy.Peer ReviewedPostprint (published version
A Case Study in Matching Service Descriptions to Implementations in an Existing System
A number of companies are trying to migrate large monolithic software systems
to Service Oriented Architectures. A common approach to do this is to first
identify and describe desired services (i.e., create a model), and then to
locate portions of code within the existing system that implement the described
services. In this paper we describe a detailed case study we undertook to match
a model to an open-source business application. We describe the systematic
methodology we used, the results of the exercise, as well as several
observations that throw light on the nature of this problem. We also suggest
and validate heuristics that are likely to be useful in partially automating
the process of matching service descriptions to implementations.Comment: 20 pages, 19 pdf figure
Pipeline Implementations of Neumann-Neumann and Dirichlet-Neumann Waveform Relaxation Methods
This paper is concerned with the reformulation of Neumann-Neumann Waveform
Relaxation (NNWR) methods and Dirichlet-Neumann Waveform Relaxation (DNWR)
methods, a family of parallel space-time approaches to solving time-dependent
PDEs. By changing the order of the operations, pipeline-parallel computation of
the waveform iterates are possible without changing the final solution. The
parallel efficiency and the increased communication cost of the pipeline
implementation is presented, along with weak scaling studies to show the
effectiveness of the pipeline NNWR and DNWR algorithms.Comment: 20 pages, 8 figure
Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels
Recent NVIDIA Graphics Processing Units (GPUs) can execute multiple kernels
concurrently. On these GPUs, the thread block scheduler (TBS) uses the FIFO
policy to schedule their thread blocks. We show that FIFO leaves performance to
chance, resulting in significant loss of performance and fairness. To improve
performance and fairness, we propose use of the preemptive Shortest Remaining
Time First (SRTF) policy instead. Although SRTF requires an estimate of runtime
of GPU kernels, we show that such an estimate of the runtime can be easily
obtained using online profiling and exploiting a simple observation on GPU
kernels' grid structure. Specifically, we propose a novel Structural Runtime
Predictor. Using a simple Staircase model of GPU kernel execution, we show that
the runtime of a kernel can be predicted by profiling only the first few thread
blocks. We evaluate an online predictor based on this model on benchmarks from
ERCBench, and find that it can estimate the actual runtime reasonably well
after the execution of only a single thread block. Next, we design a thread
block scheduler that is both concurrent kernel-aware and uses this predictor.
We implement the SRTF policy and evaluate it on two-program workloads from
ERCBench. SRTF improves STP by 1.18x and ANTT by 2.25x over FIFO. When compared
to MPMax, a state-of-the-art resource allocation policy for concurrent kernels,
SRTF improves STP by 1.16x and ANTT by 1.3x. To improve fairness, we also
propose SRTF/Adaptive which controls resource usage of concurrently executing
kernels to maximize fairness. SRTF/Adaptive improves STP by 1.12x, ANTT by
2.23x and Fairness by 2.95x compared to FIFO. Overall, our implementation of
SRTF achieves system throughput to within 12.64% of Shortest Job First (SJF, an
oracle optimal scheduling policy), bridging 49% of the gap between FIFO and
SJF.Comment: 14 pages, full pre-review version of PACT 2014 poste
- …