75,454 research outputs found

    Computation of incompressible viscous flows through turbopump components

    Get PDF
    Flow through pump components, such as an inducer and an impeller, is efficiently simulated by solving the incompressible Navier-Stokes equations. The solution method is based on the pseudocompressibility approach and uses an implicit-upwind differencing scheme together with the Gauss-Seidel line relaxation method. the equations are solved in steadily rotating reference frames and the centrifugal force and the Coriolis force are added to the equation of motion. Current computations use a one-equation Baldwin-Barth turbulence model which is derived from a simplified form of the standard k-epsilon model equations. The resulting computer code is applied to the flow analysis inside a generic rocket engine pump inducer, a fuel pump impeller, and SSME high pressure fuel turbopump impeller. Numerical results of inducer flow are compared with experimental measurements. In the fuel pump impeller, the effect of downstream boundary conditions is investigated. Flow analyses at 80 percent, 100 percent, and 120 percent of design conditions are presented

    Enlarging instruction streams

    Get PDF
    The stream fetch engine is a high-performance fetch architecture based on the concept of an instruction stream. We call a sequence of instructions from the target of a taken branch to the next taken branch, potentially containing multiple basic blocks, a stream. The long length of instruction streams makes it possible for the stream fetch engine to provide a high fetch bandwidth and to hide the branch predictor access latency, leading to performance results close to a trace cache at a lower implementation cost and complexity. Therefore, enlarging instruction streams is an excellent way to improve the stream fetch engine. In this paper, we present several hardware and software mechanisms focused on enlarging those streams that finalize at particular branch types. However, our results point out that focusing on particular branch types is not a good strategy due to Amdahl's law. Consequently, we propose the multiple-stream predictor, a novel mechanism that deals with all branch types by combining single streams into long virtual streams. This proposal tolerates the prediction table access latency without requiring the complexity caused by additional hardware mechanisms like prediction overriding. Moreover, it provides high-performance results which are comparable to state-of-the-art fetch architectures but with a simpler design that consumes less energy.Peer ReviewedPostprint (published version

    A Case Study in Matching Service Descriptions to Implementations in an Existing System

    Full text link
    A number of companies are trying to migrate large monolithic software systems to Service Oriented Architectures. A common approach to do this is to first identify and describe desired services (i.e., create a model), and then to locate portions of code within the existing system that implement the described services. In this paper we describe a detailed case study we undertook to match a model to an open-source business application. We describe the systematic methodology we used, the results of the exercise, as well as several observations that throw light on the nature of this problem. We also suggest and validate heuristics that are likely to be useful in partially automating the process of matching service descriptions to implementations.Comment: 20 pages, 19 pdf figure

    Pipeline Implementations of Neumann-Neumann and Dirichlet-Neumann Waveform Relaxation Methods

    Full text link
    This paper is concerned with the reformulation of Neumann-Neumann Waveform Relaxation (NNWR) methods and Dirichlet-Neumann Waveform Relaxation (DNWR) methods, a family of parallel space-time approaches to solving time-dependent PDEs. By changing the order of the operations, pipeline-parallel computation of the waveform iterates are possible without changing the final solution. The parallel efficiency and the increased communication cost of the pipeline implementation is presented, along with weak scaling studies to show the effectiveness of the pipeline NNWR and DNWR algorithms.Comment: 20 pages, 8 figure

    Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels

    Full text link
    Recent NVIDIA Graphics Processing Units (GPUs) can execute multiple kernels concurrently. On these GPUs, the thread block scheduler (TBS) uses the FIFO policy to schedule their thread blocks. We show that FIFO leaves performance to chance, resulting in significant loss of performance and fairness. To improve performance and fairness, we propose use of the preemptive Shortest Remaining Time First (SRTF) policy instead. Although SRTF requires an estimate of runtime of GPU kernels, we show that such an estimate of the runtime can be easily obtained using online profiling and exploiting a simple observation on GPU kernels' grid structure. Specifically, we propose a novel Structural Runtime Predictor. Using a simple Staircase model of GPU kernel execution, we show that the runtime of a kernel can be predicted by profiling only the first few thread blocks. We evaluate an online predictor based on this model on benchmarks from ERCBench, and find that it can estimate the actual runtime reasonably well after the execution of only a single thread block. Next, we design a thread block scheduler that is both concurrent kernel-aware and uses this predictor. We implement the SRTF policy and evaluate it on two-program workloads from ERCBench. SRTF improves STP by 1.18x and ANTT by 2.25x over FIFO. When compared to MPMax, a state-of-the-art resource allocation policy for concurrent kernels, SRTF improves STP by 1.16x and ANTT by 1.3x. To improve fairness, we also propose SRTF/Adaptive which controls resource usage of concurrently executing kernels to maximize fairness. SRTF/Adaptive improves STP by 1.12x, ANTT by 2.23x and Fairness by 2.95x compared to FIFO. Overall, our implementation of SRTF achieves system throughput to within 12.64% of Shortest Job First (SJF, an oracle optimal scheduling policy), bridging 49% of the gap between FIFO and SJF.Comment: 14 pages, full pre-review version of PACT 2014 poste
    corecore