21 research outputs found

    Discreet element modeling of under sleeper pads using a box test

    Get PDF
    It has recently been reported that under sleeper pads (USPs) could improve ballasted rail track by decreasing the sleeper settlement and reducing particle breakage. In order to find out what happens at the particle-pad interface, discrete element modelling (DEM) is used to provide micro mechanical insight. The same positive effects of USP are found in the DEM simulations. The evidence provided by DEM shows that application of a USP allows more particles to be in contact with the pad, and causes these particles to transfer a larger lateral load to the adjacent ballast but a smaller vertical load beneath the sleeper. This could be used to explain why the USP helps to reduce the track settlement. In terms of particle breakage, it is found that most breakage occurs at the particle-sleeper interface and along the main contact force chains between particles under the sleeper. The use of USPs could effectively reduce particle abrasion that occurs in both of these regions

    Pricing Python Parallelism: A Dynamic Language Cost Model for Heterogeneous Platforms

    Get PDF
    Execution times may be reduced by offloading parallel loop nests to a GPU. Auto-parallelizing compilers are common for static languages, often using a cost model to determine when the GPU execution speed will outweigh the offload overheads. Nowadays scientific software is increasingly written in dynamic languages and would benefit from compute accelerators. The ALPyNA framework analyses moderately complex Python loop nests and automatically JIT compiles code for heterogeneous CPU and GPU architectures. We present the first analytical cost model for auto-parallelizing loop nests in a dynamic language on heterogeneous architectures. Predicting execution time in a language like Python is extremely challenging, since aspects like the element types, size of the iteration space, and amenability to parallelization can only be determined at runtime. Hence the cost model must be both staged, to combine compile and run-time information, and lightweight to minimize runtime overhead. GPU execution time prediction must account for factors like data transfer, block-structured execution, and starvation. We show that a comparatively simple, staged analytical model can accurately determine during execution when it is profitable to offload a loop nest. We evaluate our model on three heterogeneous platforms across 360 experiments with 12 loop-intensive Python benchmark programs. The results show small misprediction intervals and a mean slowdown of just 13.6%, relative to the optimal (oracular) offload strategy

    Model-Driven Tile Size Selection for DOACROSS Loops on GPUs

    No full text
    Abstract. DOALL loops are tiled to exploit DOALL parallelism and data locality on GPUs. In contrast, due to loop-carried dependences, DOACROSS loops must be skewed first in order to make tiling legal and exploit wavefront parallelism across the tiles and within a tile. Thus, tile size selection, which is performance-critical, becomes more complex for DOACROSS loops than DOALL loops on GPUs. This paper presents a model-driven approach to automating this process. Validation using 1D, 2D and 3D SOR solvers shows that our framework can find the tile sizes for these representative DOACROSS loops to achieve performances close to the best observed for a range of problem sizes tested.

    A Throughput-Aware Analytical Performance Model for GPU Applications

    No full text
    corecore