15 research outputs found

    Freeze-thaw Resistance of an Alluvial Soil Stabilized with EcoSand and Asbestos-free Fiber Powder

    Get PDF
    Stabilization of poor soils subjected to large daily temperature variations requires careful selection of suitable stabilizer for improvement of such soils. This study investigated the freeze-thaw resistance of an alluvial soil stabilized with EcoSand and asbestos-free fiber powder (AFP). Physical and mechanical properties of the soil were determined. The soil sample was stabilized with 5 variants of equal mixtures of the EcoSand and AFP in proportions of 2, 4, 6, 8 and 10%, with 1% sodium silicate and 1% fly ash, by weight of the soil. UCS tests were conducted before and after three freeze-thaw cycles, while keeping the sample at 0ÂșC for 8 hours and later at 30ÂșC for 8 hours for each cycle. It was found that the 8% EcoSand + AFP with 1% sodium silicate and 1% fly ash content provided an optimized increase of the freeze-thaw resistance of the soil. The use of a mixture of EcoSand and AFP as a soil stabilizer for regions of the world experiencing large temperature variation has the potential to improve the resistance of sand to freezing and thawing

    Tuning Strassen's Matrix Multiplication for Memory Efficiency

    No full text
    Strassen's algorithm for matrix multiplication gains its lower arithmetic complexity at the expense of reduced locality of reference, which makes it challenging to implement the algorithm efficiently on a modern machine with a hierarchical memory system. We report on an implementation of this algorithm that uses several unconventional techniques to make the algorithm memory-friendly. First, the algorithm internally uses a non-standard array layout known as Morton order that is based on a quad-tree decomposition of the matrix. Second, we dynamically select the recursion truncation point to minimize padding without affecting the performance of the algorithm, which we can do by virtue of the cache behavior of the Morton ordering. Each technique is critical for performance, and their combination as done in our code multiplies their effectiveness. Performance comparisons of our implementation with that of competing implementations show that our implementation often outperforms th..

    SelfTuned Congestion Control for Multiprocessor Networks

    No full text
    Network performance in tightly-coupled multiprocessors typically degrades rapidly beyond network saturation. Consequently, designers must keep a network below its saturation point by reducing the load on the network. Congestion control via source throttling—a common technique to reduce the network load—prevents new packets from entering the network in the presence of congestion. Unfortunately, prior schemes to implement source throttling either lack vital global information about the network to make the correct decision (whether to throttle or not) or depend on specific network parameters, network topology, or communication patterns. This paper presents a global-knowledge-based, selftuned, congestion control technique that prevents saturation at high loads across different network configurations and communication patterns. Our design is composed of two key components. First, we use global information about a network to obtain a timely estimate of network congestion. We compare this estimate to a threshold value to determine when to throttle packet injection. The second component is a self-tuning mechanism that automatically determines appropriate threshold values based on throughput feedback. A combination of these two techniques provides high performance under heavy load, does not penalize performance under light load, and gracefully adapts to changes in communication patterns.

    Self-Tuned Congestion Control for Multiprocessor Networks

    No full text
    Network performance in tightly-coupled multiprocessors typically degrades rapidly beyond network saturation. Consequently, designers must keep a network below its saturation point by reducing the load on the network. Congestion control via source throttling---a common technique to reduce the network load---prevents new packets from entering the network in the presence of congestion. Unfortunately, prior schemes to implement source throttling either lack vital global information about the network to make the correct decision (whether to throttle or not) or depend on specific network parameters, network topology, or communication patterns

    Recursive Array Layouts and Fast Parallel Matrix Multiplication

    No full text
    Matrix multiplication is an important kernel in linear algebra algorithms, and the performance of both serial and parallel implementations is highly dependent on the memory system behavior. Unfortunately, due to false sharing and cache conflicts, traditional column-major or row-major array layouts incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts for improving the performance of parallel recursive matrix multiplication algorithms. We extend previous work by Frens and Wise on recursive matrix multiplication to examine several recursive array layouts and three recursive algorithms: standard matrix multiplication, and the more complex algorithms of Strassen and Winograd. We show that while recursive array layouts significantly outperform traditional layouts (reducing execution times by a factor of 1.2--2.5) for the standard algorithm, they offer little improvement for Strassen's and Winograd's algorithms;..

    BLAM: A High-Performance Routing Algorithm for Virtual Cut-Through Networks

    No full text
    High performance, freedom from deadlocks, and freedom from livelocks are desirable properties of interconnection networks. Unfortunately, these can be conflicting goals because networks may either devote or under-utilize resources to avoid deadlocks and livelocks. These resources could otherwise be used to improve performance. For example, a minimal adaptive routing algorithm may forgo some routing options to ensure livelock-freedom but this hurts performance at high loads. In contrast, Chaotic routing achieves higher performance as it allows full-routing flexibility including misroutes (hops that take a packet farther from its destination) and it is deadlock-free. Unfortunately, Chaotic routing only provides probabilistic guarantees of livelock-freedom

    Recursive Array Layouts and Fast Matrix Multiplication

    No full text
    The performance of both serial and parallel implementations of matrix multiplication is highly sensitive to memory system behavior. False sharing and cache conflicts cause traditional column-major or row-major array layouts to incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts to improve performance and reduce variability. Previous work on recursive matrix multiplication is extended to examine several recursive array layouts and three recursive algorithms: standard matrix multiplication, and the more complex algorithms of Strassen and Winograd. While recursive layouts significantly outperform traditional layouts (reducing execution times by a factor of 1.2--2.5) for the standard algorithm, they offer little improvement for Strassen's and Winograd's algorithms. For a purely sequential implementation, it is possible to reorder computation to conserve memory space and improve performance between ..

    Nonlinear Array Layouts for Hierarchical Memory Systems

    No full text
    Programming languages that provide multidimensional arrays and a flat linear model of memory must implement a mapping between these two domains to order array elements in memory. This layout function is fixed at language definition time and constitutes an invisible, non-programmable array attribute. In reality, modern memory systems are architecturally hierarchical rather than flat, with substantial differences in performance among different levels of the hierarchy. This mismatch between the model and the true architecture of memory systems can result in low locality of reference and poor performance. Some of this loss in performance can be recovered by re-ordering computations using transformations such as loop tiling. We explore nonlinear array layout functions as an additional means of improving locality of reference. For a benchmark suite composed of dense matrix kernels, we show by timing and simulation that two specific layouts (4D and Morton) have low implementation costs (2--5%..
    corecore