Search CORE

15 research outputs found

Freeze-thaw Resistance of an Alluvial Soil Stabilized with EcoSand and Asbestos-free Fiber Powder

Author: Akinwumi I. I.
Arun M.
Ashwin K.R.N.
Gobinath R
Krishnaraj T.
Mithuna R.
Sasikumar S.
Publication venue
Publication date: 01/01/2017
Field of study

Stabilization of poor soils subjected to large daily temperature variations requires careful selection of suitable stabilizer for improvement of such soils. This study investigated the freeze-thaw resistance of an alluvial soil stabilized with EcoSand and asbestos-free fiber powder (AFP). Physical and mechanical properties of the soil were determined. The soil sample was stabilized with 5 variants of equal mixtures of the EcoSand and AFP in proportions of 2, 4, 6, 8 and 10%, with 1% sodium silicate and 1% fly ash, by weight of the soil. UCS tests were conducted before and after three freeze-thaw cycles, while keeping the sample at 0ºC for 8 hours and later at 30ºC for 8 hours for each cycle. It was found that the 8% EcoSand + AFP with 1% sodium silicate and 1% fly ash content provided an optimized increase of the freeze-thaw resistance of the soil. The use of a mixture of EcoSand and AFP as a soil stabilizer for regions of the world experiencing large temperature variation has the potential to improve the resistance of sand to freezing and thawing

Covenant University Repository

Tuning Strassen's Matrix Multiplication for Memory Efficiency

Author: Alvin R. Lebeck
Mithuna Thottethodi
Siddhartha Chatterjee
Publication venue
Publication date
Field of study

Strassen's algorithm for matrix multiplication gains its lower arithmetic complexity at the expense of reduced locality of reference, which makes it challenging to implement the algorithm efficiently on a modern machine with a hierarchical memory system. We report on an implementation of this algorithm that uses several unconventional techniques to make the algorithm memory-friendly. First, the algorithm internally uses a non-standard array layout known as Morton order that is based on a quad-tree decomposition of the matrix. Second, we dynamically select the recursion truncation point to minimize padding without affecting the performance of the algorithm, which we can do by virtue of the cache behavior of the Morton ordering. Each technique is critical for performance, and their combination as done in our code multiplies their effectiveness. Performance comparisons of our implementation with that of competing implementations show that our implementation often outperforms th..

CiteSeerX

SelfTuned Congestion Control for Multiprocessor Networks

Author: Alvin R. Lebeck
Mithuna Thottethodi
Shubhendu S. Mukherjee
Publication venue
Publication date
Field of study

Network performance in tightly-coupled multiprocessors typically degrades rapidly beyond network saturation. Consequently, designers must keep a network below its saturation point by reducing the load on the network. Congestion control via source throttling—a common technique to reduce the network load—prevents new packets from entering the network in the presence of congestion. Unfortunately, prior schemes to implement source throttling either lack vital global information about the network to make the correct decision (whether to throttle or not) or depend on specific network parameters, network topology, or communication patterns. This paper presents a global-knowledge-based, selftuned, congestion control technique that prevents saturation at high loads across different network configurations and communication patterns. Our design is composed of two key components. First, we use global information about a network to obtain a timely estimate of network congestion. We compare this estimate to a threshold value to determine when to throttle packet injection. The second component is a self-tuning mechanism that automatically determines appropriate threshold values based on throughput feedback. A combination of these two techniques provides high performance under heavy load, does not penalize performance under light load, and gracefully adapts to changes in communication patterns.

CiteSeerX

Self-Tuned Congestion Control for Multiprocessor Networks

Author: Alvin R. Lebeck
Alvin R. Lebeck
M. Thottethodi
Mithuna Thottethodi
Mithuna Thottethodi
Shrewsbury Ma
Shubhendu S. Mukherjee
Shubhendu S. Mukherjee
Publication venue
Publication date
Field of study

CiteSeerX

Recursive Array Layouts and Fast Parallel Matrix Multiplication

Author: Alvin R. Lebeck
Mithuna Thottethodi
Praveen K. Patnala
Siddhartha Chatterjee
Publication venue
Publication date: 01/01/1999
Field of study

Matrix multiplication is an important kernel in linear algebra algorithms, and the performance of both serial and parallel implementations is highly dependent on the memory system behavior. Unfortunately, due to false sharing and cache conflicts, traditional column-major or row-major array layouts incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts for improving the performance of parallel recursive matrix multiplication algorithms. We extend previous work by Frens and Wise on recursive matrix multiplication to examine several recursive array layouts and three recursive algorithms: standard matrix multiplication, and the more complex algorithms of Strassen and Winograd. We show that while recursive array layouts significantly outperform traditional layouts (reducing execution times by a factor of 1.2--2.5) for the standard algorithm, they offer little improvement for Strassen's and Winograd's algorithms;..

CiteSeerX

Crossref

BLAM: A High-Performance Routing Algorithm for Virtual Cut-Through Networks

Author: Alvin R. Lebeck
Mithuna Thottethodi
Shubhendu S. Mukherjee
Publication venue
Publication date
Field of study

High performance, freedom from deadlocks, and freedom from livelocks are desirable properties of interconnection networks. Unfortunately, these can be conflicting goals because networks may either devote or under-utilize resources to avoid deadlocks and livelocks. These resources could otherwise be used to improve performance. For example, a minimal adaptive routing algorithm may forgo some routing options to ensure livelock-freedom but this hurts performance at high loads. In contrast, Chaotic routing achieves higher performance as it allows full-routing flexibility including misroutes (hops that take a packet farther from its destination) and it is deadlock-free. Unfortunately, Chaotic routing only provides probabilistic guarantees of livelock-freedom

CiteSeerX

Recursive Array Layouts and Fast Matrix Multiplication

Author: Alvin R. Lebeck
Mithuna Thottethodi
Praveen K. Patnala
Siddhartha Chatterjee
Publication venue
Publication date
Field of study

The performance of both serial and parallel implementations of matrix multiplication is highly sensitive to memory system behavior. False sharing and cache conflicts cause traditional column-major or row-major array layouts to incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts to improve performance and reduce variability. Previous work on recursive matrix multiplication is extended to examine several recursive array layouts and three recursive algorithms: standard matrix multiplication, and the more complex algorithms of Strassen and Winograd. While recursive layouts significantly outperform traditional layouts (reducing execution times by a factor of 1.2--2.5) for the standard algorithm, they offer little improvement for Strassen's and Winograd's algorithms. For a purely sequential implementation, it is possible to reorder computation to conserve memory space and improve performance between ..

CiteSeerX

Nonlinear Array Layouts for Hierarchical Memory Systems

Author: Alvin R. Lebeck
Mithuna Thottethodi
Shyam Mundhra
Siddhartha Chatterjee
Vibhor V. Jain
Publication venue
Publication date: 01/01/1999
Field of study

Programming languages that provide multidimensional arrays and a flat linear model of memory must implement a mapping between these two domains to order array elements in memory. This layout function is fixed at language definition time and constitutes an invisible, non-programmable array attribute. In reality, modern memory systems are architecturally hierarchical rather than flat, with substantial differences in performance among different levels of the hierarchy. This mismatch between the model and the true architecture of memory systems can result in low locality of reference and poor performance. Some of this loss in performance can be recovered by re-ordering computations using transformations such as loop tiling. We explore nonlinear array layout functions as an additional means of improving locality of reference. For a benchmark suite composed of dense matrix kernels, we show by timing and simulation that two specific layouts (4D and Morton) have low implementation costs (2--5%..

CiteSeerX

Crossref