7 research outputs found
Recommended from our members
Minimization of memory and network contention for accessing arbitrary data patterns in SIMD systems
Non-uniform memory and network access is a major source of performance degradation in SIMD supercomputers. We investigate the problem oà finding general XOR-schemes to minimize memory conflicts and network contention for accessing arrays with arbitrary data templates, defined by template bases.The XOR-matrix is defined so that each column corresponds to a distinct vector in the union of all templates bases. A restriction of the XOR-matrix to a given template is formed by concatenation oà the columns corresponding to the template basis. We prove that a necessary and sufficient condition for conflict-free and network-contention-free access for the Baseline network is that certain sub-matrices of every template's restricted matrix be non-singular. A new characterization of the baseline network and XOR-matrices is proposed. Finding an XOR-matrix far accessing arbitrary templates is proved to be an NP-complete problem.To minimize memory and network contention, a heuristic algorithm is proposed for finding XOR-matrices. The algorithm determines successive rows, from the bottom up. Given the previous row, the algorithm determines: 1) the constraints required by each template's restricted matrix 2) and the row solution by solving a set of simultaneous equations. To avoid backtracking, a randomized approach is used. The time complexity of the heuristic is O(tpn^2), where t, 2^P, and n, are the number of templates, the number of processors, and the number of distinct vectors of template bases, respectively. Evaluation shows that the proposed XOR-schemes significantly reduce memory and network contention compared to interleaving and XOR-schemes that are optimized for a set of static reference reference templates
Recommended from our members
A cost-effective heuristic storage for minimizing access time of arbitrary data templates
The serialization of memory accesses is a major limiting factor in high performance SIMD computers. For these machines, the data templates that are accessed by a program can be perceived. by the compiler, and therefore, the design of conflict-free storage schemes may dramatically improve performance.The problem of finding storage schemes, with minimum hardware requirements, for accessing a set of arbitrary templates is proved to be NP-complete. To design cost-effective storage schemes, we introduce two parameters: the number of 1's in the storage matrix (affecting hardware complexity) and the access frequency of each template. Heuristics are proposed to find storage schemes with minimum hardware (Perfect Schemes) but without enforcing a high degree of conflict reduction. Another heuristic is proposed to augment perfect storage schemes by using minimum additional hardware in order to reduce the degree of conflict (Semi-Perfect Schemes).Experimental evaluation is carried out using a Monte Carlo simulation. Performance of the proposed heuristics is compared to solutions obtained using branch-and-bound search. Results show that perfect-schemes may deviate on the average by 20% from the optimum access time in the case of 10 arbitrary templates and 16 memories. However, semi-perfect schemes lead to dramatic reduction of the degree of conflict compared to perfect-schemes. The proposed heuristic storage outperforms row-major interleaving and row-column-diagonals storage. The time complexity of the proposed heuristics is O(p(t + n) + n^2t), where t, 2^P, and n, are the number of templates, the number of processors, and the number of distinct vectors of the template bases, respectively
Recommended from our members
A cost-effective heuristic storage for minimizing access time of arbitrary data templates
The serialization of memory accesses is a major limiting factor in high performance SIMD computers. For these machines, the data templates that are accessed by a program can be perceived. by the compiler, and therefore, the design of conflict-free storage schemes may dramatically improve performance.The problem of finding storage schemes, with minimum hardware requirements, for accessing a set of arbitrary templates is proved to be NP-complete. To design cost-effective storage schemes, we introduce two parameters: the number of 1's in the storage matrix (affecting hardware complexity) and the access frequency of each template. Heuristics are proposed to find storage schemes with minimum hardware (Perfect Schemes) but without enforcing a high degree of conflict reduction. Another heuristic is proposed to augment perfect storage schemes by using minimum additional hardware in order to reduce the degree of conflict (Semi-Perfect Schemes).Experimental evaluation is carried out using a Monte Carlo simulation. Performance of the proposed heuristics is compared to solutions obtained using branch-and-bound search. Results show that perfect-schemes may deviate on the average by 20% from the optimum access time in the case of 10 arbitrary templates and 16 memories. However, semi-perfect schemes lead to dramatic reduction of the degree of conflict compared to perfect-schemes. The proposed heuristic storage outperforms row-major interleaving and row-column-diagonals storage. The time complexity of the proposed heuristics is O(p(t + n) + n^2t), where t, 2^P, and n, are the number of templates, the number of processors, and the number of distinct vectors of the template bases, respectively