Search CORE

3 research outputs found

Simulation Of Multi-core Systems And Interconnections And Evaluation Of Fat-Mesh Networks

Author: Zhang Yu
Publication venue
Publication date: 28/01/2009
Field of study

Simulators are very important in computer architecture research as they enable the exploration of new architectures to obtain detailed performance evaluation without building costly physical hardware. Simulation is even more critical to study future many-core architectures as it provides the opportunity to assess currently non-existing computer systems. In this thesis, a multiprocessor simulator is presented based on a cycle accurate architecture simulator called SESC. The shared L2 cache system is extended into a distributed shared cache (DSC) with a directory-based cache coherency protocol. A mesh network module is extended and integrated into SESC to replace the bus for scalable inter-processor communication. While these efforts complete an extended multiprocessor simulation infrastructure, two interconnection enhancements are proposed and evaluated. A novel non-uniform fat-mesh network structure similar to the idea of fat-tree is proposed. This non-uniform mesh network takes advantage of the average traffic pattern, typically all-to-all in DSC, to dedicate additional links for connections with heavy traffic (e.g., near the center) and fewer links for lighter traffic (e.g., near the periphery). Two fat-mesh schemes are implemented based on different routing algorithms. Analytical fat-mesh models are constructed by presenting the expressions for the traffic requirements of personalized all-to-all traffic. Performance improvements over the uniform mesh are demonstrated in the results from the simulator. A hybrid network consisting of one packet switching plane and multiple circuit switching planes is constructed as the second enhancement. The circuit switching planes provide fast paths between neighbors with heavy communication traffic. A compiler technique that abstracts the symbolic expressions of benchmarks' communication patterns can be used to help facilitate the circuit establishment

D-Scholarship@Pitt

Time- and VLSI-optimal sorting on enhanced meshes

Author: Bhagavathi D.
Gurla H.
Olariu S.
Schwing James L.
Wilson L.
Zhang Jingyuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/1998
Field of study

Sorting is a fundamental problem with applications in all areas of computer science and engineering. In this work, we address the problem of sorting on mesh connected computers enhanced by endowing each row and each column with its own dedicated high-speed bus. This architecture, commonly referred to as a mesh with multiple broadcasting, is commercially available and has been adopted by the DAP family of multiprocessors. Somewhat surprisingly, the problem of sorting m, (m/spl les/n), elements on a mesh with multiple broadcasting of size /spl radic/n/spl times//spl radic/n has been studied, thus far, only in the sparse case, where m/spl isin//spl Theta/(/spl radic/n) and in the dense case, where m/spl isin//spl Theta/O(/spl radic/n). Yet, many applications require using an existing platform of size /spl radic/n/spl times//spl radic/n for sorting m elements, with /spl radic/

Crossref

ScholarWorks at Central Washington University

Time- and VLSI-optimal Sorting on Enhanced Meshes

Author: D. Bhagavathi
H. Gurl
J. L. Schwing
J. Zhang
L. Wilson
S. Olariu
Publication venue
Publication date
Field of study

Sorting is a fundamental problem with applications in all areas of computer science and engineering. In this work we address the problem of sorting on mesh connected computers enhanced by endowing each row and each column with its own dedicated high-speed bus. This architecture, commonly referred to as a mesh with multiple broadcasting, is commercially available and has been adopted by the DAP family of multiprocessors. Somewhat surprisingly, the problem of sorting m, (m n), elements on a mesh with multiple broadcasting of size p n \Theta p n has been studied, thus far, only in the sparse case, where m 2 \Theta( p n) and in the dense case, where m 2 \Theta(n). Yet, many applications require using an existing platform of size p n \Theta p n for sorting m elements, with p n ! m n. Our main contribution is to present the first known adaptive time- and VLSI-optimal sorting algorithm for meshes with multiple broadcasting. Specifically, we show that for every choice of a constan..

CiteSeerX