3 research outputs found

    An efficient parallelization of a real scientific application

    Get PDF
    Bibliography: leaves 137-145.In the past decade the cost of computing has come down considerably making high-powered computing more easily affordable. As a result many institutions and organisations now have networks of high-powered workstations. Such networks provide a large, generally untapped, source of computing power which can be used for running large scientific applications which previously could only be run on supercomputers. This dissertation shows that a substantial improvement in performance can be achieved by the parallelization of a real scientific application for a heterogeneous network of Sun and Silicon Graphics workstations connected by an Ethernet network, but that this is affected by a number of factors. These factors include communication delays, load balancing, and the number of slaves used. This dissertation shows that performance can be improved by sending more, shorter messages, and by overlapping communication with computation. Part of this thesis concerns the difficulties involved in the evaluation of parallel performance on a heterogeneous network. This dissertation shows that conventional methods such as speedup and efficiency are not appropriate for evaluating the performance of a heterogeneous system, and that linear speed gives a much more representative indication of the actual performance achieved. We also proposed new concepts of perfect linear speed and linear efficiency, which help to evaluate the improvement in parallel performance on a heterogeneous system

    Domain Specific Computing in Tightly-Coupled Heterogeneous Systems

    Get PDF
    Over the past several decades, researchers and programmers across many disciplines have relied on Moores law and Dennard scaling for increases in compute capability in modern processors. However, recent data suggest that the number of transistors per square inch on integrated circuits is losing pace with Moores laws projection due to the breakdown of Dennard scaling at smaller semiconductor process nodes. This has signaled the beginning of a new “golden age in computer architecture” in which the paradigm will be shifted from improving traditional processor performance for general tasks to architecting hardware that executes a class of applications in a high-performing manner. This shift will be paved, in part, by making compute systems more heterogeneous and investigating domain specific architectures. However, the notion of domain specific architectures raises many research questions. Specifically, what constitutes a domain? How does one architect hardware for a specific domain? In this dissertation, we present our work towards domain specific computing. We start by constructing a guiding definition for our target domain and then creating a benchmark suite of applications based on our domain definition. We then use quantitative metrics from the literature to characterize our domain in order to gain insights regarding what would be most beneficial in hardware targeted specifically for the domain. From the characterization, we learn that data movement is a particularly salient aspect of our domain. Motivated by this fact, we evaluate our target platform, the Intel HARPv2 CPU+FPGA system, for architecting domain specific hardware through a portability and performance evaluation. To guide the creation of domain specific hardware for this platform, we create a novel tool to quantify spatial and temporal locality. We apply this tool to our benchmark suite and use the generated outputs as features to an unsupervised clustering algorithm. We posit that the resulting clusters represent sub-domains within our originally specified domain; specifically, these clusters inform whether a kernel of computation should be designed as a widely vectorized or deeply pipelined compute unit. Using the lessons learned from the domain characterization and hardware platform evaluation, we outline our process of designing hardware for our domain, and empirically verify that our prediction regarding a wide or deep kernel implementation is correct

    The development of computer science a sociocultural perspective

    Get PDF
    corecore