2 research outputs found

    Data distribution and loop parallelization for shared-memory multiprocessors

    Get PDF
    Shared-memory multiprocessor systems can achieve high performance levels when appropriate work parallelization and data distribution are performed. These two actions are not independent and decisions have to be taken in a unified way trying to minimize execution time and data movement costs. The first goal is achieved by parallelizing loops (the main components suitable for parallel execution in scientific codes) and assign work to processors having in mind a good load balancing. The second goal is achieved when data is stored in the cache memories of processors minimizing both true and false sharing of cache lines. This paper describes the main features of our automatic parallelization and data distribution research tool and shows the performance of the parallelization strategies generated. The tool (named PDDT) accepts programs written in Fortran77 and generates directives of shared memory programming models (like Power Fortran from SGI or Exemplar from Convex).Peer ReviewedPostprint (author's final draft
    corecore