16,242 research outputs found

    PinComm: characterizing intra-application communication for the many-core era

    Get PDF
    While the number of cores in both general purpose chip-multiprocessors (CMPs) and embedded Multi-Processor Systems-on-Chip (MPSoCs) keeps rising, on-chip communication becomes more and more important. In order to write efficient programs for these architectures, it is therefore necessary to have a good idea of the communication behavior of an application. We present a communication profiler that extracts this behavior from compiled, sequential C/C++ programs, and constructs a dynamic data-flow graph at the level of major functional blocks. It can also be used to view differences between program phases, such as different video frames, which allows both input- and phase-specific optimizations to be made. Finally, PinComm can visualize inter-thread communication in parallel programs, which can help in optimizing communication behavior and spotting communication-related performance bottlenecks

    Execution models for mapping programs onto distributed memory parallel computers

    Get PDF
    The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program

    Scalable data abstractions for distributed parallel computations

    Get PDF
    The ability to express a program as a hierarchical composition of parts is an essential tool in managing the complexity of software and a key abstraction this provides is to separate the representation of data from the computation. Many current parallel programming models use a shared memory model to provide data abstraction but this doesn't scale well with large numbers of cores due to non-determinism and access latency. This paper proposes a simple programming model that allows scalable parallel programs to be expressed with distributed representations of data and it provides the programmer with the flexibility to employ shared or distributed styles of data-parallelism where applicable. It is capable of an efficient implementation, and with the provision of a small set of primitive capabilities in the hardware, it can be compiled to operate directly on the hardware, in the same way stack-based allocation operates for subroutines in sequential machines

    S+Net: extending functional coordination with extra-functional semantics

    Get PDF
    This technical report introduces S+Net, a compositional coordination language for streaming networks with extra-functional semantics. Compositionality simplifies the specification of complex parallel and distributed applications; extra-functional semantics allow the application designer to reason about and control resource usage, performance and fault handling. The key feature of S+Net is that functional and extra-functional semantics are defined orthogonally from each other. S+Net can be seen as a simultaneous simplification and extension of the existing coordination language S-Net, that gives control of extra-functional behavior to the S-Net programmer. S+Net can also be seen as a transitional research step between S-Net and AstraKahn, another coordination language currently being designed at the University of Hertfordshire. In contrast with AstraKahn which constitutes a re-design from the ground up, S+Net preserves the basic operational semantics of S-Net and thus provides an incremental introduction of extra-functional control in an existing language.Comment: 34 pages, 11 figures, 3 table
    corecore