12,720 research outputs found

    NEUCOMP2 - parallel neural network compiler

    Get PDF
    A parallel neural network compiler (NEUCOMP2) for a shared-memory parallel machine has been implemented by introducing parallelism in NEUCOMP. The parallel routine detects the program loops of the sequential version generated by NEUCOMP, undergoing analysis of the data dependences and transforms it into a parallel version. Experiments were carried out to study the performance of the NEUCOMP2 programs for the backpropagation network. NEUCOMP2 was developed and run on the Sequent Balance 8000 computer system at Parallel Algorithm Research Centre, U.K

    Independent AND-parallel implementation of narrowing

    Get PDF
    We present a parallel graph narrowing machine, which is used to implement a functional logic language on a shared memory multiprocessor. It is an extensión of an abstract machine for a purely functional language. The result is a programmed graph reduction machine which integrates the mechanisms of unification, backtracking, and independent and-parallelism. In the machine, the subexpressions of an expression can run in parallel. In the case of backtracking, the structure of an expression is used to avoid the reevaluation of subexpressions as far as possible. Deterministic computations are detected. Their results are maintained and need not be reevaluated after backtracking

    Quantifying the benefits of SPECint distant parallelism in simultaneous multithreading architectures

    Get PDF
    We exploit the existence of distant parallelism that future compilers could detect and characterise its performance under simultaneous multithreading architectures. By distant parallelism we mean parallelism that cannot be captured by the processor instruction window and that can produce threads suitable for parallel execution in a multithreaded processor. We show that distant parallelism can make feasible wider issue processors by providing more instructions from the distant threads, thus better exploiting the resources from the processor in the case of speeding up single integer applications. We also investigate the necessity of out-of-order processors in the presence of multiple threads of the same program. It is important to notice at this point that the benefits described are totally orthogonal to any other architectural techniques targeting a single thread.Peer ReviewedPostprint (published version

    Well-Structured Futures and Cache Locality

    Full text link
    In fork-join parallelism, a sequential program is split into a directed acyclic graph of tasks linked by directed dependency edges, and the tasks are executed, possibly in parallel, in an order consistent with their dependencies. A popular and effective way to extend fork-join parallelism is to allow threads to create futures. A thread creates a future to hold the results of a computation, which may or may not be executed in parallel. That result is returned when some thread touches that future, blocking if necessary until the result is ready. Recent research has shown that while futures can, of course, enhance parallelism in a structured way, they can have a deleterious effect on cache locality. In the worst case, futures can incur Ω(PT+tT)\Omega(P T_\infty + t T_\infty) deviations, which implies Ω(CPT+CtT)\Omega(C P T_\infty + C t T_\infty) additional cache misses, where CC is the number of cache lines, PP is the number of processors, tt is the number of touches, and TT_\infty is the \emph{computation span}. Since cache locality has a large impact on software performance on modern multicores, this result is troubling. In this paper, however, we show that if futures are used in a simple, disciplined way, then the situation is much better: if each future is touched only once, either by the thread that created it, or by a thread to which the future has been passed from the thread that created it, then parallel executions with work stealing can incur at most O(CPT2)O(C P T^2_\infty) additional cache misses, a substantial improvement. This structured use of futures is characteristic of many (but not all) parallel applications

    Automatic Parallelisation of Web Applications

    Get PDF
    Small web applications have a tendency to get bigger. Yet despite the current popularity of web applications, little has been done to help programmers to leverage the performance and scalability benefits that can result from the introduction of parallelism into a program. Accordingly, we present a technique for the automatic parallelisation of whole web applications, including persistent data storage mechanisms. We detail our prototype implementation of this technique, Ceth and finally, we establish the soundness of the process by which we extract coarse-grained parallelism from programs

    Experimenting with independent and-parallel prolog using standard prolog

    Get PDF
    This paper presents an approximation to the study of parallel systems using sequential tools. The Independent And-parallelism in Prolog is an example of parallel processing paradigm in the framework of logic programming, and implementations like <fc-Prolog uncover the potential performance of parallel processing. But this potential can also be explored using only sequential systems. Being the spirit of this paper to show how this can be done with a standard system, only standard Prolog will be used in the implementations included. Such implementations include tests for parallelism in And-Prolog, a correctnesschecking meta-interpreter of <fc-Prolog and a simulator of parallel execution for <fc-Prolog
    corecore