Search CORE

12,720 research outputs found

NEUCOMP2 - parallel neural network compiler

Author: Evans D. J.
Sulaiman Md. Nasir
Publication venue: Faculty of Computer Science and Information Technology, University of Malaya
Publication date: 01/12/1996
Field of study

A parallel neural network compiler (NEUCOMP2) for a shared-memory parallel machine has been implemented by introducing parallelism in NEUCOMP. The parallel routine detects the program loops of the sequential version generated by NEUCOMP, undergoing analysis of the data dependences and transforms it into a parallel version. Experiments were carried out to study the performance of the NEUCOMP2 programs for the backpropagation network. NEUCOMP2 was developed and run on the Sequent Balance 8000 computer system at Parallel Algorithm Research Centre, U.K

Universiti Putra Malaysia Institutional Repository

Independent AND-parallel implementation of narrowing

Author: Hermenegildo Manuel V.
Kuchen Herbert
Moreno Navarro Juan José
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/1992
Field of study

We present a parallel graph narrowing machine, which is used to implement a functional logic language on a shared memory multiprocessor. It is an extensión of an abstract machine for a purely functional language. The result is a programmed graph reduction machine which integrates the mechanisms of unification, backtracking, and independent and-parallelism. In the machine, the subexpressions of an expression can run in parallel. In the case of backtracking, the structure of an expression is used to avoid the reevaluation of subexpressions as far as possible. Deterministic computations are detected. Their results are maintained and need not be reevaluated after backtracking

CiteSeerX

Archivo Digital UPM

Quantifying the benefits of SPECint distant parallelism in simultaneous multithreading architectures

Author: Ayguadé Parra Eduard
Krishnan Venkata
Martel Pérez Iván
Ortega Fernández Daniel
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1999
Field of study

We exploit the existence of distant parallelism that future compilers could detect and characterise its performance under simultaneous multithreading architectures. By distant parallelism we mean parallelism that cannot be captured by the processor instruction window and that can produce threads suitable for parallel execution in a multithreaded processor. We show that distant parallelism can make feasible wider issue processors by providing more instructions from the distant threads, thus better exploiting the resources from the processor in the case of speeding up single integer applications. We also investigate the necessity of out-of-order processors in the presence of multiple threads of the same program. It is important to notice at this point that the benefits described are totally orthogonal to any other architectural techniques targeting a single thread.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Well-Structured Futures and Cache Locality

Author: Herlihy Maurice
Liu Zhiyu
Publication venue
Publication date: 16/08/2016
Field of study

In fork-join parallelism, a sequential program is split into a directed acyclic graph of tasks linked by directed dependency edges, and the tasks are executed, possibly in parallel, in an order consistent with their dependencies. A popular and effective way to extend fork-join parallelism is to allow threads to create futures. A thread creates a future to hold the results of a computation, which may or may not be executed in parallel. That result is returned when some thread touches that future, blocking if necessary until the result is ready. Recent research has shown that while futures can, of course, enhance parallelism in a structured way, they can have a deleterious effect on cache locality. In the worst case, futures can incur

\Omega(P T_\infty + t T_\infty)

deviations, which implies

\Omega(C P T_\infty + C t T_\infty)

additional cache misses, where

C

is the number of cache lines,

P

is the number of processors,

t

is the number of touches, and

T_\infty

is the \emph{computation span}. Since cache locality has a large impact on software performance on modern multicores, this result is troubling. In this paper, however, we show that if futures are used in a simple, disciplined way, then the situation is much better: if each future is touched only once, either by the thread that created it, or by a thread to which the future has been passed from the thread that created it, then parallel executions with work stealing can incur at most

O(C P T^2_\infty)

additional cache misses, a substantial improvement. This structured use of futures is characteristic of many (but not all) parallel applications

arXiv.org e-Print Archive

CiteSeerX

Automatic Parallelisation of Web Applications

Author: Perrone Gian David
Streader David
Publication venue: Canterbury University
Publication date: 01/01/2008
Field of study

Small web applications have a tendency to get bigger. Yet despite the current popularity of web applications, little has been done to help programmers to leverage the performance and scalability benefits that can result from the introduction of parallelism into a program. Accordingly, we present a technique for the automatic parallelisation of whole web applications, including persistent data storage mechanisms. We detail our prototype implementation of this technique, Ceth and finally, we establish the soundness of the process by which we extract coarse-grained parallelism from programs

Research Commons@Waikato

Experimenting with independent and-parallel prolog using standard prolog

Author: Carro Liñares Manuel
Hermenegildo Manuel V.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/10/1991
Field of study

This paper presents an approximation to the study of parallel systems using sequential tools. The Independent And-parallelism in Prolog is an example of parallel processing paradigm in the framework of logic programming, and implementations like <fc-Prolog uncover the potential performance of parallel processing. But this potential can also be explored using only sequential systems. Being the spirit of this paper to show how this can be done with a standard system, only standard Prolog will be used in the implementations included. Such implementations include tests for parallelism in And-Prolog, a correctnesschecking meta-interpreter of <fc-Prolog and a simulator of parallel execution for <fc-Prolog

Archivo Digital UPM