5 research outputs found
An Inherently Parallel Large Grained Data Flow Environment
A parallel programming environment based on data flow is described. Programming in the environment involves use with an interactive graphic editor which facilitates the construction of a program graph consisting of modules, ports, paths and triggers. Parallelism is inherent since data presence allows many modules to execute concurrently. The graph is executed directly without transformation to traditional representations. The environment supports programming at a very high level as opposed to parallelism at the individual instruction level
Aspects of parallel processing and control engineering
The concept of parallel processing is not a new one, but the application of it to control engineering tasks is a relatively recent development, made possible by contemporary hardware and software innovation. It has long been accepted that, if properly orchestrated several processors/CPUs when combined can form a powerful processing entity. What prevented this from being implemented in commercial systems was the adequacy of the microprocessor for most tasks and hence the expense of a multi-processor system was not justified. With the advent of high demand systems, such as highly fault tolerant flight controllers and fast robotic controllers, parallel processing became a viable option.
Nonetheless, the software interfacing of control laws onto parallel systems has remained somewhat of an impasse. There are no software compilers at present which allow a programmer to specify a control law in pure mathematical terminology and then decompose it into a flow diagram of concurrent processes which may then be implemented on, say, a target Transputer system, liiere are several parallel programming languages with which a programmer can generate parallel processes but, generally, in order to realise a control algorithm in parallel the programmer must have intimate knowledge of the algorithm. Therefore, efficiency is based on the ability of the programmer to recognise inherent parellelism. Some attempts are being made to create intelligent partition and scheduling compilers but this usually means significantly extra overheads on the multiprocessor system. In the absence of an automated technique control algorithms must be decomposed by inspection.
The research presented in this thesis is founded upon the application of both parallel and pipelining techniques to particular control strategies. Parallelism is tackled objectively and by creating a tailored terminology it is defined mathematically, and consequently related concepts, such as bounded parallelism and algorithm speedup, are also quantified in a numerical sense. A pipelined explicit Self Tuning Regulator (STR) controller is developed and tested on systems of different order. Under the governance of the parallelism terminology the effectiveness of the parallel STR is evaluated and numerically quantified in terms of relevant performance indices.
A parallel simulator is presented for the Puma 560 robotic manipulator. By exploiting parallelism and pipelinability in the robot model a significant increase in execution speed is achieved over the sequential model. The use of Transputers is examined and graphical results obtained for several performance indices, including speedup, processor efficiency and bounded parallelism. By the same analytical technique a parallel computed torque feedforward controller incorporating proportional derivative feedback control for the Puma 560 manipulator is developed and appraised. The performance of a Transputer system in hosting the controller is graphically analysed and as in the case of the parallel simulator the more important performance indices are examined under both optimal conditions and conditions of varying hardware constraints
Recommended from our members
Cherub: A hardware distributed single shared address space memory architecture
Increased computer throughput can be achieved through the use of parallel processing. The granularity of a parallel program is the average number of instructions performed by the tasks constituting it. Coarse-grained programs typically execute huge numbers of instructions per task (w 105). The tasks in fine-grained programs are typically short (忙 103). In general, the finer the program grain, the greater the potential for exploiting parallelism. Amdahl鈥檚 Law shows that in the absence of overheads, the more potential parallelism that is realised in an algorithm, the faster it will be. The economical granularity of tasks is determined by the intertask communications overhead. Break-even occurs when processing is approximately equally divided between useful work and overhead.
The two common parallel programming paradigms are shared variable and message passing. Shared variable is, in general, the more natural of the two as it allows implicit communication between tasks. This encourages the programmer to make use of fine-grained tasks. The message passing paradigm requires explicit communication between tasks. This encourages the programmer to use coarser-grained tasks.
Two kinds of parallel architecture have become established. The first is the multiprocessor, which is built around a shared bus giving broadcast communications and a shared memory. This is characterised by low communications overhead, but limited scalability. The second is the multicomputer, which is based on point-to-point communications with larger communications overhead, but good scalability. Quantitatively, the low overhead of the multiprocessor is well matched to fine-grain tasks and, hence, to supporting the shared variable paradigm, while the high overhead of the multicomputer matches it to coarse-grain parallelism and, hence, to the message passing paradigm.
Currently, there appears to be no middle ground in parallel computing; an architecture which can support both several hundred medium-grained (芦 104 instructions) parallel tasks and the shared variable programming paradigm would be advantageous in many applications.
This thesis asserts that it is possible to implement a new computer architecture, Cherub, which has at least 200 processors and is able to support shared variable programming with an optimal task granularity of around 104 instructions. This can be achieved through the combination of a hardware-based distributed shared single address space and a wafer-scale communications network.
To support the thesis, the dissertation first specifies a programmer鈥檚 interface to Cherub which is simple enough to implement in hardware. It then designs algorithms which provide this interface, allowing the requirements of the underlying network to be estimated. Finally, a wafer scale communications network is outlined, and simulations are used to demonstrate that it can provide the performance required to successfully implement Cherub
Procesamiento paralelo : Balance de carga din谩mico en algoritmo de sorting
Algunas t茅cnicas de sorting intentan balancear la carga mediante un muestreo inicial de los datos a ordenar y una distribuci贸n de los mismos de acuerdo a pivots. Otras redistribuyen listas parcialmente ordenadas de modo que cada procesador almacene un n煤mero aproximadamente igual de claves, y todos tomen parte del proceso de merge durante la ejecuci贸n. Esta Tesis presenta un nuevo m茅todo que balancea din谩micamente la carga basado en un enfoque diferente, buscando realizar una distribuci贸n del trabajo utilizando un estimador que permita predecir la carga de trabajo pendiente.
El m茅todo propuesto es una variante de Sorting by Merging Paralelo, esto es, una t茅cnica basada en comparaci贸n. Las ordenaciones en los bloques se realizan mediante el m茅todo de Burbuja o Bubble Sort con centinela. En este caso, el trabajo a realizar -en t茅rminos de comparaciones e intercambios- se encuentra afectada por el grado de desorden de los datos. Se estudi贸 la evoluci贸n de la cantidad de trabajo en cada iteraci贸n del algoritmo para diferentes tipos de secuencias de entrada, n datos con valores de a n sin repetici贸n, datos al azar con distribuci贸n normal, observ谩ndose que el trabajo disminuye en cada iteraci贸n. Esto se utiliz贸 para obtener una estimaci贸n del trabajo restante esperado a partir de una iteraci贸n determinada, y basarse en el mismo para corregir la distribuci贸n de la carga.
Con esta idea, el m茅toEs revisado por: http://sedici.unlp.edu.ar/handle/10915/9500Facultad de Ciencias Exacta