41 research outputs found

    Distributed Finite Element Analysis Using a Transputer Network

    Get PDF
    The principal objective of this research effort was to demonstrate the extraordinarily cost effective acceleration of finite element structural analysis problems using a transputer-based parallel processing network. This objective was accomplished in the form of a commercially viable parallel processing workstation. The workstation is a desktop size, low-maintenance computing unit capable of supercomputer performance yet costs two orders of magnitude less. To achieve the principal research objective, a transputer based structural analysis workstation termed XPFEM was implemented with linear static structural analysis capabilities resembling commercially available NASTRAN. Finite element model files, generated using the on-line preprocessing module or external preprocessing packages, are downloaded to a network of 32 transputers for accelerated solution. The system currently executes at about one third Cray X-MP24 speed but additional acceleration appears likely. For the NASA selected demonstration problem of a Space Shuttle main engine turbine blade model with about 1500 nodes and 4500 independent degrees of freedom, the Cray X-MP24 required 23.9 seconds to obtain a solution while the transputer network, operated from an IBM PC-AT compatible host computer, required 71.7 seconds. Consequently, the 80,000transputernetworkdemonstratedacost−performanceratioabout60timesbetterthanthe80,000 transputer network demonstrated a cost-performance ratio about 60 times better than the 15,000,000 Cray X-MP24 system

    Parallelisation of algorithms

    Get PDF
    Most numerical software involves performing an extremely large volume of algebraic computations. This is both costly and time consuming in respect of computer resources and, for large problems, often super-computer power is required in order for results to be obtained in a reasonable amount of time. One method whereby both the cost and time can be reduced is to use the principle "Many hands make light work", or rather, allow several computers to operate simultaneously on the code, working towards a common goal, and hopefully obtaining the required results in a fraction of the time and cost normally used. This can be achieved through the modification of the costly, time consuming code, breaking it up into separate individual code segments which may be executed concurrently on different processors. This is termed parallelisation of code. This document describes communication between sequential processes, protocols, message routing and parallelisation of algorithms. In particular, it deals with these aspects with reference to the Transputer as developed by INMOS and includes two parallelisation examples, namely parallelisation of code to study airflow and of code to determine far field patterns of antennas. This document also reports on the practical experiences with programming in parallel

    A testbed for embedded systems

    Get PDF
    Testing and Debugging are often the most difficult phase of software development. This is especially true of embedded systems which are usually concurrent, have real-time performance and correctness constraints and which execute in the field in an environment which may not permit internal scrutiny of the software behaviour. Although good software engineering practices help, they will never eliminate the need for testing and debugging. This is because failings in the specification and design are often only discovered through testing and understanding these failings and how to correct them comes from debugging. These observations suggest that embedded software should be designed in a way which makes testing and debugging easier and that tools which support these activities are required. Due to the often hostile environment in which the finished embedded system will function, it is necessary to have a platform which allows the software to be developed and tested "in vitro". The Testbed system achieves these goals by providing dynamic modification and process migration facilities for use during development as well as powerful monitoring and background debugging support. These facilities are built on a basic run-time harness supporting an event-driven programming model with a global communication mechanism. This programming model is well suited to the reactive nature of embedded systems. The main research contributions of this work are in the areas of finding deadlock-free, path-optimal routings for networks and of dynamic modification with automated conversion of data which may include pointers

    Performance Evaluation of Specialized Hardware for Fast Global Operations on Distributed Memory Multicomputers

    Get PDF
    Workstation cluster multicomputers are increasingly being applied for solving scientific problems that require massive computing power. Parallel Virtual Machine (PVM) is a popular message-passing model used to program these clusters. One of the major performance limiting factors for cluster multicomputers is their inefficiency in performing parallel program operations involving collective communications. These operations include synchronization, global reduction, broadcast/multicast operations and orderly access to shared global variables. Hall has demonstrated that a .secondary network with wide tree topology and centralized coordination processors (COP) could improve the performance of global operations on a variety of distributed architectures [Hall94a]. My hypothesis was that the efficiency of many PVM applications on workstation clusters could be significantly improved by utilizing a COP system for collective communication operations. To test my hypothesis, I interfaced COP system with PVM. The interface software includes a virtual memory-mapped secondary network interface driver, and a function library which allows to use COP system in place of PVM function calls in application programs. My implementation makes it possible to easily port any existing PVM applications to perform fast global operations using the COP system. To evaluate the performance improvements of using a COP system, I measured cost of various PVM global functions, derived the cost of equivalent COP library global functions, and compared the results. To analyze the cost of global operations on overall execution time of applications, I instrumented a complex molecular dynamics PVM application and performed measurements. The measurements were performed for a sample cluster size of 5 and for message sizes up to 16 kilobytes. The comparison of PVM and COP system global operation performance clearly demonstrates that the COP system can speed up a variety of global operations involving small-to-medium sized messages by factors of 5-25. Analysis of the example application for a sample cluster size of 5 show that speedup provided by my global function libraries and the COP system reduces overall execution time for this and similar applications by above 1.5 times. Additionally, the performance improvement seen by applications increases as the cluster size increases, thus providing a scalable solution for performing global operations

    A Method of Rendering CSG-Type Solids Using a Hybrid of Conventional Rendering Methods and Ray Tracing Techniques

    Get PDF
    This thesis describes a fast, efficient and innovative algorithm for producing shaded, still images of complex objects, built using constructive solid geometry ( CSG ) techniques. The algorithm uses a hybrid of conventional rendering methods and ray tracing techniques. A description of existing modelling and rendering methods is given in chapters 1, 2 and 3, with emphasis on the data structures and rendering techniques selected for incorporation in the hybrid method. Chapter 4 gives a general description of the hybrid method. This method processes data in the screen coordinate system and generates images in scan-line order. Scan lines are divided into spans (or segments) using the bounding rectangles of primitives calculated in screen coordinates. Conventional rendering methods and ray tracing techniques are used interchangeably along each scan-line. The method used is detennined by the number of primitives associated with a particular span. Conventional rendering methods are used when only one primitive is associated with a span, ray tracing techniques are used for hidden surface removal when two or more primitives are involved. In the latter case each pixel in the span is evaluated by accessing the polygon that is visible within each primitive associated with the span. The depth values (i. e. z-coordinates derived from the 3-dimensional definition) of the polygons involved are deduced for the pixel's position using linear interpolation. These values are used to determine the visible polygon. The CSG tree is accessed from the bottom upwards via an ordered index that enables the 'visible' primitives on any particular scan-line to be efficiently located. Within each primitive an ordered path through the data structure provides the polygons potentially visible on a particular scan-line. Lists of the active primitives and paths to potentially visible polygons are maintained throughout the rendering step and enable span coherence and scan-line coherence to be fully utilised. The results of tests with a range of typical objects and scenes are provided in chapter 5. These results show that the hybrid algorithm is significantly faster than full ray tracing algorithms

    Towards Solving the Dopamine G Protein Coupled Receptor Modelling Problem

    Get PDF
    The overall aim of this work has been to furnish a model of the dopamine (DA) receptor D2. There are currently two sub-groups within the DA family of G protein coupled receptors (GPCRs): D1 sub-group (includes D1 and D5) and the D2 sub-group (includes D2, D3 and D4). Organon (UK) Ltd. supplied a disk containing the PDB atomic co-ordinates of the integral membrane protein bacteriorhodopsin (bRh; Henderson et al., 1975 and 1990) to use as a template to model D2 - the aim being to generate a model of D2 by simply mutating the side-residues of bRh. The assumption being that bRh had homology with members of the supergene class of GPCRs. However, using the GCG Wisconsin GAP algorithm (Devereux et al., 1984) no significant homology was detected between the primary structures of any member of the DA family of GPCRs and bRh. However, given the original brief to carry out homology modelling using bRh as a template (see appendix 1) I felt obliged to carry out further alignments using a shuffling technique and a standard statistical test to check for significant structural homology. The results clearly showed that there is no significant structural homology, on the basis of sequence similarity, between bRh and any member of the DA family of GPCRs. Indeed, the statistical analysis clearly demonstrated that while there is significant structural homology between every catecholamine binding GPCR, there is no structural homology what so ever between any catecholamine binding GPCR and bRh. Hydropathy analysis is frequently used to identify the location of putative transmembrane segments. However, is difficult to predict the end positions of each ptms. To this end a novel alignment algorithm (DH Scan) was coded to exploit transparallel supercomputer technology to provide a basis for identifying likely helix end points and to pinpoint areas of local homology between GPCRs. DH Scan clearly demonstrated characteristic transmembrane homology between different subtype DA GPCRs. Two further homology algorithms were coded (IH Scan and RH Scan) which provided evidence of internal homology. In particular IH Scan independently revealed a repeat region in the 3rd intracellular loop (iIII) of D4 and RH Scan revealed palindromic like short stretches of amino acids which were found to be particularly well represented in predicted ?-helices in each DA receptor subtype. In addition, the profile network prediction algorithm (PHD; Rost et al., 1994) predicted a short alpha-helix at greater than 80% probablility at each end of the third intracellular loop and between the carboxy terminal end of transmembrane VII and a conserved Cys residue in the forth intracellular loop. Fourier analysis of catecholamine binding GPCR primary structures in the form of a multiple-sequence file suggested that the consensus view that only those residues facing the protein interior are conserved is not entirely correct. In particular, transmembrane helices II and III do not exhibit residue conservancy characteristic of an amphipathic helix. It is proposed that these two helices undergo a form of helix interface shear to assist agonist binding to a Asp residue on helix II. This data in combination with information from a number of papers concerning helix shear interface mechanism and molecular dynamic studies of proline containing ?-helices suggested a physically plausible binding mechanism for agonists. While it was evident that homology modelling could not be scientifically justified, the combinatorial approach to protein modelling might be successfully applied to the transmembrane region of the D2 receptor. The probable arrangement of helices in the transmembrane region of GPCRs (Baldwin, 1993) which was based on a careful analysis of a low resolution projection map of rhodopsin (Gebhard et ah, 1993) was used as a guide to model the transmembrane region of D2. The backbone torsion angles of a helix with a middle Pro residue (Sankararamakrishnan et al., 1991) was used to model transmembrane helix V. Dopamine was successfully docked to the putative binding pocket of D2. Using this model as a template, models of D3 and D4 were produced. A separate model of Di was then produced and this in turn was used as a template to model D5

    Farming out : a study.

    Get PDF
    Farming is one of severals ways of arranging for a group of individuals to perform work simultaneously. Farming is attractive. It is a simple concept, and yet it allocates work dynamically, balancing the load automatically. This gives rise to potentially great efficiency; yet the range of applications that can be farmed efficiently and which implementation strategies are the most effective has not been classified. This research has investigated the types of application, design and implementation that farm efficiently on computer systems constructed from a network of communicating parallel processors. This research shows that all applications can be farmed and identifies those concerns that dictate efficiency. For the first generation of transputer hardware, extensive experiments have been performed using Occam, independent of any specific application. This study identified the boundary conditions that dictate which design parameters farm efficiently. These boundary conditions are expressed in a general form that is directly amenable to other architectures. The specific quantitative results are of direct use to others who wish to implement farms on this architecture. Because of farming’s simplicity and potential for high efficiency, this work concludes that architects of parallel hardware should consider binding this paradigm into future systems so as to enable the dynamic allocation of processes to processors to take place automatically. As well as resulting in high levels of machine utilisation for all programs, this would also permanently remove the burden of allocation from the programmer
    corecore