27 research outputs found

    Interconnection Networks Embeddings and Efficient Parallel Computations.

    Get PDF
    To obtain a greater performance, many processors are allowed to cooperate to solve a single problem. These processors communicate via an interconnection network or a bus. The most essential function of the underlying interconnection network is the efficient interchanging of messages between processes in different processors. Parallel machines based on the hypercube topology have gained a great respect in parallel computation because of its many attractive properties. Many versions of the hypercube have been introduced by many researchers mainly to enhance communications. The twisted hypercube is one of the most attractive versions of the hypercube. It preserves the important features of the hypercube and reduces its diameter by a factor of two. This dissertation investigates relations and transformations between various interconnection networks and the twisted hypercube and explore its efficiency in parallel computation. The capability of the twisted hypercube to simulate complete binary trees, complete quad trees, and rings is demonstrated and compared with the hypercube. Finally, the fault-tolerance of the twisted hypercube is investigated. We present optimal algorithms to simulate rings in a faulty twisted hypercube environment and compare that with the hypercube

    Processor allocation strategies for modified hypercubes

    Get PDF
    Parallel processing has been widely accepted to be the future in high speed computing. Among the various parallel architectures proposed/implemented, the hypercube has shown a lot of promise because of its poweful properties, like regular topology, fault tolerance, low diameter, simple routing, and ability to efficiently emulate other architectures. The major drawback of the hypercube network is that it can not be expanded in practice because the number of communication ports for each processor grows as the logarithm of the total number of processors in the system. Therefore, once a hypercube supercomputer of a certain dimensionality has been built, any future expansions can be accomplished only by replacing the VLSI chips. This is an undesirable feature and a lot of work has been under progress to eliminate this stymie, thus providing a platform for easier expansion. Modified hypercubes (MHs) have been proposed as the building blocks of hypercube-based systems supporting incremental growth techniques without introducing extra resources for individual hypercubes. However, processor allocation on MHs proves to be a challenge due to a slight deviation in their topology from that of the standard hypercube network. This thesis addresses the issue of processor allocation on MHs and proposes various strategies which are based, partially or entirely, on table look-up approaches. A study of the various task allocation strategies for standard hypercubes is conducted and their suitability for MHs is evaluated. It is shown that the proposed strategies have a perfect subcube recognition ability and a superior performance. Existing processor allocation strategies for pure hypercube networks are demonstrated to be ineffective for MHs, in the light of their inability to recognize all available subcubes. A comparative analysis that involves the buddy strategy and the new strategies is carried out using simulation results

    Parallel Computation on Hypercube-Like Machines.

    Get PDF
    The hypercube interconnection network has been recognized to be very suitable for a parallel computing architecture due to its attractive topological properties. Recently, several modified hypercubes have been propose to improve the performance of a hypercube. This dissertation deals with two modified hypercubes, the X-hypercube and the Z-cube. The X-hypercube is a variant of the hypercube, with the same amount of hardware but a diameter of only \lceil(n + 1)/2\rceil in a hypercube of dimension n. The Z-cube has only 75 percent of the edges of a hypercube with the same number vertices and the same diameter as the hypercube. In this dissertation, we investigate some topological properties and the effectiveness of the X-hypercube and the Z-cube in their combinatorial and computational aspects. We give the optimal or nearly optimal data communication algorithms including routing, broadcasting, and census function for the X-hypercube and the Z-cube. We also give the optimal embedding algorithms between the X-hypercube and the hypercube. It is shown that the average distance between vertices in a X-hypercube is roughly 13/16 of that in a hypercube. This implies that a X-hypercube achieves the better average communication performance than a hypercube. In addition, a set of fundamental SIMD algorithms for a X-hypercube is given. Our results indicate that the X-hypercube makes an improvement in performance over the hypercube, but not as much as the reduction in a diameter, and the Z-cube is a good alternative for the hypercube as far as the VLSI implementation is of major concern

    Fast algorithm for real-time rings reconstruction

    Get PDF
    The GAP project is dedicated to study the application of GPU in several contexts in which real-time response is important to take decisions. The definition of real-time depends on the application under study, ranging from answer time of μs up to several hours in case of very computing intensive task. During this conference we presented our work in low level triggers [1] [2] and high level triggers [3] in high energy physics experiments, and specific application for nuclear magnetic resonance (NMR) [4] [5] and cone-beam CT [6]. Apart from the study of dedicated solution to decrease the latency due to data transport and preparation, the computing algorithms play an essential role in any GPU application. In this contribution, we show an original algorithm developed for triggers application, to accelerate the ring reconstruction in RICH detector when it is not possible to have seeds for reconstruction from external trackers

    HPCCP/CAS Workshop Proceedings 1998

    Get PDF
    This publication is a collection of extended abstracts of presentations given at the HPCCP/CAS (High Performance Computing and Communications Program/Computational Aerosciences Project) Workshop held on August 24-26, 1998, at NASA Ames Research Center, Moffett Field, California. The objective of the Workshop was to bring together the aerospace high performance computing community, consisting of airframe and propulsion companies, independent software vendors, university researchers, and government scientists and engineers. The Workshop was sponsored by the HPCCP Office at NASA Ames Research Center. The Workshop consisted of over 40 presentations, including an overview of NASA's High Performance Computing and Communications Program and the Computational Aerosciences Project; ten sessions of papers representative of the high performance computing research conducted within the Program by the aerospace industry, academia, NASA, and other government laboratories; two panel sessions; and a special presentation by Mr. James Bailey

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    Bibliography of Lewis Research Center technical publications announced in 1985

    Get PDF
    This compilation of abstracts describes and indexes the technical reporting that resulted from the scientific and engineering work performed and managed by the Lewis Research Center in 1985. All the publications were announced in the 1985 issues of STAR (Scientific and Technical Aerospace Reports) and/or IAA (International Aerospace Abstracts). Included are research reports, journal articles, conference presentations, patents and patent applications, and theses

    Optimal Simulation of Linear Multiprocessor Architectures on Multiply-twisted Cube Using Generalized Gray Codes

    Full text link
    We consider the problem of simulating linear arrays and rings on the multiply twisted cube. We introduce a new concept, the reflected link label sequence, and use it to define a generalized Gray Code (GGC). We show that GGCs can be easily used to identify Hamiltonian paths and cycles in the multiply twisted cube. We also give a method for embedding a ring of arbitrary number of nodes into the multiply twisted cub

    Domänen parallele Maschinen

    Get PDF
    A computational model is introduced, which abstracts and idealizes computers with access to fragment shaders. While the set of functions computable by this model remains the same, the running times can be drastically reduced through parallelization compared to conventional models. Some of the algorithms designed for the model can be approximated using fragment shaders. With an automatic transcompilation scheme, fragment shader programs can be generated automatically from a description in a high-level language.In dieser Arbeit wird ein Rechenmodell, das Computer mit Zugriff zu Fragment Shader abstrahiert und idealisiert, eingeführt. Zwar bleibt der Umfang der durch dieses Modell berechenbarer Funktionen gleich, jedoch können die Laufzeiten durch Parallelisierung im Vergleich zu herkömmlichen Modellen drastisch verkürzt werden. Einige der für das Modell entworfenen Algorithmen lassen sich mithilfe von Fragment Shadern approximieren. In einer Hochsprache beschriebene Algorithmen werden automatisiert in Fragment Shader Programme übersetzt

    Fast Volume Rendering and Deformation Algorithms

    Full text link
    Volume rendering is a technique for simultaneous visualization of surfaces and inner structures of objects. However, the huge number of volume primitives (voxels) in a volume, leads to high computational cost. In this dissertation I developed two algorithms for the acceleration of volume rendering and volume deformation. The first algorithm accelerates the ray casting of volume. Previous ray casting acceleration techniques like space-leaping and early-ray-termination are only efficient when most voxels in a volume are either opaque or transparent. When many voxels are semi-transparent, the rendering time will increase considerably. Our new algorithm improves the performance of ray casting of semi-transparently mapped volumes by exploiting the opacity coherency in object space, leading to a speedup factor between 1.90 and 3.49 in rendering semi-transparent volumes. The acceleration is realized with the help of pre-computed coherency distances. We developed an efficient algorithm to encode the coherency information, which requires less than 12 seconds for data sets with about 8 million voxels. The second algorithm is for volume deformation. Unlike the traditional methods, our method incorporates the two stages of volume deformation, i.e. deformation and rendering, into a unified process. Instead to deform each voxel to generate an intermediate deformed volume, the algorithm follows inversely deformed rays to generate the desired deformation. The calculations and memory for generating the intermediate volume are thus saved. The deformation continuity is achieved by adaptive ray division which matches the amplitude of local deformation. We proposed approaches for shading and opacit adjustment which guarantee the visual plausibility of deformation results. We achieve an additional deformation speedup factor of 2.34~6.58 by incorporating early-ray-termination, space-leaping and the coherency acceleration technique in the new deformation algorithm
    corecore