15 research outputs found

    Processor allocation strategies for modified hypercubes

    Get PDF
    Parallel processing has been widely accepted to be the future in high speed computing. Among the various parallel architectures proposed/implemented, the hypercube has shown a lot of promise because of its poweful properties, like regular topology, fault tolerance, low diameter, simple routing, and ability to efficiently emulate other architectures. The major drawback of the hypercube network is that it can not be expanded in practice because the number of communication ports for each processor grows as the logarithm of the total number of processors in the system. Therefore, once a hypercube supercomputer of a certain dimensionality has been built, any future expansions can be accomplished only by replacing the VLSI chips. This is an undesirable feature and a lot of work has been under progress to eliminate this stymie, thus providing a platform for easier expansion. Modified hypercubes (MHs) have been proposed as the building blocks of hypercube-based systems supporting incremental growth techniques without introducing extra resources for individual hypercubes. However, processor allocation on MHs proves to be a challenge due to a slight deviation in their topology from that of the standard hypercube network. This thesis addresses the issue of processor allocation on MHs and proposes various strategies which are based, partially or entirely, on table look-up approaches. A study of the various task allocation strategies for standard hypercubes is conducted and their suitability for MHs is evaluated. It is shown that the proposed strategies have a perfect subcube recognition ability and a superior performance. Existing processor allocation strategies for pure hypercube networks are demonstrated to be ineffective for MHs, in the light of their inability to recognize all available subcubes. A comparative analysis that involves the buddy strategy and the new strategies is carried out using simulation results

    Fault-Tolerant Ring Embeddings in Hypercubes -- A Reconfigurable Approach

    Get PDF
    We investigate the problem of designing reconfigurable embedding schemes for a fixed hypercube (without redundant processors and links). The fundamental idea for these schemes is to embed a basic network on the hypercube without fully utilizing the nodes on the hypercube. The remaining nodes can be used as spares to reconfigure the embeddings in case of faults. The result of this research shows that by carefully embedding the application graphs, the topological properties of the embedding can be preserved under fault conditions, and reconfiguration can be carried out efficiently. In this dissertation, we choose the ring as the basic network of interest, and propose several schemes for the design of reconfigurable embeddings with the aim of minimizing reconfiguration cost and performance degradation. The cost is measured by the number of node-state changes or reconfiguration steps needed for processing of the reconfiguration, and the performance degradation is characterized as the dilation of the new embedding after reconfiguration. Compared to the existing schemes, our schemes surpass the existing ones in terms of applicability of schemes and reconfiguration cost needed for the resulting embeddings

    Hierarchical gate-array routing on a hypercube multiprocessor

    Full text link
    Gate-arrays are the most common design style for semicustom VLSI integrated circuits. An important part of the gate-array design process is the routing of wires between the logic elements, which is an extremely compute-intensive operation. This paper presents an algorithm for routing gate-arrays that uses a hypercube connected parallel processor to provide the necessary computation power. In order to make optimal use of the hypercube, the routing algorithm is organized so that interprocessor communication is kept to minimum. It occurs only during the global routing and crossing placement phases of the algorithm, which constitute less than 15% of the total running time of the algorithm. On the basis of the results of executing the algorithm on two gate-array benchmarks the case is made for using hypercube multiprocessors as accelerators for compute-intensive CAD operations.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/28650/1/0000466.pd

    Diamond-based models for scientific visualization

    Get PDF
    Hierarchical spatial decompositions are a basic modeling tool in a variety of application domains including scientific visualization, finite element analysis and shape modeling and analysis. A popular class of such approaches is based on the regular simplex bisection operator, which bisects simplices (e.g. line segments, triangles, tetrahedra) along the midpoint of a predetermined edge. Regular simplex bisection produces adaptive simplicial meshes of high geometric quality, while simplifying the extraction of crack-free, or conforming, approximations to the original dataset. Efficient multiresolution representations for such models have been achieved in 2D and 3D by clustering sets of simplices sharing the same bisection edge into structures called diamonds. In this thesis, we introduce several diamond-based approaches for scientific visualization. We first formalize the notion of diamonds in arbitrary dimensions in terms of two related simplicial decompositions of hypercubes. This enables us to enumerate the vertices, simplices, parents and children of a diamond. In particular, we identify the number of simplices involved in conforming updates to be factorial in the dimension and group these into a linear number of subclusters of simplices that are generated simultaneously. The latter form the basis for a compact pointerless representation for conforming meshes generated by regular simplex bisection and for efficiently navigating the topological connectivity of these meshes. Secondly, we introduce the supercube as a high-level primitive on such nested meshes based on the atomic units within the underlying triangulation grid. We propose the use of supercubes to associate information with coherent subsets of the full hierarchy and demonstrate the effectiveness of such a representation for modeling multiresolution terrain and volumetric datasets. Next, we introduce Isodiamond Hierarchies, a general framework for spatial access structures on a hierarchy of diamonds that exploits the implicit hierarchical and geometric relationships of the diamond model. We use an isodiamond hierarchy to encode irregular updates to a multiresolution isosurface or interval volume in terms of regular updates to diamonds. Finally, we consider nested hypercubic meshes, such as quadtrees, octrees and their higher dimensional analogues, through the lens of diamond hierarchies. This allows us to determine the relationships involved in generating balanced hypercubic meshes and to propose a compact pointerless representation of such meshes. We also provide a local diamond-based triangulation algorithm to generate high-quality conforming simplicial meshes

    Automatic Methods for Hiding Latency in Parallel and Distributed Computation

    Get PDF
    In this paper we describe methods for mitigating the degradation in performance caused by high latencies in parallel and distributed networks. For example, given any dataflow type of algorithm that runs in T steps on an n-node ring with unit link delays, we show how to run the algorithm in O(T) steps on any n-node bounded-degree connected network with average link delay O(1). This is a significant improvement over prior approaches to latency hiding, which require slowdowns proportional to the maximum link delay. In the case when the network has average link delay dave, our simulation runs in O(√daveT) steps using n/√dave processors, thereby preserving efficiency. We also show how to efficiently simulate an n × n array with unit link delays using slowdown Õ (d&frac23ave) on a two-dimensional array with average link delay dave. Last, we present results for the case in which large local databases are involved in the computation

    Scalable Algorithms for Parallel Tree-based Adaptive Mesh Refinement with General Element Types

    Get PDF
    In this thesis, we develop, discuss and implement algorithms for scalable parallel tree-based adaptive mesh refinement (AMR) using space-filling curves (SFCs). We create an AMR software that works independently of the used element type, such as for example lines, triangles, tetrahedra, quadrilaterals, hexahedra, and prisms. For triangular and tetrahedral elements (simplices) with red-refinement (1:4 in 2D, 1:8 in 3D), we develop a new SFC, the tetrahedral Morton space-filling curve (TM-SFC). Its construction is similar to the Morton index for quadrilaterals/hexa- hedra, as it is also based on bitwise interleaving the coordinates of a certain vertex of the simplex, the anchor node. Additionally, we interleave with a new piece of information, the so called type. For these simplices, we develop element local algorithms such as constructing the parent, children, or face-neighbors of a simplex, and show that most of them are constant-time operations independent of the refinement level. With SFC based partitioning it is possible that the mesh elements that are parti- tioned to one process do not form a face-connected domain. We prove the following upper bounds for the number of face-connected components of segments of the TM-SFC: With a maximum refine- ment level of L, the number of face-connected components is bounded by 2(L − 1) in 2D and 2L + 1 in 3D. Additionally, we perform a numerical investigation of the distribution of lengths of SFC segments. Furthermore, we develop a new approach to partition and repartition a coarse (input) mesh among the processes. Compared to previous methods it optimizes for fine mesh load-balance and reduces the parallel communication of coarse mesh data. We discuss the coarse mesh repartitioning algorithm and demonstrate that our method repartitions a coarse mesh of 371e9 trees on 917,504 processes (405,000 trees per process) on the Juqueen supercomputer in 1.2 seconds. We develop an AMR concept that works independently of the element type; achieving this independence by strictly distinguishing between functions that oper- ate on the whole mesh (high-level) and functions that locally operate on a single element or a small set of elements (low-level). We discuss a new approach to generate and manage ghost elements that fits into our element-type independent approach. We define and describe the necessary low-level algorithms. Our main idea is the computation of tree-to-tree face-neighbors of an element via the explicit construction of the element's face as a lower dimensional element. In order to optimize the runtime of this method we enhance the algorithm with a top-down search method from Isaac, Burstedde, Wilcox, and Ghattas, and demonstrate how it speeds up the computation by factors of 10 to 20 achieving runtimes comparable to state-of-the art implementations with fixed element types. With the ghost algorithm we build a straight-forward ripple version of the 2:1 balance algorithm. This is not an optimized version but it serves as a feasibility study for our element-type independent approach. We implement all algorithms that we develop in this thesis in the new AMR library t8code. Our modular approach allows us to reuse existing software, which we demonstrate by using the library p4est for quadrilateral and hexahedral elements. In a concurrent Bachelor's thesis by David Knapp (INS, Bonn) the necessary low-level algorithms for prisms were developed. With t8code we demonstrate that we can create, adapt, (re-)partition, and balance meshes, as well as create and manage a ghost layer. In various tests we show excellent strong and weak scaling behavior of our algorithms on up to 917,504 parallel processes on the Juqueen and Mira supercomputers using up to 858e9 mesh elements. We conclude this thesis by demonstrating how an application can be coupled with the AMR routines. We implement a finite volume based advection solver using t8code and show applications with triangular, quadrilateral, tetrahedral, and hexahedral elements, as well as 2D and 3D hybrid meshes, the latter consisting of tetrahedra, hexahedra, and prisms. Overall, we develop and demonstrate a new simplicial SFC and create a fast and scalable tree-based AMR software that offers a flexibility and generality that was previously not available

    [Research activities in applied mathematics, fluid mechanics, and computer science]

    Get PDF
    This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, fluid mechanics, and computer science during the period April 1, 1995 through September 30, 1995

    [Activity of Institute for Computer Applications in Science and Engineering]

    Get PDF
    This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, fluid mechanics, and computer science

    LIPIcs, Volume 258, SoCG 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 258, SoCG 2023, Complete Volum
    corecore