404 research outputs found

    Time-Optimal Algorithms on Meshes With Multiple Broadcasting

    Get PDF
    The mesh-connected computer architecture has emerged as a natural choice for solving a large number of computational tasks in image processing, computational geometry, and computer vision. However, due to its large communication diameter, the mesh tends to be slow when it comes to handling data transfer operations over long distances. In an attempt to overcome this problem, mesh-connected computers have recently been augmented by the addition of various types of bus systems. One such system known as the mesh with multiple broadcasting involves enhancing the mesh architecture by the addition of row and column buses. The mesh with multiple broadcasting has proven to be feasible to implement in VLSI, and is used in the DAP family of computers. In recent years, efficient algorithms to solve a number of computational problems on meshes with multiple broadcasting have been proposed in the literature. The problems considered in this thesis are semigroup computations, sorting, multiple search, various convexity-related problems, and some tree problems. Based on the size of the input data for the problem under consideration, existing results can be broadly classified into sparse and dense. Specifically, for a given √n x √n mesh with multiple broadcasting, we refer to problems involving m∈O(nm \in O(\sqrt{n}) items as sparse, while the case £ O(n) will be referred to as dense. Finally, the case corresponding to 2 ≤ m ≤ n is be termed general. The motivation behind the current work is twofold. First, time-optimal solutions are proposed for the problems listed above. Secondly, an attempt is made to remove the artificial limitation of problems studied to sparse and dense cases. To establish the time-optimality of the algorithms presented in this work, we use some existing lower bound techniques along with new ones that we develop. We solve the semigroup computation problem for the general case and present a novel lower bound argument. We solve the multiple search problem in the general case and present some surprising applications to computational geometry. In the case of sorting, the general case is defined to be slightly different. For the specified range of the size of input, we present a time and VLSI-optimal algorithm. We also present time lower bound results and matching algorithms for a number of convexity related and tree problems in the sparse case

    A Computational Paradigm on Network-Based Models of Computation

    Get PDF
    The maturation of computer science has strengthened the need to consolidate isolated algorithms and techniques into general computational paradigms. The main goal of this dissertation is to provide a unifying framework which captures the essence of a number of problems in seemingly unrelated contexts in database design, pattern recognition, image processing, VLSI design, computer vision, and robot navigation. The main contribution of this work is to provide a computational paradigm which involves the unifying framework, referred to as the multiple Query problem, along with a generic solution to the Multiple Query problem. To demonstrate the applicability of the paradigm, a number of problems from different areas of computer science are solved by formulating them in this framework. Also, to show practical relevance, two fundamental problems were implemented in the C language using MPI. The code can be ported onto many commercially available parallel computers; in particular, the code was tested on an IBM-SP2 and on a network of workstations

    Time-Optimal Tree Computations on Sparse Meshes

    Get PDF
    The main goal of this work is to fathom the suitability of the mesh with multiple broadcasting architecture (MMB) for some tree-related computations. We view our contribution at two levels: on the one hand, we exhibit time lower bounds for a number of tree-related problems on the MMB. On the other hand, we show that these lower bounds are tight by exhibiting time-optimal tree algorithms on the MMB. Specifically, we show that the task of encoding and/or decoding n-node binary and ordered trees cannot be solved faster than Ω(log n) time even if the MMB has an infinite number of processors. We then go on to show that this lower bound is tight. We also show that the task of reconstructing n-node binary trees and ordered trees from their traversais can be performed in O(1) time on the same architecture. Our algorithms rely on novel time-optimal algorithms on sequences of parentheses that we also develop

    Visibility-Related Problems on Parallel Computational Models

    Get PDF
    Visibility-related problems find applications in seemingly unrelated and diverse fields such as computer graphics, scene analysis, robotics and VLSI design. While there are common threads running through these problems, most existing solutions do not exploit these commonalities. With this in mind, this thesis identifies these common threads and provides a unified approach to solve these problems and develops solutions that can be viewed as template algorithms for an abstract computational model. A template algorithm provides an architecture independent solution for a problem, from which solutions can be generated for diverse computational models. In particular, the template algorithms presented in this work lead to optimal solutions to various visibility-related problems on fine-grain mesh connected computers such as meshes with multiple broadcasting and reconfigurable meshes, and also on coarse-grain multicomputers. Visibility-related problems studied in this thesis can be broadly classified into Object Visibility and Triangulation problems. To demonstrate the practical relevance of these algorithms, two of the fundamental template algorithms identified as powerful tools in almost every algorithm designed in this work were implemented on an IBM-SP2. The code was developed in the C language, using MPI, and can easily be ported to many commercially available parallel computers

    Geometric modeling for computer aided design

    Get PDF
    The primary goal of this grant has been the design and implementation of software to be used in the conceptual design of aerospace vehicles particularly focused on the elements of geometric design, graphical user interfaces, and the interaction of the multitude of software typically used in this engineering environment. This has resulted in the development of several analysis packages and design studies. These include two major software systems currently used in the conceptual level design of aerospace vehicles. These tools are SMART, the Solid Modeling Aerospace Research Tool, and EASIE, the Environment for Software Integration and Execution. Additional software tools were designed and implemented to address the needs of the engineer working in the conceptual design environment. SMART provides conceptual designers with a rapid prototyping capability and several engineering analysis capabilities. In addition, SMART has a carefully engineered user interface that makes it easy to learn and use. Finally, a number of specialty characteristics have been built into SMART which allow it to be used efficiently as a front end geometry processor for other analysis packages. EASIE provides a set of interactive utilities that simplify the task of building and executing computer aided design systems consisting of diverse, stand-alone, analysis codes. Resulting in a streamlining of the exchange of data between programs reducing errors and improving the efficiency. EASIE provides both a methodology and a collection of software tools to ease the task of coordinating engineering design and analysis codes

    A Multilevel Approach to Topology-Aware Collective Operations in Computational Grids

    Full text link
    The efficient implementation of collective communiction operations has received much attention. Initial efforts produced "optimal" trees based on network communication models that assumed equal point-to-point latencies between any two processes. This assumption is violated in most practical settings, however, particularly in heterogeneous systems such as clusters of SMPs and wide-area "computational Grids," with the result that collective operations perform suboptimally. In response, more recent work has focused on creating topology-aware trees for collective operations that minimize communication across slower channels (e.g., a wide-area network). While these efforts have significant communication benefits, they all limit their view of the network to only two layers. We present a strategy based upon a multilayer view of the network. By creating multilevel topology-aware trees we take advantage of communication cost differences at every level in the network. We used this strategy to implement topology-aware versions of several MPI collective operations in MPICH-G2, the Globus Toolkit[tm]-enabled version of the popular MPICH implementation of the MPI standard. Using information about topology provided by MPICH-G2, we construct these multilevel topology-aware trees automatically during execution. We present results demonstrating the advantages of our multilevel approach by comparing it to the default (topology-unaware) implementation provided by MPICH and a topology-aware two-layer implementation.Comment: 16 pages, 8 figure

    Mesh Connected Computers With Multiple Fixed Buses: Packet Routing, Sorting and Selection

    Get PDF
    Mesh connected computers have become attractive models of computing because of their varied special features. In this paper we consider two variations of the mesh model: 1) a mesh with fixed buses, and 2) a mesh with reconfigurable buses. Both these models have been the subject matter of extensive previous research. We solve numerous important problems related to packet routing, sorting, and selection on these models. In particular, we provide lower bounds and very nearly matching upper bounds for the following problems on both these models: 1) Routing on a linear array; and 2) k-k routing, k-k sorting, and cut through routing on a 2D mesh for any k ≥ 12. We provide an improved algorithm for 1-1 routing and a matching sorting algorithm. In addition we present greedy algorithms for 1-1 routing, k-k routing, cut through routing, and k-k sorting that are better on average and supply matching lower bounds. We also show that sorting can be performed in logarithmic time on a mesh with fixed buses. As a consequence we present an optimal randomized selection algorithm. In addition we provide a selection algorithm for the mesh with reconfigurable buses whose time bound is significantly better than the existing ones. Our algorithms have considerably better time bounds than many existing best known algorithms

    High performance Python for direct numerical simulations of turbulent flows

    Full text link
    Direct Numerical Simulations (DNS) of the Navier Stokes equations is an invaluable research tool in fluid dynamics. Still, there are few publicly available research codes and, due to the heavy number crunching implied, available codes are usually written in low-level languages such as C/C++ or Fortran. In this paper we describe a pure scientific Python pseudo-spectral DNS code that nearly matches the performance of C++ for thousands of processors and billions of unknowns. We also describe a version optimized through Cython, that is found to match the speed of C++. The solvers are written from scratch in Python, both the mesh, the MPI domain decomposition, and the temporal integrators. The solvers have been verified and benchmarked on the Shaheen supercomputer at the KAUST supercomputing laboratory, and we are able to show very good scaling up to several thousand cores. A very important part of the implementation is the mesh decomposition (we implement both slab and pencil decompositions) and 3D parallel Fast Fourier Transforms (FFT). The mesh decomposition and FFT routines have been implemented in Python using serial FFT routines (either NumPy, pyFFTW or any other serial FFT module), NumPy array manipulations and with MPI communications handled by MPI for Python (mpi4py). We show how we are able to execute a 3D parallel FFT in Python for a slab mesh decomposition using 4 lines of compact Python code, for which the parallel performance on Shaheen is found to be slightly better than similar routines provided through the FFTW library. For a pencil mesh decomposition 7 lines of code is required to execute a transform
    • …
    corecore