1,096 research outputs found

    Time-Optimal Algorithms on Meshes With Multiple Broadcasting

    Get PDF
    The mesh-connected computer architecture has emerged as a natural choice for solving a large number of computational tasks in image processing, computational geometry, and computer vision. However, due to its large communication diameter, the mesh tends to be slow when it comes to handling data transfer operations over long distances. In an attempt to overcome this problem, mesh-connected computers have recently been augmented by the addition of various types of bus systems. One such system known as the mesh with multiple broadcasting involves enhancing the mesh architecture by the addition of row and column buses. The mesh with multiple broadcasting has proven to be feasible to implement in VLSI, and is used in the DAP family of computers. In recent years, efficient algorithms to solve a number of computational problems on meshes with multiple broadcasting have been proposed in the literature. The problems considered in this thesis are semigroup computations, sorting, multiple search, various convexity-related problems, and some tree problems. Based on the size of the input data for the problem under consideration, existing results can be broadly classified into sparse and dense. Specifically, for a given √n x √n mesh with multiple broadcasting, we refer to problems involving m∈O(nm \in O(\sqrt{n}) items as sparse, while the case £ O(n) will be referred to as dense. Finally, the case corresponding to 2 ≤ m ≤ n is be termed general. The motivation behind the current work is twofold. First, time-optimal solutions are proposed for the problems listed above. Secondly, an attempt is made to remove the artificial limitation of problems studied to sparse and dense cases. To establish the time-optimality of the algorithms presented in this work, we use some existing lower bound techniques along with new ones that we develop. We solve the semigroup computation problem for the general case and present a novel lower bound argument. We solve the multiple search problem in the general case and present some surprising applications to computational geometry. In the case of sorting, the general case is defined to be slightly different. For the specified range of the size of input, we present a time and VLSI-optimal algorithm. We also present time lower bound results and matching algorithms for a number of convexity related and tree problems in the sparse case

    A Computational Paradigm on Network-Based Models of Computation

    Get PDF
    The maturation of computer science has strengthened the need to consolidate isolated algorithms and techniques into general computational paradigms. The main goal of this dissertation is to provide a unifying framework which captures the essence of a number of problems in seemingly unrelated contexts in database design, pattern recognition, image processing, VLSI design, computer vision, and robot navigation. The main contribution of this work is to provide a computational paradigm which involves the unifying framework, referred to as the multiple Query problem, along with a generic solution to the Multiple Query problem. To demonstrate the applicability of the paradigm, a number of problems from different areas of computer science are solved by formulating them in this framework. Also, to show practical relevance, two fundamental problems were implemented in the C language using MPI. The code can be ported onto many commercially available parallel computers; in particular, the code was tested on an IBM-SP2 and on a network of workstations

    Communication aspects of parallel processing

    Get PDF
    Cover title.Includes bibliographical references.Supported in part by the Air Force Office of Scientific Research. AFOSR-88-0032Cüneyt Özveren

    Time-Optimal Tree Computations on Sparse Meshes

    Get PDF
    The main goal of this work is to fathom the suitability of the mesh with multiple broadcasting architecture (MMB) for some tree-related computations. We view our contribution at two levels: on the one hand, we exhibit time lower bounds for a number of tree-related problems on the MMB. On the other hand, we show that these lower bounds are tight by exhibiting time-optimal tree algorithms on the MMB. Specifically, we show that the task of encoding and/or decoding n-node binary and ordered trees cannot be solved faster than Ω(log n) time even if the MMB has an infinite number of processors. We then go on to show that this lower bound is tight. We also show that the task of reconstructing n-node binary trees and ordered trees from their traversais can be performed in O(1) time on the same architecture. Our algorithms rely on novel time-optimal algorithms on sequences of parentheses that we also develop

    An improved generalization of mesh-connected computers with multiple buses

    Get PDF
    ©2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.Mesh-connected computers (MCCs) are a class of important parallel architectures due to their simple and regular interconnections. However, their performances are restricted by their large diameters. Various augmenting mechanisms have been proposed to enhance the communication efficiency of MCCs. One major approach is to add nonconfigurable buses for improved broadcasting. A typical example is the mesh-connected computer with multiple buses (MMB). We propose a new class of generalized MMBs, the improved generalized MMBs (IMMBs). We compare IMMBs with MMBs and a class of previously proposed generalized MMBs (GMMBs). We show the power of IMMBs by considering semigroup and prefix computations. Specifically, as our main result we show that for any constant 0<&epsiv;<1, one can construct an N½×N½ square IMMB using which semigroup and prefix computations on N operands can be carried out in O(N&epsiv;) time, while maintaining O(1) broadcasting time. Compared with the previous best complexities O(N&frac18;) and O(N&frac116;) achieved on a rectangular MMB and GMMB, respectively, for the same computations, our results show that IMMBs are more powerful than MMBs and GMMBsYi Pen; Zheng, S.Q.; Keqin Li; Hong She

    Expansion of the nodal-adjoint method for simple and efficient computation of the 2d tomographic imaging jacobian matrix

    Get PDF
    This paper focuses on the construction of the Jacobian matrix required in tomographic reconstruction algorithms. In microwave tomography, computing the forward solutions during the iterative reconstruction process impacts the accuracy and computational efficiency. Towards this end, we have applied the discrete dipole approximation for the forward solutions with significant time savings. However, while we have discovered that the imaging problem configuration can dramatically impact the computation time required for the forward solver, it can be equally beneficial in constructing the Jacobian matrix calculated in iterative image reconstruction algorithms. Key to this implementation, we propose to use the same simulation grid for both the forward and imaging domain discretizations for the discrete dipole approximation solutions and report in detail the theoretical aspects for this localization. In this way, the computational cost of the nodal adjoint method decreases by several orders of magnitude. Our investigations show that this expansion is a significant enhancement compared to previous implementations and results in a rapid calculation of the Jacobian matrix with a high level of accuracy. The discrete dipole approximation and the newly efficient Jacobian matrices are effectively implemented to produce quantitative images of the simplified breast phantom from the microwave imaging system

    Visibility-Related Problems on Parallel Computational Models

    Get PDF
    Visibility-related problems find applications in seemingly unrelated and diverse fields such as computer graphics, scene analysis, robotics and VLSI design. While there are common threads running through these problems, most existing solutions do not exploit these commonalities. With this in mind, this thesis identifies these common threads and provides a unified approach to solve these problems and develops solutions that can be viewed as template algorithms for an abstract computational model. A template algorithm provides an architecture independent solution for a problem, from which solutions can be generated for diverse computational models. In particular, the template algorithms presented in this work lead to optimal solutions to various visibility-related problems on fine-grain mesh connected computers such as meshes with multiple broadcasting and reconfigurable meshes, and also on coarse-grain multicomputers. Visibility-related problems studied in this thesis can be broadly classified into Object Visibility and Triangulation problems. To demonstrate the practical relevance of these algorithms, two of the fundamental template algorithms identified as powerful tools in almost every algorithm designed in this work were implemented on an IBM-SP2. The code was developed in the C language, using MPI, and can easily be ported to many commercially available parallel computers

    High-performance computing for vision

    Get PDF
    Vision is a challenging application for high-performance computing (HPC). Many vision tasks have stringent latency and throughput requirements. Further, the vision process has a heterogeneous computational profile. Low-level vision consists of structured computations, with regular data dependencies. The subsequent, higher level operations consist of symbolic computations with irregular data dependencies. Over the years, many approaches to high-speed vision have been pursued. VLSI hardware solutions such as ASIC's and digital signal processors (DSP's) have provided good processing speeds on structured low-level vision tasks. Special purpose systems for vision have also been designed. Currently, there is growing interest in using general purpose parallel systems for vision problems. These systems offer advantages of higher performance, sofavare programmability, generality, and architectural flexibility over the earlier approaches. The choice of low-cost commercial-off-theshelf (COTS) components as building blocks for these systems leads to easy upgradability and increased system life. The main focus of the paper is on effectively using the COTSbased general purpose parallel computing platforms to realize high-speed implementations of vision tasks. Due to the successful use of the COTS-based systems in a variety of high performance applications, it is attractive to consider their use for vision applications as well. However, the irregular data dependencies in vision tasks lead to large communication overheads in the HPC systems. At the University of Southern California, our research efforts have been directed toward designing scalable parallel algorithms for vision tasks on the HPC systems. In our approach, we use the message passing programming model to develop portable code. Our algorithms are specified using C and MPI. In this paper, we summarize our efforts, and illustrate our approach using several example vision tasks. To facilitate the analysis and development of scalable algorithms, a realistic computational model of the parallel system must be used. Several such models have been proposed in the literature. We use the General-purpose Distributed Memory (GDM) model which is a simple but realistic model of state-of-theart parallel machines. Using the GDM model, generic algorithmic techniques such as data remapping, overlapping of communication with computation, message packing, asynchronous execution, and communication scheduling are developed. Using these techniques, we have developed scalable algorithms for many vision tasks. For instance, a scalable algorithm for linear approximation has been developed using the asynchronous execution technique. Using this algorithm, linear feature extraction can be performed in 0.065 s on a 64 node SP-2 for a 512 × 512 image. A serial implementation takes 3.45 s for the same task. Similarly, the communication scheduling and decomposition techniques lead to a scalable algorithm for the line grouping task. We believe that such an algorithmic approach can result in the development of scalable and portable solutions for vision tasks. © 1996 IEEE Publisher Item Identifier S 0018-9219(96)04992-4.published_or_final_versio
    • …
    corecore