75,637 research outputs found

    Algorithms for Mapping Parallel Processes onto Grid and Torus Architectures

    Full text link
    Static mapping is the assignment of parallel processes to the processing elements (PEs) of a parallel system, where the assignment does not change during the application's lifetime. In our scenario we model an application's computations and their dependencies by an application graph. This graph is first partitioned into (nearly) equally sized blocks. These blocks need to communicate at block boundaries. To assign the processes to PEs, our goal is to compute a communication-efficient bijective mapping between the blocks and the PEs. This approach of partitioning followed by bijective mapping has many degrees of freedom. Thus, users and developers of parallel applications need to know more about which choices work for which application graphs and which parallel architectures. To this end, we not only develop new mapping algorithms (derived from known greedy methods). We also perform extensive experiments involving different classes of application graphs (meshes and complex networks), architectures of parallel computers (grids and tori), as well as different partitioners and mapping algorithms. Surprisingly, the quality of the partitions, unless very poor, has little influence on the quality of the mapping. More importantly, one of our new mapping algorithms always yields the best results in terms of the quality measure maximum congestion when the application graphs are complex networks. In case of meshes as application graphs, this mapping algorithm always leads in terms of maximum congestion AND maximum dilation, another common quality measure.Comment: Accepted at PDP-201

    Homogeneous and Heterogeneous Parallel Architectures in Real-Time Signal Processing and Control

    Get PDF
    This paper presents an investigation into the real time performance of homogeneous and heterogeneous parallel architectures in signal processing and control applications. Several algorithms of regular and irregular nature are considered. These are implemented on a number of uni- processor and multi-processor parallel architectures. Hardware and software resources, capabilities of the architectures and characteristics of the algorithms are considered for suitable matching between the algorithms and the architectures. The partitioning and mapping of the algorithms on the architectures and multi-processor communication techniques are investigated. Finally, a comparison of the results of implementations is made to establish merits of design and development of parallel architectures for real-time signal processing and control applications

    Multilevel Algebraic Approach for Performance Analysis of Parallel Algorithms

    Get PDF
    In order to solve a problem in parallel we need to undertake the fundamental step of splitting the computational tasks into parts, i.e. decomposing the problem solving. A whatever decomposition does not necessarily lead to a parallel algorithm with the highest performance. This topic is even more important when complex parallel algorithms must be developed for hybrid or heterogeneous architectures. We present an innovative approach which starts from a problem decomposition into parts (sub-problems). These parts will be regarded as elements of an algebraic structure and will be related to each other according to a suitably defined dependency relationship. The main outcome of such framework is to define a set of block matrices (dependency, decomposition, memory accesses and execution) which simply highlight fundamental characteristics of the corresponding algorithm, such as inherent parallelism and sources of overheads. We provide a mathematical formulation of this approach, and we perform a feasibility analysis for the performance of a parallel algorithm in terms of its time complexity and scalability. We compare our results with standard expressions of speed up, efficiency, overhead, and so on. Finally, we show how the multilevel structure of this framework eases the choice of the abstraction level (both for the problem decomposition and for the algorithm description) in order to determine the granularity of the tasks within the performance analysis. This feature is helpful to better understand the mapping of parallel algorithms on novel hybrid and heterogeneous architectures

    The Riemannian Geometry of Deep Generative Models

    Full text link
    Deep generative models learn a mapping from a low dimensional latent space to a high-dimensional data space. Under certain regularity conditions, these models parameterize nonlinear manifolds in the data space. In this paper, we investigate the Riemannian geometry of these generated manifolds. First, we develop efficient algorithms for computing geodesic curves, which provide an intrinsic notion of distance between points on the manifold. Second, we develop an algorithm for parallel translation of a tangent vector along a path on the manifold. We show how parallel translation can be used to generate analogies, i.e., to transport a change in one data point into a semantically similar change of another data point. Our experiments on real image data show that the manifolds learned by deep generative models, while nonlinear, are surprisingly close to zero curvature. The practical implication is that linear paths in the latent space closely approximate geodesics on the generated manifold. However, further investigation into this phenomenon is warranted, to identify if there are other architectures or datasets where curvature plays a more prominent role. We believe that exploring the Riemannian geometry of deep generative models, using the tools developed in this paper, will be an important step in understanding the high-dimensional, nonlinear spaces these models learn.Comment: 9 page

    The Honeycomb Architecture: Prototype Analysis and Design

    Get PDF
    Due to the inherent potential of parallel processing, a lot of attention has focused on massively parallel computer architecture. To a large extent, the performance of a massively parallel architecture is a function of the flexibility of its communication network. The ability to configure the topology of the machine determines the ease with which problems are mapped onto the architecture. If the machine is sufficiently flexible, the architecture can be configured to match the natural structure of a wide range of problems. There are essentially four unique types of massively parallel architectures: 1. Cellular Arrays 2. Lattice Architectures [21, 30] 3. Connection Architectures [19] 4. Honeycomb Architectures [24] All four architectures are classified as SIMD. Each, however, offers a slightly different solution to the mapping problem. The first three approaches are characterized by easily distinguishable processor, communication, and memory components. In contrast, the Honeycomb architecture contains multipurpose processing/communication/memory cells. Each cell can function as either a simple CPU, a memory cell, or an element of a communication bus. The conventional approach to massive parallelism is the cellular array. It typically consists of an array of processing elements arranged in a mesh pattern with hard wired connections between neighboring processors. Due to their fixed topology, cellular arrays impose severe limitations upon interprocessor communication. The lattice architecture is a somewhat more flexible approach to massive parallelism. It consists of a lattice of processing elements embedded in an array of simple switching elements. The switching elements form a programmable interconnection network. A lattice architecture can be configured in a number of different topologies, but it is still only a partial solution to the mapping problem. The connection architecture offers a comprehensive solution to the mapping problem. It consists of a cellular array integrated into a packet-switched communication network. The network provides transparent communication between all processing elements. Note that the communication network is physically abstracted from the processor array, allowing the processors to evolve independently of the network. The Honeycomb architecture offers a unique solution to the mapping problem. It consists of an array of identical processing/communication/memory cells. Each cell can function as either a processor cell, a communication cell, or a memory cell. Collections of Honeycomb cells can be grouped into multicell CPUs, multi-cell memories, or multi-cell CPU-memory systems. Multi-cell CPU-memory systems are hereafter referred to as processing clusters. The topology of the Honeycomb is determined at compilation time. During a preprocessing phase, the Honeycomb is adjusted to the desired topology. The Honeycomb cell is extremely simple, capable of only simple arithmetic and logic operations. The simplicity of the Honeycomb cell is the key to the Honeycomb concept. As indicated in [24], there are two main research avenues to pursue in furthering the Honeycomb concept: 1. Analyzing the design of a uniform Honeycomb cell 2. Mapping algorithms onto the Honeycomb architecture This technical report concentrates on the first issue. While alluded to throughout the report, the second issue is not addressed in any detail

    Heterogeneous and Homogeneous Parallel Architectures for Real-Time Adaptive Active Vibration Control

    Get PDF
    This paper presents an investigation into parallel processing techniques for real-time adaptive control of a flexible beam structure. Three different algorithms, namely simulation, control and identification are involved in the adaptive control algorithm. These are implemented on a number of computing platforms including a homogeneous network of transputer nodes, a homogeneous network of digital signal processing (DSP) devices, heterogeneous architectures involving transputers, reduced instruction set computer superscalar processor and DSP device, single DSP devices and transputer nodes and several general purpose sequential processors. The partitioning and mapping of the algorithms on the homogeneous and heterogeneous architectures is also explored. The inter-processor communication speed is investigated to establish the real-time performance aspects of the processors on the basis of the nature of the algorithms involved. A close investigation into the performance of several compilers is made and discussed within the context of real-time implementations. Finally, a comparison of the results of the implementations is made, on the basis of real-time communications performance, computation performance and complier performance, to lead to merits of design of parallel systems incorporating fast processing techniques for real-time control applications

    A survey of parallel algorithms for fractal image compression

    Get PDF
    This paper presents a short survey of the key research work that has been undertaken in the application of parallel algorithms for Fractal image compression. The interest in fractal image compression techniques stems from their ability to achieve high compression ratios whilst maintaining a very high quality in the reconstructed image. The main drawback of this compression method is the very high computational cost that is associated with the encoding phase. Consequently, there has been significant interest in exploiting parallel computing architectures in order to speed up this phase, whilst still maintaining the advantageous features of the approach. This paper presents a brief introduction to fractal image compression, including the iterated function system theory upon which it is based, and then reviews the different techniques that have been, and can be, applied in order to parallelize the compression algorithm
    • …
    corecore