Search CORE

386 research outputs found

Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

Author: Gujarathi Hemal S
McDonald-Maier Klaus D
Qadri Muhammad Yasir
Publication venue: 'Academy Publisher'
Publication date: 01/01/2009
Field of study

The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

University of Essex Research Repository

CiteSeerX

Crossref

A constant time parallel algorithm for the triangularization of a sparse matrix using CD-PARBS

Author: Chaudhari N. S.
Fehr Elfriede
Wankar Rajeev
Publication venue
Publication date: 01/01/2000
Field of study

An algorithm for the triangularization of a matrix whose graph is a directed acyclic graph, popularly known as dag, is presented. One of the algorithms for obtaining this special form has been given by Sargent and Westerberg. Their approach is practically good but sequential in nature and cannot be parallelised easily. In this work we present a parallel algorithm which is based on the observation that, if we find the transitive closure matrix of a directed acyclic graph, count the number of entries in each row, sort them in the ascending order of their values and rank them accordingly, we get a lower triangular matrix. We show that all these operations can be done using 3-d CD- PARBS(Complete Directed PARBS) in constant time. The same approach can be used for the block cases, producing the same relabelling as produced by Tarjan’s algorithm, in constant time. To the best of our knowledge, it is the first approach to solve such problems using directed PARBS

Institutional Repository of the Freie Universität Berlin

Visibility-Related Problems on Parallel Computational Models

Author: Gurla Himabindu
Publication venue: ODU Digital Commons
Publication date: 01/04/1996
Field of study

Visibility-related problems find applications in seemingly unrelated and diverse fields such as computer graphics, scene analysis, robotics and VLSI design. While there are common threads running through these problems, most existing solutions do not exploit these commonalities. With this in mind, this thesis identifies these common threads and provides a unified approach to solve these problems and develops solutions that can be viewed as template algorithms for an abstract computational model. A template algorithm provides an architecture independent solution for a problem, from which solutions can be generated for diverse computational models. In particular, the template algorithms presented in this work lead to optimal solutions to various visibility-related problems on fine-grain mesh connected computers such as meshes with multiple broadcasting and reconfigurable meshes, and also on coarse-grain multicomputers. Visibility-related problems studied in this thesis can be broadly classified into Object Visibility and Triangulation problems. To demonstrate the practical relevance of these algorithms, two of the fundamental template algorithms identified as powerful tools in almost every algorithm designed in this work were implemented on an IBM-SP2. The code was developed in the C language, using MPI, and can easily be ported to many commercially available parallel computers

Old Dominion University

Efficient parallel processing with optical interconnections

Author: Hai Lili
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/1997
Field of study

With the advances in VLSI technology, it is now possible to build chips which can each contain thousands of processors. The efficiency of such chips in executing parallel algorithms heavily depends on the interconnection topology of the processors. It is not possible to build a fully interconnected network of processors with constant fan-in/fan-out using electrical interconnections. Free space optics is a remedy to this limitation. Qualities exclusive to the optical medium are its ability to be directed for propagation in free space and the property that optical channels can cross in space without any interference. In this thesis, we present an electro-optical interconnected architecture named Optical Reconfigurable Mesh (ORM). It is based on an existing optical model of computation. There are two layers in the architecture. The processing layer is a reconfigurable mesh and the deflecting layer contains optical devices to deflect light beams. ORM provides three types of communication mechanisms. The first is for arbitrary planar connections among sets of locally connected processors using the reconfigurable mesh. The second is for arbitrary connections among N of the processors using the electrical buses on the processing layer and N2 fixed passive deflecting units on the deflection layer. The third is for arbitrary connections among any of the N2 processors using the N2 mechanically reconfigurable deflectors in the deflection layer. The third type of communication mechanisms is significantly slower than the other two. Therefore, it is desirable to avoid reconfiguring this type of communication during the execution of the algorithms. Instead, the optical reconfiguration can be done before the execution of each algorithm begins. Determining a right configuration that would be suitable for the entire configuration of a task execution is studied in this thesis. The basic data movements for each of the mechanisms are studied. Finally, to show the power of ORM, we use all three types of communication mechanisms in the first O(logN) time algorithm for finding the convex hulls of all figures in an N x N binary image presented in this thesis

Digital Commons @ New Jersey Institute of Technology (NJIT)

Design and Analysis of Optical Interconnection Networks for Parallel Computation.

Author: Li Yueming
Publication venue: LSU Digital Commons
Publication date: 01/01/1997
Field of study

In this doctoral research, we propose several novel protocols and topologies for the interconnection of massively parallel processors. These new technologies achieve considerable improvements in system performance and structure simplicity. Currently, synchronous protocols are used in optical TDM buses. The major disadvantage of a synchronous protocol is the waste of packet slots. To offset this inherent drawback of synchronous TDM, a pipelined asynchronous TDM optical bus is proposed. The simulation results show that the performance of the proposed bus is significantly better than that of known pipelined synchronous TDM optical buses. Practically, the computation power of the plain TDM protocol is limited. Various extensions must be added to the system. In this research, a new pipelined optical TDM bus for implementing a linear array parallel computer architecture is proposed. The switches on the receiving segment of the bus can be dynamically controlled, which make the system highly reconfigurable. To build large and scalable systems, we need new network architectures that are suitable for optical interconnections. A new kind of reconfigurable bus called segmented bus is introduced to achieve reduced structure simplicity and increased concurrency. We show that parallel architectures based on segmented buses are versatile by showing that it can simulate parallel communication patterns supported by a wide variety of networks with small slowdown factors. New kinds of interconnection networks, the hypernetworks, have been proposed recently. Compared with point-to-point networks, they allow for increased resource-sharing and communication bandwidth utilization, and they are especially suitable for optical interconnects. One way to derive a hypernetwork is by finding the dual of a point-to-point network. Hypercube Q\sb{n}, where n is the dimension, is a very popular point-to-point network. It is interesting to construct hypernetworks from the dual Q\sbsp{n}{*} of hypercube of Q\sb{n}. In this research, the properties of Q\sbsp{n}{*} are investigated and a set of fundamental data communication algorithms for Q\sbsp{n}{*} are presented. The results indicate that the Q\sbsp{n}{*} hypernetwork is a useful and promising interconnection structure for high-performance parallel and distributed computing systems

Louisiana State University

Center for Aeronautics and Space Information Sciences

Author: Flynn Michael J.
Publication venue
Publication date
Field of study

This report summarizes the research done during 1991/92 under the Center for Aeronautics and Space Information Science (CASIS) program. The topics covered are computer architecture, networking, and neural nets

NASA Technical Reports Server

Reconfigurable acceleration of genetic sequence alignment: A survey of two decades of efforts

Author: abelsson
arram
buhler
burrows
court
cret
draghicescu
ferragina
jacobi
li
li
lin
preu?er
preu?er
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/09/2017
Field of study

Genetic sequence alignment has always been a computational challenge in bioinformatics. Depending on the problem size, software-based aligners can take multiple CPU-days to process the sequence data, creating a bottleneck point in bioinformatic analysis flow. Reconfigurable accelerator can achieve high performance for such computation by providing massive parallelism, but at the expense of programming flexibility and thus has not been commensurately used by practitioners. Therefore, this paper aims to provide a thorough survey of the proposed accelerators by giving a qualitative categorization based on their algorithms and speedup. A comprehensive comparison between work is also presented so as to guide selection for biologist, and to provide insight on future research direction for FPGA scientists

Crossref

Spiral - Imperial College Digital Repository

An improved generalization of mesh-connected computers with multiple buses

Author: Li K.
Pan Y.
Shen H.
Zheng S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

©2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.Mesh-connected computers (MCCs) are a class of important parallel architectures due to their simple and regular interconnections. However, their performances are restricted by their large diameters. Various augmenting mechanisms have been proposed to enhance the communication efficiency of MCCs. One major approach is to add nonconfigurable buses for improved broadcasting. A typical example is the mesh-connected computer with multiple buses (MMB). We propose a new class of generalized MMBs, the improved generalized MMBs (IMMBs). We compare IMMBs with MMBs and a class of previously proposed generalized MMBs (GMMBs). We show the power of IMMBs by considering semigroup and prefix computations. Specifically, as our main result we show that for any constant 0<ϵ<1, one can construct an N½×N½ square IMMB using which semigroup and prefix computations on N operands can be carried out in O(Nϵ) time, while maintaining O(1) broadcasting time. Compared with the previous best complexities O(N&frac18;) and O(N&frac116;) achieved on a rectangular MMB and GMMB, respectively, for the same computations, our results show that IMMBs are more powerful than MMBs and GMMBsYi Pen; Zheng, S.Q.; Keqin Li; Hong She

Crossref

Adelaide Research & Scholarship

Adaptive AT2 optimal algorithms on reconfigurable meshes

Author: Murshed M. Manzur
Publication venue: Canberra : The Australian National University, Dept. of Computer Science, Faculty of Engineering and Information Technology,
Publication date: 25/05/2022
Field of study

The Australian National University