Search CORE

1,817 research outputs found

Mapping and Scheduling of Directed Acyclic Graphs on An FPFA Tile

Author: Guo Y.
Smit G.J.M.
Publication venue: STW Technology Foundation
Publication date: 01/01/2002
Field of study

An architecture for a hand-held multimedia device requires components that are energy-efficient, flexible, and provide high performance. In the CHAMELEON [4] project we develop a coarse grained reconfigurable device for DSP-like algorithms, the so-called Field Programmable Function Array (FPFA). The FPFA devices are reminiscent to FPGAs, but with a matrix of Processing Parts (PP) instead of CLBs. The design of the FPFA focuses on: (1) Keeping each PP small to maximize the number of PPs that can fit on a chip; (2) providing sufficient flexibility; (3) Low energy consumption; (4) Exploiting the maximum amount of parallelism; (5) A strong support tool for FPFA-based applications. The challenge in providing compiler support for the FPFA-based design stems from the flexibility of the FPFA structure. If we do not use the characteristics of the FPFA structure properly, the advantages of an FPFA may become its disadvantages. The GECKO1project focuses on this problem. In this paper, we present a mapping and scheduling scheme for applications running on one FPFA tile. Applications are written in C and C code is translated to a Directed Acyclic Graphs (DAG) [4]. This scheme can map a DAG directly onto the reconfigurable PPs of an FPFA tile. It tries to achieve low power consumption by exploiting locality of reference and high performance by exploiting maximum parallelism

University of Twente Research Information

Reconfigurable interconnects in DSM systems: a focus on context switch behavior

Author: Artundo I
Dambre Joni
Debaes C
Heirman Wim
Manjarres D
Thienpont H
Van Campenhout Jan
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2006
Field of study

Recent advances in the development of reconfigurable optical interconnect technologies allow for the fabrication of low cost and run-time adaptable interconnects in large distributed shared-memory (DSM) multiprocessor machines. This can allow the use of adaptable interconnection networks that alleviate the huge bottleneck present due to the gap between the processing speed and the memory access time over the network. In this paper we have studied the scheduling of tasks by the kernel of the operating system (OS) and its influence on communication between the processing nodes of the system, focusing on the traffic generated just after a context switch. We aim to use these results as a basis to propose a potential reconfiguration of the network that could provide a significant speedup

Crossref

Ghent University Academic Bibliography

Cycle Accurate Energy and Throughput Estimation for Data Cache

Author: McDonald-Maier KD
Qadri M
Publication venue: 'TECSI'
Publication date: 01/01/2009
Field of study

Resource optimization in energy constrained real-time adaptive embedded systems highly depends on accurate energy and throughput estimates of processor peripherals. Such applications require lightweight, accurate mathematical models to profile energy and timing requirements on the go. This paper presents enhanced mathematical models for data cache energy and throughput estimation. The energy and throughput models were found to be within 95% accuracy of per instruction energy model of a processor, and a full system simulator?s timing model respectively. Furthermore, the possible application of these models in various scenarios is discussed in this paper

University of Essex Research Repository

Design and resource management of reconfigurable multiprocessors for data-parallel applications

Author: Wang Xiaofang
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/2006
Field of study

FPGA (Field-Programmable Gate Array)-based custom reconfigurable computing machines have established themselves as low-cost and low-risk alternatives to ASIC (Application-Specific Integrated Circuit) implementations and general-purpose microprocessors in accelerating a wide range of computation-intensive applications. Most often they are Application Specific Programmable Circuiits (ASPCs), which are developer programmable instead of user programmable. The major disadvantages of ASPCs are minimal programmability, and significant time and energy overheads caused by required hardware reconfiguration when the problem size outnumbers the available reconfigurable resources; these problems are expected to become more serious with increases in the FPGA chip size. On the other hand, dominant high-performance computing systems, such as PC clusters and SMPs (Symmetric Multiprocessors), suffer from high communication latencies and/or scalability problems. This research introduces low-cost, user-programmable and reconfigurable MultiProcessor-on-a-Programmable-Chip (MPoPC) systems for high-performance, low-cost computing. It also proposes a relevant resource management framework that deals with performance, power consumption and energy issues. These semi-customized systems reduce significantly runtime device reconfiguration by employing userprogrammable processing elements that are reusable for different tasks in large, complex applications. For the sake of illustration, two different types of MPoPCs with hardware FPUs (floating-point units) are designed and implemented for credible performance evaluation and modeling: the coarse-grain MIMD (Multiple-Instruction, Multiple-Data) CG-MPoPC machine based on a processor IP (Intellectual Property) core and the mixed-mode (MIMD, SIMD or M-SIMD) variant-grain HERA (HEterogeneous Reconfigurable Architecture) machine. In addition to alleviating the above difficulties, MPoPCs can offer several performance and energy advantages to our data-parallel applications when compared to ASPCs; they are simpler and more scalable, and have less verification time and cost. Various common computation-intensive benchmark algorithms, such as matrix-matrix multiplication (MMM) and LU factorization, are studied and their parallel solutions are shown for the two MPoPCs. The performance is evaluated with large sparse real-world matrices primarily from power engineering. We expect even further performance gains on MPoPCs in the near future by employing ever improving FPGAs. The innovative nature of this work has the potential to guide research in this arising field of high-performance, low-cost reconfigurable computing. The largest advantage of reconfigurable logic lies in its large degree of hardware customization and reconfiguration which allows reusing the resources to match the computation and communication needs of applications. Therefore, a major effort in the presented design methodology for mixed-mode MPoPCs, like HERA, is devoted to effective resource management. A two-phase approach is applied. A mixed-mode weighted Task Flow Graph (w-TFG) is first constructed for any given application, where tasks are classified according to their most appropriate computing mode (e.g., SIMD or MIMD). At compile time, an architecture is customized and synthesized for the TFG using an Integer Linear Programming (ILP) formulation and a parameterized hardware component library. Various run-time scheduling schemes with different performanceenergy objectives are proposed. A system-level energy model for HERA, which is based on low-level implementation data and run-time statistics, is proposed to guide performance-energy trade-off decisions. A parallel power flow analysis technique based on Newton\u27s method is proposed and employed to verify the methodology

Digital Commons @ New Jersey Institute of Technology (NJIT)

Real-time optical micro-manipulation using optimized holograms generated on the GPU

Author: Ahrenberg
Ashkin
Cheong
Curtis
Di Leonardo
Dufresne
Egri
Elsen
Grier
Leach
Lesem
Liesener
Liu
Masuda
Pleguezuelos
Preece
Preis
R. Di Leonardo
Reicherter
Reicherter
S. Bianchi
Shimobaba
Visscher
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

Holographic optical tweezers allow the three dimensional, dynamic, multipoint manipulation of micron sized dielectric objects. Exploiting the massive parallel architecture of modern GPUs we can generate highly optimized holograms at video frame rate allowing the interactive micro-manipulation of 3D structures.Comment: 13 pages, 8 figure

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Synchronization and fault-masking in redundant real-time systems

Author: Butler R. W.
Krishna C. M.
Shin K. G.
Publication venue
Publication date
Field of study

A real time computer may fail because of massive component failures or not responding quickly enough to satisfy real time requirements. An increase in redundancy - a conventional means of improving reliability - can improve the former but can - in some cases - degrade the latter considerably due to the overhead associated with redundancy management, namely the time delay resulting from synchronization and voting/interactive consistency techniques. The implications of synchronization and voting/interactive consistency algorithms in N-modular clusters on reliability are considered. All these studies were carried out in the context of real time applications. As a demonstrative example, we have analyzed results from experiments conducted at the NASA Airlab on the Software Implemented Fault Tolerance (SIFT) computer. This analysis has indeed indicated that in most real time applications, it is better to employ hardware synchronization instead of software synchronization and not allow reconfiguration

NASA Technical Reports Server

Computation Enhancement using Reconfigurable Computing

Author: Salim Gumaa Ayman Hassan
Publication venue: 'Whiting & Birch, Ltd.'
Publication date: 01/05/2015
Field of study

In light of the industry’s constant need for better computer performance, this project aims to choose and evaluate an approach for facing this issue. The targeted category of computers is single board computers (e.g. Raspberry Pi). The approach utilized for enhancing performance is the use of reconfigurable computing as to execute computationally expensive calculations on a runtime custom-tailored hardware. The objective of this project is the test of the potential this approach has for increasing computers performance through comparing a software implementation of an algorithm with an FPGA assisted implementation of the same algorithm. The platforms chosen for this project are the Rapsberry Pi and the Parallella P1602 board with its Zynq SoC for the software implementation and the FPGA assisted implementation in that order. The chosen algorithm is Fourier Fast Transform due to its part in many DSP applications and its suitability for the project objective. While the software solution worked successfully resulting in an asymptotic cost of O(N log N); the reconfigurable computing solution couldn’t be completed due to time constraints and lack of experience of the student. Future work should complete the experiment and add a multicore implementation of the same algorithm to add yet another class to the comparison

UTPedia

Characterization of real-time computers

Author: Krishna C. M.
Shin K. G.
Publication venue
Publication date
Field of study

A real-time system consists of a computer controller and controlled processes. Despite the synergistic relationship between these two components, they have been traditionally designed and analyzed independently of and separately from each other; namely, computer controllers by computer scientists/engineers and controlled processes by control scientists. As a remedy for this problem, in this report real-time computers are characterized by performance measures based on computer controller response time that are: (1) congruent to the real-time applications, (2) able to offer an objective comparison of rival computer systems, and (3) experimentally measurable/determinable. These measures, unlike others, provide the real-time computer controller with a natural link to controlled processes. In order to demonstrate their utility and power, these measures are first determined for example controlled processes on the basis of control performance functionals. They are then used for two important real-time multiprocessor design applications - the number-power tradeoff and fault-masking and synchronization

NASA Technical Reports Server

Self-Partial and Dynamic Reconfiguration Implementation for AES using FPGA

Author: Alaoui Ismaili Zine El Abidine
Moussa Ahmed
Publication venue: International Journal of Computer Science Issues, IJCSI
Publication date: 01/08/2009
Field of study

This paper addresses efficient hardware/software implementation approaches for the AES (Advanced Encryption Standard) algorithm and describes the design and performance testing algorithm for embedded system. Also, with the spread of reconfigurable hardware such as FPGAs (Field Programmable Gate Array) embedded cryptographic hardware became cost-effective. Nevertheless, it is worthy to note that nowadays, even hardwired cryptographic algorithms are not so safe. From another side, the self-reconfiguring platform is reported that enables an FPGA to dynamically reconfigure itself under the control of an embedded microprocessor. Hardware acceleration significantly increases the performance of embedded systems built on programmable logic. Allowing a FPGA-based MicroBlaze processor to self-select the coprocessors uses can help reduce area requirements and increase a system's versatility. The architecture proposed in this paper is an optimal hardware implementation algorithm and takes dynamic partially reconfigurable of FPGA. This implementation is good solution to preserve confidentiality and accessibility to the information in the numeric communication

arXiv.org e-Print Archive

CogPrints Cognitive Sciences Eprint Archive