Search CORE

406 research outputs found

Recommended from our members

Timing models for high-level synthesis

Author: Chaiyakul Viraphol
Gajski Daniel D.
Wu Allen C.H.
Publication venue: eScholarship, University of California
Publication date: 31/10/1991
Field of study

In this paper, we describe a timing model for clock estimation during high-level synthesis. In order to obtain realistic timing estimates, the proposed model considers all delay elements, including datapath, control and wire delays, and several technology factors, such as layout architecture, technology mapping, buffers insertion and loading effects. The experimental results show that this model can provide much better estimates than previous models. This model is well suited for automatic and interactive synthesis as well as feedback-driven synthesis where performance matrices must be rapidly and incrementally calculated

eScholarship - University of California

Nanofabric Power Analysis: Biosequence Alignment Case of Study

Author: Amaru’ L.G.
Frache Stefano
Graziano Mariagrazia
Zamboni Maurizio
Publication venue: IEEE/ACM
Publication date: 01/01/2011
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Recommended from our members

Silicon compilation

Author: Dutt Nikil D.
Gajski Daniel D.
Pangrle Barry M.
Publication venue: eScholarship, University of California
Publication date: 01/01/1987
Field of study

Silicon compilation is a term used for many different purposes. In this paper we define silicon compilation as a mapping from some higher level description into layout. We define the basic issues in structural and behavioral silicon compilation and some possible solutions to those issues. Finally, we define the concept of an intelligent silicon compiler in which the compiler evaluates the quality of the generated design and attempts to improve it if it is not satisfactory

eScholarship - University of California

Interleaving in Systolic-Arrays: a Throughput Breakthrough

Author: G. Causapruno
M. Graziano
M. Vacca
M. Zamboni
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

In past years the most common way to improve computers performance was to increase the clock frequency. In recent years this approach suffered the limits of technology scaling, therefore computers architectures are shifting toward the direction of parallel computing to further improve circuits performance. Not only GPU based architectures are spreading in consideration, but also Systolic Arrays are particularly suited for certain classes of algorithms. An important point in favor of Systolic Arrays is that, due to the regularity of their circuit layout, they are appealing when applied to many emerging and very promising technologies, like Quantum-dot Cellular Automata and nanoarrays based on Silicon NanoWire or on Carbon nanotube Field Effect Transistors. In this work we present a systematic method to improve Systolic Arrays performance exploiting Pipelining and Input Data Interleaving. We tackle the problem from a theoretical point of view first, and then we apply it to both CMOS technology and emerging technologies. On CMOS we demonstrate that it is possible to vastly improve the overall throughput of the circuit. By applying this technique to emerging technologies we show that it is possible to overcome some of their limitations greatly improving the throughput, making a considerable step forward toward the post-CMOS era

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Optimized Surface Code Communication in Superconducting Quantum Computers

Author: Brown Kenneth R.
Chong Frederic T.
Franklin Diana
Gokhale Pranav
Holmes Adam
Javadi-Abhari Ali
Martonosi Margaret
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Quantum computing (QC) is at the cusp of a revolution. Machines with 100 quantum bits (qubits) are anticipated to be operational by 2020 [googlemachine,gambetta2015building], and several-hundred-qubit machines are around the corner. Machines of this scale have the capacity to demonstrate quantum supremacy, the tipping point where QC is faster than the fastest classical alternative for a particular problem. Because error correction techniques will be central to QC and will be the most expensive component of quantum computation, choosing the lowest-overhead error correction scheme is critical to overall QC success. This paper evaluates two established quantum error correction codes---planar and double-defect surface codes---using a set of compilation, scheduling and network simulation tools. In considering scalable methods for optimizing both codes, we do so in the context of a full microarchitectural and compiler analysis. Contrary to previous predictions, we find that the simpler planar codes are sometimes more favorable for implementation on superconducting quantum computers, especially under conditions of high communication congestion.Comment: 14 pages, 9 figures, The 50th Annual IEEE/ACM International Symposium on Microarchitectur

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Development Of An 8-Bit Fpga-Based Asynchronous Risc Pipelined Processor For Data Encryption

Author: Pang Wai Leong
Publication venue
Publication date: 01/09/2003
Field of study

Microprocessors are widely used in various applications. One of the application is in the area of data security where data are encrypted and decrypted before and after transfer via communication channel. The microprocessor design can be categorized into two types, which are synchronous and asynchronous processors. The asynchronous processor may offer better speed improvement because it is self-timed where a control circuit will generate enable signals for all instruction executions based on the request and acknowledgement signals. Unlike the asynchronous design, synchronous design requires global clock. The clock must be long enough to accommodate the worst-case delay. In this work, an 8-bit asynchronous processor is designed based on a synchronous RISC pipe lined processor architecture. The synchronous processor consists of three stages. They are instruction fetch stage, instruction decode stage and execution stage. The reduce instruction set computer (RISC) architecture is used to minimize the instruction and to perform specific operation. To design the asynchronous processor, an asynchronous control circuit is added to synchronous design. The asynchronous control circuit is designed based on handshake protocol. Both the synchronous and asynchronous designs are applied fully using VHDL. The MAX+PLUS II is used as the simulation tools to design and for design verification. The UP1 education board that contains the FLEX10K chip is used to observe the hardware operation. The asynchronous processor was successfully designed with higher million instructions per second (MIPS) and higher operation frequency as compared to synchronous processor. The asynchronous processor has 10.772 MIPS and operated under frequency of 11. 16MHz. The asynchronous processor design consumed 63% of the total logic cells in FLEX10K chip. The processor fits in FLEX10K and provides extra spaces for future expansion

Universiti Putra Malaysia Institutional Repository

The 1991 3rd NASA Symposium on VLSI Design

Author: Maki Gary K.
Publication venue
Publication date
Field of study

Papers from the symposium are presented from the following sessions: (1) featured presentations 1; (2) very large scale integration (VLSI) circuit design; (3) VLSI architecture 1; (4) featured presentations 2; (5) neural networks; (6) VLSI architectures 2; (7) featured presentations 3; (8) verification 1; (9) analog design; (10) verification 2; (11) design innovations 1; (12) asynchronous design; and (13) design innovations 2

NASA Technical Reports Server

Recommended from our members

MILO : a microarchitecture and logic optimizer

Author: Gajski Daniel
Zanden Nels Vander
Publication venue: eScholarship, University of California
Publication date: 30/01/1988
Field of study

In this report we discuss strengths and weaknesses of logic synthesis systems and describe a system for microarchitectural and logic optimization. Our system uses a set of algorithms for synthesizing SSI/MSI macros from parameterized microarchitecture components. In addition, it uses rules for optimizing both at the microarchitecture and logic level. The system increases designer productivity and requires less design knowledge and experience from circuit engineers

eScholarship - University of California

An Evolvable Combinational Unit for FPGAs

Author: Friedl Štěpán
Sekanina Lukáš
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 20/02/2012
Field of study

A complete hardware implementation of an evolvable combinational unit for FPGAs is presented. The proposed combinational unit consisting of a virtual reconfigurable circuit and evolutionary algorithm was described in VHDL independently of a target platform, i.e. as a soft IP core, and realized in the COMBO6 card. In many cases the unit is able to evolve (i.e. to design) the required function automatically and autonomously, in a few seconds, only on the basis of interactions with an environment. A number of circuits were successfully evolved directly in the FPGA, in particular, 3-bit multipliers, adders, multiplexers and parity encoders. The evolvable unit was also tested in a simulated dynamic environment and used to design various circuits specified by randomly generated truth tables

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Preliminary Report on High-Performance Computational Structures for Robot Control

Author: Meyer David G.
Rahman Mahibur
Publication venue: 'Purdue University (bepress)'
Publication date: 01/10/1987
Field of study

In this report we present some initial results of our work completed thus far on Computational Structures for Robot Control . A SIMD architecture with the crossbar interprocessor network which achieves the parallel processing execution time lower bound of o( [a1n ]), where a1 is a constant and n is the number of manipulator joints, for the computation of the inverse dynamics problem, is discussed. A novel SIMD task scheduling algorithm that optimizes the parallel processing performance on the indicated architecture is also delineated. Simulations performed on this architecture show speedup factor of 3.4 over previous related work completed for the evaluation of the specified problem, is achieved. Parallel processing of PUMA forward and inverse kinematics solutions is next investigated using a particular scheduling algorithm. In addition, a custom bit-serial array architecture is designed for the computation of the inverse dynamics problem within the bit-serial execution time lower bound of o(c1k + c2kn), where c1 and c2 are specified constants, k is the word length, and n is the number of manipulator joints. Finally, mapping of the Newton-Euler equations onto a fixed systolic array is investigated. A balanced architecture for the inverse dynamics problem which achieves the systolic execution time lower bound for the specified problem is depicted. Please note again that these results are only preliminary and improvements to our algorithms and architectures are currently still being made

Purdue E-Pubs