249,886 research outputs found
A low-power geometric mapping co-processor for high-speed graphics application
In this article we present a novel design of a low-power geometric mapping co-processor that can be used for high-performance graphics system. The processor can carry out any single or a combination of transformations belonging to affine transformation family ranging from 1-D to 3-D. It allows interactive operations which can be defined either by a user (allowing it to be a stand-alone geometric transformation processor) or by a host processor (allowing it to be a co-processor to accelerate certain graphics operations). It occupies a silicon area of 6 mm2 and consumes 40 mW power when synthesized with 0.25?m technology
MORA - an architecture and programming model for a resource efficient coarse grained reconfigurable processor
This paper presents an architecture and implementation details for MORA, a novel coarse grained reconfigurable processor for accelerating media processing applications. The MORA architecture involves a 2-D array of several such processors, to deliver low cost, high throughput performance in media processing applications. A distinguishing feature of the MORA architecture is the co-design of hardware architecture and low-level programming language throughout the design cycle. The implementation details for the single MORA processor, and benchmark evaluation using a cycle accurate simulator are presented
PGPG: An Automatic Generator of Pipeline Design for Programmable GRAPE Systems
We have developed PGPG (Pipeline Generator for Programmable GRAPE), a
software which generates the low-level design of the pipeline processor and
communication software for FPGA-based computing engines (FBCEs). An FBCE
typically consists of one or multiple FPGA (Field-Programmable Gate Array)
chips and local memory. Here, the term "Field-Programmable" means that one can
rewrite the logic implemented to the chip after the hardware is completed, and
therefore a single FBCE can be used for calculation of various functions, for
example pipeline processors for gravity, SPH interaction, or image processing.
The main problem with FBCEs is that the user need to develop the detailed
hardware design for the processor to be implemented to FPGA chips. In addition,
she or he has to write the control logic for the processor, communication and
data conversion library on the host processor, and application program which
uses the developed processor. These require detailed knowledge of hardware
design, a hardware description language such as VHDL, the operating system and
the application, and amount of human work is huge. A relatively simple design
would require 1 person-year or more. The PGPG software generates all necessary
design descriptions, except for the application software itself, from a
high-level design description of the pipeline processor in the PGPG language.
The PGPG language is a simple language, specialized to the description of
pipeline processors. Thus, the design of pipeline processor in PGPG language is
much easier than the traditional design. For real applications such as the
pipeline for gravitational interaction, the pipeline processor generated by
PGPG achieved the performance similar to that of hand-written code. In this
paper we present a detailed description of PGPG version 1.0.Comment: 24 pages, 6 figures, accepted PASJ 2005 July 2
Generic Pipelined Processor Modeling and High Performance Cycle-Accurate Simulator Generation
Detailed modeling of processors and high performance cycle-accurate
simulators are essential for today's hardware and software design. These
problems are challenging enough by themselves and have seen many previous
research efforts. Addressing both simultaneously is even more challenging, with
many existing approaches focusing on one over another. In this paper, we
propose the Reduced Colored Petri Net (RCPN) model that has two advantages:
first, it offers a very simple and intuitive way of modeling pipelined
processors; second, it can generate high performance cycle-accurate simulators.
RCPN benefits from all the useful features of Colored Petri Nets without
suffering from their exponential growth in complexity. RCPN processor models
are very intuitive since they are a mirror image of the processor pipeline
block diagram. Furthermore, in our experiments on the generated cycle-accurate
simulators for XScale and StrongArm processor models, we achieved an order of
magnitude (~15 times) speedup over the popular SimpleScalar ARM simulator.Comment: Submitted on behalf of EDAA (http://www.edaa.com/
Using fast and accurate simulation to explore hardware/software trade-offs in the multi-core era
Writing well-performing parallel programs is challenging in the multi-core processor era. In addition to achieving good per-thread performance, which in itself is a balancing act between instruction-level parallelism, pipeline effects and good memory performance, multi-threaded programs complicate matters even further. These programs require synchronization, and are affected by the interactions between threads through sharing of both processor resources and the cache hierarchy.
At the Intel Exascience Lab, we are developing an architectural simulator called Sniper for simulating future exascale-era multi-core processors. Its goal is twofold: Sniper should assist hardware designers to make design decisions, while simultaneously providing software designers with a tool to gain insight into the behavior of their algorithms and allow for optimization. By taking architectural features into account, our simulator can provide more insight into parallel programs than what can be obtained from existing performance analysis tools. This unique combination of hardware simulator and software performance analysis tool makes Sniper a useful tool for a simultaneous exploration of the hardware and software design space for future high-performance multi-core systems
Design of Asynchronous Processor
There has been a resurgence of interest in asynchronous design recently. The renewed interest in asynchronous design results from its potential to address the problem faced by the synchronous design methodology. In asynchronous
methodology, there is no global clock controlling the synchronization of a circuit; instead, the data communication between each functional unit is completed through local request-acknowledge handshake protocol. The growth in demand of high performance portable systems has accelerated asynchronous logic design technique which can offers better performance and lower power consumption especially in the development of the asynchronous processor for mobile and portable application. In this thesis, the design and verification of an 8-bit asynchronous pipelined
processor is presented. The developed asynchronous processor is based on Harvard architecture and uses Reduced Instruction Set Computer (RISC) instruction set architecture. 24 instructions are supported by the processor including register, memory, branch and jump operations. The processor has three-stage pipelining i.e.
fetch, decode and execution pipeline. Micropipelines framework with 2-phase signalling protocol and bundled-data approach is employed in designing complex and powerful asynchronous control circuits for the processor. Very High Speed Integrated Circuit Hardware Description Language (VHDL) is used to design and construct all parts of the asynchronous processor. Simulation, synthesis and verification of the processor are carried out using MAX +PLUS II software. The simulation results have demonstrated that the developed 8-bit asynchronous RISC processor is working correctly using current Field Programmable Gate Array (FPGA) technology. This processor employed 903 logic cells and has 6144 memory bits for instruction and data memory. Each of the processor subsystem can operates at different cycle time, thus enable an asynchronous processor achieving 11.95MHz average speed performance
- …