81,338 research outputs found

    Performance evaluation of multi-core multi-cluster architecture

    No full text
    A multi-core cluster is a cluster composed of numbers of nodes where each node has a number of processors, each with more than one core within each single chip. Cluster nodes are connected via an interconnection network. Multi-cored processors are able to achieve higher performance without driving up power consumption and heat, which is the main concern in a single-core processor. A general problem in the network arises from the fact that multiple messages can be in transit at the same time on the same network links. This paper considers the communication latencies of a multi-core multi-cluster architecture will be investigated using simulation experiments and measurements under various working conditions

    Low power pipelined FFT processor architecture on FPGA

    Get PDF
    Fast Fourier Transform (FFT) processor is the hardware implementation for FFT algorithms for Discrete Fourier Transform (DFT) which compute any signal in time domain to frequency domain. This processor plays an important role in many applications such as digital video broadcasting, wireless sensor network and many more digital signal processing applications, which requires a small area and low power processor. Pipelined FFT processor design on FPGA will speed up the design process and flexibility. This paper provides a survey of three types of pipelined FFT architecture, radix-8, radix-4 single path feedback (R4SDF) and radix-4 single-pasth delay commutator implemented on FPGA. The simulation part is done via Modelsim and verification through Matlab. While the implementation is done via Quartus on the Altera Cyclone IV FPGA board. The performance of these FFT processor is studied. The result shows that radix-8 pipelined FFT have higher power dissipation compared to R4SDF and R4SDC, however R4SDC design has low area design compared to the rest. Overall, all pipelined FFT processor designs are functioning accordingly

    Design of Asynchronous Processor

    Get PDF
    There has been a resurgence of interest in asynchronous design recently. The renewed interest in asynchronous design results from its potential to address the problem faced by the synchronous design methodology. In asynchronous methodology, there is no global clock controlling the synchronization of a circuit; instead, the data communication between each functional unit is completed through local request-acknowledge handshake protocol. The growth in demand of high performance portable systems has accelerated asynchronous logic design technique which can offers better performance and lower power consumption especially in the development of the asynchronous processor for mobile and portable application. In this thesis, the design and verification of an 8-bit asynchronous pipelined processor is presented. The developed asynchronous processor is based on Harvard architecture and uses Reduced Instruction Set Computer (RISC) instruction set architecture. 24 instructions are supported by the processor including register, memory, branch and jump operations. The processor has three-stage pipelining i.e. fetch, decode and execution pipeline. Micropipelines framework with 2-phase signalling protocol and bundled-data approach is employed in designing complex and powerful asynchronous control circuits for the processor. Very High Speed Integrated Circuit Hardware Description Language (VHDL) is used to design and construct all parts of the asynchronous processor. Simulation, synthesis and verification of the processor are carried out using MAX +PLUS II software. The simulation results have demonstrated that the developed 8-bit asynchronous RISC processor is working correctly using current Field Programmable Gate Array (FPGA) technology. This processor employed 903 logic cells and has 6144 memory bits for instruction and data memory. Each of the processor subsystem can operates at different cycle time, thus enable an asynchronous processor achieving 11.95MHz average speed performance

    Pico Processor Using Verilog HDL

    Get PDF
    © ASEE 2009The Pico processor is a scaled down RISC processor hence the name “Pico”. Pico processors form an integral part in a network. They act as co-processors to Network processors. The network processors are in-charge of various complex functions such as routing, packet switching, queuing, encryption, decryption, pattern matching, computation and other such tasks. Many Pico processors work in parallel with the network processor, which leads to reduced computing time and improved performance (speed). This in turn increases the processing power of the network processor. One of the main uses of the Pico processor is to take care of the computation part of the network processor. Our project aims to further improve the performance of the network processor by increasing the processing speed of the Pico processor. We can do this by altering the architecture of the current Pico processors to accommodate a five stage pipeline. By doing so, we can manage to increase the speed of execution of each instruction by up to five times. The five stages which we have incorporated in our architecture are Instruction Fetch, Instruction Decode, Execute, Memory I/O and Write Back. The Pico processor is designed and simulated with ModelSim 6.2c. The logic synthesis of the Pico processor is performed using Quartus II software. The simulation results demonstrate the correct functions of the designed Pico processor. Significant performance enhancement has been observed in the designed Pico processor

    A cross-stack, network-centric architectural design for next-generation datacenters

    Get PDF
    This thesis proposes a full-stack, cross-layer datacenter architecture based on in-network computing and near-memory processing paradigms. The proposed datacenter architecture is built atop two principles: (1) utilizing commodity, off-the-shelf hardware (i.e., processor, DRAM, and network devices) with minimal changes to their architecture, and (2) providing a standard interface to the programmers for using the novel hardware. More specifically, the proposed datacenter architecture enables a smart network adapter to collectively compress/decompress data exchange between distributed DNN training nodes and assist the operating system in performing aggressive processor power management. It also deploys specialized memory modules in the servers, capable of performing general-purpose computation and network connectivity. This thesis unlocks the potentials of hardware and operating system co-design in architecting application-transparent, near-data processing hardware for improving datacenter's performance, energy efficiency, and scalability. We evaluate the proposed datacenter architecture using a combination of full-system simulation, FPGA prototyping, and real-system experiments
    corecore