2,904 research outputs found

    A case study for NoC based homogeneous MPSoC architectures

    Get PDF
    The many-core design paradigm requires flexible and modular hardware and software components to provide the required scalability to next-generation on-chip multiprocessor architectures. A multidisciplinary approach is necessary to consider all the interactions between the different components of the design. In this paper, a complete design methodology that tackles at once the aspects of system level modeling, hardware architecture, and programming model has been successfully used for the implementation of a multiprocessor network-on-chip (NoC)-based system, the NoCRay graphic accelerator. The design, based on 16 processors, after prototyping with field-programmable gate array (FPGA), has been laid out in 90-nm technology. Post-layout results show very low power, area, as well as 500 MHz of clock frequency. Results show that an array of small and simple processors outperform a single high-end general purpose processo

    Micro-threading and FPGA implementation of a RISC microprocessor : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University, Palmerston North, New Zealand

    Get PDF
    Appendix E removed due to copyright restrictions. Articles are available in the print copy held in the libraryThis thesis is the outcome of research in two areas of computer technology: microprocessor and multi-processor architectures (specifically from the perspective of how differently they tolerate highly-latent and non-deterministic events), and hardware design of complex digital systems containing both datapath and control (particularly microprocessors). This thesis starts by pointing out that in order to achieve high processing speeds, current popular superscalar microprocessors (e.g. Intel Pentiums, Digital Alpha, etc) rely heavily on the technique of speculating the outcome of instruction flow in order to predict the behaviour of non-deterministic computing operations (as in loading operands from high-latency memory into the processor). This is fine only if the speculation is correct. But, what if it isn't? If the speculation fails, this would mean that the processor has to abandon its current decision (which now proved to be the wrong one) for the instruction flow path taken and to start all over again with the other path (the actual correct one). This is a waste of valuable processing time and hardware resources and a reduction of performance when speculation fails. Therefore, these processors can achieve high performance only when the majority of speculations are successful (being able to predict the right path). In an attempt to overcome the above shortcomings, the first part of this thesis is an investigation of the novel vector micro-threading architecture as an alternative approach to the current superscalar-based speculative microprocessor designs. Micro-threading is based on the not-so-novel multithreading technique, which avoids speculation altogether and instead, starts running a different thread of instructions while waiting for the non-determinism to be resolved. This utilizes the chip resources more efficiently without waste of any processing power. The rest of this thesis focuses on the baseline RISC processor platform, the MIPS R2000, which is reviewed first then partially synthesized from the RTL (Register Transfer Level) description using VHDL and then simulated and tested. This is conducted in order for future research to build upon and add the micro-threading architectural add-ons and modifications. Keywords: Micro-threading, Latency Tolerance, FPGA Synthesis, RISC Architecture, MIPS R2000 processor, VHDL

    A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

    Full text link
    Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

    Interval simulation: raising the level of abstraction in architectural simulation

    Get PDF
    Detailed architectural simulators suffer from a long development cycle and extremely long evaluation times. This longstanding problem is further exacerbated in the multi-core processor era. Existing solutions address the simulation problem by either sampling the simulated instruction stream or by mapping the simulation models on FPGAs; these approaches achieve substantial simulation speedups while simulating performance in a cycle-accurate manner This paper proposes interval simulation which rakes a completely different approach: interval simulation raises the level of abstraction and replaces the core-level cycle-accurate simulation model by a mechanistic analytical model. The analytical model estimates core-level performance by analyzing intervals, or the timing between two miss events (branch mispredictions and TLB/cache misses); the miss events are determined through simulation of the memory hierarchy, cache coherence protocol, interconnection network and branch predictor By raising the level of abstraction, interval simulation reduces both development time and evaluation time. Our experimental results using the SPEC CPU2000 and PARSEC benchmark suites and the MS multi-core simulator show good accuracy up to eight cores (average error of 4.6% and max error of 11% for the multi-threaded full-system workloads), while achieving a one order of magnitude simulation speedup compared to cycle-accurate simulation. Moreover interval simulation is easy to implement: our implementation of the mechanistic analytical model incurs only one thousand lines of code. Its high accuracy, fast simulation speed and ease-of-use make interval simulation a useful complement to the architect's toolbox for exploring system-level and high-level micro-architecture trade-offs

    An Implementation of a Dual-Processor System on FPGA

    Get PDF
    In recent years, Field-Programmable Gate Arrays (FPGA) have evolved rapidly paving the way for a whole new range of computing paradigms. On the other hand, computer applications are evolving. There is a rising demand for a system that is general-purpose and yet has the processing abilities to accommodate current trends in application processing. This work proposes a design and implementation of a tightly-coupled FPGA-based dual-processor platform. We architect a platform that optimizes the utilization of FPGA resources and allows for the investigation of practical implementation issues such as cache design. The performance of the proposed prototype is then evaluated, as different configurations of a uniprocessor and a dual-processor system are studied and compared against each other and against published results for common industry-standard CPU platforms. The proposed implementation utilizes the Nios II 32-bit embedded soft-core processor architecture designed for the Altera Cyclone III family of FPGAs
    • …
    corecore