730 research outputs found
A case study for NoC based homogeneous MPSoC architectures
The many-core design paradigm requires flexible and modular hardware and software components to provide the required scalability to next-generation on-chip multiprocessor architectures. A multidisciplinary approach is necessary to consider all the interactions between the different components of the design. In this paper, a complete design methodology that tackles at once the aspects of system level modeling, hardware architecture, and programming model has been successfully used for the implementation of a multiprocessor network-on-chip (NoC)-based system, the NoCRay graphic accelerator. The design, based on 16 processors, after prototyping with field-programmable gate array (FPGA), has been laid out in 90-nm technology. Post-layout results show very low power, area, as well as 500 MHz of clock frequency. Results show that an array of small and simple processors outperform a single high-end general purpose processo
Cycle Accurate Energy and Throughput Estimation for Data Cache
Resource optimization in energy constrained real-time adaptive embedded systems highly depends on accurate energy and throughput estimates of processor peripherals. Such applications require lightweight, accurate mathematical models to profile energy and timing requirements on the go. This paper presents enhanced mathematical models for data cache energy and throughput estimation. The energy and throughput models were found to be within 95% accuracy of per instruction energy model of a processor, and a full system simulator?s timing model respectively. Furthermore, the possible application of these models in various scenarios is discussed in this paper
Network control for a multi-user transputer-based system.
A dissertation submitted to the Faculty of Engineering, University of the
Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of
Master of Science in EngineeringThe MC2/64 system is a configureable multi-user transputer- based system which was
designed using a modular approach. The MC2/64 consists of MC2 Clusters which are
connected using a modified Clos network. The MC2 Clusters were designed and
realised as completely configurable modules using and extending an algorithm based on
Eulerian cycles through a requested graph. This dissertation discusses the configuration
algorithm and the extensions made to the algorithm for the MC2 Clusters.
The total MC2/64 system is not completely configurable as a MC2 Cluster releases only
a limited number of links for inter-cluster connections. This dissertation analyses the
configurability of MC2/64, but also presents algorithms which enhance the usability of
the system from the user's point of view.
The design and the implementation of the network control software are also submitted
as topics in this dissertation. The network control software must allow multiple users to
use the system, but without them influencing each other's transputer domains.
This dissertation therefore seeks to give an overview of network control problems and
the solutions implemented in current MC2/64 systems. The results of the research
done for this dissertation will hopefully aid in the design of future MC2 systems which
will provide South Africa with much needed, low cost, high performance computing
power.Andrew Chakane 201
Performance and area evaluations of processor-based benchmarks on FPGA devices
The computing system on SoCs is being long-term research since the FPGA technology has emerged due to its personality of re-programmable fabric, reconfigurable computing, and fast development time to market. During the last decade, uni-processor in a SoC is no longer to deal with the high growing market for complex applications such as Mobile Phones audio and video encoding, image and network processing. Due to the number of transistors on a silicon wafer is increasing, the recent FPGAs or embedded systems are advancing toward multi-processor-based design to meet tremendous performance and benefit this kind of systems are possible. Therefore, is an upcoming age of the MPSoC. In addition, most of the embedded processors are soft-cores, because they are flexible and reconfigurable for specific software functions and easy to build homogenous multi-processor systems for parallel programming. Moreover, behavioural synthesis tools are becoming a lot more powerful and enable to create datapath of logic units from high-level algorithms such as C to HDL and available for partitioning a HW/SW concurrent methodology.
A range of embedded processors is able to implement on a FPGA-based prototyping to integrate the CPUs on a programmable device. This research is, firstly represent different types of computer architectures in modern embedded processors that are followed in different type of software applications (eg. Multi-threading Operations or Complex Functions) on FPGA-based SoCs; and secondly investigate their capability by executing a wide-range of multimedia software codes (Integer-algometric only) in different models of the processor-systems (uni-processor or multi-processor or Co-design), and finally compare those results in terms of the benchmarks and resource utilizations within FPGAs. All the examined programs were written in standard C and executed in a variety numbers of soft-core processors or hardware units to obtain the execution times. However, the number of processors and their customizable configuration or hardware datapath being generated are limited by a target FPGA resource, and designers need to understand the FPGA-based tradeoffs that have been considered - Speed versus Area.
For this experimental purpose, I defined benchmarks into DLP / HLS catalogues, which are "data" and "function" intensive respectively. The programs of DLP will be executed in LEON3 MP and LE1 CMP multi-processor systems and the programs of HLS in the LegUp Co-design system on target FPGAs. In preliminary, the performance of the soft-core processors will be examined by executing all the benchmarks. The whole story of this thesis work centres on the issue of the execute times or the speed-up and area breakdown on FPGA devices in terms of different programs
The use of field-programmable gate arrays for the hardware acceleration of design automation tasks
This paper investigates the possibility of using Field-Programmable Gate Arrays (Fr’GAS) as
reconfigurable co-processors for workstations to produce moderate speedups for most tasks
in the design process, resulting in a worthwhile overall design process speedup at low cost
and allowing algorithm upgrades with no hardware modification. The use of FPGAS as hardware
accelerators is reviewed and then achievable speedups are predicted for logic simulation
and VLSI design rule checking tasks for various FPGA co-processor arrangements
- …