Search CORE

314 research outputs found

Design Approach to Implementation Of Arbitration Algorithm In Shared Bus Architectures (MPSoC)

Author: Amutha R.
Shanthi D.
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 01/10/2011
Field of study

The multiprocessor SoC designs have more than one processor and huge memory on the same chip. SoC consists of hardware cores and software cores ,multiple processors, embedded DRAM and connectors between cores .A wide range of MPSOC architectures have been developed over the past decade. This paper surveys the history of various On-Chip communication architectures present in the design of MPSoC. This acts as a primary factor of overall performance in complex SoC designs. Some of the various techniques that have driven the design of MpSoC has been discussed. Dynamically configurable communication architectures are found to improve the system performance. Currently On-chip interconnection networks are mostly implemented using shared buses which are the most common medium. The arbitration plays a crucial role in determining performance of bus-based system, as it assigns priorities, with which processor is granted the access to the shared communication resources. In the conventional arbitration algorithms there are some drawbacks such as bus starvation problem and low system performance. The bus should provide each component a flexible and utmost share of on-chip communication bandwidth and should improve the latency in access of the shared bus. The performance of SoC is improved using the probabilistic round robin algorithm with regard to the parameters, latency.Thus in this paper various issues related to bus arbitration related to design of MPSoC is analysed

International Institute for Science, Technology and Education (IISTE): E-Journals

Performance Evaluation of XY and XTRANC Routing Algorithm for Network on Chip and Implementation using DART Simulator

Author: Panda Manisha
Publication venue
Publication date: 01/05/2015
Field of study

In today’s world Network on Chip(NoC) is one of the most efficient on chip communication platform for System on Chip where a large amount of computational and storage blocks are integrated on a single chip. NoCs are scalable and have tackled the short commings of SoCs . In the first part of this project the basics of NoCs is explained which includes why we should use NoC , how to implement NoC ,various blocks of NoCs .The next part of the project deals with the implementation of XY routing algorithm in mesh (3*3) and mesh (4*4) network topologies. The throughput and latency curves for both the topologies were found and a through comparison was done by varying the no of virtual cannels. In the next part an improvised routing algorithm known as the extended torus(XTRANC) routing algorithm for NoCs implementation is explained. This algorithm is designed for inner torus mesh networks and provides better performance than usual routing algorithms. It has been implemented using the CONNECT simulator. Then the DART simulator was explored and two important components namely the flitqueue and the traffic generator was designed using this simulator

ethesis@nitr

Implementation and Evaluation of an NoC Architecture for FPGAs

Author: Le Thuan
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2009
Field of study

The Networks-on-Chip (NoC) approach for designing Systems-on-Chip (SoC) is currently emerging as an advanced concept for overcoming the scalability and efficiency problems of traditional bus-based systems. A great deal of theoretical research has been done in this area that provides good insight and shows promising results. There is a great need for research in hardware implementation of NoC-based systems to determine the feasibility of implementing various topologies and protocols, and also to accurately determine what design tradeoffs are involved in NoC implementation. This thesis addresses the challenges of implementing an NoC-based system on FPGAs for running real benchmark applications. The NoC used a mesh topology and circuit-switched communication protocol. An experimental framework was developed that allowed implementation of NoC-based system from a high level specification, using the Celoxica Handel-C hardware description language. Two test applications: charged couple device (CCD) and JPEG were developed in Handel-C to be used as our benchmark applications. Both benchmarks are computational expensive and require large quantities of data transfer that will test the NoC system. Implementation results show that the NoC-based system gives superior area utilization and speed performance compared to the bus-based system, running the same benchmarks

Scholarship at UWindsor

MPSoCBench : um framework para avaliação de ferramentas e metodologias para sistemas multiprocessados em chip

Author: Garanhani Liana Dessandre Duenha, 1977-
Publication venue: [s.n.]
Publication date: 30/08/2018
Field of study

Orientador: Rodolfo Jardim de AzevedoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Recentes metodologias e ferramentas de projetos de sistemas multiprocessados em chip (MPSoC) aumentam a produtividade por meio da utilização de plataformas baseadas em simuladores, antes de definir os últimos detalhes da arquitetura. No entanto, a simulação só é eficiente quando utiliza ferramentas de modelagem que suportem a descrição do comportamento do sistema em um elevado nível de abstração. A escassez de plataformas virtuais de MPSoCs que integrem hardware e software escaláveis nos motivou a desenvolver o MPSoCBench, que consiste de um conjunto escalável de MPSoCs incluindo quatro modelos de processadores (PowerPC, MIPS, SPARC e ARM), organizado em plataformas com 1, 2, 4, 8, 16, 32 e 64 núcleos, cross-compiladores, IPs, interconexões, 17 aplicações paralelas e estimativa de consumo de energia para os principais componentes (processadores, roteadores, memória principal e caches). Uma importante demanda em projetos MPSoC é atender às restrições de consumo de energia o mais cedo possível. Considerando que o desempenho do processador está diretamente relacionado ao consumo, há um crescente interesse em explorar o trade-off entre consumo de energia e desempenho, tendo em conta o domínio da aplicação alvo. Técnicas de escalabilidade dinâmica de freqüência e voltagem fundamentam-se em gerenciar o nível de tensão e frequência da CPU, permitindo que o sistema alcance apenas o desempenho suficiente para processar a carga de trabalho, reduzindo, consequentemente, o consumo de energia. Para explorar a eficiência energética e desempenho, foram adicionados recursos ao MPSoCBench, visando explorar escalabilidade dinâmica de voltaegem e frequência (DVFS) e foram validados três mecanismos com base na estimativa dinâmica de energia e taxa de uso de CPUAbstract: Recent design methodologies and tools aim at enhancing the design productivity by providing a software development platform before the definition of the final Multiprocessor System on Chip (MPSoC) architecture details. However, simulation can only be efficiently performed when using a modeling and simulation engine that supports system behavior description at a high abstraction level. The lack of MPSoC virtual platform prototyping integrating both scalable hardware and software in order to create and evaluate new methodologies and tools motivated us to develop the MPSoCBench, a scalable set of MPSoCs including four different ISAs (PowerPC, MIPS, SPARC, and ARM) organized in platforms with 1, 2, 4, 8, 16, 32, and 64 cores, cross-compilers, IPs, interconnections, 17 parallel version of software from well-known benchmarks, and power consumption estimation for main components (processors, routers, memory, and caches). An important demand in MPSoC designs is the addressing of energy consumption constraints as early as possible. Whereas processor performance comes with a high power cost, there is an increasing interest in exploring the trade-off between power and performance, taking into account the target application domain. Dynamic Voltage and Frequency Scaling techniques adaptively scale the voltage and frequency levels of the CPU allowing it to reach just enough performance to process the system workload while meeting throughput constraints, and thereby, reducing the energy consumption. To explore this wide design space for energy efficiency and performance, both for hardware and software components, we provided MPSoCBench features to explore dynamic voltage and frequency scalability (DVFS) and evaluated three mechanisms based on energy estimation and CPU usage rateDoutoradoCiência da ComputaçãoDoutora em Ciência da Computaçã

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio da Producao Cientifica e Intelectual da Unicamp

Interconnect design for the edge computing system-on-chip

Author: Gimbitskii Aleksei
Publication venue
Publication date: 22/06/2022
Field of study

Nowadays the majority of system-on-chips are designed by placing various IP blocks such as CPUs, memories and accelerators on the same chip. With the advantage of silicon manufacturing technologies, it has become possible to place hundreds of CPU cores and other design blocks on the same chip. A communication system that transfers data between chip components largely affects overall chip performance, computational speed and response time for external events. Firstly, this thesis studies the main on-chip interconnect design paradigms. According to the presented research, various architectures may be chosen for an interconnect design depending on the required complexity and number of subsystems. The shared and hybrid bus interconnects are one of the oldest means of on-chip communication. They are efficient for small systems with no more than ten IP blocks. The crossbars or bus matrix interconnects can help to build on-chip communication systems which can efficiently interconnect dozens of system-on-chip modules. The networks-on-chip can provide a communication solution for large scale chip designs with hundreds of IP blocks. The second part of this thesis focuses on the novel Ballast chip implementation and its interconnect design. The Ballast is a heterogeneous multiprocessor chip designed for edge computing and general-purpose computing applications. In this thesis Ballast interconnect was designed from scratch by using a cascaded crossbar approach by connecting three open-sourced AXI protocol bus matrices. The designed interconnect allows to efficiently connect 6 bus masters with 9 slaves and provides up to 9,6 GB/s bandwidth for the most productive CPU subsystem

Trepo - Institutional Repository of Tampere University

An Application-Specific Design Methodology for On-chip Crossbar Generation

Author: Benini Luca
De Micheli Giovanni
Murali Srinivasan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Designing a power-efficient interconnection architec- ture for MultiProcessor Systems-on-Chips (MPSoCs) satisfying the application performance constraints is a nontrivial task. In order to meet the tight time-to-market constraints and to effec- tively handle the design complexity, it is essential to provide a computer-aided design tool support for automating this task. In this paper, we address the issue of “application-specific design of optimal crossbar architecture” satisfying the performance re- quirements of the application and optimal binding of the cores onto the crossbar resources. We present a simulation-based design approach that is based on the analysis of the actual traffic trace of the application, considering local variations in traffic rates, temporal overlap among traffic streams, and criticality of traffic streams. Our approach is physical design aware, where the wiring complexity of the crossbar architecture is also considered during the design process. This leads to detecting timing violations on the wires early in the design cycle and to having accurate estimates of the power consumption on the wires. We apply our methodology onto several MPSoC designs, and the synthesized crossbar plat- forms are validated for performance by cycle-accurate SystemC simulation of the designs. The crossbar matrix power consumption values are based on the synthesis of the register transfer level models of the designs, obtained using industry standard tools. The experimental case studies show large reduction in communication architecture power consumption (45.3% on average) and total wirelength (38% on average) for the MPSoC designs when com- pared with traditional design approaches. The synthesized cross- bar designs also lead to large reduction in transaction latencies (up to 7×) when compared with the existing design approaches

Infoscience - École polytechnique fédérale de Lausanne

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Design and Implementation of Benes/Clos On-Chip Interconnection Networks

Author: Jiang Yikun
Publication venue: Digital Scholarship@UNLV
Publication date: 01/08/2016
Field of study

Networks-on-Chip (NoCs) have emerged as the key on-chip communication architecture for multiprocessor systems-on-chip and chip multiprocessors. Single-hop non-blocking networks have the advantage of providing uniform latency and throughput, which is important for cachecoherent NoC systems. Existing work shows that Benes networks have much lower transistor count and smaller circuit area but longer delay than crossbars. To reduce the delay, we propose to design the Clos network built with larger size switches. Using less than half number of stages than the Benes network, the Clos network with 4x4 switches can significantly reduce the delay. This dissertation focuses on designing high performance Benes/Clos on-chip interconnection networks and implementing the switch setting circuits for these networks. The major contributions are summarized below: The circuit designs of both Benes and Clos networks in different sizes are conducted considering two types of implementation of the configurable switch: with NMOS transistors only and full transmission gates (TGs). The layout and simulation results under 45nm technology show that TG-based Benes networks have much better delay and power performance than their NMOS-based counterparts, though more transistor resources are needed in TG-based designs. Clos networks achieve average 60% lower delay than Benes networks with even smaller area and power consumption. The Lee’s switch setting algorithm is fully implemented in RTL and synthesized. We have refined the algorithm in data structure and initialization/updating of relation values to make it suitable for hardware implementation. The simulation and synthesis results of the switching setting circuits for 4x4 to 64x64 Benes networks under 65nm technology confirm that the trend of delay and area results of the circuit is consistent with that of the Lee’s algorithm. To the best of our knowledge, this is the first complete hardware implementation of the parallel switch setting algorithm which can handle all types of permutations including partial ones. The results in this dissertation confirm that the Benes/Clos networks are promising solution to implement on-chip interconnection network

University of Nevada, Las Vegas Repository

A Fuzzy Logic Reconfiguration Engine for Symmetric Chip Multiprocessors

Author: McDonald-Maier Klaus D
Qadri Muhammad Yasir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/04/2010
Field of study

Recent developments in reconfigurable multiprocessor system on chip (MPSoC) have offered system designers a great amount of flexibility to exploit task concurrency with higher throughput and less energy consumption. This paper presents a novel fuzzy logic reconfiguration engine (FLRE) for coarse grain MPSoC reconfiguration that facilitates to identify an optimum balance between power and performance of the system. The FLRE is composed on two levels of abstraction layers. The system selects an optimal configuration of Level 1 / Level 2 cache size and Associativity, processor operating frequency and voltage, the number of cores based on miss rate, and energy and throughput information of the system both at core and SoC level. An 8-core symmetric chip multiprocessor has been used to evaluate the proposed scheme. The results show an overall decrease of energy consumption with not more than 30% decrease in the throughput

University of Essex Research Repository

Crossref

dReDBox: A Disaggregated Architectural Perspective for Data Centers

Author: Alachiotis Nikolaos
Andronikakis Andreas
Igoumenos Ioannis
Katrinis Kostas
Korakis Thanasis
Mishra Vaibhawa
Papadakis Orion
Pnevmatikatos Dionisios
Reale Andrea
Syrigos Ilias
Syrivelis Dimitris
Theodoropoulos Dimitris
Torrents Marti
Yuan Hui
Zervas George
Zyulkyarov Ferad
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Data centers are currently constructed with fixed blocks (blades); the hard boundaries of this approach lead to suboptimal utilization of resources and increased energy requirements. The dReDBox (disaggregated Recursive Datacenter in a Box) project addresses the problem of fixed resource proportionality in next-generation, low-power data centers by proposing a paradigm shift toward finer resource allocation granularity, where the unit is the function block rather than the mainboard tray. This introduces various challenges at the system design level, requiring elastic hardware architectures, efficient software support and management, and programmable interconnect. Memory and hardware accelerators can be dynamically assigned to processing units to boost application performance, while high-speed, low-latency electrical and optical interconnect is a prerequisite for realizing the concept of data center disaggregation. This chapter presents the dReDBox hardware architecture and discusses design aspects of the software infrastructure for resource allocation and management. Furthermore, initial simulation and evaluation results for accessing remote, disaggregated memory are presented, employing benchmarks from the Splash-3 and the CloudSuite benchmark suites.This work was supported in part by EU H2020 ICT project dRedBox, contract #687632.Peer ReviewedPostprint (author's final draft

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC