Search CORE

15 research outputs found

Класифікація та архітектурні особливості програмованих мультипроцесорних систем-на-кристалі

Author: Клименко І. А.
Publication venue: 'National Aviation University'
Publication date: 04/03/2012
Field of study

Provided general information on embedded multiprocessor systems-on-chip based on FPGA (FPGA-MPSoC). Completed a comprehensive analysis of the architectural features and provided Shih rock classification FPGA-MPSoC. Powered overview of recent research in the development of FPGA-MPSoC. A wide circle of such systems in order to study trends in architecture and all problems solvedПредоставлено общую информацию о встроенных мультипроцессорных систем-на-кристалле на базе ПЛИС (FPGA-MPSoC). Выполнено всесторонний анализ архитектурных особенностей и предоставлена широкая классификация FPGA-MPSoC. Приведены обзор последних исследований в области разработки FPGA-MPSoC. Представлен широкий круг таких систем с целью исследования всех тенденциях архитектуры и решаемых задачПредоставлено общую информацию о встроенных мультипроцессорных систем-на-кристалле на базе ПЛИС (FPGA-MPSoC). Выполнено всесторонний анализ архитектурных особенностей и предоставлена широкая классификация FPGA-MPSoC. Приведены обзор последних исследований в области разработки FPGA-MPSoC. Представлен широкий круг таких систем с целью исследования всех тенденциях архитектуры и решаемых зада

Наукові журнали Національного Авіаційного Університету

Evaluation and Design Space Exploration of a Time-Division Multiplexed NoC on FPGA for Image Analysis Applications

Author: Fresse Virginie
Houzet Dominique
Khalid Mohammed
Legrand Anne-Claire
Zhang Linlin
Publication venue
Publication date: 01/01/2009
Field of study

The aim of this paper is to present an adaptable Fat Tree NoC architecture for Field Programmable Gate Array (FPGA) designed for image analysis applications. Traditional NoCs (Network on Chip) are not optimal for dataflow applications with large amount of data. On the opposite, point to point communications are designed from the algorithm requirements but they are expensives in terms of resource and wire. We propose a dedicated communication architecture for image analysis algorithms. This communication mechanism is a generic NoC infrastructure dedicated to dataflow image processing applications, mixing circuit-switching and packet-switching communications. The complete architecture integrates two dedicated communication architectures and reusable IP blocks. Communications are based on the NoC concept to support the high bandwidth required for a large number and type of data

arXiv.org e-Print Archive

CiteSeerX

Crossref

HAL-UJM

Hal - Université Grenoble Alpes

Springer - Publisher Connector

Directory of Open Access Journals

Interconnect design for the edge computing system-on-chip

Author: Gimbitskii Aleksei
Publication venue
Publication date: 22/06/2022
Field of study

Nowadays the majority of system-on-chips are designed by placing various IP blocks such as CPUs, memories and accelerators on the same chip. With the advantage of silicon manufacturing technologies, it has become possible to place hundreds of CPU cores and other design blocks on the same chip. A communication system that transfers data between chip components largely affects overall chip performance, computational speed and response time for external events. Firstly, this thesis studies the main on-chip interconnect design paradigms. According to the presented research, various architectures may be chosen for an interconnect design depending on the required complexity and number of subsystems. The shared and hybrid bus interconnects are one of the oldest means of on-chip communication. They are efficient for small systems with no more than ten IP blocks. The crossbars or bus matrix interconnects can help to build on-chip communication systems which can efficiently interconnect dozens of system-on-chip modules. The networks-on-chip can provide a communication solution for large scale chip designs with hundreds of IP blocks. The second part of this thesis focuses on the novel Ballast chip implementation and its interconnect design. The Ballast is a heterogeneous multiprocessor chip designed for edge computing and general-purpose computing applications. In this thesis Ballast interconnect was designed from scratch by using a cascaded crossbar approach by connecting three open-sourced AXI protocol bus matrices. The designed interconnect allows to efficiently connect 6 bus masters with 9 slaves and provides up to 9,6 GB/s bandwidth for the most productive CPU subsystem

Trepo - Institutional Repository of Tampere University

Prosessori- ja system-on-chip-työkalujen yhteiskäyttö

Author: Lahti Sakari
Publication venue
Publication date: 04/06/2014
Field of study

Transport-triggered architecture (TTA) processors provide an efficient middle-ground in creating intellectual property (IP) components for system-on-chip (SoC) designs. Using TTAs, the design effort is greatly reduced compared to ASIC approach, and a more economic and efficient implementation is possible than when using a general purpose processor. This Thesis examines ways to accelerate the design flow when using TTA processors in SoC designs. The proposed ﬂows combine the use of the TTA-based Co-design Environment (TCE) tool set and Kactus2 IP-XACT design environment. The IP-XACT standard and the Kactus2 tool make it easy to integrate and conﬁgure IP components from multiple vendors, whereas the TCE tools provide a fast and efﬁcient path from C to VHDL. The Thesis presents three use cases for TTA: as a ready-made ﬁxed accelerator, a general purpose processor, and a tailored application-speciﬁc processor. Moreover, management of instance-speciﬁc data in IP-XACT is discussed. For each use case, the design ﬂows are presented in detail step-by-step, a case example is presented, and the design time spent on each step is evaluated. The flows contain between 15 and 18 steps and use between 8 and 12 different program tools from the studied tool sets. Provided that C source codes and IP-XACT library are available, a non-HW oriented engineer can implement an FPGA based multiprocessor product in less than 4 hours. Based on the results, further development suggestions for the TCE tools and Kactus2 are made

Trepo - Institutional Repository of Tampere University

Design and resource management of reconfigurable multiprocessors for data-parallel applications

Author: Wang Xiaofang
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/2006
Field of study

FPGA (Field-Programmable Gate Array)-based custom reconfigurable computing machines have established themselves as low-cost and low-risk alternatives to ASIC (Application-Specific Integrated Circuit) implementations and general-purpose microprocessors in accelerating a wide range of computation-intensive applications. Most often they are Application Specific Programmable Circuiits (ASPCs), which are developer programmable instead of user programmable. The major disadvantages of ASPCs are minimal programmability, and significant time and energy overheads caused by required hardware reconfiguration when the problem size outnumbers the available reconfigurable resources; these problems are expected to become more serious with increases in the FPGA chip size. On the other hand, dominant high-performance computing systems, such as PC clusters and SMPs (Symmetric Multiprocessors), suffer from high communication latencies and/or scalability problems. This research introduces low-cost, user-programmable and reconfigurable MultiProcessor-on-a-Programmable-Chip (MPoPC) systems for high-performance, low-cost computing. It also proposes a relevant resource management framework that deals with performance, power consumption and energy issues. These semi-customized systems reduce significantly runtime device reconfiguration by employing userprogrammable processing elements that are reusable for different tasks in large, complex applications. For the sake of illustration, two different types of MPoPCs with hardware FPUs (floating-point units) are designed and implemented for credible performance evaluation and modeling: the coarse-grain MIMD (Multiple-Instruction, Multiple-Data) CG-MPoPC machine based on a processor IP (Intellectual Property) core and the mixed-mode (MIMD, SIMD or M-SIMD) variant-grain HERA (HEterogeneous Reconfigurable Architecture) machine. In addition to alleviating the above difficulties, MPoPCs can offer several performance and energy advantages to our data-parallel applications when compared to ASPCs; they are simpler and more scalable, and have less verification time and cost. Various common computation-intensive benchmark algorithms, such as matrix-matrix multiplication (MMM) and LU factorization, are studied and their parallel solutions are shown for the two MPoPCs. The performance is evaluated with large sparse real-world matrices primarily from power engineering. We expect even further performance gains on MPoPCs in the near future by employing ever improving FPGAs. The innovative nature of this work has the potential to guide research in this arising field of high-performance, low-cost reconfigurable computing. The largest advantage of reconfigurable logic lies in its large degree of hardware customization and reconfiguration which allows reusing the resources to match the computation and communication needs of applications. Therefore, a major effort in the presented design methodology for mixed-mode MPoPCs, like HERA, is devoted to effective resource management. A two-phase approach is applied. A mixed-mode weighted Task Flow Graph (w-TFG) is first constructed for any given application, where tasks are classified according to their most appropriate computing mode (e.g., SIMD or MIMD). At compile time, an architecture is customized and synthesized for the TFG using an Integer Linear Programming (ILP) formulation and a parameterized hardware component library. Various run-time scheduling schemes with different performanceenergy objectives are proposed. A system-level energy model for HERA, which is based on low-level implementation data and run-time statistics, is proposed to guide performance-energy trade-off decisions. A parallel power flow analysis technique based on Newton\u27s method is proposed and employed to verify the methodology

Digital Commons @ New Jersey Institute of Technology (NJIT)

Exploration d'architectures génériques sur FPGA pour des algorithmes d'imagerie multispectrale

Author: FRESSE Virginie
ROUSSEAU Frédéric
TAN Junyan
Publication venue
Publication date: 01/01/2012
Field of study

Les architectures multiprocesseur sur puce (MPSoC) basées sur les réseaux sur puce (NoC) constituent une des solutions les plus appropriées pour les applications embarquées temps réel de traitement du signal et de l image. De part l augmentation constante de la complexité de ces algorithmes et du type et de la taille des données manipulées, des architectures MPSoC sont nécessaires pour répondre aux contraintes de performance et de portabilité. Mais l exploration de l espace de conception de telles architectures devient très coûteuse en temps. En effet, il faut définir principalement le type et le nombre des coeurs de calcul, l architecture mémoire et le réseau de communication entre tous ces composants. La validation par simulation de haut niveau manque de précision, et la simulation de bas niveau est inadaptée au vu de la taille de l architecture. L émulation sur FPGA devient donc inévitable. Dans le domaine de l image, l imagerie spectrale est de plus en plus utilisée car elle permet de multiplier les intervalles spectraux, améliorant la définition de la lumière d une scène pour permettre un accès à des caractéristiques non visibles à l oeil nu. De nombreux paramètres modifient les caractéristiques de l algorithme, ce qui influence l architecture finale. L objectif de cette thèse est de proposer une méthode pour dimensionner au plus juste l architecture matérielle et logicielle d une application d imagerie multispectrale. La première étape est le dimensionnement du NoC en fonction du trafic sur le réseau. Le développement automatique d une plateforme d émulation sur mono ou multi FPGA facilite cette étape et détermine le positionnement des composants de calcul. Ensuite, le dimensionnement des composants de calcul et leurs fonctionnalités sont validés à l aide de plateformes de simulation existantes, avant la génération du modèle synthétisable sur FPGA. Le flot de conception est ouvert dans le sens qu il accepte différents NoC à condition d avoir le modèle source HDL de ce composant. De nombreux résultats mettent en avant les paramètres importants qui ont une influence sur les performances des architectures et du NoC en particulier. Plusieurs solutions sont décrites, commentées et critiquées. Ces travaux nous permettent de poser les premiers jalons d une plateforme d émulation complète MPSoC à base de NoCThe Multiprocessor-System-On-Chip (MPSoC) architectures based on the Network-On-Chip (NoC) communication are the one of the most appropriate solution for image and signal processing applications under real time constraints. Due to the ever increasing complexity of these algorithms, the types and sizes of the data manipulated, the MPSoC architectures are necessary to meet the constraints of performance and portability. However exploring the design space of such architecture is time consuming. Indeed, many parameters should be defined such as the type and the number of processing cores, the memory architecture and the communication network between all these components. Validation by high-level simulations has the lack of the precision. Low-level simulation is inadequate for such big size of the architecture. Therefore, the emulation on FPGA becomes inevitable. In image processing, spectral imaging is more and more used. This technology captures light from more frequencies than the human eye increasing the number of wavelengths. Invisible details can be extracted from a scene. The difference between all spectral imaging applications is the number of wavelengths and the precision. Many parameters affect the characteristics of the algorithm, having a huge impact on the final architecture. The objective of this thesis is to propose a method for sizing one of the most accurate hardware and software architecture for multispectral imaging application. The first step is the design of the NoC based on the network traffic. The automatic development of an emulation platform on a single FPGA or multi-FPGAs simplifies this step and determines the positioning of the computational components. Then, the design of computational components and their functions are validated using existing simulation platforms. The synthesizable model of the architecture on FPGA is then generated. The design flow is open. Several NoC structures can be inserted using the source model of this component. The set of results obtained points out the major parameters influencing the performances of architecture and the NoC itself. Several solutions are described and analyzed. These studies allow us to lay the groundwork for a complete MPSoC emulation platform based on NoCST ETIENNE-Bib. électronique (422189901) / SudocSudocFranceF

OpenGrey Repository

a PC based SDR platform with dynamic reconfiguration

Author: Bivona Francesco Paul
Camilo Alexander Valdemar
Publication venue: Digital WPI
Publication date: 12/04/2009
Field of study

The goal of this Major Qualifying Project is to provide the framework for integration of a Virtex series field programmable gate array (FPGA) into GNU Radio, allowing GNU Radio to have control over both FPGA and non-FPGA components of the pipeline. In this report, we address the following: our research into the which FPGA series would be most beneficial to our project, an outline of the evolution of our design over the course of the past 21 weeks, and a summary of the final outcomes in various subsets of project development

DigitalCommons@WPI

Approche efficace pour la conception des architectures multiprocesseurs sur puce électronique

Author: Elie Etienne
Publication venue
Publication date: 01/12/2010
Field of study

Les systèmes multiprocesseurs sur puce électronique (On-Chip Multiprocessor [OCM]) sont considérés comme les meilleures structures pour occuper l'espace disponible sur les circuits intégrés actuels. Dans nos travaux, nous nous intéressons à un modèle architectural, appelé architecture isométrique de systèmes multiprocesseurs sur puce, qui permet d'évaluer, de prédire et d'optimiser les systèmes OCM en misant sur une organisation efficace des nœuds (processeurs et mémoires), et à des méthodologies qui permettent d'utiliser efficacement ces architectures. Dans la première partie de la thèse, nous nous intéressons à la topologie du modèle et nous proposons une architecture qui permet d'utiliser efficacement et massivement les mémoires sur la puce. Les processeurs et les mémoires sont organisés selon une approche isométrique qui consiste à rapprocher les données des processus plutôt que d'optimiser les transferts entre les processeurs et les mémoires disposés de manière conventionnelle. L'architecture est un modèle maillé en trois dimensions. La disposition des unités sur ce modèle est inspirée de la structure cristalline du chlorure de sodium (NaCl), où chaque processeur peut accéder à six mémoires à la fois et où chaque mémoire peut communiquer avec autant de processeurs à la fois. Dans la deuxième partie de notre travail, nous nous intéressons à une méthodologie de décomposition où le nombre de nœuds du modèle est idéal et peut être déterminé à partir d'une spécification matricielle de l'application qui est traitée par le modèle proposé. Sachant que la performance d'un modèle dépend de la quantité de flot de données échangées entre ses unités, en l'occurrence leur nombre, et notre but étant de garantir une bonne performance de calcul en fonction de l'application traitée, nous proposons de trouver le nombre idéal de processeurs et de mémoires du système à construire. Aussi, considérons-nous la décomposition de la spécification du modèle à construire ou de l'application à traiter en fonction de l'équilibre de charge des unités. Nous proposons ainsi une approche de décomposition sur trois points : la transformation de la spécification ou de l'application en une matrice d'incidence dont les éléments sont les flots de données entre les processus et les données, une nouvelle méthodologie basée sur le problème de la formation des cellules (Cell Formation Problem [CFP]), et un équilibre de charge de processus dans les processeurs et de données dans les mémoires. Dans la troisième partie, toujours dans le souci de concevoir un système efficace et performant, nous nous intéressons à l'affectation des processeurs et des mémoires par une méthodologie en deux étapes. Dans un premier temps, nous affectons des unités aux nœuds du système, considéré ici comme un graphe non orienté, et dans un deuxième temps, nous affectons des valeurs aux arcs de ce graphe. Pour l'affectation, nous proposons une modélisation des applications décomposées en utilisant une approche matricielle et l'utilisation du problème d'affectation quadratique (Quadratic Assignment Problem [QAP]). Pour l'affectation de valeurs aux arcs, nous proposons une approche de perturbation graduelle, afin de chercher la meilleure combinaison du coût de l'affectation, ceci en respectant certains paramètres comme la température, la dissipation de chaleur, la consommation d'énergie et la surface occupée par la puce. Le but ultime de ce travail est de proposer aux architectes de systèmes multiprocesseurs sur puce une méthodologie non traditionnelle et un outil systématique et efficace d'aide à la conception dès la phase de la spécification fonctionnelle du système.On-Chip Multiprocessor (OCM) systems are considered to be the best structures to occupy the abundant space available on today integrated circuits (IC). In our thesis, we are interested on an architectural model, called Isometric on-Chip Multiprocessor Architecture (ICMA), that optimizes the OCM systems by focusing on an effective organization of cores (processors and memories) and on methodologies that optimize the use of these architectures. In the first part of this work, we study the topology of ICMA and propose an architecture that enables efficient and massive use of on-chip memories. ICMA organizes processors and memories in an isometric structure with the objective to get processed data close to the processors that use them rather than to optimize transfers between processors and memories, arranged in a conventional manner. ICMA is a mesh model in three dimensions. The organization of our architecture is inspired by the crystal structure of sodium chloride (NaCl), where each processor can access six different memories and where each memory can communicate with six processors at once. In the second part of our work, we focus on a methodology of decomposition. This methodology is used to find the optimal number of nodes for a given application or specification. The approach we use is to transform an application or a specification into an incidence matrix, where the entries of this matrix are the interactions between processors and memories as entries. In other words, knowing that the performance of a model depends on the intensity of the data flow exchanged between its units, namely their number, we aim to guarantee a good computing performance by finding the optimal number of processors and memories that are suitable for the application computation. We also consider the load balancing of the units of ICMA during the specification phase of the design. Our proposed decomposition is on three points: the transformation of the specification or application into an incidence matrix, a new methodology based on the Cell Formation Problem (CFP), and load balancing processes in the processors and data in memories. In the third part, we focus on the allocation of processor and memory by a two-step methodology. Initially, we allocate units to the nodes of the system structure, considered here as an undirected graph, and subsequently we assign values to the arcs of this graph. For the assignment, we propose modeling of the decomposed application using a matrix approach and the Quadratic Assignment Problem (QAP). For the assignment of the values to the arcs, we propose an approach of gradual changes of these values in order to seek the best combination of cost allocation, this under certain metric constraints such as temperature, heat dissipation, power consumption and surface occupied by the chip. The ultimate goal of this work is to propose a methodology for non-traditional, systematic and effective decision support design tools for multiprocessor system architects, from the phase of functional specification

Dépôt Institutionnel Numérique

Kirjastonhallinnan toteutus Kactus2 IP-XACT työkalussa

Author: Kamppi Antti
Publication venue
Publication date: 05/12/2012
Field of study

The size and complexity of embedded systems have grown at an accelerating pace over the last years. This causes demand to improve the productivity of the design process e.g. by enhancing the reusability of logic components, also called IP-blocks. Improving reusability requires use of new design tools and methods. IP-XACT is a XML based metadata standard, which describes IP-blocks in a tool, implementation and vendor neutral way. Previously there hasn’t been open source design tools supporting IP-XACT and the commercial tools are expensive, thus limiting the ability of small and middle-sized companies to use IP-XACT. This thesis presents an open source IP-XACT design tool called Kactus2. The scope of the thesis is the library management and IP-packaging modules, which enable automated management of IP-blocks. The thesis presents a few extensions to the standard, which expand the original scope of IP-XACT towards product management. The design and implementation of the library management and IP-packaging classes and the user interfaces are described. The implementation language was C++ and the used development framework was the open source version 4.8.3 of Qt. The development environment was Microsoft Visual Studio 2008 with the Qt add-in installed. Qt enables cross-platform development, which facilitated the release of Kactus2 for both Windows and Linux operating systems. The sizes of the presented modules in code lines are 7.500 for library management and 21.000 for IP-packaging. The corresponding class counts are 26 and 156. The code line count for whole Kactus2 tool is 103.000 lines. Library management contains two views of the library structure and a segment to define search options. Packaging module contains 28 editors for different elements of the metadata. The graphical user interface was designed to be easy to use, enabling users to adopt new design methods. Also, the tool contains a context based help system, which reacts to user’s actions giving advice related to the task on hand. The total download count for different Kactus2 versions is over 1.700.Sulautettujen järjestelmien koko ja monimutkaisuus ovat viime vuosina kasvaneet kiihtyvällä tahdilla. Siksi suunnittelun tuottavuutta täytyy tehostaa, johon on pyritty mm. käyttämällä uudelleenkäytettäviä logiikkakomponentteja. Uudelleenkäytön tehostaminen vaatii uusia suunnittelutyökaluja ja metodeja. IP-XACT on XML-pohjainen metadata standardi, jolla kuvataan uudelleenkäytettäviä logiikkakomponentteja, eli IP-lohkoja, työkalu- toteutus- ja toimittajaneutraalilla tavalla. Ongelmana IP-XACT:in yleistymisessä on ollut työkalujen tuki. Saatavilla ei ole aiemmin ollut vapaan lähdekoodin suunnittelutyökaluja ja kaupalliset vaihtoehdot ovat kalliita, mikä rajoittaa pienten ja keskisuurten yritysten mahdollisuuksia ottaa IP-XACT käyttöön. Tässä diplomityössä esitellään avoimen lähdekoodin Kactus2 työkalu IP-XACT-pohjaiseen suunnitteluun. Työn aiheena on työkalun kirjastonhallinta- ja IP-paketointimoduulit, joiden avulla IP-lohkoille voidaan luoda metadata-kuvaukset ja hallinnoida lohkoja automatisoidusti. Diplomityössä esitellään muutamia lisäyksiä, jotka laajentavat alkuperäistä standardia myös tuotetiedon hallintaan. Työssä sekä suunniteltiin että toteutettiin kirjastonhallinnan ja paketoinnin vaatimat luokat ja käyttöliittymänäkymät. Toteutuksessa käytettiin C++ ohjelmointikieltä ja ohjelmistokehyksenä käytettiin Qt:n avoimen lähdekoodin versiota 4.8.3. Kehitysympäristönä toimi Microsoftin Visual Studio 2008, johon oli asennettu Qt lisäosa. Qt mahdollistaa järjestelmäriippumattoman koodin kirjoittamisen, joten Kactus2 on julkaistu sekä Windows että Linux käyttöjärjestelmille. Esiteltyjen moduulien koot koodiriveinä ovat 7.500 kirjastonhallinta- ja 21.000 IP-paketointimoduulille. Vastaavat luokkien määrät ovat 26 ja 156. Koko Kactus2:n koodirivimäärä on 103.000 riviä. Kirjastonhallinta sisältää kaksi eri näkymää kirjaston rakenteesta, sekä oman osan kirjaston hakuehtojen määrittämiseen. Paketointimoduuli sisältää 28 eri editoria. Käyttöliittymästä on pyritty tekemään selkeä ja helppokäyttöinen, jotta käyttäjien olisi helppo omaksua uusia toimintatapoja. Lisäksi työkaluun on lisätty kontekstipohjainen opastusjärjestelmä, joka reagoi käyttäjän tekemisiin. Kokonaisuudessaan Kactus2:n eri versioita on ladattu yli 1.700 kertaa

Trepo - Institutional Repository of Tampere University

TUT DPub