32 research outputs found

    NFV Platforms: Taxonomy, Design Choices and Future Challenges

    Get PDF
    Due to the intrinsically inefficient service provisioning in traditional networks, Network Function Virtualization (NFV) keeps gaining attention from both industry and academia. By replacing the purpose-built, expensive, proprietary network equipment with software network functions consolidated on commodity hardware, NFV envisions a shift towards a more agile and open service provisioning paradigm. During the last few years, a large number of NFV platforms have been implemented in production environments that typically face critical challenges, including the development, deployment, and management of Virtual Network Functions (VNFs). Nonetheless, just like any complex system, such platforms commonly consist of abounding software and hardware components and usually incorporate disparate design choices based on distinct motivations or use cases. This broad collection of convoluted alternatives makes it extremely arduous for network operators to make proper choices. Although numerous efforts have been devoted to investigating different aspects of NFV, none of them specifically focused on NFV platforms or attempted to explore their design space. In this paper, we present a comprehensive survey on the NFV platform design. Our study solely targets existing NFV platform implementations. We begin with a top-down architectural view of the standard reference NFV platform and present our taxonomy of existing NFV platforms based on what features they provide in terms of a typical network function life cycle. Then we thoroughly explore the design space and elaborate on the implementation choices each platform opts for. We also envision future challenges for NFV platform design in the incoming 5G era. We believe that our study gives a detailed guideline for network operators or service providers to choose the most appropriate NFV platform based on their respective requirements. Our work also provides guidelines for implementing new NFV platforms

    Optimización de problemas de varios objetivos desde un enfoque de eficiencia energética aplicado a redes celulares heterogéneas 5G usando un marco de conmutación de celdas pequeñas

    Get PDF
    This Ph.D. dissertation addresses the Many-Objective Optimization Problem (MaOP) study to reduce the inter-cell interference and the power consumption for realistic Centralized, Collaborative, Cloud, and Clean Radio Access Networks (C-RANs). It uses the Cell Switch-Off (CSO) scheme to switch-off/on Remote Radio Units (RRUs) and the Coordinated Scheduling (CS) technique to allocate resource blocks smartly. The EF1-NSGA-III (It is a variation of the NSGA-III algorithm that uses the front 1 to find extreme points at the normalization procedure extended in this thesis) algorithm is employed to solve a proposed Many-Objective Optimization Problem (MaOP). It is composed of four objective functions, four constraints, and two decision variables. However, the above problem is redefined to have three objective functions to see the performance comparison between the NSGA-II and EF1-NSGA-III algorithms. The OpenAirInterface (OAI) platform is used to evaluate and validate the performance of an indoor coverage system because most of the user-end equipment of next-generation cellular networks will be in an indoor environment. It constitutes the fastest growing 5G open-source platform that implements 3GPP technology on general-purpose computers, fast Ethernet transport ports, and Commercial-Off-The-Shelf (COTS) software-defined radio hardware. This document is composed of five contributions. The first one is a survey about testbed, emulators, and simulators for 4G/5G cellular networks. The second one is the extension of the KanGAL's NSGA-II code to implement the EF1-NSGA-III, adaptive EF1-NSGA-III (A-EF1-NSGA-III), and efficient adaptive EF1-NSGA-III (A2^2-EF1-NSGA-III). They support up to 10 objective functions, manage real, integer, and binary decision variables, and many constraints. The above algorithms outperform other works in terms of the Inverted Generational Distance (IGD) metric. The third contribution is the implementation of real-time emulation methodologies for C-RANs using a frequency domain representation in OAI. It improves the average computation time 10-fold compared to the time domain without using Radio Frequency hardware and avoids their uncertainties. The fourth one is the implementation of the Coordination Scheduling (CS) technique as a proof-of-concept to validate the advantages of frequency domain methodologies and to allocate resource blocks dynamically among RRUs. Finally, a many-objective optimization problem is defined and solved with evolutionary algorithms where diversity is managed based on crowded-distance and reference points to reduce the power consumption for C-RANs. The solutions obtained are considered to control the scheduling task at the Radio Cloud Center (RCC) and to switch RRUs.Este disertación aborda el estudio del problema de optimización de varios objetivos (MaOP) para reducir la interferencia entre células y el consumo de energía para redes de acceso de radio en tiempo real, colaborativas, en la nube y limpias (C-RAN). Utiliza el esquema de conmutacion de celdas (CSO) para apagar / encender unidades de radio remotas (RRU) y la técnica de programación coordinada (CS) para asignar bloques de recursos de manera inteligente. El algoritmo EF1-NSGA-III (es una variación del algoritmo NSGA-III que usa el primer frente de pareto para encontrar puntos extremos en el procedimiento de normalización extendido en esta tesis) se utiliza para resolver un problema de optimización de varios objetivos (MaOP) propuesto. Se compone de cuatro funciones objetivos, cuatro restricciones y dos variables de decisión. Sin embargo, el problema anterior se redefine para tener tres funciones objetivas para ver la comparación de rendimiento entre los algoritmos NSGA-II y EF1-NSGA-III. La plataforma OpenAirInterface (OAI) se utiliza para evaluar y validar el rendimiento de un sistema de cobertura en interiores porque la mayoría del equipos móviles de las redes celulares de próxima generación estarán en un entorno interior. Ella constituye la plataforma de código abierto 5G de más rápido crecimiento que implementa la tecnología 3GPP en computadoras de uso general, puertos de transporte Ethernet rápidos y hardware de radio definido por software comercial (COTS). Este documento se compone de cinco contribuciones. La primera es una estudio sobre banco de pruebas, emuladores y simuladores para redes celulares 4G / 5G. El segundo es la extensión del código NSGA-II de KanGAL para implementar EF1-NSGA-III, EF1-NSGA-III adaptativo (A-EF1-NSGA-III) y EF1-NSGA-III adaptativo eficiente (A 2 ^ 2 -EF1-NSGA-III). Admiten hasta 10 funciones objetivas, gestionan variables de decisión reales, enteras y binarias, y muchas restricciones. Los algoritmos anteriores superan a otros trabajos en términos de la métrica de distancia generacional invertida (IGD). La tercera contribución es la implementación de metodologías de emulación en tiempo real para C-RAN utilizando una representación de dominio de frecuencia en OAI. Mejora el tiempo de cálculo promedio 10 veces en comparación con el dominio del tiempo sin usar hardware de radiofrecuencia y evita sus incertidumbres. El cuarto es la implementación de la técnica de Programación de Coordinación (CS) como prueba de concepto para validar las ventajas de las metodologías de dominio de frecuencia y asignar bloques de recursos dinámicamente entre las RRU. Finalmente, un problema de optimización de muchos objetivos se define y resuelve con algoritmos evolutivos en los que la diversidad se gestiona en función de la distancia de crouding y los puntos de referencia para reducir el consumo de energía de las C-RAN. Las soluciones obtenidas controlan la tarea de programación en Radio Cloud Center (RCC) y conmutan las RRU.Proyecto personal: Redes celulares de próxima generaciónDoctorad

    Architecture matérielle logicielle pour l'exécution à latence réduite d'applications de télécommunications émergentes sur centre de données

    Get PDF
    RÉSUMÉ L’industrie des technologies de l’information et des communications fait face à une demande croissante de services sans fil et Internet omniprésents. Cette demande est alimentée par une explosion du nombre d’appareils mobiles riches en multimédia. Il a été estimé qu’à partir de cette année, 2020, le volume de trafic de données mobiles doublera chaque année pour plusieurs années. En conséquence, il en résulte une augmentation significative des dépenses en capital pour les systèmes construits sur les technologies actuelles de réseau d’accès ra-dio qui sont essentiellement basées sur des architectures avec une structure fixe utilisant des plates-formes propriétaires et des mécanismes de contrôle et de gestion de réseau distribués. D’autre part, pour garantir la qualité de service requise, les sous-systèmes sont dimensionnés en fonction des demandes de pointe. Par conséquent, l’extension du réseau aura un impact considérable sur les dépenses d’exploitation. La recherche proposée vise à développer une architecture matérielle et logicielle adaptée à une grappe d’unités de traitement virtualisée pour les signaux en bande de base d’accès radio en nuagique. Ce type d’architecture de-vra prendre en charge le traitement en temps réel avec des processeurs généralistes sur une plateforme hétérogène. Cela soulève deux défis principaux : la planification des tâches en temps réel et leur exécution d’une manière plus déterministe par rapport aux plates-formes généralistes existantes. Ainsi, les mécanismes d’allocation et de gestion des ressources dans les grappes informatiques doivent être revus. Le deuxième défi est d’obtenir un comporte-ment à faible variance qui implique deux préoccupations majeures : le temps de calcul et le délai de communication. Essentiellement, la variation du temps de calcul est inhérente à tous les processeurs généralistes. Néanmoins, l’infrastructure de communication des grappes informatiques existantes ne fournit aucun soutien pour les communications à faible variance. La recherche proposée est divisée en deux principaux sujets : Le calcul dynamique, l’allocation et la gestion des ressources réseau dans une grappeinformatique (hétérogène) : les algorithmes d’allocation dynamique des ressources et de planification des tâches en temps réel formeront la fonctionnalité de base prise en charge par le plan de contrôle. Afin de répondre aux fortes contraintes en temps réel de cette classe d’applications, une implémentation matérielle parallèle basée sur circuit logique programmable (FPGA) du plan de contrôle est proposée.----------ABSTRACT The Information and Communications Technology industry is facing an increasing demand for ubiquitous wireless and Internet services introduced by an explosion of multimedia-rich mobile devices. It is estimated that starting this year, 2020, the volume of mobile data traÿcs will double every year. Consequently, it results in significant increases of capital expenditures for systems built on the current Radio Access Network technologies, which are essentially based on architectures with a fixed structure (not reconfigurable) using proprietary platforms with distributed network control and management mechanisms. To ensure the required quality of service, subsystems are dimensioned with respect to the peak demands. Therefore, network expansion will considerably impact on operating expenditures. This thesis aims at developing an architecture at both hardware and software levels suitable for a virtualized Baseband Processing Unit pool in Cloud Radio Acces Network in order to support real-time processing in a General Purpose Processor based platform. This raises two main challenges: scheduling tasks in real-time and executing them in a manner that is reduces variance compared to the existing General Purpose Processor based platforms. Real-time tasks from radio air interface in the Cloud Radio Access Network must be scheduled at a finer grain and must be completed within a given timeslot. Thus, mechanisms for resource allocation and management in computing clusters must be revisited. The second challenge is obtaining a behavior with reduced variability that involves two major concerns: computing time and communication delay. Nevertheless, the communication infrastructure of existing computing clusters does not provide any support for low variance communications. The proposed research is divided into the following main subjects:Adaptive computing and network resource allocation and management in (hetero-geneous) computing clusters: The algorithms for dynamic resources allocation and real-time task scheduling will form the core functionality that the control plane will support. In order to meet the hard real-time constraints of that class of applications, a parallel Field Programable Gate Array based hardware implementation of the control plane is proposed

    Efficient sharing mechanisms for virtualized multi-tenant heterogeneous networks

    Get PDF
    The explosion in data traffic, the physical resource constraints, and the insufficient financial incentives for deploying 5G networks, stress the need for a paradigm shift in network upgrades. Typically, operators are also the service providers, which charge the end users with low and flat tariffs, independently of the service enjoyed. A fine-scale management of the network resources is needed, both for optimizing costs and resource utilization, as well as for enabling new synergies among network owners and third-parties. In particular, operators could open their networks to third parties by means of fine-scale sharing agreements over customized networks for enhanced service provision, in exchange for an adequate return of investment for upgrading their infrastructures. The main objective of this thesis is to study the potential of fine-scale resource management and sharing mechanisms for enhancing service provision and for contributing to a sustainable road to 5G. More precisely, the state-of-the-art architectures and technologies for network programmability and scalability are studied, together with a novel paradigm for supporting service diversity and fine-scale sharing. We review the limits of conventional networks, we extend existing standardization efforts and define an enhanced architecture for enabling 5G networks' features (e.g., network-wide centralization and programmability). The potential of the proposed architecture is assessed in terms of flexible sharing and enhanced service provision, while the advantages of alternative business models are studied in terms of additional profits to the operators. We first study the data rate improvement achievable by means of spectrum and infrastructure sharing among operators and evaluate the profit increase justified by a better service provided. We present a scheme based on coalitional game theory for assessing the capability of accommodating more service requests when a cooperative approach is adopted, and for studying the conditions for beneficial sharing among coalitions of operators. Results show that: i) collaboration can be beneficial also in case of unbalanced cost redistribution within coalitions; ii) coalitions of equal-sized operators provide better profit opportunities and require lower tariffs. The second kind of sharing interaction that we consider is the one between operators and third-party service providers, in the form of fine-scale provision of customized portions of the network resources. We define a policy-based admission control mechanism, whose performance is compared with reference strategies. The proposed mechanism is based on auction theory and computes the optimal admission policy at a reduced complexity for different traffic loads and allocation frequencies. Because next-generation services include delay-critical services, we compare the admission control performances of conventional approaches with the proposed one, which proves to offer near real-time service provision and reduced complexity. Besides, it guarantees high revenues and low expenditures in exchange for negligible losses in terms of fairness towards service providers. To conclude, we study the case where adaptable timescales are adopted for the policy-based admission control, in order to promptly guarantee service requirements over traffic fluctuations. In order to reduce complexity, we consider the offline pre­computation of admission strategies with respect to reference network conditions, then we study the extension to unexplored conditions by means of computationally efficient methodologies. Performance is compared for different admission strategies by means of a proof of concept on real network traces. Results show that the proposed strategy provides a tradeoff in complexity and performance with respect to reference strategies, while reducing resource utilization and requirements on network awareness.La explosion del trafico de datos, los recursos limitados y la falta de incentivos para el desarrollo de 5G evidencian la necesidad de un cambio de paradigma en la gestion de las redes actuales. Los operadores de red suelen ser tambien proveedores de servicios, cobrando tarifas bajas y planas, independientemente del servicio ofrecido. Se necesita una gestion de recursos precisa para optimizar su utilizacion, y para permitir nuevas sinergias entre operadores y proveedores de servicios. Concretamente, los operadores podrian abrir sus redes a terceros compartiendolas de forma flexible y personalizada para mejorar la calidad de servicio a cambio de aumentar sus ganancias como incentivo para mejorar sus infraestructuras. El objetivo principal de esta tesis es estudiar el potencial de los mecanismos de gestion y comparticion de recursos a pequei\a escala para trazar un camino sostenible hacia el 5G. En concreto, se estudian las arquitecturas y tecnolog fas mas avanzadas de "programabilidad" y escalabilidad de las redes, junto a un nuevo paradigma para la diversificacion de servicios y la comparticion de recursos. Revisamos los limites de las redes convencionales, ampliamos los esfuerzos de estandarizacion existentes y definimos una arquitectura para habilitar la centralizacion y la programabilidad en toda la red. La arquitectura propuesta se evalua en terminos de flexibilidad en la comparticion de recursos, y de mejora en la prestacion de servicios, mientras que las ventajas de un modelo de negocio alternativo se estudian en terminos de ganancia para los operadores. En primer lugar, estudiamos el aumento en la tasa de datos gracias a un uso compartido del espectro y de las infraestructuras, y evaluamos la mejora en las ganancias de los operadores. Presentamos un esquema de admision basado en la teoria de juegos para acomodar mas solicitudes de servicio cuando se adopta un enfoque cooperativo, y para estudiar las condiciones para que la reparticion de recursos sea conveniente entre coaliciones de operadores. Los resultados ensei\an que: i) la colaboracion puede ser favorable tambien en caso de una redistribucion desigual de los costes en cada coalicion; ii) las coaliciones de operadores de igual tamai\o ofrecen mejores ganancias y requieren tarifas mas bajas. El segundo tipo de comparticion que consideramos se da entre operadores de red y proveedores de servicios, en forma de provision de recursos personalizada ya pequei\a escala. Definimos un mecanismo de control de trafico basado en polfticas de admision, cuyo rendimiento se compara con estrategias de referencia. El mecanismo propuesto se basa en la teoria de subastas y calcula la politica de admision optima con una complejidad reducida para diferentes cargas de trafico y tasa de asignacion. Con particular atencion a servicios 5G de baja latencia, comparamos las prestaciones de estrategias convencionales para el control de admision con las del metodo propuesto, que proporciona: i) un suministro de servicios casi en tiempo real; ii) una complejidad reducida; iii) unos ingresos elevados; y iv) unos gastos reducidos, a cambio de unas perdidas insignificantes en terminos de imparcialidad hacia los proveedores de servicios. Para concluir, estudiamos el caso en el que se adoptan escalas de tiempo adaptables para el control de admision, con el fin de garantizar puntualmente los requisitos de servicio bajo diferentes condiciones de trafico. Para reducir la complejidad, consideramos el calculo previo de las estrategias de admision con respecto a condiciones de red de referenda, adaptables a condiciones inexploradas por medio de metodologias computacionalmente eficientes. Se compara el rendimiento de diferentes estrategias de admision sobre trazas de trafico real. Los resultados muestran que la estrategia propuesta equilibra complejidad y ganancias, mientras se reduce la utilizacion de recursos y la necesidad de conocer el estado exacto de la red.Postprint (published version

    Reconfigurable network systems and software-defined networking

    Get PDF
    Modern high-speed networks have evolved from relatively static networks to highly adaptive networks facilitating dynamic reconfiguration. This evolution has influenced all levels of network design and management, introducing increased programmability and configuration flexibility. This influence has extended from the lowest level of physical hardware interfaces to the highest level of network management by software. A key representative of this evolution is the emergence of softwaredefined networking (SDN). In this paper, we review the current state of the art in reconfigurable network systems, covering hardware reconfiguration, SDN, and the interplay between them. We take a top-down approach, starting with a tutorial on software-defined networks. We then continue to discuss programming languages as the linking element between different levels of software and hardware in the network. We review electronic switching systems, highlighting programmability and reconfiguration aspects, and describe the trends in reconfigurable network elements. Finally, we describe the state of the art in the integration of photonic transceiver and switching elements with electronic technologies, and consider the implications for SDN and reconfigurable network systems.This work was jointly supported by the UKs Engineering and Physical Sciences Research Council (EPSRC) Internet Project EP/H040536/1, an EPSRC Research Fellowship grant to Philip Watts (EP/I004157/2), and DARPA and AFRL under contract FA8750-11-C-0249.This is the final version of the article. It first appeared from IEEE via http://dx.doi.org/10.1109/JPROC.2015.243573

    CPU Utilization Improvement of Multiple-Core Processors Through Cache Management and Task Scheduling

    Get PDF
    RÉSUMÉ De nos jours, les architectures multicœurs et multiprocesseurs sont largement utilisées dans les centres de données. Une telle utilisation fournit les performances requises pour diverses tâches, telles que le C-RAN(accès radio par info-nuagique). Le traitement du signal sans fil en bande de base(wireless baseband) pour les normes 4G et 5G désigne un ensemble de tâches qui doivent être exécutées dans un intervalle de temps spécifique. Par exemple, la pile de liaison montante(up-link stack) pour une station de base 4G virtualisée a été décomposée en plus de 1000 tâches exécutables en 5 ms. Avec la 5G, la latence cible dans un scénario de bout en bout avec utilisation à latence très faible(ultra-low latency) est de 1 ms, tandis que la complexité de calcul est d’un à deux ordres de grandeur plus élevée que celle de la 4G. Le défi à surmonter c’est de répondre à ces objectifs en terme de complexité de traitement. Pour ce faire, il est crucial de caractériser la variabilité du temps de traitement par rapport aux caractéristiques du modèle de mémoire afin de garantir un temps de traitement donné dans les grappes d’ordinateurs classiques.En outre, la planification des tâches sur les systèmes multicœurs reste un problème ouvert. Un tel problème doit être analysé afin d’utiliser pleinement la capacité de traitement d’un système multicoeur d’un système multicoeur et d’atteindre une faible latence. Afin de remédier à l’utilisation ineÿcace des cœurs de processeur, un schéma d’ordonnancement des tâches basé sur la mise en file d’attente, qui se focalise sur le calcul parallèle local, est proposé. Dans cette mémoire, on introduit la gestion multi-files pour la planification dynamique des tâches afin de cibler une utilisation complète à 100% des cœurs de CPU locaux pour des tâches d’entrée suÿsantes. Plusieurs simulations sont faites pour vérifier le schéma de planification des tâches proposé. Les résultats rapportés confirment sa viabilité et son efficacité. De plus, l’utilisation de la mémoire cache est l’une des principales sources de variabilité du temps d’exécution. En outre, une gestion ineÿcace de la mémoire cache s’avère probléma-tique dans les systèmes avec WCET. Une approche eÿcace de gestion de la cache doit prendre en compte simultanément la planification des tâches et la gestion de la cache. L’approche optimale de gestion de la cache oblige de manière critique à prendre en compte les priorités associées à toutes les tâches; la connaissance de ces priorités est essentielle pour détecter et éviter les goulots d’étranglement dans le système.----------ABSTRACT Nowadays, modern multiprocessor and multicore architectures are widely used in data centers. Such usage provides the required performance for a variety of tasks, such as the C-RAN(Cloud-Radio-Access-Network). Wireless baseband signal processing for the 4G and 5G standards designates a range of tasks which must be executed in a specific time slot. For instance, the up-link stack of one 4G virtualized-base station was decomposed in more than 1000 tasks executable within 5ms. In 5G, the expected target latency for ultra-low latency use cases is 1ms in an end-to-end scenario; while the computational complexity is expected to be one to two orders of magnitude higher than that of 4G. It remains to be seen whether and how reaching such computational complexity is feasible. It is a crucial factor to characterize processing time variability besides features of memory model to guarantee a given processing time in mainstream computer clusters.Besides, the task scheduling on multicore systems is still an open issue. Such a problem needs to be analyzed in order to fully utilize the processing capacity and to achieve low processing latency. In order to tackle the inefficient utilization of CPU cores, a queueing-based data-driven task scheduling scheme, which focuses on local parallel computing, is proposed in this thesis. This thesis introduces multi-queue management for dynamic task scheduling to target 100% utilization of local CPU cores for sufficient input tasks. Finally, the thesis entails several simulations to verify the proposed task scheduling scheme. The reported results confirm its viability and efficiency. Moreover, cache memory usage is one of the primary sources of execution time variability. Besides, inefficient management of cache memory proves to be problematic in systems with which WCET(Worst-Case-Execution-Time) is of concern. An efficient cache managing approach needs to take both task scheduling and cache management into account simultaneously. Optimal cache-management imposes considering priorities associating with all tasks; the knowledge of such priorities is essential for detecting and avoiding system bottlenecks. Such approach proposes allocating adequate resources to such a critical task to facilitate better management. The work starts with the introduction of a simple, scalable, and configurable test method called an Array of Counters, the purpose of which is to characterize the processing time variations of multicore architectures. The technique helps to find system bottlenecks. Such help is conducive to a more optimized and enhanced cache-management algorithm

    P-SOCRATES: A Parallel Software Framework for Time-Critical Many-Core Systems

    Full text link
    The advent of next-generation many-core embedded platforms has the chance of intercepting a converging need for predictable high-performance coming from both the High-Performance Computing (HPC) and Embedded Computing (EC) domains. On one side, new kinds of HPC applications are being required by markets needing huge amounts of information to be processed within a bounded amount of time. On the other side, EC systems are increasingly concerned with providing higher performance in real-time, challenging the performance capabilities of current architectures. This converging demand, however, raises the problem about how to guarantee timing requirements in presence of parallel execution. This paper presents the approach of project P-SOCRATES for the design of an integrated framework for the execution of workload-intensive applications with real-time requirements on top of next-generation commercial-off-the-shelf (COTS) platforms based on many-core accelerated architectures. The time-criticality and parallelisation challenges are addressed by merging techniques coming from both HPC and EC domains, identifying the main sources of indeterminism and proposing efficient mapping and scheduling algorithms, along with the associated timing and schedulability analysis, to guarantee the real-time and performance requirements of the applications

    RNN-Based Radio Resource Management on Multicore RISC-V Accelerator Architectures

    Get PDF
    Radio resource management (RRM) is critical in 5G mobile communications due to its ubiquity on every radio device and its low latency constraints. The rapidly evolving RRM algorithms with low latency requirements combined with the dense and massive 5G base station deployment ask for an on-the-edge RRM acceleration system with a tradeoff between flexibility, efficiency, and cost-making application-specific instruction-set processors (ASIPs) an optimal choice. In this work, we start from a baseline, simple RISC-V core and introduce instruction extensions coupled with software optimizations for maximizing the throughput of a selected set of recently proposed RRM algorithms based on models using multilayer perceptrons (MLPs) and recurrent neural networks (RNNs). Furthermore, we scale from a single-ASIP to a multi-ASIP acceleration system to further improve RRM throughput. For the single-ASIP system, we demonstrate an energy efficiency of 218 GMAC/s/W and a throughput of 566 MMAC/s corresponding to an improvement of 10x and 10.6x, respectively, over the single-core system with a baseline RV32IMC core. For the multi-ASIP system, we analyze the parallel speedup dependency on the input and output feature map (FM) size for fully connected and LSTM layers, achieving up to 10.2x speedup with 16 cores over a single extended RI5CY core for single LSTM layers and a speedup of 13.8x for a single fully connected layer. On the full RRM benchmark suite, we achieve an average overall speedup of 16.4x, 25.2x, 31.9x, and 38.8x on two, four, eight, and 16 cores, respectively, compared to our single-core RV32IMC baseline implementation
    corecore