497 research outputs found

    Modelling, Dimensioning and Optimization of 5G Communication Networks, Resources and Services

    Get PDF
    This reprint aims to collect state-of-the-art research contributions that address challenges in the emerging 5G networks design, dimensioning and optimization. Designing, dimensioning and optimization of communication networks resources and services have been an inseparable part of telecom network development. The latter must convey a large volume of traffic, providing service to traffic streams with highly differentiated requirements in terms of bit-rate and service time, required quality of service and quality of experience parameters. Such a communication infrastructure presents many important challenges, such as the study of necessary multi-layer cooperation, new protocols, performance evaluation of different network parts, low layer network design, network management and security issues, and new technologies in general, which will be discussed in this book

    Co-designing reliability and performance for datacenter memory

    Get PDF
    Memory is one of the key components that affects reliability and performance of datacenter servers. Memory in today’s servers is organized and shared in several ways to provide the most performant and efficient access to data. For example, cache hierarchy in multi-core chips to reduce access latency, non-uniform memory access (NUMA) in multi-socket servers to improve scalability, disaggregation to increase memory capacity. In all these organizations, hardware coherence protocols are used to maintain memory consistency of this shared memory and implicitly move data to the requesting cores. This thesis aims to provide fault-tolerance against newer models of failure in the organization of memory in datacenter servers. While designing for improved reliability, this thesis explores solutions that can also enhance performance of applications. The solutions build over modern coherence protocols to achieve these properties. First, we observe that DRAM memory system failure rates have increased, demanding stronger forms of memory reliability. To combat this, the thesis proposes Dvé, a hardware driven replication mechanism where data blocks are replicated across two different memory controllers in a cache-coherent NUMA system. Data blocks are accompanied by a code with strong error detection capabilities so that when an error is detected, correction is performed using the replica. Dvé’s organization offers two independent points of access to data which enables: (a) strong error correction that can recover from a range of faults affecting any of the components in the memory and (b) higher performance by providing another nearer point of memory access. Dvé’s coherent replication keeps the replicas in sync for reliability and also provides coherent access to read replicas during fault-free operation for improved performance. Dvé can flexibly provide these benefits on-demand at runtime. Next, we observe that the coherence protocol itself requires to be hardened against failures. Memory in datacenter servers is being disaggregated from the compute servers into dedicated memory servers, driven by standards like CXL. CXL specifies the coherence protocol semantics for compute servers to access and cache data from a shared region in the disaggregated memory. However, the CXL specification lacks the requisite level of fault-tolerance necessary to operate at an inter-server scale within the datacenter. Compute servers can fail or be unresponsive in the datacenter and therefore, it is important that the coherence protocol remain available in the presence of such failures. The thesis proposes Āpta, a CXL-based, shared disaggregated memory system for keeping the cached data consistent without compromising availability in the face of compute server failures. Āpta architects a high-performance fault-tolerant object-granular memory server that significantly improves performance for stateless function-as-a-service (FaaS) datacenter applications

    AN EFFICIENT INTERFERENCE AVOIDANCE SCHEME FOR DEVICE-TODEVICE ENABLED FIFTH GENERATION NARROWBAND INTERNET OF THINGS NETWOKS’

    Get PDF
    Narrowband Internet of Things (NB-IoT) is a low-power wide-area (LPWA) technology built on long-term evolution (LTE) functionalities and standardized by the 3rd-Generation Partnership Project (3GPP). Due to its support for massive machine-type communication (mMTC) and different IoT use cases with rigorous standards in terms of connection, energy efficiency, reachability, reliability, and latency, NB-IoT has attracted the research community. However, as the capacity needs for various IoT use cases expand, the LTE evolved packet core (EPC) system's numerous functionalities may become overburdened and suboptimal. Several research efforts are currently in progress to address these challenges. As a result, an overview of these efforts with a specific focus on the optimized architecture of the LTE EPC functionalities, the 5G architectural design for NB-IoT integration, the enabling technologies necessary for 5G NB-IoT, 5G new radio (NR) coexistence with NB-IoT, and feasible architectural deployment schemes of NB-IoT with cellular networks is discussed. This thesis also presents cloud-assisted relay with backscatter communication as part of a detailed study of the technical performance attributes and channel communication characteristics from the physical (PHY) and medium access control (MAC) layers of the NB-IoT, with a focus on 5G. The numerous drawbacks that come with simulating these systems are explored. The enabling market for NB-IoT, the benefits for a few use cases, and the potential critical challenges associated with their deployment are all highlighted. Fortunately, the cyclic prefix orthogonal frequency division multiplexing (CPOFDM) based waveform by 3GPP NR for improved mobile broadband (eMBB) services does not prohibit the use of other waveforms in other services, such as the NB-IoT service for mMTC. As a result, the coexistence of 5G NR and NB-IoT must be manageably orthogonal (or quasi-orthogonal) to minimize mutual interference that limits the form of freedom in the waveform's overall design. As a result, 5G coexistence with NB-IoT will introduce a new interference challenge, distinct from that of the legacy network, even though the NR's coexistence with NB-IoT is believed to improve network capacity and expand the coverage of the user data rate, as well as improves robust communication through frequency reuse. Interference challenges may make channel estimation difficult for NB-IoT devices, limiting the user performance and spectral efficiency. Various existing interference mitigation solutions either add to the network's overhead, computational complexity and delay or are hampered by low data rate and coverage. These algorithms are unsuitable for an NB-IoT network owing to the low-complexity nature. As a result, a D2D communication based interference-control technique becomes an effective strategy for addressing this problem. This thesis used D2D communication to decrease the network bottleneck in dense 5G NBIoT networks prone to interference. For D2D-enabled 5G NB-IoT systems, the thesis presents an interference-avoidance resource allocation that considers the less favourable cell edge NUEs. To simplify the algorithm's computing complexity and reduce interference power, the system divides the optimization problem into three sub-problems. First, in an orthogonal deployment technique using channel state information (CSI), the channel gain factor is leveraged by selecting a probable reuse channel with higher QoS control. Second, a bisection search approach is used to find the best power control that maximizes the network sum rate, and third, the Hungarian algorithm is used to build a maximum bipartite matching strategy to choose the optimal pairing pattern between the sets of NUEs and the D2D pairs. The proposed approach improves the D2D sum rate and overall network SINR of the 5G NB-IoT system, according to the numerical data. The maximum power constraint of the D2D pair, D2D's location, Pico-base station (PBS) cell radius, number of potential reuse channels, and cluster distance impact the D2D pair's performance. The simulation results achieve 28.35%, 31.33%, and 39% SINR performance higher than the ARSAD, DCORA, and RRA algorithms when the number of NUEs is twice the number of D2D pairs, and 2.52%, 14.80%, and 39.89% SINR performance higher than the ARSAD, RRA, and DCORA when the number of NUEs and D2D pairs are equal. As a result, a D2D sum rate increase of 9.23%, 11.26%, and 13.92% higher than the ARSAD, DCORA, and RRA when the NUE’s number is twice the number of D2D pairs, and a D2D’s sum rate increase of 1.18%, 4.64% and 15.93% higher than the ARSAD, RRA and DCORA respectively, with an equal number of NUEs and D2D pairs is achieved. The results demonstrate the efficacy of the proposed scheme. The thesis also addressed the problem where the cell-edge NUE's QoS is critical to challenges such as long-distance transmission, delays, low bandwidth utilization, and high system overhead that affect 5G NB-IoT network performance. In this case, most cell-edge NUEs boost their transmit power to maximize network throughput. Integrating cooperating D2D relaying technique into 5G NB-IoT heterogeneous network (HetNet) uplink spectrum sharing increases the system's spectral efficiency and interference power, further degrading the network. Using a max-max SINR (Max-SINR) approach, this thesis proposed an interference-aware D2D relaying strategy for 5G NB-IoT QoS improvement for a cell-edge NUE to achieve optimum system performance. The Lagrangian-dual technique is used to optimize the transmit power of the cell-edge NUE to the relay based on the average interference power constraint, while the relay to the NB-IoT base station (NBS) employs a fixed transmit power. To choose an optimal D2D relay node, the channel-to-interference plus noise ratio (CINR) of all available D2D relays is used to maximize the minimum cell-edge NUE's data rate while ensuring the cellular NUEs' QoS requirements are satisfied. Best harmonic mean, best-worst, half-duplex relay selection, and a D2D communication scheme were among the other relaying selection strategies studied. The simulation results reveal that the Max-SINR selection scheme outperforms all other selection schemes due to the high channel gain between the two communication devices except for the D2D communication scheme. The proposed algorithm achieves 21.27% SINR performance, which is nearly identical to the half-duplex scheme, but outperforms the best-worst and harmonic selection techniques by 81.27% and 40.29%, respectively. As a result, as the number of D2D relays increases, the capacity increases by 14.10% and 47.19%, respectively, over harmonic and half-duplex techniques. Finally, the thesis presents future research works on interference control in addition with the open research directions on PHY and MAC properties and a SWOT (Strengths, Weaknesses, Opportunities, and Threats) analysis presented in Chapter 2 to encourage further study on 5G NB-IoT

    Exploiting data locality in cache-coherent NUMA systems

    Get PDF
    The end of Dennard scaling has caused a stagnation of the clock frequency in computers.To overcome this issue, in the last two decades vendors have been integrating larger numbers of processing elements in the systems, interconnecting many nodes, including multiple chips in the nodes and increasing the number of cores in each chip. The speed of main memory has not evolved at the same rate as processors, it is much slower and there is a need to provide more total bandwidth to the processors, especially with the increase in the number of cores and chips. Still keeping a shared address space, where all processors can access the whole memory, solutions have come by integrating more memories: by using newer technologies like high-bandwidth memories (HBM) and non-volatile memories (NVM), by giving groups cores (like sockets, for example) faster access to some subset of the DRAM, or by combining many of these solutions. This has caused some heterogeneity in the access speed to main memory, depending on the CPU requesting access to a memory address and the actual physical location of that address, causing non-uniform memory access (NUMA) behaviours. Moreover, many of these systems are cache-coherent (ccNUMA), meaning that changes in the memory done from one CPU must be visible by the other CPUs and transparent for the programmer. These NUMA behaviours reduce the performance of applications and can pose a challenge to the programmers. To tackle this issue, this thesis proposes solutions, at the software and hardware levels, to improve the data locality in NUMA systems and, therefore, the performance of applications in these computer systems. The first contribution shows how considering hardware prefetching simultaneously with thread and data placement in NUMA systems can find configurations with better performance than considering these aspects separately. The performance results combined with performance counters are then used to build a performance model to predict, both offline and online, the best configuration for new applications not in the model. The evaluation is done using two different high performance NUMA systems, and the performance counters collected in one machine are used to predict the best configurations in the other machine. The second contribution builds on the idea that prefetching can have a strong effect in NUMA systems and proposes a NUMA-aware hardware prefetching scheme. This scheme is generic and can be applied to multiple hardware prefetchers with a low hardware cost but giving very good results. The evaluation is done using a cycle-accurate architectural simulator and provides detailed results of the performance, the data transfer reduction and the energy costs. Finally, the third and last contribution consists in scheduling algorithms for task-based programming models. These programming models help improve the programmability of applications in parallel systems and also provide useful information to the underlying runtime system. This information is used to build a task dependency graph (TDG), a directed acyclic graph that models the application where the nodes are sequential pieces of code known as tasks and the edges are the data dependencies between the different tasks. The proposed scheduling algorithms use graph partitioning techniques and provide a scheduling for the tasks in the TDG that minimises the data transfers between the different NUMA regions of the system. The results have been evaluated in real ccNUMA systems with multiple NUMA regions.La fi de la llei de Dennard ha provocat un estancament de la freqüència de rellotge dels computadors. Amb l'objectiu de superar aquest fet, durant les darreres dues dècades els fabricants han integrat més quantitat d'unitats de còmput als sistemes mitjançant la interconnexió de nodes diferents, la inclusió de múltiples xips als nodes i l'increment de nuclis de processador a cada xip. La rapidesa de la memòria principal no ha evolucionat amb el mateix factor que els processadors; és molt més lenta i hi ha la necessitat de proporcionar més ample de banda als processadors, especialment amb l'increment del nombre de nuclis i xips. Tot mantenint un adreçament compartit en el qual tots els processadors poden accedir a la memòria sencera, les solucions han estat al voltant de la integració de més memòries: amb tecnologies modernes com HBM (high-bandwidth memories) i NVM (non-volatile memories), fent que grups de nuclis (com sòcols sencers) tinguin accés més ràpid a una part de la DRAM o amb la combinació de solucions. Això ha provocat una heterogeneïtat en la velocitat d'accés a la memòria principal, en funció del nucli que sol·licita l'accés a una adreça en particular i la seva localització física, fet que provoca uns comportaments no uniformes en l'accés a la memòria (non-uniform memory access, NUMA). A més, sovint tenen memòries cau coherents (cache-coherent NUMA, ccNUMA), que implica que qualsevol canvi fet a la memòria des d'un nucli d'un processador ha de ser visible la resta de manera transparent. Aquests comportaments redueixen el rendiment de les aplicacions i suposen un repte. Per abordar el problema, a la tesi s'hi proposen solucions, a nivell de programari i maquinari, que milloren la localitat de dades als sistemes NUMA i, en conseqüència, el rendiment de les aplicacions en aquests sistemes. La primera contribució mostra que, quan es tenen en compte alhora la precàrrega d'adreces de memòria amb maquinari (hardware prefetching) i les decisions d'ubicació dels fils d'execució i les dades als sistemes NUMA, es poden trobar millors configuracions que quan es condieren per separat. Una combinació dels resultats de rendiment i dels comptadors disponibles al sistema s'utilitza per construir un model de rendiment per fer la predicció, tant per avançat com també en temps d'execució, de la millor configuració per aplicacions que no es troben al model. L'avaluació es du a terme a dos sistemes NUMA d'alt rendiment, i els comptadors mesurats en un sistema s'usen per predir les millors configuracions a l'altre sistema. La segona contribució es basa en la idea que el prefetching pot tenir un efecte considerable als sistemes NUMA i proposa un esquema de precàrrega a nivell de maquinari que té en compte els efectes NUMA. L'esquema és genèric i es pot aplicar als algorismes de precàrrega existents amb un cost de maquinari molt baix però amb molt bons resultats. S'avalua amb un simulador arquitectural acurat a nivell de cicle i proporciona resultats detallats del rendiment, la reducció de les comunicacions de dades i els costos energètics. La tercera i darrera contribució consisteix en algorismes de planificació per models de programació basats en tasques. Aquests simplifiquen la programabilitat de les aplicacions paral·leles i proveeixen informació molt útil al sistema en temps d'execució (runtime system) que en controla el funcionament. Amb aquesta informació es construeix un graf de dependències entre tasques (task dependency graph, TDG), un graf dirigit i acíclic que modela l'aplicació i en el qual els nodes són fragments de codi seqüencial (o tasques) i els arcs són les dependències de dades entre les tasques. Els algorismes de planificació proposats fan servir tècniques de particionat de grafs i proporcionen una planificació de les tasques del TDG que minimitza la comunicació de dades entre les diferents regions NUMA del sistema. Els resultats han estat avaluats en sistemes ccNUMA reals amb múltiples regions NUMA.El final de la ley de Dennard ha provocado un estancamiento de la frecuencia de reloj de los computadores. Con el objetivo de superar este problema, durante las últimas dos décadas los fabricantes han integrado más unidades de cómputo en los sistemas mediante la interconexión de nodos diferentes, la inclusión de múltiples chips en los nodos y el incremento de núcleos de procesador en cada chip. La rapidez de la memoria principal no ha evolucionado con el mismo factor que los procesadores; es mucho más lenta y hay la necesidad de proporcionar más ancho de banda a los procesadores, especialmente con el incremento del número de núcleos y chips. Aun manteniendo un sistema de direccionamiento compartido en el que todos los procesadores pueden acceder al conjunto de la memoria, las soluciones han oscilado alrededor de la integración de más memorias: usando tecnologías modernas como las memorias de alto ancho de banda (highbandwidth memories, HBM) y memorias no volátiles (non-volatile memories, NVM), haciendo que grupos de núcleos (como zócalos completos) tengan acceso más veloz a un subconjunto de la DRAM, o con la combinación de soluciones. Esto ha provocado una heterogeneidad en la velocidad de acceso a la memoria principal, en función del núcleo que solicita el acceso a una dirección de memoria en particular y la ubicación física de esta dirección, lo que provoca unos comportamientos no uniformes en el acceso a la memoria (non-uniform memory access, NUMA). Además, muchos de estos sistemas tienen memorias caché coherentes (cache-coherent NUMA, ccNUMA), lo que implica que cualquier cambio hecho en la memoria desde un núcleo de un procesador debe ser visible por el resto de procesadores de forma transparente para los programadores. Estos comportamientos NUMA reducen el rendimiento de las aplicaciones y pueden suponer un reto para los programadores. Para abordar dicho problema, en esta tesis se proponen soluciones, a nivel de software y hardware, que mejoran la localidad de datos en los sistemas NUMA y, en consecuencia, el rendimiento de las aplicaciones en estos sistemas informáticos. La primera contribución muestra que, cuando se tienen en cuenta a la vez la precarga de direcciones de memoria mediante hardware (o hardware prefetching ) y las decisiones de la ubicación de los hilos de ejecución y los datos en los sistemas NUMA, se pueden hallar mejores configuraciones que cuando se consideran ambos aspectos por separado. Con una combinación de los resultados de rendimiento y de los contadores disponibles en el sistema se construye un modelo de rendimiento, tanto por avanzado como en en tiempo de ejecución, de la mejor configuración para aplicaciones que no están incluidas en el modelo. La evaluación se realiza en dos sistemas NUMA de alto rendimiento, y los contadores medidos en uno de los sistemas se usan para predecir las mejores configuraciones en el otro sistema. La segunda contribución se basa en la idea de que el prefetching puede tener un efecto considerable en los sistemas NUMA y propone un esquema de precarga a nivel hardware que tiene en cuenta los efectos NUMA. Este esquema es genérico y se puede aplicar a diferentes algoritmos de precarga existentes con un coste de hardware muy bajo pero que proporciona muy buenos resultados. Dichos resultados se obtienen y evalúan mediante un simulador arquitectural preciso a nivel de ciclo y proporciona resultados detallados del rendimiento, la reducción de las comunicaciones de datos y los costes energéticos. Finalmente, la tercera y última contribución consiste en algoritmos de planificación para modelos de programación basados en tareas. Estos modelos simplifican la programabilidad de las aplicaciones paralelas y proveen información muy útil al sistema en tiempo de ejecución (runtime system) que controla su funcionamiento. Esta información se utiliza para construir un grafo de dependencias entre tareas (task dependency graph, TDG), un grafo dirigido y acíclico que modela la aplicación y en el ue los nodos son fragmentos de código secuencial, conocidos como tareas, y los arcos son las dependencias de datos entre las distintas tareas. Los algoritmos de planificación que se proponen usan técnicas e particionado de grafos y proporcionan una planificación de las tareas del TDG que minimiza la comunicación de datos entre las distintas regiones NUMA del sistema. Los resultados se han evaluado en sistemas ccNUMA reales con múltiples regiones NUMA.Postprint (published version

    5G: 2020 and Beyond

    Get PDF
    The future society would be ushered in a new communication era with the emergence of 5G. 5G would be significantly different, especially, in terms of architecture and operation in comparison with the previous communication generations (4G, 3G...). This book discusses the various aspects of the architecture, operation, possible challenges, and mechanisms to overcome them. Further, it supports users? interac- tion through communication devices relying on Human Bond Communication and COmmunication-NAvigation- SENsing- SErvices (CONASENSE).Topics broadly covered in this book are; • Wireless Innovative System for Dynamically Operating Mega Communications (WISDOM)• Millimeter Waves and Spectrum Management• Cyber Security• Device to Device Communicatio

    Performance Optimization of Many-core Systems by Exploiting Task Migration and Dark Core Allocation

    Get PDF
    As an effective scheme often adopted for performance tuning in many-core processors, task migration provides an opportunity for "hot" tasks to be migrated to run on a "cool" core that has a lower temperature. When a task needs to migrate from one processor core to another, the migration can embark on numerous modes defined by the migration paths undertaken and/or the destinations of the migration. Selecting the right migration mode that a task shall follow has always been difficult, and it can be more challenging with the existence of dark cores that can be called back to service (reactivated), which ushers in additional task migration modes. Previous works have demonstrated that dark cores can be placed near the active cores to reduce power density so that the active cores can run at higher voltage/frequency levels for higher performance. However, the existing task migration schemes neither consider the impact of dark cores on each application's performance, nor exploit performance trade-off under different migration modes. Unlike the existing task migration schemes, in this paper, a runtime task migration algorithm that simultaneously takes both migration modes and dark cores into consideration is proposed, and it essentially has two major steps. In the first step, for a specific migration mode that is tied to an application whose tasks need to be migrated, the number of dark cores is determined so that the overall performance is maximized. The second step is to find an appropriate core region and its location for each application to optimize the communication latency and computation performance; during this step, focus is placed on reducing the fragmentation of the free core regions resulting from the task migration. Experimental results have confirmed that our approach achieves over 50% reduction in total response time when compared to recently proposed thermal-aware runtime task migration approachess

    Toward a Bio-Inspired System Architecting Framework: Simulation of the Integration of Autonomous Bus Fleets & Alternative Fuel Infrastructures in Closed Sociotechnical Environments

    Get PDF
    Cities are set to become highly interconnected and coordinated environments composed of emerging technologies meant to alleviate or resolve some of the daunting issues of the 21st century such as rapid urbanization, resource scarcity, and excessive population demand in urban centers. These cybernetically-enabled built environments are expected to solve these complex problems through the use of technologies that incorporate sensors and other data collection means to fuse and understand large sums of data/information generated from other technologies and its human population. Many of these technologies will be pivotal assets in supporting and managing capabilities in various city sectors ranging from energy to healthcare. However, among these sectors, a significant amount of attention within the recent decade has been in the transportation sector due to the flood of new technological growth and cultivation, which is currently seeing extensive research, development, and even implementation of emerging technologies such as autonomous vehicles (AVs), the Internet of Things (IoT), alternative xxxvi fueling sources, clean propulsion technologies, cloud/edge computing, and many other technologies. Within the current body of knowledge, it is fairly well known how many of these emerging technologies will perform in isolation as stand-alone entities, but little is known about their performance when integrated into a transportation system with other emerging technologies and humans within the system organization. This merging of new age technologies and humans can make analyzing next generation transportation systems extremely complex to understand. Additionally, with new and alternative forms of technologies expected to come in the near-future, one can say that the quantity of technologies, especially in the smart city context, will consist of a continuously expanding array of technologies whose capabilities will increase with technological advancements, which can change the performance of a given system architecture. Therefore, the objective of this research is to understand the system architecture implications of integrating different alternative fueling infrastructures with autonomous bus (AB) fleets in the transportation system within a closed sociotechnical environment. By being able to understand the system architecture implications of alternative fueling infrastructures and AB fleets, this could provide performance-based input into a more sophisticated approach or framework which is proposed as a future work of this research

    3-я Міжнародна конференція зі сталого майбутнього: екологічні, технологічні, соціальні та економічні аспекти (ICSF 2022) 24-27 травня 2022 року, м. Кривий Ріг, Україна

    Get PDF
    Матеріали 3-ої Міжнародної конференції зі сталого майбутнього: екологічні, технологічні, соціальні та економічні аспекти (ICSF 2022) 24-27 травня 2022 року, м. Кривий Ріг, Україна.Proceedings of the 3rd International Conference on Sustainable Futures: Environmental, Technological, Social and Economic Matters (ICSF 2022) 24-27 May 2022, Kryvyi Rih, Ukraine

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum
    corecore