689 research outputs found

    Exploration of the scalability of SIMD processing for software defined radio

    Get PDF
    The idea of software defined radio (SDR) describes a signal processing system for wireless communications that allows performing major parts of the physical layer processing in software. SDR systems are more flexible and have lower development costs than traditional systems based on application-specific integrated circuits (ASICs). Yet, SDR requires programmable processor architectures that can meet the throughput and energy efficiency requirements of current third generation (3G) and future fourth generation (4G) wireless standards for mobile devices. Single instruction, multiple data (SIMD) processors operate on long data vectors in parallel data lanes and can achieve a good ratio of computing power to energy consumption. Hence, SIMD processors could be the basis of future SDR systems. Yet, SIMD processors only achieve a high efficiency if all parallel data lanes can be utilized. This thesis investigates the scalability of SIMD processing for algorithms required in 4G wireless systems; i. e. the scaling of performance and energy consumption with increasing SIMD vector lengths is explored. The basis of the exploration is a scalable SIMD processor architecture, which also supports long instruction word (LIW) execution and can be configured with four different permutation networks for vector element permutations. Radix-2 and mixed-radix fast Fourier transform (FFT) algorithms, sphere decoding for multiple input, multiple output (MIMO) systems, and the decoding of quasi-cyclic lowdensity parity check (LDPC) codes have been examined, as these are key algorithms for 4G wireless systems. The results show that the performance of all algorithms scales with the SIMD vector length, yet there are different constraints on the ratios between algorithm and architecture parameters. The radix-2 FFT algorithm allows close to linear speedups if the FFT size is at least twice the SIMD vector length, the mixed-radix FFT algorithm requires the FFT size to be a multiple of the squared SIMD width. The performance of the implemented sphere decoding algorithm scales linearly with the SIMD vector length. The scalability of LDPC decoding is determined by the expansion factor of the quasicyclic code. Wider SIMD processors offer better performance and also require less energy than processors with a shorter vector length for all considered algorithms. The results for different permutations networks show that a simple permutation network is sufficient for most applications

    Optimización de problemas de varios objetivos desde un enfoque de eficiencia energética aplicado a redes celulares heterogéneas 5G usando un marco de conmutación de celdas pequeñas

    Get PDF
    This Ph.D. dissertation addresses the Many-Objective Optimization Problem (MaOP) study to reduce the inter-cell interference and the power consumption for realistic Centralized, Collaborative, Cloud, and Clean Radio Access Networks (C-RANs). It uses the Cell Switch-Off (CSO) scheme to switch-off/on Remote Radio Units (RRUs) and the Coordinated Scheduling (CS) technique to allocate resource blocks smartly. The EF1-NSGA-III (It is a variation of the NSGA-III algorithm that uses the front 1 to find extreme points at the normalization procedure extended in this thesis) algorithm is employed to solve a proposed Many-Objective Optimization Problem (MaOP). It is composed of four objective functions, four constraints, and two decision variables. However, the above problem is redefined to have three objective functions to see the performance comparison between the NSGA-II and EF1-NSGA-III algorithms. The OpenAirInterface (OAI) platform is used to evaluate and validate the performance of an indoor coverage system because most of the user-end equipment of next-generation cellular networks will be in an indoor environment. It constitutes the fastest growing 5G open-source platform that implements 3GPP technology on general-purpose computers, fast Ethernet transport ports, and Commercial-Off-The-Shelf (COTS) software-defined radio hardware. This document is composed of five contributions. The first one is a survey about testbed, emulators, and simulators for 4G/5G cellular networks. The second one is the extension of the KanGAL's NSGA-II code to implement the EF1-NSGA-III, adaptive EF1-NSGA-III (A-EF1-NSGA-III), and efficient adaptive EF1-NSGA-III (A2^2-EF1-NSGA-III). They support up to 10 objective functions, manage real, integer, and binary decision variables, and many constraints. The above algorithms outperform other works in terms of the Inverted Generational Distance (IGD) metric. The third contribution is the implementation of real-time emulation methodologies for C-RANs using a frequency domain representation in OAI. It improves the average computation time 10-fold compared to the time domain without using Radio Frequency hardware and avoids their uncertainties. The fourth one is the implementation of the Coordination Scheduling (CS) technique as a proof-of-concept to validate the advantages of frequency domain methodologies and to allocate resource blocks dynamically among RRUs. Finally, a many-objective optimization problem is defined and solved with evolutionary algorithms where diversity is managed based on crowded-distance and reference points to reduce the power consumption for C-RANs. The solutions obtained are considered to control the scheduling task at the Radio Cloud Center (RCC) and to switch RRUs.Este disertación aborda el estudio del problema de optimización de varios objetivos (MaOP) para reducir la interferencia entre células y el consumo de energía para redes de acceso de radio en tiempo real, colaborativas, en la nube y limpias (C-RAN). Utiliza el esquema de conmutacion de celdas (CSO) para apagar / encender unidades de radio remotas (RRU) y la técnica de programación coordinada (CS) para asignar bloques de recursos de manera inteligente. El algoritmo EF1-NSGA-III (es una variación del algoritmo NSGA-III que usa el primer frente de pareto para encontrar puntos extremos en el procedimiento de normalización extendido en esta tesis) se utiliza para resolver un problema de optimización de varios objetivos (MaOP) propuesto. Se compone de cuatro funciones objetivos, cuatro restricciones y dos variables de decisión. Sin embargo, el problema anterior se redefine para tener tres funciones objetivas para ver la comparación de rendimiento entre los algoritmos NSGA-II y EF1-NSGA-III. La plataforma OpenAirInterface (OAI) se utiliza para evaluar y validar el rendimiento de un sistema de cobertura en interiores porque la mayoría del equipos móviles de las redes celulares de próxima generación estarán en un entorno interior. Ella constituye la plataforma de código abierto 5G de más rápido crecimiento que implementa la tecnología 3GPP en computadoras de uso general, puertos de transporte Ethernet rápidos y hardware de radio definido por software comercial (COTS). Este documento se compone de cinco contribuciones. La primera es una estudio sobre banco de pruebas, emuladores y simuladores para redes celulares 4G / 5G. El segundo es la extensión del código NSGA-II de KanGAL para implementar EF1-NSGA-III, EF1-NSGA-III adaptativo (A-EF1-NSGA-III) y EF1-NSGA-III adaptativo eficiente (A 2 ^ 2 -EF1-NSGA-III). Admiten hasta 10 funciones objetivas, gestionan variables de decisión reales, enteras y binarias, y muchas restricciones. Los algoritmos anteriores superan a otros trabajos en términos de la métrica de distancia generacional invertida (IGD). La tercera contribución es la implementación de metodologías de emulación en tiempo real para C-RAN utilizando una representación de dominio de frecuencia en OAI. Mejora el tiempo de cálculo promedio 10 veces en comparación con el dominio del tiempo sin usar hardware de radiofrecuencia y evita sus incertidumbres. El cuarto es la implementación de la técnica de Programación de Coordinación (CS) como prueba de concepto para validar las ventajas de las metodologías de dominio de frecuencia y asignar bloques de recursos dinámicamente entre las RRU. Finalmente, un problema de optimización de muchos objetivos se define y resuelve con algoritmos evolutivos en los que la diversidad se gestiona en función de la distancia de crouding y los puntos de referencia para reducir el consumo de energía de las C-RAN. Las soluciones obtenidas controlan la tarea de programación en Radio Cloud Center (RCC) y conmutan las RRU.Proyecto personal: Redes celulares de próxima generaciónDoctorad

    State of the art baseband DSP platforms for Software Defined Radio: A survey

    Get PDF
    Software Defined Radio (SDR) is an innovative approach which is becoming a more and more promising technology for future mobile handsets. Several proposals in the field of embedded systems have been introduced by different universities and industries to support SDR applications. This article presents an overview of current platforms and analyzes the related architectural choices, the current issues in SDR, as well as potential future trends.Peer reviewe

    Hardware-Aware Algorithm Designs for Efficient Parallel and Distributed Processing

    Get PDF
    The introduction and widespread adoption of the Internet of Things, together with emerging new industrial applications, bring new requirements in data processing. Specifically, the need for timely processing of data that arrives at high rates creates a challenge for the traditional cloud computing paradigm, where data collected at various sources is sent to the cloud for processing. As an approach to this challenge, processing algorithms and infrastructure are distributed from the cloud to multiple tiers of computing, closer to the sources of data. This creates a wide range of devices for algorithms to be deployed on and software designs to adapt to.In this thesis, we investigate how hardware-aware algorithm designs on a variety of platforms lead to algorithm implementations that efficiently utilize the underlying resources. We design, implement and evaluate new techniques for representative applications that involve the whole spectrum of devices, from resource-constrained sensors in the field, to highly parallel servers. At each tier of processing capability, we identify key architectural features that are relevant for applications and propose designs that make use of these features to achieve high-rate, timely and energy-efficient processing.In the first part of the thesis, we focus on high-end servers and utilize two main approaches to achieve high throughput processing: vectorization and thread parallelism. We employ vectorization for the case of pattern matching algorithms used in security applications. We show that re-thinking the design of algorithms to better utilize the resources available in the platforms they are deployed on, such as vector processing units, can bring significant speedups in processing throughout. We then show how thread-aware data distribution and proper inter-thread synchronization allow scalability, especially for the problem of high-rate network traffic monitoring. We design a parallelization scheme for sketch-based algorithms that summarize traffic information, which allows them to handle incoming data at high rates and be able to answer queries on that data efficiently, without overheads.In the second part of the thesis, we target the intermediate tier of computing devices and focus on the typical examples of hardware that is found there. We show how single-board computers with embedded accelerators can be used to handle the computationally heavy part of applications and showcase it specifically for pattern matching for security-related processing. We further identify key hardware features that affect the performance of pattern matching algorithms on such devices, present a co-evaluation framework to compare algorithms, and design a new algorithm that efficiently utilizes the hardware features.In the last part of the thesis, we shift the focus to the low-power, resource-constrained tier of processing devices. We target wireless sensor networks and study distributed data processing algorithms where the processing happens on the same devices that generate the data. Specifically, we focus on a continuous monitoring algorithm (geometric monitoring) that aims to minimize communication between nodes. By deploying that algorithm in action, under realistic environments, we demonstrate that the interplay between the network protocol and the application plays an important role in this layer of devices. Based on that observation, we co-design a continuous monitoring application with a modern network stack and augment it further with an in-network aggregation technique. In this way, we show that awareness of the underlying network stack is important to realize the full potential of the continuous monitoring algorithm.The techniques and solutions presented in this thesis contribute to better utilization of hardware characteristics, across a wide spectrum of platforms. We employ these techniques on problems that are representative examples of current and upcoming applications and contribute with an outlook of emerging possibilities that can build on the results of the thesis

    Design and Implementation of Software Defined Radios on a Homogeneous Multi-Processor Architecture

    Get PDF
    In the wireless communications domain, multi-mode and multi-standard platforms are becoming increasingly the central focus of system architects. In fact, mobile terminal users require more and more mobility and throughput, pushing towards a fully integrated radio system able to support different communication protocols running concurrently on the platform. A new concept of radio system was introduced to meet the users' expectations. Flexible radio platforms have became an indispensable requirement to meet the expectations of the users today and in the future. This thesis deals with issues related to the design of flexible radio platforms. In particular, the flexibility of the radio system is achieved through the concept of software defined radios (SDRs). The research work focuses on the utilization of homogeneous multi-processor (MP) architectures as a feasible way to efficiently implement SDR platforms. In fact, platforms based on MP architectures are able to deliver high performance together with a high degree of flexibility. Moreover, homogeneous MP platforms are able to reduce design and verification costs as well as provide a high scalability in terms of software and hardware. However, homogeneous MP architectures provide less computational efficiency when compared to heterogeneous solutions. This thesis can be divided into two parts: the first part is related to the implementation of a reference platform while the second part of the thesis introduces the design and implementation of flexible, high performance, power and energy efficient algorithms for wireless communications. The proposed reference platform, Ninesilica, is a homogeneous MP architecture composed of a 3x3 mesh of processing nodes (PNs), interconnected by a hierarchical Network-on-Chip (NoC). Each PN hosts as Processing Element (PE) a processor core. To improve the computational efficiency of the platform, different power and energy saving techniques have been investigated. In the design, implementation and mapping of the algorithms, the following constraints were considered: energy and power efficiency, high scalability of the platform, portability of the solutions across similar platforms, and parallelization efficiency. Ninesilica architecture together with the proposed algorithm implementations showed that homogeneous MP architectures are highly scalable platforms, both in terms of hardware and software. Furthermore, Ninesilica architecture demonstrated that homogeneous MPs are able to achieve high parallelization efficiency as well as high energy and power savings, meeting the requirements of SDRs as well as enabling cognitive radios. Ninesilica can be utilized as a stand-alone block or as an elementary building block to realize clustered many-core architectures. Moreover, the obtained results, in terms of parallelization efficiency as well as power and energy efficiency are independent of the type of PE utilized, ensuring the portability of the results to similar architectures based on a different type of processing element

    Design and Implementation of Software Defined Radios on a Homogeneous Multi-Processor Architecture

    Get PDF
    In the wireless communications domain, multi-mode and multi-standard platforms are becoming increasingly the central focus of system architects. In fact, mobile terminal users require more and more mobility and throughput, pushing towards a fully integrated radio system able to support different communication protocols running concurrently on the platform. A new concept of radio system was introduced to meet the users' expectations. Flexible radio platforms have became an indispensable requirement to meet the expectations of the users today and in the future. This thesis deals with issues related to the design of flexible radio platforms. In particular, the flexibility of the radio system is achieved through the concept of software defined radios (SDRs). The research work focuses on the utilization of homogeneous multi-processor (MP) architectures as a feasible way to efficiently implement SDR platforms. In fact, platforms based on MP architectures are able to deliver high performance together with a high degree of flexibility. Moreover, homogeneous MP platforms are able to reduce design and verification costs as well as provide a high scalability in terms of software and hardware. However, homogeneous MP architectures provide less computational efficiency when compared to heterogeneous solutions. This thesis can be divided into two parts: the first part is related to the implementation of a reference platform while the second part of the thesis introduces the design and implementation of flexible, high performance, power and energy efficient algorithms for wireless communications. The proposed reference platform, Ninesilica, is a homogeneous MP architecture composed of a 3x3 mesh of processing nodes (PNs), interconnected by a hierarchical Network-on-Chip (NoC). Each PN hosts as Processing Element (PE) a processor core. To improve the computational efficiency of the platform, different power and energy saving techniques have been investigated. In the design, implementation and mapping of the algorithms, the following constraints were considered: energy and power efficiency, high scalability of the platform, portability of the solutions across similar platforms, and parallelization efficiency. Ninesilica architecture together with the proposed algorithm implementations showed that homogeneous MP architectures are highly scalable platforms, both in terms of hardware and software. Furthermore, Ninesilica architecture demonstrated that homogeneous MPs are able to achieve high parallelization efficiency as well as high energy and power savings, meeting the requirements of SDRs as well as enabling cognitive radios. Ninesilica can be utilized as a stand-alone block or as an elementary building block to realize clustered many-core architectures. Moreover, the obtained results, in terms of parallelization efficiency as well as power and energy efficiency are independent of the type of PE utilized, ensuring the portability of the results to similar architectures based on a different type of processing element

    Fast algorithm for real-time rings reconstruction

    Get PDF
    The GAP project is dedicated to study the application of GPU in several contexts in which real-time response is important to take decisions. The definition of real-time depends on the application under study, ranging from answer time of μs up to several hours in case of very computing intensive task. During this conference we presented our work in low level triggers [1] [2] and high level triggers [3] in high energy physics experiments, and specific application for nuclear magnetic resonance (NMR) [4] [5] and cone-beam CT [6]. Apart from the study of dedicated solution to decrease the latency due to data transport and preparation, the computing algorithms play an essential role in any GPU application. In this contribution, we show an original algorithm developed for triggers application, to accelerate the ring reconstruction in RICH detector when it is not possible to have seeds for reconstruction from external trackers