266 research outputs found

    Parallel and Distributed Computing

    Get PDF
    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

    Design synthesis for dynamically reconfigurable logic systems

    Get PDF
    Dynamic reconfiguration of logic circuits has been a research problem for over four decades. While applications using logic reconfiguration in practical scenarios have been demonstrated, the design of these systems has proved to be a difficult process demanding the skills of an experienced reconfigurable logic design expert. This thesis proposes an automatic synthesis method which relieves designers of some of the difficulties associated with designing partially dynamically reconfigurable systems. A new design abstraction model for reconfigurable systems is proposed in order to support design exploration using the presented method. Given an input behavioural model, a technology server and a set of design constraints, the method will generate a reconfigurable design solution in the form of a 3D floorplan and a configuration schedule. The approach makes use of genetic algorithms. It facilitates global optimisation to accommodate multiple design objectives common in reconfigurable system design, while making realistic estimates of configuration overheads and of the potential for resource sharing between configurations. A set of custom evolutionary operators has been developed to cope with a multiple-objective search space. Furthermore, the application of a simulation technique verifying the lll results of such an automatic exploration is outlined in the thesis. The qualities of the proposed method are evaluated using a set of benchmark designs taking data from a real reconfigurable logic technology. Finally, some extensions to the proposed method and possible research directions are discussed

    Characterisation of a reconfigurable free space optical interconnect system for parallel computing applications and experimental validation using rapid prototyping technology

    Get PDF
    Free-space optical interconnects (FSOIs) are widely seen as a potential solution to present and future bandwidth bottlenecks for parallel processing applications. This thesis will be focused on the study of a particular FSOI system called Optical Highway (OH). The OH is a polarised beam routing system which uses Polarising Beam Splitters and Liquid Crystals (PBS/LC) assemblies to perform reconfigurable interconnection networks. The properties of the OH make it suitable for implementing different passive static networks. A technology known as Rapid Prototyping (RP) will be employed for the first time in order to create optomechanical structures at low cost and low production times. Off-theshelf optical components will also be characterised in order to implement the OH. Additionally, properties such as reconfigurability, scalability, tolerance to misalignment and polarisation losses will be analysed. The OH will be modelled at three levels: node, optical stage and architecture. Different designs will be proposed and a particular architecture, Optimised Cut-Through Ring (OCTR), will be experimentally implemented. Finally, based on this architecture, a new set of properties will be defined in order to optimise the efficiency of the optical channels

    Principles of Neuromorphic Photonics

    Full text link
    In an age overrun with information, the ability to process reams of data has become crucial. The demand for data will continue to grow as smart gadgets multiply and become increasingly integrated into our daily lives. Next-generation industries in artificial intelligence services and high-performance computing are so far supported by microelectronic platforms. These data-intensive enterprises rely on continual improvements in hardware. Their prospects are running up against a stark reality: conventional one-size-fits-all solutions offered by digital electronics can no longer satisfy this need, as Moore's law (exponential hardware scaling), interconnection density, and the von Neumann architecture reach their limits. With its superior speed and reconfigurability, analog photonics can provide some relief to these problems; however, complex applications of analog photonics have remained largely unexplored due to the absence of a robust photonic integration industry. Recently, the landscape for commercially-manufacturable photonic chips has been changing rapidly and now promises to achieve economies of scale previously enjoyed solely by microelectronics. The scientific community has set out to build bridges between the domains of photonic device physics and neural networks, giving rise to the field of \emph{neuromorphic photonics}. This article reviews the recent progress in integrated neuromorphic photonics. We provide an overview of neuromorphic computing, discuss the associated technology (microelectronic and photonic) platforms and compare their metric performance. We discuss photonic neural network approaches and challenges for integrated neuromorphic photonic processors while providing an in-depth description of photonic neurons and a candidate interconnection architecture. We conclude with a future outlook of neuro-inspired photonic processing.Comment: 28 pages, 19 figure

    Cross-Layer Design of Highly Scalable and Energy-Efficient AI Accelerator Systems Using Photonic Integrated Circuits

    Get PDF
    Artificial Intelligence (AI) has experienced remarkable success in recent years, solving complex computational problems across various domains, including computer vision, natural language processing, and pattern recognition. Much of this success can be attributed to the advancements in deep learning algorithms and models, particularly Artificial Neural Networks (ANNs). In recent times, deep ANNs have achieved unprecedented levels of accuracy, surpassing human capabilities in some cases. However, these deep ANN models come at a significant computational cost, with billions to trillions of parameters. Recent trends indicate that the number of parameters per ANN model will continue to grow exponentially in the foreseeable future. To meet the escalating computational demands of ANN models, the hardware accelerators used for processing ANNs must offer lower latency and higher energy efficiency. Unfortunately, traditional electronic implementations of ANN hardware accelerators, including CPUs, Graphics Processing Units (GPUs), Application-Specific Integrated Circuits (ASICs), and Field Programmable Gate Arrays (FPGAs), have fallen short of meeting the latency and energy efficiency requirements for processing deep ANN models. Furthermore, the interconnection network subsystems in these electronic accelerator systems, designed to facilitate large-scale data transfers between processing cores and memory/control units within the accelerator systems, have become bottlenecks that hinder the throughput, latency, and energy efficiency of deep ANN model processing. Fortunately, Photonic Integrated Circuits (PICs)-based accelerator systems, featuring photonic network subsystems are promising alternatives to conventional electronic accelerators. PIC-based accelerator systems operate in the optical domain, delivering processing at the speed of light with ultra-low latency, minimal dynamic energy consumption, and high throughput. These advantages stem from the wavelength division multiplexing capabilities and the absence of distance-dependent impedance in PICs. Furthermore, these characteristics enable the implementation of high-performance photonic network subsystems within PIC-based accelerator systems. Additionally, PIC-based accelerator systems offer inherent optical nonlinearities. Despite these numerous advantages over electronic accelerators, PIC-based systems still encounter several challenges due to limited optical power budget, susceptibility to crosstalk and other sources of noise caused by the analog operation, high area consumption, and restricted functional flexibility of PICs. These challenges manifest in various ways. (i) The existence of a significant trade-off between the achievable processing core size and the supported bit precision that impedes the scalability of processing cores. (ii) The limited reconfigurability, in terms of supported computing size and precision, makes them less adaptable to modern ANN models with diverse computational and precision demands. (iii) The reliance on electronic adder networks for accumulation diminishes the latency and energy consumption benefits of PIC-based accelerator systems due to frequent analog-to-digital conversions and memory accesses involved in accumulations. My research has contributed several solutions that overcome a multitude of these challenges and improve the throughput, energy efficiency, and flexibility of PIC-based AI accelerator systems. I identified and analyzed factors that affect the scalability and reconfigurability of PIC-based AI accelerator systems. I proposed several novel PIC-based accelerator architectures with enhancements at the circuit level, architecture level, and system level to improve scalability, reconfigurability, and functional flexibility. At the circuit level, these enhancements serve to decrease optical signal losses, reduce control complexity, enable adaptability for various ANN processing tasks, and lower power and area consumption. The architecture-level improvements mitigate crosstalk noise, facilitate functional reconfigurability, enable in-situ and flexible spatio-temporal accumulation, and provide flexible support for different dataflows. The system-level enhancements involve the integration of stochastic computing with PIC-based accelerators to break the inherent trade-off between scalability and supported bit precision. Additionally, applying stochastic computing enhances the flexibility of PIC-based accelerators, allowing them to support mixed-precision ANN models. These cross-layer enhancements collectively contribute to the design of PIC-based AI accelerator systems, resulting in improved throughput, energy efficiency, scalability, and reconfigurability

    Photonic logic-gates: boosting all-optical header processing in future packet-switched networks

    Full text link
    Las redes ópticas de paquetes se han convertido en los últimos años en uno de los temas de vanguardia en el campo de las tecnologías de comunicaciones. El procesado de cabeceras es una de las funciones más importantes que se llevan a cabo en nodos intermedios, donde un paquete debe ser encaminado a su destino correspondiente. El uso de tecnología completamente óptica para las funciones de encaminamiento y reconocimiento de cabeceras reduce el retardo de procesado respecto al procesado eléctrico, disminuyendo de ese modo la latencia en el enlace de comunicaciones. Existen diferentes métodos de procesado de datos para implementar el reconocimiento de cabeceras. El objetivo de este trabajo es la propuesta de una nueva arquitectura para el procesado de cabeceras basado en el uso de puertas lógicas completamente ópticas. Estas arquitecturas tienen como elemento clave el interferómetro Mach-Zehnder basado en el amplificador óptico de semiconductor (SOA-MZI), y utilizan el efecto no lineal de modulación cruzada de fase (XPM) en los SOAs para realizar dicha funcionalidad. La estructura SOA-MZI con XPM es una de las alternativas más atractivas debido a las numerosas ventajas que presenta, como por ejemplo los requisitos de baja energía para las señales de entrada, su diseño compacto, una elevada relación de extinción (ER), regeneración de la señal y el bajo nivel de chirp que introducen. Este trabajo se ha centrado en la implementación de la funcionalidad lógica XOR. Mediante esta función se pueden realizar diversas funcionalidades en las redes ópticas. Se proponen dos esquemas para el reconocimiento de cabeceras basados en el uso de la puerta XOR. El primer esquema utiliza puertas en cascada. El segundo esquema presenta una arquitectura muy escalable, y se basa en el uso de un bucle de realimentación implementado a la salida de la puerta. Asimismo, también se presentan algunas aplicaciones del procesado de cabeceras para el encaminamiento de paquetes basadas en el uso dMartínez Canet, JM. (2006). Photonic logic-gates: boosting all-optical header processing in future packet-switched networks [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1874Palanci

    Advances in Solid State Circuit Technologies

    Get PDF
    This book brings together contributions from experts in the fields to describe the current status of important topics in solid-state circuit technologies. It consists of 20 chapters which are grouped under the following categories: general information, circuits and devices, materials, and characterization techniques. These chapters have been written by renowned experts in the respective fields making this book valuable to the integrated circuits and materials science communities. It is intended for a diverse readership including electrical engineers and material scientists in the industry and academic institutions. Readers will be able to familiarize themselves with the latest technologies in the various fields

    Brain fame:From FPGA to heterogeneous acceleration of brain simulations

    Get PDF
    Among the various methods in neuroscience for understanding brain function, in-silico simulations have been gaining popularity. Advances in neuroscience and engineering led to the creation of mathematical models of networks that do not simply mimic biological behaviour in an abstract fashion but emulate its in significant detail, even to the level of its biophysical properties. Such an example is the Spiking Neural Network (SNN) that can model a variety of additional behavioural features, like encoding data and adapting according to a spike train`s amplitude, frequency and general precise pattern of arrival of spiking events on a neuron. As a result, SNNs have higher explanatory power than their predecessors, thus brain simulations based on SNNs become an attractive topic to explore. In-silico simulations of SNNs can have beneficial results not only for neuroscience research but breakthroughs can also potentially benefit medical, computing and A.I. research. SNNs, though, computationally depending workloads that traditional computing might not be able to cover. Thus, the use of High Performance Computing (HPC) platforms in this application domain becomes desirable. This dissertation explores the topic of HPC-based in-silico brain simulations. Initially, the effort focuses on custom hardware accelerators, due to their potential in providing real-time performance alongside support for large-scale non-real-time experiments and specifically Field Programmable Gate Arrays (FPGAs). The nature of FPGA-based accelerators provides specific benefits against other similar paradigms like Application Specific Integrated Circuit (ASIC) designs.Firstly, we explore the general characteristics of typical SNNs model types to identify their computational requirements in relation to their explanatory strength. We also identify major design characteristics in model development that can directly affect its performance and behaviour when ported to an HPC platform. Subsequently, a detailed literature review is made on FPGA-based SNN implementations. The HPC porting effort begins with the implementation of an extended-Hodgkin-Huxley model of the Inferior-olivary nucleus featuring advanced connectivity. The model is quite demanding and complex enough to act as a realistic benchmark for HPC implementations, while also being scientifically relevant in its own right. FPGA development shows promising performance results not only when doing custom designs but also using High-level synthesis (HLS) toolflows that significantly reduce development time. FPGAs have proven suitable for small-scale embedded-HPC uses as well. The various efforts, though, reveal a very specific weakness of FPGA development that has less to do with the silicon itself and more with its programming environment. The FPGA tools are very inaccessible to non-experts, thus any acceleration effort would require the engineer (and the FPGA development time) to be in the critical path of the research process. An important question to be answered is how the FPGA platform would compare to other popular software-based HPC solutions such as GPU- and CPU-based platforms. A detailed comparison of the best FPGA implementation with GPU and manycore-CPU ports of the same benchmark is conducted. The comparison and evaluation shows that, when it comes to real-time performance, FPGAs have a clear advantage. But for non-real-time, large scale simulations, there is no single platform that can optimally support the complete range of experiments that could be conducted with the inferior olive model. The comparison makes a clear case for BrainFrame, a platform that supports heterogeneous HPC substrates. This dissertation, thus, concludes with the proposal of the BrainFrame system. The proof-of-concept design supports standard and extended Hodgkin-Huxley models, , such as the original inferior-olive model. The system integrates a GPU-, CPU- and FPGA-based HPC back-end while also using a standard neuroscientific language front-end (PyNN) that can score best-in-class performance, alleviate some of the development hurdles and make it far more user-friendly for the typical model developer. Additionally, the multi-node potential of the platform is being explored. BrainFrame provides both a powerful heterogeneous platform for acceleration and also a front-end familiar to the neuroscientist
    corecore