3,990 research outputs found

    Empowering parallel computing with field programmable gate arrays

    Get PDF
    After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements

    Principles of Neuromorphic Photonics

    Full text link
    In an age overrun with information, the ability to process reams of data has become crucial. The demand for data will continue to grow as smart gadgets multiply and become increasingly integrated into our daily lives. Next-generation industries in artificial intelligence services and high-performance computing are so far supported by microelectronic platforms. These data-intensive enterprises rely on continual improvements in hardware. Their prospects are running up against a stark reality: conventional one-size-fits-all solutions offered by digital electronics can no longer satisfy this need, as Moore's law (exponential hardware scaling), interconnection density, and the von Neumann architecture reach their limits. With its superior speed and reconfigurability, analog photonics can provide some relief to these problems; however, complex applications of analog photonics have remained largely unexplored due to the absence of a robust photonic integration industry. Recently, the landscape for commercially-manufacturable photonic chips has been changing rapidly and now promises to achieve economies of scale previously enjoyed solely by microelectronics. The scientific community has set out to build bridges between the domains of photonic device physics and neural networks, giving rise to the field of \emph{neuromorphic photonics}. This article reviews the recent progress in integrated neuromorphic photonics. We provide an overview of neuromorphic computing, discuss the associated technology (microelectronic and photonic) platforms and compare their metric performance. We discuss photonic neural network approaches and challenges for integrated neuromorphic photonic processors while providing an in-depth description of photonic neurons and a candidate interconnection architecture. We conclude with a future outlook of neuro-inspired photonic processing.Comment: 28 pages, 19 figure

    A review of High Performance Computing foundations for scientists

    Full text link
    The increase of existing computational capabilities has made simulation emerge as a third discipline of Science, lying midway between experimental and purely theoretical branches [1, 2]. Simulation enables the evaluation of quantities which otherwise would not be accessible, helps to improve experiments and provides new insights on systems which are analysed [3-6]. Knowing the fundamentals of computation can be very useful for scientists, for it can help them to improve the performance of their theoretical models and simulations. This review includes some technical essentials that can be useful to this end, and it is devised as a complement for researchers whose education is focused on scientific issues and not on technological respects. In this document we attempt to discuss the fundamentals of High Performance Computing (HPC) [7] in a way which is easy to understand without much previous background. We sketch the way standard computers and supercomputers work, as well as discuss distributed computing and discuss essential aspects to take into account when running scientific calculations in computers.Comment: 33 page

    New Logic-In-Memory Paradigms: An Architectural and Technological Perspective

    Get PDF
    Processing systems are in continuous evolution thanks to the constant technological advancement and architectural progress. Over the years, computing systems have become more and more powerful, providing support for applications, such as Machine Learning, that require high computational power. However, the growing complexity of modern computing units and applications has had a strong impact on power consumption. In addition, the memory plays a key role on the overall power consumption of the system, especially when considering data-intensive applications. These applications, in fact, require a lot of data movement between the memory and the computing unit. The consequence is twofold: Memory accesses are expensive in terms of energy and a lot of time is wasted in accessing the memory, rather than processing, because of the performance gap that exists between memories and processing units. This gap is known as the memory wall or the von Neumann bottleneck and is due to the different rate of progress between complementary metal-oxide semiconductor (CMOS) technology and memories. However, CMOS scaling is also reaching a limit where it would not be possible to make further progress. This work addresses all these problems from an architectural and technological point of view by: (1) Proposing a novel Configurable Logic-in-Memory Architecture that exploits the in-memory computing paradigm to reduce the memory wall problem while also providing high performance thanks to its flexibility and parallelism; (2) exploring a non-CMOS technology as possible candidate technology for the Logic-in-Memory paradigm

    Design of testbed and emulation tools

    Get PDF
    The research summarized was concerned with the design of testbed and emulation tools suitable to assist in projecting, with reasonable accuracy, the expected performance of highly concurrent computing systems on large, complete applications. Such testbed and emulation tools are intended for the eventual use of those exploring new concurrent system architectures and organizations, either as users or as designers of such systems. While a range of alternatives was considered, a software based set of hierarchical tools was chosen to provide maximum flexibility, to ease in moving to new computers as technology improves and to take advantage of the inherent reliability and availability of commercially available computing systems

    Programming MPSoC platforms: Road works ahead

    Get PDF
    This paper summarizes a special session on multicore/multi-processor system-on-chip (MPSoC) programming challenges. The current trend towards MPSoC platforms in most computing domains does not only mean a radical change in computer architecture. Even more important from a SW developer´s viewpoint, at the same time the classical sequential von Neumann programming model needs to be overcome. Efficient utilization of the MPSoC HW resources demands for radically new models and corresponding SW development tools, capable of exploiting the available parallelism and guaranteeing bug-free parallel SW. While several standards are established in the high-performance computing domain (e.g. OpenMP), it is clear that more innovations are required for successful\ud deployment of heterogeneous embedded MPSoC. On the other hand, at least for coming years, the freedom for disruptive programming technologies is limited by the huge amount of certified sequential code that demands for a more pragmatic, gradual tool and code replacement strategy

    Globally asynchronous locally synchronous configurable array architecture for algorithm embeddings

    Get PDF

    Content addressable memory: design and usage for general purpose computing

    Get PDF
    corecore