2,707 research outputs found

    FASTCUDA: Open Source FPGA Accelerator & Hardware-Software Codesign Toolset for CUDA Kernels

    Get PDF
    Using FPGAs as hardware accelerators that communicate with a central CPU is becoming a common practice in the embedded design world but there is no standard methodology and toolset to facilitate this path yet. On the other hand, languages such as CUDA and OpenCL provide standard development environments for Graphical Processing Unit (GPU) programming. FASTCUDA is a platform that provides the necessary software toolset, hardware architecture, and design methodology to efficiently adapt the CUDA approach into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are partitioned into two groups with minimal user intervention: those that are compiled and executed in parallel software, and those that are synthesized and implemented in hardware. A modern low power FPGA can provide the processing power (via numerous embedded micro-CPUs) and the logic capacity for both the software and hardware implementations of the CUDA kernels. This paper describes the system requirements and the architectural decisions behind the FASTCUDA approach

    Transparent Dynamic reconfiguration for CORBA

    Get PDF
    Distributed systems with high availability requirements have to support some form of dynamic reconfiguration. This means that they must provide the ability to be maintained or upgraded without being taken off-line. Building a distributed system that allows dynamic reconfiguration is very intrusive to the overall design of the system, and generally requires special skills from both the client and server side application developers. There is an opportunity to provide support for dynamic reconfiguration at the object middleware level of distributed systems, and create a dynamic reconfiguration transparency to application developers. We propose a Dynamic Reconfiguration Service for CORBA that allows the reconfiguration of a running system with maximum transparency for both client and server side developers. We describe the architecture, a prototype implementation, and some preliminary test result

    High-Rate Space Coding for Reconfigurable 2x2 Millimeter-Wave MIMO Systems

    Full text link
    Millimeter-wave links are of a line-of-sight nature. Hence, multiple-input multiple-output (MIMO) systems operating in the millimeter-wave band may not achieve full spatial diversity or multiplexing. In this paper, we utilize reconfigurable antennas and the high antenna directivity in the millimeter-wave band to propose a rate-two space coding design for 2x2 MIMO systems. The proposed scheme can be decoded with a low complexity maximum-likelihood detector at the receiver and yet it can enhance the bit-error-rate performance of millimeter-wave systems compared to traditional spatial multiplexing schemes, such as the Vertical Bell Laboratories Layered Space-Time Architecture (VBLAST). Using numerical simulations, we demonstrate the efficiency of the proposed code and show its superiority compared to existing rate-two space-time block codes

    Cycle-accurate evaluation of reconfigurable photonic networks-on-chip

    Get PDF
    There is little doubt that the most important limiting factors of the performance of next-generation Chip Multiprocessors (CMPs) will be the power efficiency and the available communication speed between cores. Photonic Networks-on-Chip (NoCs) have been suggested as a viable route to relieve the off- and on-chip interconnection bottleneck. Low-loss integrated optical waveguides can transport very high-speed data signals over longer distances as compared to on-chip electrical signaling. In addition, with the development of silicon microrings, photonic switches can be integrated to route signals in a data-transparent way. Although several photonic NoC proposals exist, their use is often limited to the communication of large data messages due to a relatively long set-up time of the photonic channels. In this work, we evaluate a reconfigurable photonic NoC in which the topology is adapted automatically (on a microsecond scale) to the evolving traffic situation by use of silicon microrings. To evaluate this system's performance, the proposed architecture has been implemented in a detailed full-system cycle-accurate simulator which is capable of generating realistic workloads and traffic patterns. In addition, a model was developed to estimate the power consumption of the full interconnection network which was compared with other photonic and electrical NoC solutions. We find that our proposed network architecture significantly lowers the average memory access latency (35% reduction) while only generating a modest increase in power consumption (20%), compared to a conventional concentrated mesh electrical signaling approach. When comparing our solution to high-speed circuit-switched photonic NoCs, long photonic channel set-up times can be tolerated which makes our approach directly applicable to current shared-memory CMPs
    corecore