1,632 research outputs found

    Transformations of High-Level Synthesis Codes for High-Performance Computing

    Full text link
    Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C/C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target specialized hardware, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes. Fast and efficient codes for reconfigurable platforms are thus still challenging to design. To alleviate this, we present a set of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. Our work provides a toolbox for developers, where we systematically identify classes of transformations, the characteristics of their effect on the HLS code and the resulting hardware (e.g., increases data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip streaming dataflow, allowing for massively parallel architectures. To quantify the effect of our transformations, we use them to optimize a set of throughput-oriented FPGA kernels, demonstrating that our enhancements are sufficient to scale up parallelism within the hardware constraints. With the transformations covered, we hope to establish a common framework for performance engineers, compiler developers, and hardware developers, to tap into the performance potential offered by specialized hardware architectures using HLS

    A High Performance Fuzzy Logic Architecture for UAV Decision Making

    Get PDF
    The majority of Unmanned Aerial Vehicles (UAVs) in operation today are not truly autonomous, but are instead reliant on a remote human pilot. A high degree of autonomy can provide many advantages in terms of cost, operational resources and safety. However, one of the challenges involved in achieving autonomy is that of replicating the reasoning and decision making capabilities of a human pilot. One candidate method for providing this decision making capability is fuzzy logic. In this role, the fuzzy system must satisfy real-time constraints, process large quantities of data and relate to large knowledge bases. Consequently, there is a need for a generic, high performance fuzzy computation platform for UAV applications. Based on Leesā€™ [1] original work, a high performance fuzzy processing architecture, implemented in Field Programmable Gate Arrays (FPGAs), has been developed and is shown to outclass the performance of existing fuzzy processors

    Fusion and Perspective Correction of Multiple Networked Video Sensors

    Get PDF
    A network of adaptive processing elements has been developed that transforms and fuses video captured from multiple sensors. Unlike systems that rely on end-systems to process data, this system distributes the computation throughout the network in order to reduce overall network bandwidth. The network architecture is scalable because it uses a hierarchy of processing engines to perform signal processing. Nodes within the network can be dynamically reprogrammed in order to compose video from multiple sources, digitally transform camera perspectives, and adapt the video format to meet the needs of speciļ¬c applications. A prototype has been developed using reconļ¬gurable hardware that collects and processes real-time, streaming video of an urban environment. Multiple video cameras gather data from diļ¬€erent perspectives and fuse that data into a uniļ¬ed, top-down view. The hardware exploits both the spatial and temporal parallelism of the video streams and the regular processing when applying the transforms. Recon-ļ¬gurable hardware allows for the functions at nodes to be reprogrammed for dynamic changes in topology. Hardware-based video processors also consume less power than high frequency software-based solutions. Performance and scalability are compared to a distributed software-based implementation. The reconļ¬gurable hardware design is coded in VHDL and prototyped using Washington Universityā€™s Field Programmable Port Extender (FPX) platform. The transform engine circuit utilizes approximately 34 percent of the resources of a Xilinx Virtex 2000E FPGA, and can be clocked at frequencies up to 48 MHz. The com-position engine circuit utilizes approximately 39 percent of the resources of a Xilinx Virtex 2000E FPGA, and can be clocked at frequencies up to 45 MHz

    Field Programmable Gate Arrays (FPGAs) II

    Get PDF
    This Edited Volume Field Programmable Gate Arrays (FPGAs) II is a collection of reviewed and relevant research chapters, offering a comprehensive overview of recent developments in the field of Computer and Information Science. The book comprises single chapters authored by various researchers and edited by an expert active in the Computer and Information Science research area. All chapters are complete in itself but united under a common research study topic. This publication aims at providing a thorough overview of the latest research efforts by international authors on Computer and Information Science, and open new possible research paths for further novel developments

    Process Development for the Fabrication of Spheroidal Microdevice Packages Utilizing MEMS Technologies

    Get PDF
    Sub-mm3 spherical microrobots are being researched as a path towards reconfigurable wireless networks and programmable matter. The microrobot design requires a spheroidal microdevice package compatible with solar energy collection, wireless sensing, and electrostatic actuation mechanisms to be developed. Throughout this research, a variety of MEMS fabrication techniques were evaluated with regards to their applicability to the packaging process. SF6-based plasma was determined to be a preferable alternative to wet HNA etching when producing repeatable bulk isotropic etches in silicon. The effect of silicon crystal orientation on etch variance and anisotropy was also investigated. HNA polishing was demonstrated as an effective method of reducing undercutting, surface roughness, and anisotropy. MatLab image processing routines were developed and incorporated into etch analysis, providing an efficient method of data collection. A method of performing sophisticated wafer alignment and photolithography processes by leveraging existing cleanroom devices was proposed. This research established a path forward for an advanced packaging scheme designed to move microelectronics packages away from the planar circuit board configurations of the past and into the autonomous architectures of the future. The proposed design is applicable to a wide variety of microelectronics applications while meeting the requirements of the sub-mm3 spherical microrobot system

    Lunar Applications in Reconfigurable Computing

    Get PDF
    NASA s Constellation Program is developing a lunar surface outpost in which reconfigurable computing will play a significant role. Reconfigurable systems provide a number of benefits over conventional software-based implementations including performance and power efficiency, while the use of standardized reconfigurable hardware provides opportunities to reduce logistical overhead. The current vision for the lunar surface architecture includes habitation, mobility, and communications systems, each of which greatly benefit from reconfigurable hardware in applications including video processing, natural feature recognition, data formatting, IP offload processing, and embedded control systems. In deploying reprogrammable hardware, considerations similar to those of software systems must be managed. There needs to be a mechanism for discovery enabling applications to locate and utilize the available resources. Also, application interfaces are needed to provide for both configuring the resources as well as transferring data between the application and the reconfigurable hardware. Each of these topics are explored in the context of deploying reconfigurable resources as an integral aspect of the lunar exploration architecture

    AutonomROS: A ReconROS-based Autonomonous Driving Unit

    Full text link
    Autonomous driving has become an important research area in recent years, and the corresponding system creates an enormous demand for computations. Heterogeneous computing platforms such as systems-on-chip that combine CPUs with reprogrammable hardware offer both computational performance and flexibility and are thus interesting targets for autonomous driving architectures. The de-facto software architecture standard in robotics, including autonomous driving systems, is ROS 2. ReconROS is a framework for creating robotics applications that extends ROS 2 with the possibility of mapping compute-intense functions to hardware. This paper presents AutonomROS, an autonomous driving unit based on the ReconROS framework. AutonomROS serves as a blueprint for a larger robotics application developed with ReconROS and demonstrates its suitability and extendability. The application integrates the ROS 2 package Navigation 2 with custom-developed software and hardware-accelerated functions for point cloud generation, obstacle detection, and lane detection. In addition, we detail a new communication middleware for shared memory communication between software and hardware functions. We evaluate AutonomROS and show the advantage of hardware acceleration and the new communication middleware for improving turnaround times, achievable frame rates, and, most importantly, reducing CPU load
    • ā€¦
    corecore