1,856 research outputs found

    Pipelining Saturated Accumulation

    Get PDF
    Aggressive pipelining and spatial parallelism allow integrated circuits (e.g., custom VLSI, ASICs, and FPGAs) to achieve high throughput on many Digital Signal Processing applications. However, cyclic data dependencies in the computation can limit parallelism and reduce the efficiency and speed of an implementation. Saturated accumulation is an important example where such a cycle limits the throughput of signal processing applications. We show how to reformulate saturated addition as an associative operation so that we can use a parallel-prefix calculation to perform saturated accumulation at any data rate supported by the device. This allows us, for example, to design a 16-bit saturated accumulator which can operate at 280 MHz on a Xilinx Spartan-3(XC3S-5000-4) FPGA, the maximum frequency supported by the component's DCM

    Tracks of experience: curated routes in space

    Get PDF

    Ono: an open platform for social robotics

    Get PDF
    In recent times, the focal point of research in robotics has shifted from industrial ro- bots toward robots that interact with humans in an intuitive and safe manner. This evolution has resulted in the subfield of social robotics, which pertains to robots that function in a human environment and that can communicate with humans in an int- uitive way, e.g. with facial expressions. Social robots have the potential to impact many different aspects of our lives, but one particularly promising application is the use of robots in therapy, such as the treatment of children with autism. Unfortunately, many of the existing social robots are neither suited for practical use in therapy nor for large scale studies, mainly because they are expensive, one-of-a-kind robots that are hard to modify to suit a specific need. We created Ono, a social robotics platform, to tackle these issues. Ono is composed entirely from off-the-shelf components and cheap materials, and can be built at a local FabLab at the fraction of the cost of other robots. Ono is also entirely open source and the modular design further encourages modification and reuse of parts of the platform

    Towards Lattice Quantum Chromodynamics on FPGA devices

    Get PDF
    In this paper we describe a single-node, double precision Field Programmable Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250 accelerator and the largest device available on the market, the VU13P device. In our implementation we separate software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in hardware, and the rest of the algorithm runs on the host. We find out that the FPGA implementation can offer a performance comparable with that obtained using current CPU or Intel's many core Xeon Phi accelerators. A possible multiple node FPGA-based system is discussed and we argue that power-efficient High Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure

    Mixing multi-core CPUs and GPUs for scientific simulation software

    Get PDF
    Recent technological and economic developments have led to widespread availability of multi-core CPUs and specialist accelerator processors such as graphical processing units (GPUs). The accelerated computational performance possible from these devices can be very high for some applications paradigms. Software languages and systems such as NVIDIA's CUDA and Khronos consortium's open compute language (OpenCL) support a number of individual parallel application programming paradigms. To scale up the performance of some complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica- tions using threading approaches and multi-core CPUs to control independent GPU devices. We present speed-up data and discuss multi-threading software issues for the applications level programmer and o er some suggested areas for language development and integration between coarse-grained and ne-grained multi-thread systems. We discuss results from three common simulation algorithmic areas including: partial di erential equations; graph cluster metric calculations and random number generation. We report on programming experiences and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs; a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and trends in multi-core programming for scienti c applications developers

    An Improved Public Unclonable Function Design for Xilinx FPGAs for Hardware Security Applications

    Get PDF
    In the modern era we are moving towards completely connecting many useful electronic devices to each other through internet. There is a great need for secure electronic devices and systems. A lot of money is being invested in protecting the electronic devices and systems from hacking and other forms of malicious attacks. Physical Unclonable Function (PUF) is a low-cost hardware scheme that provides affordable security for electronic devices and systems. This thesis proposes an improved PUF design for Xilinx FPGAs and evaluates and compares its performance and reliability compared to existing PUF designs. Furthermore, the utility of the proposed PUF was demonstrated by using it for hardware Intellectual Property (IP) core licensing and authentication. Hardware Trojan can be used to provide evaluation copy of IP cores for a limited time. After that it disables the functionality of the IP core. A finite state machine (FSM) based hardware trojan was integrated with a binary divider IP core and evaluated for licensing and authentication applications. The proposed PUF was used in the design of hardware trojan. Obfuscation metric measures the effectiveness of hardware trojan. A moderately good obfuscation level was achieved for our hardware trojan

    The Megaprocessor as an Educational Tool Making the Abstract Concrete

    Get PDF
    Computer architecture courses can be difficult for students to engage with and learn from. This is because, unlike most core courses for a computer science student, learning architecture is an abstract process. To address this, universities have implemented methods for teaching course material other than purely descriptive methods. This typically means using simulations to model some aspect of a CPU or FPGA (fieldprogrammable gate array) boards for hands-on experimentation in CPU design. However, there are issues with these tools. Simulations can only cover a few topics well, are prone to being abandoned, and introduce additional abstraction layers. FPGAs, while great for advanced topics and long class projects, are often best suited for senior and graduate level students. Both methods are useful, but neither offers a tangible learning experience, which is what the Megaprocessor can provide. The Megaprocessor is a room sized, general-purpose computer made from discrete components, whose architecture is comprised of primitive logic gates with LEDs on every input and output. The entire circuitry of the Megaprocessor is transparent to the users, with its entire state visible and unabstracted. Because of these properties, it is a great learning mechanism for computer architecture education. The Megaprocessor is a tool for hands on and project-based learning that can be used to span the learning gap between simulations and FPGAs
    • …
    corecore