23,353 research outputs found

    Special purpose parallel computer architecture for real-time control and simulation in robotic applications

    Get PDF
    This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call

    Optimising Simulation Data Structures for the Xeon Phi

    Get PDF
    In this paper, we propose a lock-free architecture to accelerate logic gate circuit simulation using SIMD multi-core machines. We evaluate its performance on different test circuits simulated on the Intel Xeon Phi and 2 other machines. Comparisons are presented of this software/hardware combination with reported performances of GPU and other multi-core simulation platforms. Comparisons are also given between the lock free architecture and a leading commercial simulator running on the same Intel hardware

    A review of High Performance Computing foundations for scientists

    Full text link
    The increase of existing computational capabilities has made simulation emerge as a third discipline of Science, lying midway between experimental and purely theoretical branches [1, 2]. Simulation enables the evaluation of quantities which otherwise would not be accessible, helps to improve experiments and provides new insights on systems which are analysed [3-6]. Knowing the fundamentals of computation can be very useful for scientists, for it can help them to improve the performance of their theoretical models and simulations. This review includes some technical essentials that can be useful to this end, and it is devised as a complement for researchers whose education is focused on scientific issues and not on technological respects. In this document we attempt to discuss the fundamentals of High Performance Computing (HPC) [7] in a way which is easy to understand without much previous background. We sketch the way standard computers and supercomputers work, as well as discuss distributed computing and discuss essential aspects to take into account when running scientific calculations in computers.Comment: 33 page

    Empowering parallel computing with field programmable gate arrays

    Get PDF
    After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements

    A Reconfigurable Vector Instruction Processor for Accelerating a Convection Parametrization Model on FPGAs

    Full text link
    High Performance Computing (HPC) platforms allow scientists to model computationally intensive algorithms. HPC clusters increasingly use General-Purpose Graphics Processing Units (GPGPUs) as accelerators; FPGAs provide an attractive alternative to GPGPUs for use as co-processors, but they are still far from being mainstream due to a number of challenges faced when using FPGA-based platforms. Our research aims to make FPGA-based high performance computing more accessible to the scientific community. In this work we present the results of investigating the acceleration of a particular atmospheric model, Flexpart, on FPGAs. We focus on accelerating the most computationally intensive kernel from this model. The key contribution of our work is the architectural exploration we undertook to arrive at a solution that best exploits the parallelism available in the legacy code, and is also convenient to program, so that eventually the compilation of high-level legacy code to our architecture can be fully automated. We present the three different types of architecture, comparing their resource utilization and performance, and propose that an architecture where there are a number of computational cores, each built along the lines of a vector instruction processor, works best in this particular scenario, and is a promising candidate for a generic FPGA-based platform for scientific computation. We also present the results of experiments done with various configuration parameters of the proposed architecture, to show its utility in adapting to a range of scientific applications.Comment: This is an extended pre-print version of work that was presented at the international symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART2014), Sendai, Japan, June 911, 201

    Ianus: an Adpative FPGA Computer

    Full text link
    Dedicated machines designed for specific computational algorithms can outperform conventional computers by several orders of magnitude. In this note we describe {\it Ianus}, a new generation FPGA based machine and its basic features: hardware integration and wide reprogrammability. Our goal is to build a machine that can fully exploit the performance potential of new generation FPGA devices. We also plan a software platform which simplifies its programming, in order to extend its intended range of application to a wide class of interesting and computationally demanding problems. The decision to develop a dedicated processor is a complex one, involving careful assessment of its performance lead, during its expected lifetime, over traditional computers, taking into account their performance increase, as predicted by Moore's law. We discuss this point in detail

    An Automated Design-flow for FPGA-based Sequential Simulation

    Get PDF
    In this paper we describe the automated design flow that will transform and map a given homogeneous or heterogeneous hardware design into an FPGA that performs a cycle accurate simulation. The flow replaces the required manually performed transformation and can be embedded in existing standard synthesis flows. Compared to the earlier manually translated designs, this automated flow resulted in a reduced number of FPGA hardware resources and higher simulation frequencies. The implementation of the complete design flow is work in progress.\u

    Janus II: a new generation application-driven computer for spin-system simulations

    Get PDF
    This paper describes the architecture, the development and the implementation of Janus II, a new generation application-driven number cruncher optimized for Monte Carlo simulations of spin systems (mainly spin glasses). This domain of computational physics is a recognized grand challenge of high-performance computing: the resources necessary to study in detail theoretical models that can make contact with experimental data are by far beyond those available using commodity computer systems. On the other hand, several specific features of the associated algorithms suggest that unconventional computer architectures, which can be implemented with available electronics technologies, may lead to order of magnitude increases in performance, reducing to acceptable values on human scales the time needed to carry out simulation campaigns that would take centuries on commercially available machines. Janus II is one such machine, recently developed and commissioned, that builds upon and improves on the successful JANUS machine, which has been used for physics since 2008 and is still in operation today. This paper describes in detail the motivations behind the project, the computational requirements, the architecture and the implementation of this new machine and compares its expected performances with those of currently available commercial systems.Comment: 28 pages, 6 figure

    Computers for real time flight simulation: A market survey

    Get PDF
    An extensive computer market survey was made to determine those available systems suitable for current and future flight simulation studies at Ames Research Center. The primary requirement is for the computation of relatively high frequency content (5 Hz) math models representing powered lift flight vehicles. The Rotor Systems Research Aircraft (RSRA) was used as a benchmark vehicle for computation comparison studies. The general nature of helicopter simulations and a description of the benchmark model are presented, and some of the sources of simulation difficulties are examined. A description of various applicable computer architectures is presented, along with detailed discussions of leading candidate systems and comparisons between them
    corecore