23,088 research outputs found

    FpSynt: a fixed-point datapath synthesis tool for embedded systems

    Full text link
    Digital mobile systems must function with low power, small size and weight, and low cost. High-performance desktop microprocessors, with built-in floating point hardware, are not suitable in these cases. For embedded systems, it can be advantageous to implement these calculations with fixed point arithmetic instead. We present an automated fixed-point data path synthesis tool FpSynt for designing embedded applications in fixed-point domain with sufficient accuracy for most applications. FpSynt is available under the GNU General Public License from the following GitHub repository: http://github.com/izhbannikov/FPSYN

    Allocating the chains of consecutive additions for optimal fixed-point data path synthesis

    Full text link
    Minimization of computational errors in the fixed-point data path is often difficult task. Many signal processing algorithms use chains of consecutive additions. The analyzing technique that can be applied to fixed-point data path synthesis has been proposed. This technique takes advantage of allocating the chains of consecutive additions in order to predict growing width of the data path and minimize the design complexity and computational errors

    Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

    Full text link
    GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin

    The future of computing beyond Moore's Law.

    Get PDF
    Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

    Software verification plan for GCS

    Get PDF
    This verification plan is written as part of an experiment designed to study the fundamental characteristics of the software failure process. The experiment will be conducted using several implementations of software that were produced according to industry-standard guidelines, namely the Radio Technical Commission for Aeronautics RTCA/DO-178A guidelines, Software Consideration in Airborne Systems and Equipment Certification, for the development of flight software. This plan fulfills the DO-178A requirements for providing instructions on the testing of each implementation of software. The plan details the verification activities to be performed at each phase in the development process, contains a step by step description of the testing procedures, and discusses all of the tools used throughout the verification process

    Adapting the interior point method for the solution of linear programs on high performance computers

    Get PDF
    In this paper we describe a unified algorithmic framework for the interior point method (IPM) of solving Linear Programs (LPs) which allows us to adapt it over a range of high performance computer architectures. We set out the reasons as to why IPM makes better use of high performance computer architecture than the sparse simplex method. In the inner iteration of the IPM a search direction is computed using Newton or higher order methods. Computationally this involves solving a sparse symmetric positive definite (SSPD) system of equations. The choice of direct and indirect methods for the solution of this system and the design of data structures to take advantage of coarse grain parallel and massively parallel computer architectures are considered in detail. Finally, we present experimental results of solving NETLIB test problems on examples of these architectures and put forward arguments as to why integration of the system within sparse simplex is beneficial

    Towards Lattice Quantum Chromodynamics on FPGA devices

    Get PDF
    In this paper we describe a single-node, double precision Field Programmable Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250 accelerator and the largest device available on the market, the VU13P device. In our implementation we separate software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in hardware, and the rest of the algorithm runs on the host. We find out that the FPGA implementation can offer a performance comparable with that obtained using current CPU or Intel's many core Xeon Phi accelerators. A possible multiple node FPGA-based system is discussed and we argue that power-efficient High Performance Computing (HPC) systems can be implemented using FPGA devices only.Comment: 17 pages, 4 figure
    • …
    corecore