18,203 research outputs found

    Accurate and efficient explicit approximations of the Colebrook flow friction equation based on the Wright omega-function

    Get PDF
    The Colebrook equation is a popular model for estimating friction loss coefficients in water and gas pipes. The model is implicit in the unknown flow friction factor, f. To date, the captured flow friction factor, f, can be extracted from the logarithmic form analytically only in the term of the Lambert W-function. The purpose of this study is to find an accurate and computationally efficient solution based on the shifted Lambert W-function also known as the Wright omega-function. The Wright omega-function is more suitable because it overcomes the problem with the overflow error by switching the fast growing term, y = W (e(x)), of the Lambert W-function to series expansions that further can be easily evaluated in computers without causing overflow run-time errors. Although the Colebrook equation transformed through the Lambert W-function is identical to the original expression in terms of accuracy, a further evaluation of the Lambert W-function can be only approximate. Very accurate explicit approximations of the Colebrook equation that contain only one or two logarithms are shown. The final result is an accurate explicit approximation of the Colebrook equation with a relative error of no more than 0.0096%. The presented approximations are in a form suitable for everyday engineering use, and are both accurate and computationally efficient.Web of Science71art. no. 3

    Secure Numerical and Logical Multi Party Operations

    Full text link
    We derive algorithms for efficient secure numerical and logical operations using a recently introduced scheme for secure multi-party computation~\cite{sch15} in the semi-honest model ensuring statistical or perfect security. To derive our algorithms for trigonometric functions, we use basic mathematical laws in combination with properties of the additive encryption scheme in a novel way. For division and logarithm we use a new approach to compute a Taylor series at a fixed point for all numbers. All our logical operations such as comparisons and large fan-in AND gates are perfectly secure. Our empirical evaluation yields speed-ups of more than a factor of 100 for the evaluated operations compared to the state-of-the-art

    PI-BA Bundle Adjustment Acceleration on Embedded FPGAs with Co-observation Optimization

    Full text link
    Bundle adjustment (BA) is a fundamental optimization technique used in many crucial applications, including 3D scene reconstruction, robotic localization, camera calibration, autonomous driving, space exploration, street view map generation etc. Essentially, BA is a joint non-linear optimization problem, and one which can consume a significant amount of time and power, especially for large optimization problems. Previous approaches of optimizing BA performance heavily rely on parallel processing or distributed computing, which trade higher power consumption for higher performance. In this paper we propose {\pi}-BA, the first hardware-software co-designed BA engine on an embedded FPGA-SoC that exploits custom hardware for higher performance and power efficiency. Specifically, based on our key observation that not all points appear on all images in a BA problem, we designed and implemented a Co-Observation Optimization technique to accelerate BA operations with optimized usage of memory and computation resources. Experimental results confirm that {\pi}-BA outperforms the existing software implementations in terms of performance and power consumption.Comment: in Proceedings of IEEE FCCM 201

    Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP

    Full text link
    With ever-increasing volumes of scientific data produced by HPC applications, significantly reducing data size is critical because of limited capacity of storage space and potential bottlenecks on I/O or networks in writing/reading or transferring data. SZ and ZFP are the two leading lossy compressors available to compress scientific data sets. However, their performance is not consistent across different data sets and across different fields of some data sets: for some fields SZ provides better compression performance, while other fields are better compressed with ZFP. This situation raises the need for an automatic online (during compression) selection between SZ and ZFP, with a minimal overhead. In this paper, the automatic selection optimizes the rate-distortion, an important statistical quality metric based on the signal-to-noise ratio. To optimize for rate-distortion, we investigate the principles of SZ and ZFP. We then propose an efficient online, low-overhead selection algorithm that predicts the compression quality accurately for two compressors in early processing stages and selects the best-fit compressor for each data field. We implement the selection algorithm into an open-source library, and we evaluate the effectiveness of our proposed solution against plain SZ and ZFP in a parallel environment with 1,024 cores. Evaluation results on three data sets representing about 100 fields show that our selection algorithm improves the compression ratio up to 70% with the same level of data distortion because of very accurate selection (around 99%) of the best-fit compressor, with little overhead (less than 7% in the experiments).Comment: 14 pages, 9 figures, first revisio

    4.45 Pflops Astrophysical N-Body Simulation on K computer -- The Gravitational Trillion-Body Problem

    Full text link
    As an entry for the 2012 Gordon-Bell performance prize, we report performance results of astrophysical N-body simulations of one trillion particles performed on the full system of K computer. This is the first gravitational trillion-body simulation in the world. We describe the scientific motivation, the numerical algorithm, the parallelization strategy, and the performance analysis. Unlike many previous Gordon-Bell prize winners that used the tree algorithm for astrophysical N-body simulations, we used the hybrid TreePM method, for similar level of accuracy in which the short-range force is calculated by the tree algorithm, and the long-range force is solved by the particle-mesh algorithm. We developed a highly-tuned gravity kernel for short-range forces, and a novel communication algorithm for long-range forces. The average performance on 24576 and 82944 nodes of K computer are 1.53 and 4.45 Pflops, which correspond to 49% and 42% of the peak speed.Comment: 10 pages, 6 figures, Proceedings of Supercomputing 2012 (http://sc12.supercomputing.org/), Gordon Bell Prize Winner. Additional information is http://www.ccs.tsukuba.ac.jp/CCS/eng/gbp201

    Machine learning for ultrafast X-ray diffraction patterns on large-scale GPU clusters

    Full text link
    The classical method of determining the atomic structure of complex molecules by analyzing diffraction patterns is currently undergoing drastic developments. Modern techniques for producing extremely bright and coherent X-ray lasers allow a beam of streaming particles to be intercepted and hit by an ultrashort high energy X-ray beam. Through machine learning methods the data thus collected can be transformed into a three-dimensional volumetric intensity map of the particle itself. The computational complexity associated with this problem is very high such that clusters of data parallel accelerators are required. We have implemented a distributed and highly efficient algorithm for inversion of large collections of diffraction patterns targeting clusters of hundreds of GPUs. With the expected enormous amount of diffraction data to be produced in the foreseeable future, this is the required scale to approach real time processing of data at the beam site. Using both real and synthetic data we look at the scaling properties of the application and discuss the overall computational viability of this exciting and novel imaging technique

    Constrained LQR for Low-Precision Data Representation

    Get PDF
    Performing computations with a low-bit number representation results in a faster implementation that uses less silicon, and hence allows an algorithm to be implemented in smaller and cheaper processors without loss of performance. We propose a novel formulation to efficiently exploit the low (or non-standard) precision number representation of some computer architectures when computing the solution to constrained LQR problems, such as those that arise in predictive control. The main idea is to include suitably-defined decision variables in the quadratic program, in addition to the states and the inputs, to allow for smaller roundoff errors in the solver. This enables one to trade off the number of bits used for data representation against speed and/or hardware resources, so that smaller numerical errors can be achieved for the same number of bits (same silicon area). Because of data dependencies, the algorithm complexity, in terms of computation time and hardware resources, does not necessarily increase despite the larger number of decision variables. Examples show that a 10-fold reduction in hardware resources is possible compared to using double precision floating point, without loss of closed-loop performance

    Scan matching by cross-correlation and differential evolution

    Get PDF
    Scan matching is an important task, solved in the context of many high-level problems including pose estimation, indoor localization, simultaneous localization and mapping and others. Methods that are accurate and adaptive and at the same time computationally efficient are required to enable location-based services in autonomous mobile devices. Such devices usually have a wide range of high-resolution sensors but only a limited processing power and constrained energy supply. This work introduces a novel high-level scan matching strategy that uses a combination of two advanced algorithms recently used in this field: cross-correlation and differential evolution. The cross-correlation between two laser range scans is used as an efficient measure of scan alignment and the differential evolution algorithm is used to search for the parameters of a transformation that aligns the scans. The proposed method was experimentally validated and showed good ability to match laser range scans taken shortly after each other and an excellent ability to match laser range scans taken with longer time intervals between them.Web of Science88art. no. 85
    corecore