2,427 research outputs found

    Performance Analysis of BigDecimal Arithmetic Operation in Java

    Get PDF
    The Java programming language provides binary floating-point primitive data types such as float and double to represent decimal numbers. However, these data types cannot represent decimal numbers with complete accuracy, which may cause precision errors while performing calculations. To achieve better precision, Java provides the BigDecimal class. Unlike float and double, which use approximation, this class is able to represent the exact value of a decimal number. However, it comes with a drawback: BigDecimal is treated as an object and requires additional CPU and memory usage to operate with. In this paper, statistical data are presented of performance impact on using BigDecimal compared to the double data type. As test cases, common mathematical processes were used, such as calculating mean value, sorting, and multiplying matrices

    Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques

    Full text link
    The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the semiconductor field in the last 15 years has constituted power as a first-class design concern. As a result, the community of computing systems is forced to find alternative design approaches to facilitate high-performance and/or power-efficient computing. Among the examined solutions, Approximate Computing has attracted an ever-increasing interest, with research works applying approximations across the entire traditional computing stack, i.e., at software, hardware, and architectural levels. Over the last decade, there is a plethora of approximation techniques in software (programs, frameworks, compilers, runtimes, languages), hardware (circuits, accelerators), and architectures (processors, memories). The current article is Part I of our comprehensive survey on Approximate Computing, and it reviews its motivation, terminology and principles, as well it classifies and presents the technical details of the state-of-the-art software and hardware approximation techniques.Comment: Under Review at ACM Computing Survey

    Combining learning and optimization for transprecision computing

    Get PDF
    The growing demands of the worldwide IT infrastructure stress the need for reduced power consumption, which is addressed in so-called transprecision computing by improving energy efficiency at the expense of precision. For example, reducing the number of bits for some floating-point operations leads to higher efficiency, but also to a non-linear decrease of the computation accuracy. Depending on the application, small errors can be tolerated, thus allowing to fine-tune the precision of the computation. Finding the optimal precision for all variables in respect of an error bound is a complex task, which is tackled in the literature via heuristics. In this paper, we report on a first attempt to address the problem by combining a Mathematical Programming (MP) model and a Machine Learning (ML) model, following the Empirical Model Learning methodology. The ML model learns the relation between variables precision and the output error; this information is then embedded in the MP focused on minimizing the number of bits. An additional refinement phase is then added to improve the quality of the solution. The experimental results demonstrate an average speedup of 6.5% and a 3% increase in solution quality compared to the state-of-the-art. In addition, experiments on a hardware platform capable of mixed-precision arithmetic (PULPissimo) show the benefits of the proposed approach, with energy savings of around 40% compared to fixed-precision

    ARM Wrestling with Big Data: A Study of Commodity ARM64 Server for Big Data Workloads

    Full text link
    ARM processors have dominated the mobile device market in the last decade due to their favorable computing to energy ratio. In this age of Cloud data centers and Big Data analytics, the focus is increasingly on power efficient processing, rather than just high throughput computing. ARM's first commodity server-grade processor is the recent AMD A1100-series processor, based on a 64-bit ARM Cortex A57 architecture. In this paper, we study the performance and energy efficiency of a server based on this ARM64 CPU, relative to a comparable server running an AMD Opteron 3300-series x64 CPU, for Big Data workloads. Specifically, we study these for Intel's HiBench suite of web, query and machine learning benchmarks on Apache Hadoop v2.7 in a pseudo-distributed setup, for data sizes up to 20GB20GB files, 5M5M web pages and 500M500M tuples. Our results show that the ARM64 server's runtime performance is comparable to the x64 server for integer-based workloads like Sort and Hive queries, and only lags behind for floating-point intensive benchmarks like PageRank, when they do not exploit data parallelism adequately. We also see that the ARM64 server takes 13rd\frac{1}{3}^{rd} the energy, and has an Energy Delay Product (EDP) that is 5071%50-71\% lower than the x64 server. These results hold promise for ARM64 data centers hosting Big Data workloads to reduce their operational costs, while opening up opportunities for further analysis.Comment: Accepted for publication in the Proceedings of the 24th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), 201

    Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

    Full text link
    The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.Comment: Under Review at ACM Computing Survey

    Melhorando o desempenho de aplicações iterativas por meio da execução intercalada de kernels CUDA aproximados

    Get PDF
    Approximate computing techniques, particularly those involving reduced and mixed pre cision, are widely studied in literature to accelerate applications and reduce energy con sumption. Although many researchers analyze the performance, accuracy loss, and energy consumption of a wide range of application domains, few evaluate approximate comput ing techniques in iterative applications. These applications rely on the result of the com putations of previous iterations to perform subsequent iterations, making them sensitive to precision errors that can propagate and magnify throughout the execution. Additionally, monitoring the accuracy loss of the execution in large datasets is challenging. Calculating accuracy loss at runtime is computationally expensive and becomes infeasible in applica tions with a considerable volume of data. This thesis presents a methodology for generat ing interleaved execution configurations of multiple kernel versions for iterative applica tions on GPUs. The methodology involves sampling the accuracy loss profile, extracting performance and accuracy loss statistics, and offline generating interleaved execution con figurations of kernel versions for different thresholds of accuracy loss. The experiments conducted on three iterative applications of physical simulation in three-dimensional data domains demonstrated the capability of the methodology to extract performance and ac curacy loss statistics and generate interleaved execution configurations of kernel versions with speedups up to 2 and reduction of energy consumption up to 60%. For future work, we suggest studying different optimization strategies for generating interleaved execution configurations of kernel versions, such as using neural networks and machine learning.As técnicas de computação aproximada, particularmente aquelas envolvendo precisão reduzida e mista, são amplamente estudadas na literatura para acelerar aplicações e reduzir o consumo de energia. Embora muitos pesquisadores analisem o desempenho, perda de precisão e consumo de energia de uma ampla gama de domínios de aplicação, poucos avaliam técnicas de computação aproximada em aplicações iterativas. Essas aplicações dependem do resultado dos cálculos das iterações anteriores para realizar iterações sub sequentes, tornando-as sensíveis a erros de precisão que podem se propagar e amplificar durante a execução. Além disso, monitorar a perda de precisão da execução em grandes conjuntos de dados é desafiador. Calcular a perda de precisão em tempo de execução é computacionalmente caro e se torna inviável em aplicações com um volume considerável de dados. Esta tese apresenta uma metodologia para gerar configurações de execução entrelaçadas de múltiplas versões de kernel para aplicações iterativas em GPUs. A meto dologia envolve amostrar o perfil de perda de precisão, extrair estatísticas de desempenho e perda de precisão, e gerar offline configurações de execução entrelaçadas de versões de kernel para diferentes limiares de perda de precisão. Os experimentos realizados em três aplicações iterativas de simulação física em domínios de dados tridimensionais de monstraram a capacidade da metodologia de extrair estatísticas de desempenho e perda de precisão e gerar configurações de execução entrelaçadas de versões de kernel com speedups de até 2 e redução do consumo de energia de até 60%. Para trabalhos futuros, sugerimos estudar diferentes estratégias de otimização para gerar configurações de execução entrelaçadas de versões de kernel, como o uso de redes neurais e aprendizado de máquina

    Mixed Precision Tuning with Salsa

    Get PDF
    Precision tuning consists of finding the least floating-point formats enabling a program to compute some results with an accuracy requirement. In mixed precision, this problem has a huge combinatory since any value may have its own format. Precision tuning has given rise to the development of several tools that aim at guarantying a desired precision on the outputs of programs doing floating-point computations, by minimizing the initial, over-estimated, precision of the inputs and intermediary results. In this article, we present an extension of our tool for numerical accuracy, Salsa, which performs precision tuning. Originally, Salsa is a program transformation tool based on static analysis and which improves the accuracy of floating-point computations. We have extended Salsa with a precision tuning static analysis. We present experimental results showing the efficiency of this new feature as well as the additional gains that we obtain by performing Salsa’s program transformation before the precision tuning analysis. We experiment our tool on a set of programs coming from various domains like embedded systems and numerical analysis
    corecore