407 research outputs found

    Verified lifting of stencil computations

    Get PDF
    This paper demonstrates a novel combination of program synthesis and verification to lift stencil computations from low-level Fortran code to a high-level summary expressed using a predicate language. The technique is sound and mostly automated, and leverages counter-example guided inductive synthesis (CEGIS) to find provably correct translations. Lifting existing code to a high-performance description language has a number of benefits, including maintainability and performance portability. For example, our experiments show that the lifted summaries can enable domain specific compilers to do a better job of parallelization as compared to an off-the-shelf compiler working on the original code, and can even support fully automatic migration to hardware accelerators such as GPUs. We have implemented verified lifting in a system called STNG and have evaluated it using microbenchmarks, mini-apps, and real-world applications. We demonstrate the benefits of verified lifting by first automatically summarizing Fortran source code into a high-level predicate language, and subsequently translating the lifted summaries into Halide, with the translated code achieving median performance speedups of 4.1X and up to 24X for non-trivial stencils as compared to the original implementation.United States. Department of Energy. Office of Science (Award DE-SC0008923)United States. Department of Energy. Office of Science (Award DE-SC0005288

    Systematic analysis of the cache behavior of irregular codes

    Get PDF
    [Resumen] El rendimiento de las jerarquías de memoria, en las cuales la caché juega un papel fundamental, es crítico en los computadores de proposito general actuales y en los sistemas embebidos, debido al creciente problema del cuello de botella del sistema de memoria. Desafortunadamente, el comportamiento de la caché es muy inestable y difícil de predecir. Esto es especialmente cierto en presencia de patrones de acceso irregulares, los cuales exhiben poca localidad. Tales patrones son muy comunes por ejemplo en aplicaciones en las cuales algunas referencias están afectadas por sentencias condicionales o en las que el almacenamiento comprimido de matrices dispersas da lugar a la aparición de indirecciones. SIn embargo, el comportamiento caché en presencia de patrones de acceso irregulares no ha sido estudiado ampliamente. En esta tesis presentamos extensiones de una técnica de modelado analítico sistemático basadas en PMEs (Ecuaciones probabilísticas de fallos) que permiten el análisis automático del comportamiento caché para códigos que incluyen sentencias condicionales cuyo valor de verdad puede no ser determinable en tiempo de compilación y códigos con referencias irregulares debidas a indirecciones, respectivamente. El modelo genera predicciones muy precisar a pesar de la irregularidad y tiene un bajo coste computacional siendo el primer modelo que reune estas dos características capaz de analizar automáticamente esta clase de códigos. Estas propiedades convierten al modelo en adecuado para servir de guía en optimizaciones del compilador. La extensión del modelo para códigos irregulares con indirecciones ha sido integrada en el compilador XARK, un compilador orientado al reconocimiento automático de kernels en aplicaciones científicas. Mostramos como explotar las potentes capacidades de extracción de información de este compilador para permitir el modelado automático de códigos científicos basados en bucles

    Mira: A Framework for Static Performance Analysis

    Full text link
    The performance model of an application can pro- vide understanding about its runtime behavior on particular hardware. Such information can be analyzed by developers for performance tuning. However, model building and analyzing is frequently ignored during software development until perfor- mance problems arise because they require significant expertise and can involve many time-consuming application runs. In this paper, we propose a fast, accurate, flexible and user-friendly tool, Mira, for generating performance models by applying static program analysis, targeting scientific applications running on supercomputers. We parse both the source code and binary to estimate performance attributes with better accuracy than considering just source or just binary code. Because our analysis is static, the target program does not need to be executed on the target architecture, which enables users to perform analysis on available machines instead of conducting expensive exper- iments on potentially expensive resources. Moreover, statically generated models enable performance prediction on non-existent or unavailable architectures. In addition to flexibility, because model generation time is significantly reduced compared to dynamic analysis approaches, our method is suitable for rapid application performance analysis and improvement. We present several scientific application validation results to demonstrate the current capabilities of our approach on small benchmarks and a mini application

    Proceedings of the 3rd Workshop on Domain-Specific Language Design and Implementation (DSLDI 2015)

    Full text link
    The goal of the DSLDI workshop is to bring together researchers and practitioners interested in sharing ideas on how DSLs should be designed, implemented, supported by tools, and applied in realistic application contexts. We are both interested in discovering how already known domains such as graph processing or machine learning can be best supported by DSLs, but also in exploring new domains that could be targeted by DSLs. More generally, we are interested in building a community that can drive forward the development of modern DSLs. These informal post-proceedings contain the submitted talk abstracts to the 3rd DSLDI workshop (DSLDI'15), and a summary of the panel discussion on Language Composition
    corecore