24 research outputs found

    Tecnologia adaptativa aplicada à otimização de código em compiladores

    Get PDF
    The programming memory space of embedded microcontrolled systems is usually limited. Although, compilers nowadays apply optimizing transformations to the embedded software, the lack of memory space can become a critical problem to the designer with the introduction of new features and corrections in the original software. In contrast, workstations hosting development systems for embedded applications are faster and have much more memory. Given this scenario, we have developed a peephole optimizer exploring an adaptive technique that requires more memory and execution time, but is capable to achieve a better compression ratio of the object code than a conventional peephole optimizer. The introduction of an adaptive action enables the algorithm to self-modify its behavior in response to a specific input condition and to search the sequence of optimization rules that best optimizes the object code among the many possible sequences resulted from the superposition of two or more equally applicable optimization rules.O espaço de memória de programação de sistemas microcontrolados embutidos é normalmente limitado. Embora os compiladores atuais apliquem transformações otimizantes ao software embutido, a falta de espaço de memória pode se tornar um problema crítico para o projetista com a introdução de novas facilidades e correções no software original. Por outro lado, as estações de trabalho hospedando os sistemas de desenvolvimento para aplicações embutidas são mais rápidas e dispõem de mais memória. Diante deste panorama, desenvolvemos um otimizador peephole explorando uma técnica adaptativa que requer mais memória e tempo de execução, mas é capaz de obter uma melhor taxa de compressão do código objeto do que um otimizador peephole convencional. A introdução de uma ação adaptativa permite que o algoritmo auto modifique o seu comportamento em resposta a uma condição de entrada específica e procure a seqüência de regras de otimização que melhor otimiza o código objeto entre as muitas seqüências possíveis resultantes da superposição de duas ou mais regras de otimização igualmente aplicáveis.Eje: Teoría (TEOR)Red de Universidades con Carreras en Informática (RedUNCI

    Tecnologia adaptativa aplicada à otimização de código em compiladores

    Get PDF
    The programming memory space of embedded microcontrolled systems is usually limited. Although, compilers nowadays apply optimizing transformations to the embedded software, the lack of memory space can become a critical problem to the designer with the introduction of new features and corrections in the original software. In contrast, workstations hosting development systems for embedded applications are faster and have much more memory. Given this scenario, we have developed a peephole optimizer exploring an adaptive technique that requires more memory and execution time, but is capable to achieve a better compression ratio of the object code than a conventional peephole optimizer. The introduction of an adaptive action enables the algorithm to self-modify its behavior in response to a specific input condition and to search the sequence of optimization rules that best optimizes the object code among the many possible sequences resulted from the superposition of two or more equally applicable optimization rules.O espaço de memória de programação de sistemas microcontrolados embutidos é normalmente limitado. Embora os compiladores atuais apliquem transformações otimizantes ao software embutido, a falta de espaço de memória pode se tornar um problema crítico para o projetista com a introdução de novas facilidades e correções no software original. Por outro lado, as estações de trabalho hospedando os sistemas de desenvolvimento para aplicações embutidas são mais rápidas e dispõem de mais memória. Diante deste panorama, desenvolvemos um otimizador peephole explorando uma técnica adaptativa que requer mais memória e tempo de execução, mas é capaz de obter uma melhor taxa de compressão do código objeto do que um otimizador peephole convencional. A introdução de uma ação adaptativa permite que o algoritmo auto modifique o seu comportamento em resposta a uma condição de entrada específica e procure a seqüência de regras de otimização que melhor otimiza o código objeto entre as muitas seqüências possíveis resultantes da superposição de duas ou mais regras de otimização igualmente aplicáveis.Eje: Teoría (TEOR)Red de Universidades con Carreras en Informática (RedUNCI

    Cautiously Optimistic Program Analyses for Secure and Reliable Software

    Full text link
    Modern computer systems still have various security and reliability vulnerabilities. Well-known dynamic analyses solutions can mitigate them using runtime monitors that serve as lifeguards. But the additional work in enforcing these security and safety properties incurs exorbitant performance costs, and such tools are rarely used in practice. Our work addresses this problem by constructing a novel technique- Cautiously Optimistic Program Analysis (COPA). COPA is optimistic- it infers likely program invariants from dynamic observations, and assumes them in its static reasoning to precisely identify and elide wasteful runtime monitors. The resulting system is fast, but also ensures soundness by recovering to a conservatively optimized analysis when a likely invariant rarely fails at runtime. COPA is also cautious- by carefully restricting optimizations to only safe elisions, the recovery is greatly simplified. It avoids unbounded rollbacks upon recovery, thereby enabling analysis for live production software. We demonstrate the effectiveness of Cautiously Optimistic Program Analyses in three areas: Information-Flow Tracking (IFT) can help prevent security breaches and information leaks. But they are rarely used in practice due to their high performance overhead (>500% for web/email servers). COPA dramatically reduces this cost by eliding wasteful IFT monitors to make it practical (9% overhead, 4x speedup). Automatic Garbage Collection (GC) in managed languages (e.g. Java) simplifies programming tasks while ensuring memory safety. However, there is no correct GC for weakly-typed languages (e.g. C/C++), and manual memory management is prone to errors that have been exploited in high profile attacks. We develop the first sound GC for C/C++, and use COPA to optimize its performance (16% overhead). Sequential Consistency (SC) provides intuitive semantics to concurrent programs that simplifies reasoning for their correctness. However, ensuring SC behavior on commodity hardware remains expensive. We use COPA to ensure SC for Java at the language-level efficiently, and significantly reduce its cost (from 24% down to 5% on x86). COPA provides a way to realize strong software security, reliability and semantic guarantees at practical costs.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/170027/1/subarno_1.pd

    Effective memory management for mobile environments

    Get PDF
    Smartphones, tablets, and other mobile devices exhibit vastly different constraints compared to regular or classic computing environments like desktops, laptops, or servers. Mobile devices run dozens of so-called “apps” hosted by independent virtual machines (VM). All these VMs run concurrently and each VM deploys purely local heuristics to organize resources like memory, performance, and power. Such a design causes conflicts across all layers of the software stack, calling for the evaluation of VMs and the optimization techniques specific for mobile frameworks. In this dissertation, we study the design of managed runtime systems for mobile platforms. More specifically, we deepen the understanding of interactions between garbage collection (GC) and system layers. We develop tools to monitor the memory behavior of Android-based apps and to characterize GC performance, leading to the development of new techniques for memory management that address energy constraints, time performance, and responsiveness. We implement a GC-aware frequency scaling governor for Android devices. We also explore the tradeoffs of power and performance in vivo for a range of realistic GC variants, with established benchmarks and real applications running on Android virtual machines. We control for variation due to dynamic voltage and frequency scaling (DVFS), Just-in-time (JIT) compilation, and across established dimensions of heap memory size and concurrency. Finally, we provision GC as a global service that collects statistics from all running VMs and then makes an informed decision that optimizes across all them (and not just locally), and across all layers of the stack. Our evaluation illustrates the power of such a central coordination service and garbage collection mechanism in improving memory utilization, throughput, and adaptability to user activities. In fact, our techniques aim at a sweet spot, where total on-chip energy is reduced (20–30%) with minimal impact on throughput and responsiveness (5–10%). The simplicity and efficacy of our approach reaches well beyond the usual optimization techniques

    Heap Data Allocation to Scratch-Pad Memory in Embedded Systems

    Get PDF
    This thesis presents the first-ever compile-time method for allocating a portion of a program's dynamic data to scratch-pad memory. A scratch-pad is a fast directly addressed compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees vs cache and by its significantly lower overheads in access time, energy consumption, area and overall runtime. Dynamic data refers to all objects allocated at run-time in a program, as opposed to static data objects which are allocated at compile-time. Existing compiler methods for allocating data to scratch-pad are able to place only code, global and stack data (static data) in scratch-pad memory; heap and recursive-function objects(dynamic data) are allocated entirely in DRAM, resulting in poor performance for these dynamic data types. Runtime methods based on software caching can place data in scratch-pad, but because of their high overheads from software address translation, they have not been successful, especially for dynamic data. In this thesis we present a dynamic yet compiler-directed allocation method for dynamic data that for the first time, (i) is able to place a portion of the dynamic data in scratch-pad; (ii) has no software-caching tags; (iii) requires no run-time per-access extra address translation; and (iv) is able to move dynamic data back and forth between scratch-pad and DRAM to better track the program's locality characteristics. With our method, code, global, stack and heap variables can share the same scratch-pad. When compared to placing all dynamic data variables in DRAM and only static data in scratch-pad, our results show that our method reduces the average runtime of our benchmarks by 22.3%, and the average power consumption by 26.7%, for the same size of scratch-pad fixed at 5% of total data size. Significant savings in runtime and energy across a large number of benchmarks were also observed when compared against cache memory organizations, showing our method's success under constrained SRAM sizes when dealing with dynamic data. Lastly, our method is able to minimize the profile dependence issues which plague all similar allocation methods through careful analysis of static and dynamic profile information

    Language Support for Programming High-Performance Code

    Get PDF
    Nowadays, the computing landscape is becoming increasingly heterogeneous and this trend is currently showing no signs of turning around. In particular, hardware becomes more and more specialized and exhibits different forms of parallelism. For performance-critical codes it is indispensable to address hardware-specific peculiarities. Because of the halting problem, however, it is unrealistic to assume that a program implemented in a general-purpose programming language can be fully automatically compiled to such specialized hardware while still delivering peak performance. One form of parallelism is single instruction, multiple data (SIMD). Part I of this thesis presents Sierra: an extension for C ++ that facilitates portable and effective SIMD programming. Part II discusses AnyDSL. This framework allows to embed a so-called domain-specific language (DSL) into a host language. On the one hand, a DSL offers the application developer a convenient interface; on the other hand, a DSL can perform domain-specific optimizations and effectively map DSL constructs to various architectures. In order to implement a DSL, one usually has to write or modify a compiler. With AnyDSL though, the DSL constructs are directly implemented in the host language while a partial evaluator removes any abstractions that are required in the implementation of the DSL.Die Rechnerlandschaft wird heutzutage immer heterogener und derzeit ist keine Trendwende in Sicht. Insbesondere wird die Hardware immer spezialisierter und weist verschiedene Formen der Parallelität auf. Für performante Programme ist es unabdingbar, hardwarespezifische Eigenheiten zu adressieren. Wegen des Halteproblems ist es allerdings unrealistisch anzunehmen, dass ein Programm, das in einer universell einsetzbaren Programmiersprache implementiert ist, vollautomatisch auf solche spezialisierte Hardware übersetzt werden kann und dabei noch Spitzenleistung erzielt. Eine Form der Parallelität ist „single instruction, multiple data (SIMD)“. Teil I dieser Arbeit stellt Sierra vor: eine Erweiterung für C++, die portable und effektive SIMD-Programmierung unterstützt. Teil II behandelt AnyDSL. Dieses Rahmenwerk ermöglicht es, eine sogenannte domänenspezifische Sprache (DSL) in eine Gastsprache einzubetten. Auf der einen Seite bietet eine DSL dem Anwendungsentwickler eine komfortable Schnittstelle; auf der anderen Seiten kann eine DSL domänenspezifische Optimierungen durchführen und DSL-Konstrukte effektiv auf verschiedene Architekturen abbilden. Um eine DSL zu implementieren, muss man gewöhnlich einen Compiler schreiben oder modifizieren. In AnyDSL werden die DSL-Konstrukte jedoch direkt in der Gastsprache implementiert und ein partieller Auswerter entfernt jegliche Abstraktionen, die in der Implementierung der DSL benötigt werden

    Verifying Information Flow Control Libraries

    Get PDF
    Information Flow Control (IFC) is a principled approach to protecting the confidentiality and integrity of data in software systems. Intuitively, IFC sys- tems associate data with security labels that track and restrict flows of information throughout a program in order to enforce security. Most IFC techniques require developers to use specific programming languages and tools that require substantial efforts to develop or to adopt. To avoid redundant work and lower the threshold for adopting secure languages, IFC has been embedded in general-purpose languages through software libraries that promote security-by-construction with their API.This thesis makes several contributions to state-of-the-art static (MAC) and dynamic IFC libraries (LIO) in three areas: expressive power, theoretical IFC foundations and protection against covert channels. Firstly, the thesis gives a functor algebraic structure to sensitive data, in a way that it can be processed through classic functional programming patterns that do not incur in security checks. Then, it establishes the formal security guarantees of MAC, using the standard proof technique of term erasure, enriched with two-steps erasure, a novel idea that simplifies reasoning about advanced programming features, such as exceptions, mutable references and concurrency. Secondly, the thesis demonstrates that the lightweight, but coarse-grained, enforcement of dynamic IFC libraries (e.g., LIO) can be as precise and permissive as the fine-grained, but heavyweight, approach of fully-fledged IFC languages. Lastly, the thesis contributes to the design of secure runtime systems that protect IFC libraries, and IFC languages as well, against internal- and external-timing covert channels that leak information through certain runtime system resources and features, such as lazy evaluation and parallelism.The results of this thesis are supported with extensive machine-checked proof scripts, consisting of 12,000 lines of code developed in the Agda proof assistant

    Enhanced applicability of loop transformations

    Get PDF

    Design and evaluation of a Thread-Level Speculation runtime library

    Get PDF
    En los próximos años es más que probable que máquinas con cientos o incluso miles de procesadores sean algo habitual. Para aprovechar estas máquinas, y debido a la dificultad de programar de forma paralela, sería deseable disponer de sistemas de compilación o ejecución que extraigan todo el paralelismo posible de las aplicaciones existentes. Así en los últimos tiempos se han propuesto multitud de técnicas paralelas. Sin embargo, la mayoría de ellas se centran en códigos simples, es decir, sin dependencias entre sus instrucciones. La paralelización especulativa surge como una solución para estos códigos complejos, posibilitando la ejecución de cualquier tipo de códigos, con o sin dependencias. Esta técnica asume de forma optimista que la ejecución paralela de cualquier tipo de código no de lugar a errores y, por lo tanto, necesitan de un mecanismo que detecte cualquier tipo de colisión. Para ello, constan de un monitor responsable que comprueba constantemente que la ejecución no sea errónea, asegurando que los resultados obtenidos de forma paralela sean similares a los de cualquier ejecución secuencial. En caso de que la ejecución fuese errónea los threads se detendrían y reiniciarían su ejecución para asegurar que la ejecución sigue la semántica secuencial. Nuestra contribución en este campo incluye (1) una nueva librería de ejecución especulativa fácil de utilizar; (2) nuevas propuestas que permiten reducir de forma significativa el número de accesos requeridos en las peraciones especulativas, así como consejos para reducir la memoria a utilizar; (3) propuestas para mejorar los métodos de scheduling centradas en la gestión dinámica de los bloques de iteraciones utilizados en las ejecuciones especulativas; (4) una solución híbrida que utiliza memoria transaccional para implementar las secciones críticas de una librería de paralelización especulativa; y (5) un análisis de las técnicas especulativas en uno de los dispositivos más vanguardistas del momento, los coprocesadores Intel Xeon Phi. Como hemos podido comprobar, la paralelización especulativa es un campo de investigación activo. Nuestros resultados demuestran que esta técnica permite obtener mejoras de rendimiento en un gran número de aplicaciones. Así, esperamos que este trabajo contribuya a facilitar el uso de soluciones especulativas en compiladores comerciales y/o modelos de programación paralela de memoria compartida.Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos
    corecore