183 research outputs found

    OSCAR. A Noise Injection Framework for Testing Concurrent Software

    Get PDF
    “Moore’s Law” is a well-known observable phenomenon in computer science that describes a visible yearly pattern in processor’s die increase. Even though it has held true for the last 57 years, thermal limitations on how much a processor’s core frequencies can be increased, have led to physical limitations to their performance scaling. The industry has since then shifted towards multicore architectures, which offer much better and scalable performance, while in turn forcing programmers to adopt the concurrent programming paradigm when designing new software, if they wish to make use of this added performance. The use of this paradigm comes with the unfortunate downside of the sudden appearance of a plethora of additional errors in their programs, stemming directly from their (poor) use of concurrency techniques. Furthermore, these concurrent programs themselves are notoriously hard to design and to verify their correctness, with researchers continuously developing new, more effective and effi- cient methods of doing so. Noise injection, the theme of this dissertation, is one such method. It relies on the “probe effect” — the observable shift in the behaviour of concurrent programs upon the introduction of noise into their routines. The abandonment of ConTest, a popular proprietary and closed-source noise injection framework, for testing concurrent software written using the Java programming language, has left a void in the availability of noise injection frameworks for this programming language. To mitigate this void, this dissertation proposes OSCAR — a novel open-source noise injection framework for the Java programming language, relying on static bytecode instrumentation for injecting noise. OSCAR will provide a free and well-documented noise injection tool for research, pedagogical and industry usage. Additionally, we propose a novel taxonomy for categorizing new and existing noise injection heuristics, together with a new method for generating and analysing concurrent software traces, based on string comparison metrics. After noising programs from the IBM Concurrent Benchmark with different heuristics, we observed that OSCAR is highly effective in increasing the coverage of the interleaving space, and that the different heuristics provide diverse trade-offs on the cost and benefit (time/coverage) of the noise injection process.Resumo A “Lei de Moore” é um fenómeno, bem conhecido na área das ciências da computação, que descreve um padrão evidente no aumento anual da densidade de transístores num processador. Mesmo mantendo-se válido nos últimos 57 anos, o aumento do desempenho dos processadores continua garrotado pelas limitações térmicas inerentes `a subida da sua frequência de funciona- mento. Desde então, a industria transitou para arquiteturas multi núcleo, com significativamente melhor e mais escalável desempenho, mas obrigando os programadores a adotar o paradigma de programação concorrente ao desenhar os seus novos programas, para poderem aproveitar o desempenho adicional que advém do seu uso. O uso deste paradigma, no entanto, traz consigo, por consequência, a introdução de uma panóplia de novos erros nos programas, decorrentes diretamente da utilização (inadequada) de técnicas de programação concorrente. Adicionalmente, estes programas concorrentes são conhecidos por serem consideravelmente mais difíceis de desenhar e de validar, quanto ao seu correto funcionamento, incentivando investi- gadores ao desenvolvimento de novos métodos mais eficientes e eficazes de o fazerem. A injeção de ruído, o tema principal desta dissertação, é um destes métodos. Esta baseia-se no “efeito sonda” (do inglês “probe effect”) — caracterizado por uma mudança de comportamento observável em programas concorrentes, ao terem ruído introduzido nas suas rotinas. Com o abandono do Con- Test, uma framework popular, proprietária e de código fechado, de análise dinâmica de programas concorrentes através de injecção de ruído, escritos com recurso `a linguagem de programação Java, viu-se surgir um vazio na oferta de framework de injeção de ruído, para esta mesma linguagem. Para mitigar este vazio, esta dissertação propõe o OSCAR — uma nova framework de injeção de ruído, de código-aberto, para a linguagem de programação Java, que utiliza manipulação estática de bytecode para realizar a introdução de ruído. O OSCAR pretende oferecer uma ferramenta livre e bem documentada de injeção de ruído para fins de investigação, pedagógicos ou até para a indústria. Adicionalmente, a dissertação propõe uma nova taxonomia para categorizar os dife- rentes tipos de heurísticas de injecção de ruídos novos e existentes, juntamente com um método para gerar e analisar traces de programas concorrentes, com base em métricas de comparação de strings. Após inserir ruído em programas do IBM Concurrent Benchmark, com diversas heurísticas, ob- servámos que o OSCAR consegue aumentar significativamente a dimensão da cobertura do espaço de estados de programas concorrentes. Adicionalmente, verificou-se que diferentes heurísticas produzem um leque variado de prós e contras, especialmente em termos de eficácia versus eficiência

    Deductive Verification of Concurrent Programs and its Application to Secure Information Flow for Java

    Get PDF
    Formal verification of concurrent programs still poses a major challenge in computer science. Our approach is an adaptation of the modular rely/guarantee methodology in dynamic logic. Besides functional properties, we investigate language-based security. Our verification approach extends naturally to multi-threaded Java and we present an implementation in the KeY verification system. We propose natural extensions to JML regarding both confidentiality properties and multi-threaded programs

    Parallel Markov Chain Monte Carlo

    Get PDF
    The increasing availability of multi-core and multi-processor architectures provides new opportunities for improving the performance of many computer simulations. Markov Chain Monte Carlo (MCMC) simulations are widely used for approximate counting problems, Bayesian inference and as a means for estimating very highdimensional integrals. As such MCMC has found a wide variety of applications in fields including computational biology and physics,financial econometrics, machine learning and image processing. This thesis presents a number of new method for reducing the runtime of Markov Chain Monte Carlo simulations by using SMP machines and/or clusters. Two of the methods speculatively perform iterations in parallel, reducing the runtime of MCMC programs whilst producing statistically identical results to conventional sequential implementations. The other methods apply only to problem domains that can be presented as an image, and involve using various means of dividing the image into subimages that can be proceed with some degree of independence. Where possible the thesis includes a theoretical analysis of the reduction in runtime that may be achieved using our technique under perfect conditions, and in all cases the methods are tested and compared on selection of multi-core and multi-processor architectures. A framework is provided to allow easy construction of MCMC application that implement these parallelisation methods

    Parallel Markov Chain Monte Carlo

    Get PDF
    The increasing availability of multi-core and multi-processor architectures provides new opportunities for improving the performance of many computer simulations. Markov Chain Monte Carlo (MCMC) simulations are widely used for approximate counting problems, Bayesian inference and as a means for estimating very highdimensional integrals. As such MCMC has found a wide variety of applications in fields including computational biology and physics,financial econometrics, machine learning and image processing. This thesis presents a number of new method for reducing the runtime of Markov Chain Monte Carlo simulations by using SMP machines and/or clusters. Two of the methods speculatively perform iterations in parallel, reducing the runtime of MCMC programs whilst producing statistically identical results to conventional sequential implementations. The other methods apply only to problem domains that can be presented as an image, and involve using various means of dividing the image into subimages that can be proceed with some degree of independence. Where possible the thesis includes a theoretical analysis of the reduction in runtime that may be achieved using our technique under perfect conditions, and in all cases the methods are tested and compared on selection of multi-core and multi-processor architectures. A framework is provided to allow easy construction of MCMC application that implement these parallelisation methods.EThOS - Electronic Theses Online ServiceUniversity of Warwick. Dept. of Computer ScienceGBUnited Kingdo

    Data complexity in supervised learning: A far-reaching implication

    Get PDF
    Aquesta tesi estudia la complexitat de les dades i el seu rol en la definició del comportament de les tècniques d'aprenentatge supervisat, i alhora explora la generació artificial de conjunts de dades mitjançant estimadors de complexitat. El treball s'ha construït sobre quatre principis que s'han succeït de manera natural. (1) La crítica de la metodologia actual utilitzada per la comunitat científica per avaluar el rendiment de nous sistemes d'aprenentatge ha desencadenat (2) l'interès per estimadors alternatius basats en l'anàlisi de la complexitat de les dades i el seu estudi. Ara bé, tant l'estat primerenc de les mesures de complexitat com la disponibilitat limitada de problemes del món real per fer el seu test han inspirat (3) la generació sintètica de problemes, la qual ha esdevingut l'eix central de la tesi, i (4) la proposta de fer servir estàndards artificials amb semblança als problemes reals. L'objectiu que es persegueix a llarg termini amb aquesta recerca és proporcionar als usuaris (1) unes directrius per escollir el sistema d'aprenentatge idoni per resoldre el seu problema i (2) una col•lecció de problemes per, o bé avaluar el rendiment dels sistemes d'aprenentatge, o bé provar les seves limitacions.Esta tesis profundiza en el estudio de la complejidad de los datos y su papel en la definición del comportamiento de las técnicas de aprendizaje supervisado, a la vez que explora la generación artificial de conjuntos de datos mediante estimadores de complejidad. El trabajo se ha construido sobre cuatro pilares que se han sucedido de manera natural. (1) La crítica de la metodología actual utilizada por la comunidad científica para evaluar el rendimiento de nuevos sistemas de aprendizaje ha desatado (2) el interés por estimadores alternativos basados en el análisis de la complejidad de los datos y su estudio. Sin embargo, tanto el estado primerizo de las medidas de complejidad como la limitada disponibilidad de problemas del mundo real para su testeo han inspirado (3) la generación sintética de problemas, considerada el eje central de la tesis, y (4) la propuesta del uso de estándares artificiales con parecido a los problemas reales. El objetivo que se persigue a largo plazo con esta investigación es el de proporcionar a los usuarios (1) unas pautas pare escoger el sistema de aprendizaje más idóneo para resolver su problema y (2) una colección de problemas para evaluar el rendimiento de los sistemas de aprendizaje o probar sus limitaciones.This thesis takes a close view of data complexity and its role shaping the behaviour of machine learning techniques in supervised learning and explores the generation of synthetic data sets through complexity estimates. The work has been built upon four principles which have naturally followed one another. (1) A critique about the current methodologies used by the machine learning community to evaluate the performance of new learners unleashes (2) the interest for alternative estimates based on the analysis of data complexity and its study. However, both the early stage of the complexity measures and the limited availability of real-world problems for testing inspire (3) the generation of synthetic problems, which becomes the backbone of this thesis, and (4) the proposal of artificial benchmarks resembling real-world problems. The ultimate goal of this research flow is, in the long run, to provide practitioners (1) with some guidelines to choose the most suitable learner given a problem and (2) with a collection of benchmarks to either assess the performance of the learners or test their limitations

    Challenges and applications of assembly level software model checking

    Get PDF
    This thesis addresses the application of a formal method called Model Checking to the domain of software verification. Here, exploration algorithms are used to search for errors in a program. In contrast to the majority of other approaches, we claim that the search should be applied to the actual source code of the program, rather than to some formal model. There are several challenges that need to be overcome to build such a model checker. First, the tool must be capable to handle the full semantics of the underlying programming language. This implies a considerable amount of additional work unless the interpretation of the program is done by some existing infrastructure. The second challenge lies in the increased memory requirements needed to memorize entire program configurations. This additionally aggravates the problem of large state spaces that every model checker faces anyway. As a remedy to the first problem, the thesis proposes to use an existing virtual machine to interpret the program. This takes the burden off the developer, who can fully concentrate on the model checking algorithms. To address the problem of large program states, we call attention to the fact that most transitions in a program only change small fractions of the entire program state. Based on this observation, we devise an incremental storing of states which considerably lowers the memory requirements of program exploration. To further alleviate the per-state memory requirement, we apply state reconstruction, where states are no longer memorized explicitly but through their generating path. Another problem that results from the large state description of a program lies in the computational effort of hashing, which is exceptionally high for the used approach. Based on the same observation as used for the incremental storing of states, we devise an incremental hash function which only needs to process the changed parts of the program’s state. Due to the dynamic nature of computer programs, this is not a trivial task and constitutes a considerable part of the overall thesis. Moreover, the thesis addresses a more general problem of model checking - the state explosion, which says that the number of reachable states grows exponentially in the number of state components. To minimize the number of states to be memorized, the thesis concentrates on the use of heuristic search. It turns out that only a fraction of all reachable states needs to be visited to find a specific error in the program. Heuristics can greatly help to direct the search forwards the error state. As another effective way to reduce the number of memorized states, the thesis proposes a technique that skips intermediate states that do not affect shared resources of the program. By merging several consecutive state transitions to a single transition, the technique may considerably truncate the search tree. The proposed approach is realized in StEAM, a model checker for concurrent C++ programs, which was developed in the course of the thesis. Building on an existing virtual machine, the tool provides a set of blind and directed search algorithms for the detection of errors in the actual C++ implementation of a program. StEAM implements all of the aforesaid techniques, whose effectiveness is experimentally evaluated at the end of the thesis. Moreover, we exploit the relation between model checking and planning. The claim is, that the two fields of research have great similarities and that technical advances in one fields can easily carry over to the other. The claim is supported by a case study where StEAM is used as a planner for concurrent multi-agent systems. The thesis also contains a user manual for StEAM and technical details that facilitate understanding the engineering process of the tool

    Software-Oriented Distributed Shared Cache Management for Chip Multiprocessors

    Get PDF
    This thesis proposes a software-oriented distributed shared cache management approach for chip multiprocessors (CMPs). Unlike hardware-based schemes, our approach offloads the cache management task to trace analysis phase, allowing flexible management strategies. For single-threaded programs, a static 2D page coloring scheme is proposed to utilize oracle trace information to derive an optimal data placement schema for a program. In addition, a dynamic 2D page coloring scheme is proposed as a practical solution, which tries to ap- proach the performance of the static scheme. The evaluation results show that the static scheme achieves 44.7% performance improvement over the conventional shared cache scheme on average while the dynamic scheme performs 32.3% better than the shared cache scheme. For latency-oriented multithreaded programs, a pattern recognition algorithm based on the K-means clustering method is introduced. The algorithm tries to identify data access pat- terns that can be utilized to guide the placement of private data and the replication of shared data. The experimental results show that data placement and replication based on these access patterns lead to 19% performance improvement over the shared cache scheme. The reduced remote cache accesses and aggregated cache miss rate result in much lower bandwidth requirements for the on-chip network and the off-chip main memory bus. Lastly, for throughput-oriented multithreaded programs, we propose a hint-guided data replication scheme to identify memory instructions of a target program that access data with a high reuse property. The derived hints are then used to guide data replication at run time. By balancing the amount of data replication and local cache pressure, the proposed scheme has the potential to help achieve comparable performance to best existing hardware-based schemes.Our proposed software-oriented shared cache management approach is an effective way to manage program performance on CMPs. This approach provides an alternative direction to the research of the distributed cache management problem. Given the known difficulties (e.g., scalability and design complexity) we face with hardware-based schemes, this software- oriented approach may receive a serious consideration from researchers in the future. In this perspective, the thesis provides valuable contributions to the computer architecture research society
    corecore