68 research outputs found

    srcSlice: very efficient and scalable forward static slicing

    Full text link
    A highly efficient lightweight forward static slicing approach is presented and evaluated. The approach does not compute the program/system dependence graph but instead dependence and control information is com-puted as needed while computing the slice on a variable. The result is a list of line numbers, dependent vari-ables, aliases, and function calls that are part of the slice for all variables (both local and global) for the entire system. The method is implemented as a tool, called srcSlice, on top of srcML, an XML representation of source code. The approach is highly scalable and can generate the slices for all variables of the Linux kernel in approximately 20min on a typical desktop. Benchmark results are compared with the CodeSurfer slicing tool from GrammaTech Inc., and the approach compares well with regard to accuracy of slices. Copyright

    Tool-supported identification of functional concerns in object-oriented code

    Get PDF
    Concern identification aims to find the implementation of a functional concern in existing source code. In this work, concerns are described, using the Hierarchic Concern Model, as gray-boxes containing subconcerns, inputs, and outputs. The inputs and outputs are used as concern seeds to identify data-oriented abstractions of concern implementations, called concern skeletons. The identification approach is based on context free language reachability and supported by a tool, called CoDEx

    Object Histories in Java

    Get PDF
    Developers are often faced with the task of implementing new features or diagnosing problems in large software systems. Convoluted control and data flows in large object-oriented software systems, however, make even simple tasks extremely difficult, time-consuming, and frustrating. Specifically, Java programs manipulate objects by adding and removing them from collections and by putting and getting them from other objects' fields. Complex object histories hinder program understanding by forcing software maintainers to track the provenance of objects through their past histories when diagnosing software faults. In this thesis, we present a novel approach which answers queries about the evolution of objects throughout their lifetime in a program. On-demand answers to object history queries aids the maintenance of large software systems by allowing developers to pinpoint relevant details quickly. We describe an event-based, flow-insensitive, interprocedural program analysis technique for computing object histories and answering history queries. Our analysis technique identifies all relevant events affecting an object and uses pointer analysis to filter out irrelevant events. It uses prior knowledge of the meanings of methods in the Java collection classes to improve the quality of the histories. We present the details of our technique and experimental results that highlight the utility of object histories in common programming tasks

    Heap Abstractions for Static Analysis

    Full text link
    Heap data is potentially unbounded and seemingly arbitrary. As a consequence, unlike stack and static memory, heap memory cannot be abstracted directly in terms of a fixed set of source variable names appearing in the program being analysed. This makes it an interesting topic of study and there is an abundance of literature employing heap abstractions. Although most studies have addressed similar concerns, their formulations and formalisms often seem dissimilar and some times even unrelated. Thus, the insights gained in one description of heap abstraction may not directly carry over to some other description. This survey is a result of our quest for a unifying theme in the existing descriptions of heap abstractions. In particular, our interest lies in the abstractions and not in the algorithms that construct them. In our search of a unified theme, we view a heap abstraction as consisting of two features: a heap model to represent the heap memory and a summarization technique for bounding the heap representation. We classify the models as storeless, store based, and hybrid. We describe various summarization techniques based on k-limiting, allocation sites, patterns, variables, other generic instrumentation predicates, and higher-order logics. This approach allows us to compare the insights of a large number of seemingly dissimilar heap abstractions and also paves way for creating new abstractions by mix-and-match of models and summarization techniques.Comment: 49 pages, 20 figure

    Coherent Dependence Cluster

    Get PDF
    This thesis introduces coherent dependence clusters and shows their relevance in areas of software engineering such as program comprehension and mainte- nance. All statements in a coherent dependence cluster depend upon the same set of statements and affect the same set of statements; a coherent cluster’s statements have ‘coherent’ shared backward and forward dependence. We introduce an approximation to efficiently locate coherent clusters and show that its precision significantly improves over previous approximations. Our empirical study also finds that, despite their tight coherence constraints, coherent dependence clusters are to be found in abundance in production code. Studying patterns of clustering in several open-source and industrial programs reveal that most contain multiple significant coherent clusters. A series of case studies reveal that large clusters map to logical functionality and pro- gram structure. Cluster visualisation also reveals subtle deficiencies of program structure and identify potential candidates for refactoring efforts. Supplemen- tary studies of inter-cluster dependence is presented where identification of coherent clusters can help in deriving hierarchical system decomposition for reverse engineering purposes. Furthermore, studies of program faults find no link between existence of coherent clusters and software bugs. Rather, a longi- tudinal study of several systems find that coherent clusters represent the core architecture of programs during system evolution. Due to the inherent conservativeness of static analysis, it is possible for unreachable code and code implementing cross-cutting concerns such as error- handling and debugging to link clusters together. This thesis studies their effect on dependence clusters by using coverage information to remove unexecuted and rarely executed code. Empirical evaluation reveals that code reduction yields smaller slices and clusters

    An investigation into the unsoundness of static program analysis : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Palmerston North, New Zealand

    Get PDF
    Static program analysis is widely used in many software applications such as in security analysis, compiler optimisation, program verification and code refactoring. In contrast to dynamic analysis, static analysis can perform a full program analysis without the need of running the program under analysis. While it provides full program coverage, one of the main issues with static analysis is imprecision -- i.e., the potential of reporting false positives due to overestimating actual program behaviours. For many years, research in static program analysis has focused on reducing such imprecision while improving scalability. However, static program analysis may also miss some critical parts of the program, resulting in program behaviours not being reported. A typical example of this is the case of dynamic language features, where certain behaviours are hard to model due to their dynamic nature. The term ``unsoundness'' has been used to describe those missed program behaviours. Compared to static analysis, dynamic analysis has the advantage of obtaining precise results, as it only captures what has been executed during run-time. However, dynamic analysis is also limited to the defined program executions. This thesis investigates the unsoundness issue in static program analysis. We first investigate causes of unsoundness in terms of Java dynamic language features and identify potential usage patterns of such features. We then report the results of a number of empirical experiments we conducted in order to identify and categorise the sources of unsoundness in state-of-the-art static analysis frameworks. Finally, we quantify and measure the level of unsoundness in static analysis in the presence of dynamic language features. The models developed in this thesis can be used by static analysis frameworks and tools to boost the soundness in those frameworks and tools

    Infrared: A Meta Bug Detector

    Full text link
    The recent breakthroughs in deep learning methods have sparked a wave of interest in learning-based bug detectors. Compared to the traditional static analysis tools, these bug detectors are directly learned from data, thus, easier to create. On the other hand, they are difficult to train, requiring a large amount of data which is not readily available. In this paper, we propose a new approach, called meta bug detection, which offers three crucial advantages over existing learning-based bug detectors: bug-type generic (i.e., capable of catching the types of bugs that are totally unobserved during training), self-explainable (i.e., capable of explaining its own prediction without any external interpretability methods) and sample efficient (i.e., requiring substantially less training data than standard bug detectors). Our extensive evaluation shows our meta bug detector (MBD) is effective in catching a variety of bugs including null pointer dereference, array index out-of-bound, file handle leak, and even data races in concurrent programs; in the process MBD also significantly outperforms several noteworthy baselines including Facebook Infer, a prominent static analysis tool, and FICS, the latest anomaly detection method

    Verification, slicing, and visualization of programs with contracts

    Get PDF
    Tese de doutoramento em Informática (área de especialização em Ciências da Computação)As a specification carries out relevant information concerning the behaviour of a program, why not explore this fact to slice a program in a semantic sense aiming at optimizing it or easing its verification? It was this idea that Comuzzi, in 1996, introduced with the notion of postcondition-based slicing | slice a program using the information contained in the postcondition (the condition Q that is guaranteed to hold at the exit of a program). After him, several advances were made and different extensions were proposed, bridging the two areas of Program Verification and Program Slicing: specifically precondition-based slicing and specification-based slicing. The work reported in this Ph.D. dissertation explores further relations between these two areas aiming at discovering mutual benefits. A deep study of specification-based slicing has shown that the original algorithm is not efficient and does not produce minimal slices. In this dissertation, traditional specification-based slicing algorithms are revisited and improved (their formalization is proposed under the name of assertion-based slicing), in a new framework that is appropriate for reasoning about imperative programs annotated with contracts and loop invariants. In the same theoretical framework, the semantic slicing algorithms are extended to work at the program level through a new concept called contract based slicing. Contract-based slicing, constituting another contribution of this work, allows for the study of a program at an interprocedural level, enabling optimizations in the context of code reuse. Motivated by the lack of tools to prove that the proposed algorithms work in practice, a tool (GamaSlicer) was also developed. It implements all the existing semantic slicing algorithms, in addition to the ones introduced in this dissertation. This third contribution is based on generic graph visualization and animation algorithms that were adapted to work with verification and slice graphs, two specific cases of labelled control low graphs.Tendo em conta que uma especificação contém informação relevante no que diz respeito ao comportamento de um programa, faz sentido explorar este facto para o cortar em fatias (slice) com o objectivo de o optimizar ou de facilitar a sua verificação. Foi precisamente esta ideia que Comuzzi introduziu, em 1996, apresentando o conceito de postcondition-based slicing que consiste em cortar um programa usando a informação contida na pos-condicão (a condição Q que se assegura ser verdadeira no final da execução do programa). Depois da introdução deste conceito, vários avanços foram feitos e diferentes extensões foram propostas, aproximando desta forma duas áreas que até então pareciam desligadas: Program Verification e Program Slicing. Entre estes conceitos interessa-nos destacar as noções de precondition-based slicing e specification-based slicing, que serão revisitadas neste trabalho. Um estudo aprofundado do conceito de specification-based slicing relevou que o algoritmo original não é eficiente e não produz slices mínimos. O trabalho reportado nesta dissertação de doutoramento explora a ideia de tornar mais próximas essas duas áreas visando obter benefícios mútuos. Assim, estabelecendo uma nova base teórica matemática, os algoritmos originais de specification-based slicing são revistos e aperfeiçoados | a sua formalizacão é proposta com o nome de assertion-based slicing. Ainda sobre a mesma base teórica, os algoritmos de slicing são extendidos, de forma a funcionarem ao nível do programa; alem disso introduz-se um novo conceito: contract-based slicing. Este conceito, contract-based slicing, sendo mais um dos contributos do trabalho aqui descrito, possibilita o estudo de um programa ao nível externo de um procedimento, permitindo, por um lado, otimizações no contexto do seu uso, e por outro, a sua reutilização segura. Devido à falta de ferramentas que provem que os algoritmos propostos de facto funcionam na prática, foi desenvolvida uma, com o nome GamaSlicer, que implementa todos os algoritmos existentes de slicing semântico e os novos propostos. Uma terceira contribuição é baseada nos algoritmos genéricos de visualização e animação de grafos que foram adaptados para funcionar com os grafos de controlo de fluxo etiquetados e os grafos de verificação e slicing.Fundação para a Ciência e a Tecnologia (FCT) através da Bolsa de Doutoramento SFRH/BD/33231/2007Projecto RESCUE (contrato FCT sob a referência PTDC / EIA / 65862 /2006)Projecto CROSS (contrato FCT sob a referência PTDC / EIACCO / 108995 / 2008

    Program Transformations in Magnolia

    Get PDF
    We explore program transformations in the context of the Magnolia programming language. We discuss research and implementations of transformation techniques, scenarios to put them to use in Magnolia, interfacing with transformations, and potential workflows and tooling that this approach to programming enables.Vi utforsker program transformasjoner med tanke på programmeringsspråket Magnolia. Vi diskuterer forsking og implementasjoner av transformasjonsteknikker, sammenhenger der vi kan bruke dei i Magnolia, grensesnitt til transformasjoner, og potensielle arbeidsflyt og verktøy som denne tilnærmingen til programmering kan tillate og fremme.Masteroppgåve i informatikkINF39
    corecore