11 research outputs found

    Hydrogen: A Framework for Analyzing Software Revision Histories

    Get PDF
    Hydrogen is a framework used for analyzing software revision histories for such applications as verifying bug fixes and identifying changes that cause bugs. The framework uses a graph representation of multiple versions of a program in a software revision history called a multi-version interprocedural control flow graph (MVICFG). The MVICFG integrates the control flow for multiple versions of a program into a single graph and provides a convenient way to represent semantic (i.e. control flow) change in a program. The MVICFG can also reduce the storage demands for representing the control flow for multiple versions of a program. Hydrogen implements an algorithm that uses data mined from source code repositories to construct the MVICFG. The MVICFG is analyzed using demand driven analysis for patch verification in multiple releases of software

    Flexeme: Untangling Commits Using Lexical Flows

    Get PDF

    Improving Software Project Health Using Machine Learning

    Get PDF
    In recent years, systems that would previously live on different platforms have been integrated under a single umbrella. The increased use of GitHub, which offers pull-requests, issue trackingand version history, and its integration with other solutions such as Gerrit, or Travis, as well as theresponse from competitors, created development environments that favour agile methodologiesby increasingly automating non-coding tasks: automated build systems, automated issue triagingetc. In essence, source-code hosting platforms shifted to continuous integration/continuousdelivery (CI/CD) as a service. This facilitated a shift in development paradigms, adherents ofagile methodology can now adopt a CI/CD infrastructure more easily. This has also created large,publicly accessible sources of source-code together with related project artefacts: GHTorrent andsimilar datasets now offer programmatic access to the whole of GitHub. Project health encompasses traceability, documentation, adherence to coding conventions,tasks that reduce maintenance costs and increase accountability, but may not directly impactfeatures. Overfocus on health can slow velocity (new feature delivery) so the Agile Manifestosuggests developers should travel light — forgo tasks focused on a project health in favourof higher feature velocity. Obviously, injudiciously following this suggestion can undermine aproject’s chances for success. Simultaneously, this shift to CI/CD has allowed the proliferation of Natural Language orNatural Language and Formal Language textual artefacts that are programmatically accessible:GitHub and their competitors allow API access to their infrastructure to enable the creation ofCI/CD bots. This suggests that approaches from Natural Language Processing and MachineLearning are now feasible and indeed desirable. This thesis aims to (semi-)automate tasks forthis new paradigm and its attendant infrastructure by bringing to the foreground the relevant NLPand ML techniques. Under this umbrella, I focus on three synergistic tasks from this domain: (1) improving theissue-pull-request traceability, which can aid existing systems to automatically curate the issuebacklog as pull-requests are merged; (2) untangling commits in a version history, which canaid the beforementioned traceability task as well as improve the usability of determining a faultintroducing commit, or cherry-picking via tools such as git bisect; (3) mixed-text parsing, to allowbetter API mining and open new avenues for project-specific code-recommendation tools

    Finding Differences in Privilege Protection and their Origin in Role-Based Access Control Implementations

    Get PDF
    Les applications Web sont très courantes, et ont des besoins de sécurité. L’un d’eux est le contrôle d’accès. Le contrôle d’accès s’assure que la politique de sécurité est respectée. Cette politique définit l’accès légitime aux données et aux opérations de l’application. Les applications Web utilisent régulièrement le contrôle d’accès à base de rôles (en anglais, « Role-Based Access Control » ou RBAC). Les politiques de sécurité RBAC permettent aux développeurs de définir des rôles et d’assigner des utilisateurs à ces rôles. De plus, l’assignation des privilèges d’accès se fait au niveau des rôles. Les applications Web évoluent durant leur maintenance et des changements du code source peuvent affecter leur sécurité de manière inattendue. Pour éviter que ces changements engendrent des régressions et des vulnérabilités, les développeurs doivent revalider l’implémentation RBAC de leur application. Ces revalidations peuvent exiger des ressources considérables. De plus, la tâche est compliquée par l’éloignement possible entre le changement et son impact sur la sécurité (e.g. dans des procédures ou fichiers différents). Pour s’attaquer à cette problématique, nous proposons des analyses statiques de programmes autour de la protection garantie des privilèges. Nous générons automatiquement des modèles de protection des privilèges. Pour ce faire, nous utilisons l’analyse de flux par traversement de patron (en anglais, « Pattern Traversal Flow Analysis » ou PTFA) à partir du code source de l’application. En comparant les modèles PTFA de différentes versions, nous déterminons les impacts des changements de code sur la protection des privilèges. Nous appelons ces impacts de sécurité des différences de protection garantie (en anglais, « Definite Protection Difference » ou DPD). En plus de trouver les DPD entre deux versions, nous établissons une classification des différences reposant sur la théorie des ensembles.----------ABSTRACT : Web applications are commonplace, and have security needs. One of these is access control. Access control enforces a security policy that allows and restricts access to information and operations. Web applications often use Role-Based Access Control (RBAC) to restrict operations and protect security-sensitive information and resources. RBAC allows developers to assign users to various roles, and assign privileges to the roles. Web applications undergo maintenance and evolution. Their security may be affected by source code changes between releases. Because these changes may impact security in unexpected ways, developers need to revalidate their RBAC implementation to prevent regressions and vulnerabilities. This may be resource-intensive. This task is complicated by the fact that the code change and its security impact may be distant (e.g. in different functions or files). To address this issue, we propose static program analyses of definite privilege protection. We automatically generate privilege protection models from the source code using Pattern Traversal Flow Analysis (PTFA). Using differences between versions and PTFA models, we determine privilege-level security impacts of code changes using definite protection differences (DPDs) and apply a set-theoretic classification to them. We also compute explanatory counter-examples for DPDs in PTFA models. In addition, we shorten them using graph transformations in order to facilitate their understanding. We define protection-impacting changes (PICs), changed code during evolution that impact privilege protection. We do so using graph reachability and differencing of two versions’ PTFA models. We also identify a superset of source code changes that contain root causes of DPDs by reverting these changes. We survey the distribution of DPDs and their classification over 147 release pairs of Word-Press, spanning from 2.0 to 4.5.1. We found that code changes caused no DPDs in 82 (56%) release pairs. The remaining 65 (44%) release pairs are security-affected. For these release pairs, only 0.30% of code is affected by DPDs on average. We also found that the most common change categories are complete gains (� 41%), complete losses (� 18%) and substitution (� 20%)

    Change-Based Approaches for Static Taint Analyses

    Get PDF
    Les logiciels développés dans les dernières années ont souvent été aux prises avec des vulnérabilités qui ont été exploitées par des personnes mal intentionnées. Certaines de ces attaques ont coûté cher à plusieurs entreprises et particuliers dus aux données volées. Ainsi, il y a un besoin réel de déceler ces failles dans le code avant leur utilisation. Une technique pour tenter de détecter de possibles vulnérabilités dans le code avant même que le logiciel soit public est d’utiliser un outil d’analyse statique. En utilisant plus particulièrement l’analyse de teinte qui a pour but de simuler la propagation de données critiques dans le programme, il est possible de trouver des points d’accès dans le code qui ne sont pas protégés. Cependant, ces analyses peuvent prendre des heures selon l’outil utilisé et le volume de code à analyser. En plus d’être de longues analyses, elles sont souvent effectuées à répétition sur le même code au fur et à mesure qu’il est développé. Chaque fois, c’est un calcul exhaustif à partir de zéro alors que les changements dans le code sont généralement minimes en comparaison au volume total du logiciel. Instinctivement, on pourrait supposer que si les changements sont mineurs dans le code, les changements dans les résultats devraient aussi l’être. Dans ce mémoire, nous nous penchons sur des techniques tirant avantage de la nature incrémentale du développement logiciel afin d’accélérer le calcul de la teinte. Notre travail propose une technique novatrice qui met à jour la teinte en fonction des changements dans le code entre deux versions. Nous utilisons une technique qui est granulaire à la ligne de code. Avec nos améliorations, nous réussissons à largement réduire le temps de calcul nécessaire sur les projets que nous avons analysés.----------ABSTRACT: In the past few years, many security problems have been discovered in all kinds of software. For some of these vulnerabilities, ill-intentioned people exploited them and successfully stole information about people and companies. The monetary cost of these vulnerabilities is real, and for this reason, many developers are trying to find these vulnerabilities before hackers do. One method to find vulnerabilities in an application before it is published is to use a static analysis tool. Taint analysis is one static analysis technique that is closely related to security. This approach simulates how data is propagated inside the application with the goal of finding locations that could either leak sensitive information or damage the integrity of the system. These analyses can take hours, depending on the tool used and the size of the codebase being analyzed. On top of taking considerable time, these analyses tend to be repeated over and over during development. The computation of taint is exhaustive and usually redone from scratch on every execution, even if the codebase has stayed nearly the same since the last analysis. This is why our work will mainly focus on how we can take advantage of the incremental nature of software development to accelerate the computation of taint. We propose new techniques to update the taint information from the changes in the source code between two versions of a given software. Our approaches are granular to the lines of code and succeed at greatly reducing the time required to find potential vulnerabilities in the projects that we analyzed

    PASDA: A Partition-based Semantic Differencing Approach with Best Effort Classification of Undecided Cases

    Full text link
    Equivalence checking is used to verify whether two programs produce equivalent outputs when given equivalent inputs. Research in this field mainly focused on improving equivalence checking accuracy and runtime performance. However, for program pairs that cannot be proven to be either equivalent or non-equivalent, existing approaches only report a classification result of "unknown", which provides no information regarding the programs' non-/equivalence. In this paper, we introduce PASDA, our partition-based semantic differencing approach with best effort classification of undecided cases. While PASDA aims to formally prove non-/equivalence of analyzed program pairs using a variant of differential symbolic execution, its main novelty lies in its handling of cases for which no formal non-/equivalence proof can be found. For such cases, PASDA provides a best effort equivalence classification based on a set of classification heuristics. We evaluated PASDA with an existing benchmark consisting of 141 non-/equivalent program pairs. PASDA correctly classified 61-74% of these cases at timeouts from 10 seconds to 3600 seconds. Thus, PASDA achieved equivalence checking accuracies that are 3-7% higher than the best results achieved by three existing tools. Furthermore, PASDA's best effort classifications were correct for 70-75% of equivalent and 55-85% of non-equivalent cases across the different timeouts

    Shadow symbolic execution for testing software patches

    Get PDF
    While developers are aware of the importance of comprehensively testing patches, the large effort involved in coming up with relevant test cases means that such testing rarely happens in practice. Furthermore, even when test cases are written to cover the patch, they often exercise the same behaviour in the old and the new version of the code. In this article, we present a symbolic execution-based technique that is designed to generate test inputs that cover the new program behaviours introduced by a patch. The technique works by executing both the old and the new version in the same symbolic execution instance, with the old version shadowing the new one. During this combined shadow execution, whenever a branch point is reached where the old and the new version diverge, we generate a test case exercising the divergence and comprehensively test the new behaviours of the new version. We evaluate our technique on the Coreutils patches from the CoREBench suite of regression bugs, and show that it is able to generate test inputs that exercise newly added behaviours and expose some of the regression bugs

    Hybrid Differential Software Testing

    Get PDF
    Differentielles Testen ist ein wichtiger Bestandteil der Qualitätssicherung von Software, mit dem Ziel Testeingaben zu generieren, die Unterschiede im Verhalten der Software deutlich machen. Solche Unterschiede können zwischen zwei Ausführungspfaden (1) in unterschiedlichen Programmversionen, aber auch (2) im selben Programm auftreten. In dem ersten Fall werden unterschiedliche Programmversionen mit der gleichen Eingabe untersucht, während bei dem zweiten Fall das gleiche Programm mit unterschiedlichen Eingaben analysiert wird. Die Regressionsanalyse, die Side-Channel Analyse, das Maximieren der Ausführungskosten eines Programms und die Robustheitsanalyse von Neuralen Netzwerken sind typische Beispiele für differentielle Softwareanalysen. Eine besondere Herausforderung liegt in der effizienten Analyse von mehreren Programmpfaden (auch über mehrere Programmvarianten hinweg). Die existierenden Ansätze sind dabei meist nicht (spezifisch) dafür konstruiert, unterschiedliches Verhalten präzise hervorzurufen oder sind auf einen Teil des Suchraums limitiert. Diese Arbeit führt das Konzept des hybriden differentiellen Software Testens (HyDiff) ein: eine hybride Analysetechnik für die Generierung von Eingaben zur Erkennung von semantischen Unterschieden in Software. HyDiff besteht aus zwei parallel laufenden Komponenten: (1) einem such-basierten Ansatz, der effizient Eingaben generiert und (2) einer systematischen Analyse, die auch komplexes Programmverhalten erreichen kann. Die such-basierte Komponente verwendet Fuzzing geleitet durch differentielle Heuristiken. Die systematische Analyse basiert auf Dynamic Symbolic Execution, das konkrete Eingaben bei der Analyse integrieren kann. HyDiff wird anhand mehrerer Experimente evaluiert, die in spezifischen Anwendungen im Bereich des differentiellen Testens ausgeführt werden. Die Resultate zeigen eine effektive Generierung von Testeingaben durch HyDiff, wobei es sich signifikant besser als die einzelnen Komponenten verhält.Differential software testing is important for software quality assurance as it aims to automatically generate test inputs that reveal behavioral differences in software. The concrete analysis procedure depends on the targeted result: differential testing can reveal divergences between two execution paths (1) of different program versions or (2) within the same program. The first analysis type would execute different program versions with the same input, while the second type would execute the same program with different inputs. Therefore, detecting regression bugs in software evolution, analyzing side-channels in programs, maximizing the execution cost of a program over multiple executions, and evaluating the robustness of neural networks are instances of differential software analysis with the goal to generate diverging executions of program paths. The key challenge of differential software testing is to simultaneously reason about multiple program paths, often across program variants, in an efficient way. Existing work in differential testing is often not (specifically) directed to reveal a different behavior or is limited to a subset of the search space. This PhD thesis proposes the concept of Hybrid Differential Software Testing (HyDiff) as a hybrid analysis technique to generate difference revealing inputs. HyDiff consists of two components that operate in a parallel setup: (1) a search-based technique that inexpensively generates inputs and (2) a systematic exploration technique to also exercise deeper program behaviors. HyDiff’s search-based component uses differential fuzzing directed by differential heuristics. HyDiff’s systematic exploration component is based on differential dynamic symbolic execution that allows to incorporate concrete inputs in its analysis. HyDiff is evaluated experimentally with applications specific for differential testing. The results show that HyDiff is effective in all considered categories and outperforms its components in isolation

    Anales del XIII Congreso Argentino de Ciencias de la ComputaciĂłn (CACIC)

    Get PDF
    Contenido: Arquitecturas de computadoras Sistemas embebidos Arquitecturas orientadas a servicios (SOA) Redes de comunicaciones Redes heterogéneas Redes de Avanzada Redes inalámbricas Redes móviles Redes activas Administración y monitoreo de redes y servicios Calidad de Servicio (QoS, SLAs) Seguridad informática y autenticación, privacidad Infraestructura para firma digital y certificados digitales Análisis y detección de vulnerabilidades Sistemas operativos Sistemas P2P Middleware Infraestructura para grid Servicios de integración (Web Services o .Net)Red de Universidades con Carreras en Informática (RedUNCI
    corecore