67 research outputs found

    Formalizing the SSA-based Compiler for Verified Advanced Program Transformations

    Get PDF
    Compilers are not always correct due to the complexity of language semantics and transformation algorithms, the trade-offs between compilation speed and verifiability,etc.The bugs of compilers can undermine the source-level verification efforts (such as type systems, static analysis, and formal proofs) and produce target programs with different meaning from source programs. Researchers have used mechanized proof tools to implement verified compilers that are guaranteed to preserve program semantics and proved to be more robust than ad-hoc non-verified compilers. The goal of the dissertation is to make a step towards verifying an industrial strength modern compiler--LLVM, which has a typed, SSA-based, and general-purpose intermediate representation, therefore allowing more advanced program transformations than existing approaches. The dissertation formally defines the sequential semantics of the LLVM intermediate representation with its type system, SSA properties, memory model, and operational semantics. To design and reason about program transformations in the LLVM IR, we provide tools for interacting with the LLVM infrastructure and metatheory for SSA properties, memory safety, dynamic semantics, and control-flow-graphs. Based on the tools and metatheory, the dissertation implements verified and extractable applications for LLVM that include an interpreter for the LLVM IR, a transformation for enforcing memory safety, translation validators for local optimizations, and verified SSA construction transformation. This dissertation shows that formal models of SSA-based compiler intermediate representations can be used to verify low-level program transformations, thereby enabling the construction of high-assurance compiler passes

    Automated Analysis of ARM Binaries using the Low-Level Virtual Machine Compiler Framework

    Get PDF
    Binary program analysis is a critical capability for offensive and defensive operations in Cyberspace. However, many current techniques are ineffective or time-consuming and few tools can analyze code compiled for embedded processors such as those used in network interface cards, control systems and mobile phones. This research designs and implements a binary analysis system, called the Architecture-independent Binary Abstracting Code Analysis System (ABACAS), which reverses the normal program compilation process, lifting binary machine code to the Low-Level Virtual Machine (LLVM) compiler\u27s intermediate representation, thereby enabling existing security-related analyses to be applied to binary programs. The prototype targets ARM binaries but can be extended to support other architectures. Several programs are translated from ARM binaries and analyzed with existing analysis tools. Programs lifted from ARM binaries are an average of 3.73 times larger than the same programs compiled from a high-level language (HLL). Analysis results are equivalent regardless of whether the HLL source or ARM binary version of the program is submitted to the system, confirming the hypothesis that LLVM is effective for binary analysis

    Synthesis and Verification of Digital Circuits using Functional Simulation and Boolean Satisfiability.

    Full text link
    The semiconductor industry has long relied on the steady trend of transistor scaling, that is, the shrinking of the dimensions of silicon transistor devices, as a way to improve the cost and performance of electronic devices. However, several design challenges have emerged as transistors have become smaller. For instance, wires are not scaling as fast as transistors, and delay associated with wires is becoming more significant. Moreover, in the design flow for integrated circuits, accurate modeling of wire-related delay is available only toward the end of the design process, when the physical placement of logic units is known. Consequently, one can only know whether timing performance objectives are satisfied, i.e., if timing closure is achieved, after several design optimizations. Unless timing closure is achieved, time-consuming design-flow iterations are required. Given the challenges arising from increasingly complex designs, failing to quickly achieve timing closure threatens the ability of designers to produce high-performance chips that can match continually growing consumer demands. In this dissertation, we introduce powerful constraint-guided synthesis optimizations that take into account upcoming timing closure challenges and eliminate expensive design iterations. In particular, we use logic simulation to approximate the behavior of increasingly complex designs leveraging a recently proposed concept, called bit signatures, which allows us to represent a large fraction of a complex circuit's behavior in a compact data structure. By manipulating these signatures, we can efficiently discover a greater set of valid logic transformations than was previously possible and, as a result, enhance timing optimization. Based on the abstractions enabled through signatures, we propose a comprehensive suite of novel techniques: (1) a fast computation of circuit don't-cares that increases restructuring opportunities, (2) a verification methodology to prove the correctness of speculative optimizations that efficiently utilizes the computational power of modern multi-core systems, and (3) a physical synthesis strategy using signatures that re-implements sections of a critical path while minimizing perturbations to the existing placement. Our results indicate that logic simulation is effective in approximating the behavior of complex designs and enables a broader family of optimizations than previous synthesis approaches.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61793/1/splaza_1.pd

    Waddle - Always-canonical Intermediate Representation

    Get PDF
    Program transformations that are able to rely on the presence of canonical properties of the program undergoing optimization can be written to be more robust and efficient than an equivalent but generalized transformation that also handles non-canonical programs. If a canonical property is required but broken earlier in an earlier transformation, it must be rebuilt (often from scratch). This additional work can be a dominating factor in compilation time when many transformations are applied over large programs. This dissertation introduces a methodology for constructing program transformations so that the program remains in an always-canonical form as the program is mutated, making only local changes to restore broken properties

    Resource-constrained project scheduling.

    Get PDF
    Abstract: Resource-constrained project scheduling involves the scheduling of project activities subject to precedence and resource constraints in order to meet the objective(s) in the best possible way. The area covers a wide variety of problem types. The objective of this paper is to provide a survey of what we believe are important recent in the area . Our main focus will be on the recent progress made in and the encouraging computational experience gained with the use of optimal solution procedures for the basic resource-constrained project scheduling problem (RCPSP) and important extensions. The RCPSP involves the scheduling of a project its duration subject to zero-lag finish-start precedence constraints of the PERT/CPM type and constant availability constraints on the required set of renewable resources. We discuss recent striking advances in dealing with this problem using a new depth-first branch-and-bound procedure, elaborating on the effective and efficient branching scheme, bounding calculations and dominance rules, and discuss the potential of using truncated branch-and-bound. We derive a set of conclusions from the research on optimal solution procedures for the basis RCPSP and subsequently illustrate how effective and efficient branching rules and several of the strong dominance and bounding arguments can be extended to a rich and realistic variety of related problems. The preemptive resource-constrained project scheduling problem (PRCPSP) relaxes the nonpreemption condition of the RCPSP, thus allowing activities to be interrupted at integer points in time and resumed later without additional penalty cost. The generalized resource-constrained project scheduling (GRCPSP) extends the RCPSP to the case of precedence diagramming type of precedence constraints (minimal finish-start, start-start, start-finish, finish-finish precedence relations), activity ready times, deadlines and variable resource availability's. The resource-constrained project scheduling problem with generalized precedence relations (RCPSP-GPR) allows for start-start, finish-start and finish-finish constraints with minimal and maximal time lags. The MAX-NPV problem aims at scheduling project activities in order to maximize the net present value of the project in the absence of resource constraints. The resource-constrained project scheduling problem with discounted cash flows (RCPSP-DC) aims at the same non-regular objective in the presence of resource constraints. The resource availability cost problem (RACP) aims at determining the cheapest resource availability amounts for which a feasible solution exists that does not violate the project deadline. In the discrete time/cost trade-off problem (DTCTP) the duration of an activity is a discrete, non-increasing function of the amount of a single nonrenewable resource committed to it. In the discrete time/resource trade-off problem (DTRTP) the duration of an activity is a discrete, non-increasing function of the amount of a single renewable resource. Each activity must then be scheduled in one of its possible execution modes. In addition to time/resource trade-offs, the multi-mode project scheduling problem (MRCPSP) allows for resource/resource trade-offs and constraints on renewable, nonrenewable and doubly-constrained resources. We report on recent computational results and end with overall conclusions and suggestions for future research.Scheduling; Optimal;

    Automated Security Analysis of Web Application Technologies

    Get PDF
    TheWeb today is a complex universe of pages and applications teeming with interactive content that we use for commercial and social purposes. Accordingly, the security of Web applications has become a concern of utmost importance. Devising automated methods to help developers to spot security flaws and thereby make the Web safer is a challenging but vital area of research. In this thesis, we leverage static analysis methods to automatically discover vulnerabilities in programs written in JavaScript or PHP. While JavaScript is the number one language fueling the client-side logic of virtually every Web application, PHP is the most widespread language on the server side. In the first part, we use a series of program transformations and information flow analysis to examine the JavaScript Helios voting client. Helios is a stateof- the-art voting system that has been exhaustively analyzed by the security community on a conceptual level and whose implementation is claimed to be highly secure. We expose two severe and so far undiscovered vulnerabilities. In the second part, we present a framework allowing developers to analyze PHP code for vulnerabilities that can be freely modeled. To do so, we build socalled code property graphs for PHP and import them into a graph database. Vulnerabilities can then be modeled as appropriate database queries. We show how to model common vulnerabilities and evaluate our framework in a large-scale study, spotting hundreds of vulnerabilities.DasWeb hat sich zu einem komplexen Netz aus hochinteraktiven Seiten und Anwendungen entwickelt, welches wir täglich zu kommerziellen und sozialen Zwecken einsetzen. Dementsprechend ist die Sicherheit von Webanwendungen von höchster Relevanz. Das automatisierte Auffinden von Sicherheitslücken ist ein anspruchsvolles, aber wichtiges Forschungsgebiet mit dem Ziel, Entwickler zu unterstützen und das Web sicherer zu machen. In dieser Arbeit nutzen wir statische Analysemethoden, um automatisiert Lücken in JavaScript- und PHP-Programmen zu entdecken. JavaScript ist clientseitig die wichtigste Sprache des Webs, während PHP auf der Serverseite am weitesten verbreitet ist. Im ersten Teil nutzen wir eine Reihe von Programmtransformationen und Informationsflussanalyse, um den JavaScript HeliosWahl-Client zu untersuchen. Helios ist ein modernesWahlsystem, welches auf konzeptueller Ebene eingehend analysiert wurde und dessen Implementierung als sehr sicher gilt. Wir enthüllen zwei schwere und bis dato unentdeckte Sicherheitslücken. Im zweiten Teil präsentieren wir ein Framework, das es Entwicklern ermöglicht, PHP Code auf frei modellierbare Schwachstellen zu untersuchen. Zu diesem Zweck konstruieren wir sogenannte Code-Property-Graphen und importieren diese anschließend in eine Graphdatenbank. Schwachstellen können nun als geeignete Datenbankanfragen formuliert werden. Wir zeigen, wie wir herkömmliche Schwachstellen modellieren können und evaluieren unser Framework in einer groß angelegten Studie, in der wir hunderte Sicherheitslücken identifizieren.CISP

    Doctor of Philosophy

    Get PDF
    dissertationCompilers are indispensable tools to developers. We expect them to be correct. However, compiler correctness is very hard to be reasoned about. This can be partly explained by the daunting complexity of compilers. In this dissertation, I will explain how we constructed a random program generator, Csmith, and used it to find hundreds of bugs in strong open source compilers such as the GNU Compiler Collection (GCC) and the LLVM Compiler Infrastructure (LLVM). The success of Csmith depends on its ability of being expressive and unambiguous at the same time. Csmith is composed of a code generator and a GTAV (Generation-Time Analysis and Validation) engine. They work interactively to produce expressive yet unambiguous random programs. The expressiveness of Csmith is attributed to the code generator, while the unambiguity is assured by GTAV. GTAV performs program analyses, such as points-to analysis and effect analysis, efficiently to avoid ambiguities caused by undefined behaviors or unspecifed behaviors. During our 4.25 years of testing, Csmith has found over 450 bugs in the GNU Compiler Collection (GCC) and the LLVM Compiler Infrastructure (LLVM). We analyzed the bugs by putting them into different categories, studying the root causes, finding their locations in compilers' source code, and evaluating their importance. We believe analysis results are useful to future random testers, as well as compiler writers/users

    Aplicação de técnicas de Clustering ao contexto da Tomada de Decisão em Grupo

    Get PDF
    Nowadays, decisions made by executives and managers are primarily made in a group. Therefore, group decision-making is a process where a group of people called participants work together to analyze a set of variables, considering and evaluating a set of alternatives to select one or more solutions. There are many problems associated with group decision-making, namely when the participants cannot meet for any reason, ranging from schedule incompatibility to being in different countries with different time zones. To support this process, Group Decision Support Systems (GDSS) evolved to what today we call web-based GDSS. In GDSS, argumentation is ideal since it makes it easier to use justifications and explanations in interactions between decision-makers so they can sustain their opinions. Aspect Based Sentiment Analysis (ABSA) is a subfield of Argument Mining closely related to Natural Language Processing. It intends to classify opinions at the aspect level and identify the elements of an opinion. Applying ABSA techniques to Group Decision Making Context results in the automatic identification of alternatives and criteria, for example. This automatic identification is essential to reduce the time decision-makers take to step themselves up on Group Decision Support Systems and offer them various insights and knowledge on the discussion they are participants. One of these insights can be arguments getting used by the decision-makers about an alternative. Therefore, this dissertation proposes a methodology that uses an unsupervised technique, Clustering, and aims to segment the participants of a discussion based on arguments used so it can produce knowledge from the current information in the GDSS. This methodology can be hosted in a web service that follows a micro-service architecture and utilizes Data Preprocessing and Intra-sentence Segmentation in addition to Clustering to achieve the objectives of the dissertation. Word Embedding is needed when we apply clustering techniques to natural language text to transform the natural language text into vectors usable by the clustering techniques. In addition to Word Embedding, Dimensionality Reduction techniques were tested to improve the results. Maintaining the same Preprocessing steps and varying the chosen Clustering techniques, Word Embedders, and Dimensionality Reduction techniques came up with the best approach. This approach consisted of the KMeans++ clustering technique, using SBERT as the word embedder with UMAP dimensionality reduction, reducing the number of dimensions to 2. This experiment achieved a Silhouette Score of 0.63 with 8 clusters on the baseball dataset, which wielded good cluster results based on their manual review and Wordclouds. The same approach obtained a Silhouette Score of 0.59 with 16 clusters on the car brand dataset, which we used as an approach validation dataset.Atualmente, as decisões tomadas por gestores e executivos são maioritariamente realizadas em grupo. Sendo assim, a tomada de decisão em grupo é um processo no qual um grupo de pessoas denominadas de participantes, atuam em conjunto, analisando um conjunto de variáveis, considerando e avaliando um conjunto de alternativas com o objetivo de selecionar uma ou mais soluções. Existem muitos problemas associados ao processo de tomada de decisão, principalmente quando os participantes não têm possibilidades de se reunirem (Exs.: Os participantes encontramse em diferentes locais, os países onde estão têm fusos horários diferentes, incompatibilidades de agenda, etc.). Para suportar este processo de tomada de decisão, os Sistemas de Apoio à Tomada de Decisão em Grupo (SADG) evoluíram para o que hoje se chamam de Sistemas de Apoio à Tomada de Decisão em Grupo baseados na Web. Num SADG, argumentação é ideal pois facilita a utilização de justificações e explicações nas interações entre decisores para que possam suster as suas opiniões. Aspect Based Sentiment Analysis (ABSA) é uma área de Argument Mining correlacionada com o Processamento de Linguagem Natural. Esta área pretende classificar opiniões ao nível do aspeto da frase e identificar os elementos de uma opinião. Aplicando técnicas de ABSA à Tomada de Decisão em Grupo resulta na identificação automática de alternativas e critérios por exemplo. Esta identificação automática é essencial para reduzir o tempo que os decisores gastam a customizarem-se no SADG e oferece aos mesmos conhecimento e entendimentos sobre a discussão ao qual participam. Um destes entendimentos pode ser os argumentos a serem usados pelos decisores sobre uma alternativa. Assim, esta dissertação propõe uma metodologia que utiliza uma técnica não-supervisionada, Clustering, com o objetivo de segmentar os participantes de uma discussão com base nos argumentos usados pelos mesmos de modo a produzir conhecimento com a informação atual no SADG. Esta metodologia pode ser colocada num serviço web que segue a arquitetura micro serviços e utiliza Preprocessamento de Dados e Segmentação Intra Frase em conjunto com o Clustering para atingir os objetivos desta dissertação. Word Embedding também é necessário para aplicar técnicas de Clustering a texto em linguagem natural para transformar o texto em vetores que possam ser usados pelas técnicas de Clustering. Também Técnicas de Redução de Dimensionalidade também foram testadas de modo a melhorar os resultados. Mantendo os passos de Preprocessamento e variando as técnicas de Clustering, Word Embedder e as técnicas de Redução de Dimensionalidade de modo a encontrar a melhor abordagem. Essa abordagem consiste na utilização da técnica de Clustering KMeans++ com o SBERT como Word Embedder e UMAP como a técnica de redução de dimensionalidade, reduzindo as dimensões iniciais para duas. Esta experiência obteve um Silhouette Score de 0.63 com 8 clusters no dataset de baseball, que resultou em bons resultados de cluster com base na sua revisão manual e visualização dos WordClouds. A mesma abordagem obteve um Silhouette Score de 0.59 com 16 clusters no dataset das marcas de carros, ao qual usamos esse dataset com validação de abordagem
    corecore