226 research outputs found

    Protecting Systems From Exploits Using Language-Theoretic Security

    Get PDF
    Any computer program processing input from the user or network must validate the input. Input-handling vulnerabilities occur in programs when the software component responsible for filtering malicious input---the parser---does not perform validation adequately. Consequently, parsers are among the most targeted components since they defend the rest of the program from malicious input. This thesis adopts the Language-Theoretic Security (LangSec) principle to understand what tools and research are needed to prevent exploits that target parsers. LangSec proposes specifying the syntactic structure of the input format as a formal grammar. We then build a recognizer for this formal grammar to validate any input before the rest of the program acts on it. To ensure that these recognizers represent the data format, programmers often rely on parser generators or parser combinators tools to build the parsers. This thesis propels several sub-fields in LangSec by proposing new techniques to find bugs in implementations, novel categorizations of vulnerabilities, and new parsing algorithms and tools to handle practical data formats. To this end, this thesis comprises five parts that tackle various tenets of LangSec. First, I categorize various input-handling vulnerabilities and exploits using two frameworks. First, I use the mismorphisms framework to reason about vulnerabilities. This framework helps us reason about the root causes leading to various vulnerabilities. Next, we built a categorization framework using various LangSec anti-patterns, such as parser differentials and insufficient input validation. Finally, we built a catalog of more than 30 popular vulnerabilities to demonstrate the categorization frameworks. Second, I built parsers for various Internet of Things and power grid network protocols and the iccMAX file format using parser combinator libraries. The parsers I built for power grid protocols were deployed and tested on power grid substation networks as an intrusion detection tool. The parser I built for the iccMAX file format led to several corrections and modifications to the iccMAX specifications and reference implementations. Third, I present SPARTA, a novel tool I built that generates Rust code that type checks Portable Data Format (PDF) files. The type checker I helped build strictly enforces the constraints in the PDF specification to find deviations. Our checker has contributed to at least four significant clarifications and corrections to the PDF 2.0 specification and various open-source PDF tools. In addition to our checker, we also built a practical tool, PDFFixer, to dynamically patch type errors in PDF files. Fourth, I present ParseSmith, a tool to build verified parsers for real-world data formats. Most parsing tools available for data formats are insufficient to handle practical formats or have not been verified for their correctness. I built a verified parsing tool in Dafny that builds on ideas from attribute grammars, data-dependent grammars, and parsing expression grammars to tackle various constructs commonly seen in network formats. I prove that our parsers run in linear time and always terminate for well-formed grammars. Finally, I provide the earliest systematic comparison of various data description languages (DDLs) and their parser generation tools. DDLs are used to describe and parse commonly used data formats, such as image formats. Next, I conducted an expert elicitation qualitative study to derive various metrics that I use to compare the DDLs. I also systematically compare these DDLs based on sample data descriptions available with the DDLs---checking for correctness and resilience

    Recovering Grammar Relationships for the Java Language Specification

    Get PDF
    Grammar convergence is a method that helps discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent

    SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>SSWAP (<b>S</b>imple <b>S</b>emantic <b>W</b>eb <b>A</b>rchitecture and <b>P</b>rotocol; pronounced "swap") is an architecture, protocol, and platform for using reasoning to semantically integrate heterogeneous disparate data and services on the web. SSWAP was developed as a hybrid semantic web services technology to overcome limitations found in both pure web service technologies and pure semantic web technologies.</p> <p>Results</p> <p>There are currently over 2400 resources published in SSWAP. Approximately two dozen are custom-written services for QTL (Quantitative Trait Loci) and mapping data for legumes and grasses (grains). The remaining are wrappers to Nucleic Acids Research Database and Web Server entries. As an architecture, SSWAP establishes how clients (users of data, services, and ontologies), providers (suppliers of data, services, and ontologies), and discovery servers (semantic search engines) interact to allow for the description, querying, discovery, invocation, and response of semantic web services. As a protocol, SSWAP provides the vocabulary and semantics to allow clients, providers, and discovery servers to engage in semantic web services. The protocol is based on the W3C-sanctioned first-order description logic language OWL DL. As an open source platform, a discovery server running at <url>http://sswap.info</url> (as in to "swap info") uses the description logic reasoner Pellet to integrate semantic resources. The platform hosts an interactive guide to the protocol at <url>http://sswap.info/protocol.jsp</url>, developer tools at <url>http://sswap.info/developer.jsp</url>, and a portal to third-party ontologies at <url>http://sswapmeet.sswap.info</url> (a "swap meet").</p> <p>Conclusion</p> <p>SSWAP addresses the three basic requirements of a semantic web services architecture (<it>i.e</it>., a common syntax, shared semantic, and semantic discovery) while addressing three technology limitations common in distributed service systems: <it>i.e</it>., <it>i</it>) the fatal mutability of traditional interfaces, <it>ii</it>) the rigidity and fragility of static subsumption hierarchies, and <it>iii</it>) the confounding of content, structure, and presentation. SSWAP is novel by establishing the concept of a canonical yet mutable OWL DL graph that allows data and service providers to describe their resources, to allow discovery servers to offer semantically rich search engines, to allow clients to discover and invoke those resources, and to allow providers to respond with semantically tagged data. SSWAP allows for a mix-and-match of terms from both new and legacy third-party ontologies in these graphs.</p

    Proceedings of Monterey Workshop 2001 Engineering Automation for Sofware Intensive System Integration

    Get PDF
    The 2001 Monterey Workshop on Engineering Automation for Software Intensive System Integration was sponsored by the Office of Naval Research, Air Force Office of Scientific Research, Army Research Office and the Defense Advance Research Projects Agency. It is our pleasure to thank the workshop advisory and sponsors for their vision of a principled engineering solution for software and for their many-year tireless effort in supporting a series of workshops to bring everyone together.This workshop is the 8 in a series of International workshops. The workshop was held in Monterey Beach Hotel, Monterey, California during June 18-22, 2001. The general theme of the workshop has been to present and discuss research works that aims at increasing the practical impact of formal methods for software and systems engineering. The particular focus of this workshop was "Engineering Automation for Software Intensive System Integration". Previous workshops have been focused on issues including, "Real-time & Concurrent Systems", "Software Merging and Slicing", "Software Evolution", "Software Architecture", "Requirements Targeting Software" and "Modeling Software System Structures in a fastly moving scenario".Office of Naval ResearchAir Force Office of Scientific Research Army Research OfficeDefense Advanced Research Projects AgencyApproved for public release, distribution unlimite

    An interoperability framework for security policy languages

    Get PDF
    A thesis submitted to the University of Bedfordshire in partial fulfilment of the requirements for the degree of Doctor of PhilosophySecurity policies are widely used across the IT industry in order to secure environments. Firewalls, routers, enterprise application or even operating systems like Windows and Unix are all using security policies to some extent in order to secure certain components. In order to automate enforcement of security policies, security policy languages have been introduced. Security policy languages that are classified as computer software, like many other programming languages have been revolutionised during the last decade. A number of security policy languages have been introduced in the industry in order to tackle a specific business requirements. Not to mention each of these security policy languages themselves evolved and enhanced during the last few years. Having said that, a quick research on security policy languages shows that the industry suffers from the lack of a framework for security policy languages. Such a framework would facilitate the management of security policies from an abstract point. In order to achieve that specific goal, the framework utilises an abstract security policy language that is independent of existing security policy languages yet capable of expressing policies written in those languages. Usage of interoperability framework for security policy languages as described above comes with major benefits that are categorised into two levels: short and long-term benefits. In short-term, industry and in particular multi-dimensional organisations that make use of multiple domains for different purposes would lower their security related costs by managing their security policies that are stretched across their environment and often managed locally. In the long term, usage of abstract security policy language that is independent of any existing security policy languages, gradually paves the way for standardising security policy languages. A goal that seems unreachable at this moment of time. Taking the above facts into account, the aim of this research is to introduce and develop a novel framework for security policy languages. Using such a framework would allow multi-dimensional organisations to use an abstract policy language to orchestrate all security policies from a single point, which could then be propagated across their environment. In addition, using such a framework would help security administrators to learn and use only one single, common abstract language to describe and model their environment(s)

    Proceedings

    Get PDF
    Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 268 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

    Reverse Engineering Heterogeneous Applications

    Get PDF
    Nowadays a large majority of software systems are built using various technologies that in turn rely on different languages (e.g. Java, XML, SQL etc.). We call such systems heterogeneous applications (HAs). By contrast, we call software systems that are written in one language homogeneous applications. In HAs the information regarding the structure and the behaviour of the system is spread across various components and languages and the interactions between different application elements could be hidden. In this context applying existing reverse engineering and quality assurance techniques developed for homogeneous applications is not enough. These techniques have been created to measure quality or provide information about one aspect of the system and they cannot grasp the complexity of HAs. In this dissertation we present our approach to support the analysis and evolution of HAs based on: (1) a unified first-class description of HAs and, (2) a meta-model that reifies the concept of horizontal and vertical dependencies between application elements at different levels of abstraction. We implemented our approach in two tools, MooseEE and Carrack. The first is an extension of the Moose platform for software and data analysis and contains our unified meta-model for HAs. The latter is an engine to infer derived dependencies that can support the analysis of associations among the heterogeneous elements composing HA. We validate our approach and tools by case studies on industrial and open-source JEAs which demonstrate how we can handle the complexity of such applications and how we can solve problems deriving from their heterogeneous nature

    Recovering grammar relationships for the Java language specification

    Get PDF
    Grammar convergence is a method that helps in discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent

    SEON: a pyramid of ontologies for software evolution and its applications

    Get PDF
    The Semantic Web provides a standardized, well-established framework to define and work with ontologies. It is especially apt for machine processing. However, researchers in the field of software evolution have not really taken advantage of that so far. In this paper, we address the potential of representing software evolution knowledge with ontologies and Semantic Web technology, such as Linked Data and automated reasoning. We present Seon, a pyramid of ontologies for software evolution, which describes stakeholders, their activities, artifacts they create, and the relations among all of them. We show the use of evolution-specific ontologies for establishing a shared taxonomy of software analysis services, for defining extensible meta-models, for explicitly describing relationships among artifacts, and for linking data such as code structures, issues (change requests), bugs, and basically any changes made to a system over time. For validation, we discuss three different approaches, which are backed by Seon and enable semantically enriched software evolution analysis. These techniques have been fully implemented as tools and cover software analysis with web services, a natural language query interface for developers, and large-scale software visualizatio
    corecore