10 research outputs found
Security Applications of Formal Language Theory
We present an approach to improving the security of complex, composed systems based on formal language theory, and show how this approach leads to advances in input validation, security modeling, attack surface reduction, and ultimately, software design and programming methodology. We cite examples based on real-world security flaws in common protocols representing different classes of protocol complexity. We also introduce a formalization of an exploit development technique, the parse tree differential attack, made possible by our conception of the role of formal grammars in security. These insights make possible future advances in software auditing techniques applicable to static and dynamic binary analysis, fuzzing, and general reverse-engineering and exploit development.
Our work provides a foundation for verifying critical implementation components with considerably less burden to developers than is offered by the current state of the art. It additionally offers a rich basis for further exploration in the areas of offensive analysis and, conversely, automated defense tools and techniques.
This report is divided into two parts. In Part I we address the formalisms and their applications; in Part II we discuss the general implications and recommendations for protocol and software design that follow from our formal analysis
A Grammatical Inference Approach to Language-Based Anomaly Detection in XML
False-positives are a problem in anomaly-based intrusion detection systems.
To counter this issue, we discuss anomaly detection for the eXtensible Markup
Language (XML) in a language-theoretic view. We argue that many XML-based
attacks target the syntactic level, i.e. the tree structure or element content,
and syntax validation of XML documents reduces the attack surface. XML offers
so-called schemas for validation, but in real world, schemas are often
unavailable, ignored or too general. In this work-in-progress paper we describe
a grammatical inference approach to learn an automaton from example XML
documents for detecting documents with anomalous syntax.
We discuss properties and expressiveness of XML to understand limits of
learnability. Our contributions are an XML Schema compatible lexical datatype
system to abstract content in XML and an algorithm to learn visibly pushdown
automata (VPA) directly from a set of examples. The proposed algorithm does not
require the tree representation of XML, so it can process large documents or
streams. The resulting deterministic VPA then allows stream validation of
documents to recognize deviations in the underlying tree structure or
datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and
Countermeasures ECTCM 201
Speaking the Local Dialect: Exploiting differences between IEEE 802.15.4 Receivers with Commodity Radios for fingerprinting, targeted attacks, and WIDS evasion
Producing IEEE 802.15.4 PHY-frames reliably accepted by some digital radio receivers, but rejected by others---depending on the receiver chip\u27s make and model---has strong implications for wireless security. Attackers could target specific receivers by crafting shaped charges, attack frames that appear valid to the intended target and are ignored by all other recipients. By transmitting in the unique, slightly non-compliant dialect of the intended receivers, attackers would be able to create entire communication streams invisible to others, including wireless intrusion detection and prevention systems (WIDS/WIPS).
These scenarios are no longer theoretic. We present methods of producing such IEEE 802.15.4 frames with commodity digital radio chips widely used in building inexpensive 802.15.4-conformant devices. Typically, PHY-layer fingerprinting requires software-defined radios that cost orders of magnitude more than the chips they fingerprint; however, our methods do not require a software-defined radio and use the same inexpensive chips.
Knowledge of such differences, and the ability to fingerprint them is crucial for defenders. We investigate new methods of fingerprinting IEEE 802.15.4 devices by exploring techniques to differentiate between multiple 802.15.4-conformant radio-hardware manufacturers and firmware distributions. Further, we point out the implications of these results for WIDS, both with respect to WIDS evasion techniques and countering such evasion
Automatic Generation of Input Grammars Using Symbolic Execution
Invalid input often leads to unexpected behavior in a program and is behind a plethora of known and unknown vulnerabilities. To prevent improper input from being processed, the input needs to be validated before the rest of the program executes. Formal language theory facilitates the definition and recognition of proper inputs. We focus on the problem of defining valid input after the program has already been written. We construct a parser that infers the structure of inputs which avoid vulnerabilities while existing work focuses on inferring the structure of input the program anticipates. We present a tool that constructs an input language, given the program as input, using symbolic execution on symbolic arguments. This differs from existing work which tracks the execution of concrete inputs to infer a grammar. We test our tool on programs with known vulnerabilities, including programs in the GNU Coreutils library, and we demonstrate how the parser catches known invalid inputs. We conclude that the synthesis of the complete parser cannot be entirely automated due to limitations of symbolic execution tools and issues of computability. A more comprehensive parser must additionally be informed by examples and counterexamples of the input language
Parsing Protocol Standards to Parse Standard Protocols
Internet protocol standards have been slow to adopt formal protocol description languages and methodologies, and are still largely written as English prose. This makes it hard to check them for correctness, or to automatically derive implementations from standards. Reasons for this are both technical and social. Some methodologies effectively describe complex communication patterns, but cannot model protocol data. Others are unnecessarily tied to particular description formats, or use unfamiliar concepts and terminology, and don't address usability by standards developers.
We assess the viability of existing approaches to modelling and parsing protocol data, and identify missing features needed to represent emerging protocols. We present a typed protocol representation that can describe: (i) the format of protocol data, including data-dependent formats; (ii) contextual information needed to maintain parser state, where correct parsing may depend on out-of-band information or prior packets; and (iii) transformations and helper functions needed for multi-stage parsing. We discuss social barriers to adoption, and describe a set of principles to encourage use of formal languages within the Internet standards process. We show how to integrate our approach with the existing standards process, using QUIC as an example
Bridging the Gap Between Intent and Outcome: Knowledge, Tools & Principles for Security-Minded Decision-Making
Well-intentioned decisions---even ones intended to improve aggregate security--- may inadvertently jeopardize security objectives. Adopting a stringent password composition policy ostensibly yields high-entropy passwords; however, such policies often drive users to reuse or write down passwords. Replacing URLs in emails with safe URLs that navigate through a gatekeeper service that vets them before granting user access may reduce user exposure to malware; however, it may backfire by reducing the user\u27s ability to parse the URL or by giving the user a false sense of security if user expectations misalign with the security checks delivered by the vetting process. A short timeout threshold may ensure the user is promptly logged out when the system detects they are away; however, if an infuriated user copes by inserting a USB stick in their computer to emulate mouse movements, then not only will the detection mechanism fail but the insertion of the USB stick may present a new attack surface. These examples highlight the disconnect between decision-maker intentions and decision outcomes. Our focus is on bridging this gap. This thesis explores six projects bound together by the core objective of empowering people to make decisions that achieve their security and privacy objectives. First, we use grounded theory to examine Amazon reviews of password logbooks and to obtain valuable insights into users\u27 password management beliefs, motivations, and behaviors. Second, we present a discrete-event simulation we built to assess the efficacy of password policies. Third, we explore the idea of supplementing language-theoretic security with human-computability boundaries. Fourth, we conduct an eye-tracking study to understand users\u27 visual processes while parsing and classifying URLs. Fifth, we discuss preliminary findings from a study conducted on Amazon Mechanical Turk to examine why users fall for unsafe URLs. And sixth, we develop a logic-based representation of mismorphisms, which allows us to express the root causes of security problems. Each project demonstrates a key technique that can help in bridging the gap between intent and outcome
Recommended from our members
Enabling Graceful Software Upgrades in Distributed Systems
In today’s world, we are highly dependent on software systems together with devices for almost every task in our day to day life. Software system upgrades are released whenever it is necessary to accommodate the ever-changing user’s needs. The devices we use to run the software systems might be of different configurations, and we might be running different versions of these software systems on the devices. In such events, it is not possible for everybody to continue to run the same version of the software, nor can these billions of people using a software system update their software at the same time or even on the same day. Therefore, it has become necessary to design systems to maintain uninterrupted communication among the devices irrespective of software versions and device configurations. One of the key characteristics to achieve uninterrupted communication among the software systems is interoperability. In this thesis, we study techniques to achieve graceful software upgrades in distributed systems by implementing interoperability techniques. It is also observed that multiple software systems, like Bitcoin, failed to facilitate communication between versions.
This thesis identifies the necessary elements to support graceful upgrade, i.e., the principles that are recommended for an upgrade to avoid interruption of user’s work or data. These elements were discovered by closely studying multiple widely popular distributed technologies that successfully achieved graceful upgrades. This thesis also analyzes the Bitcoin protocol and identifies the factors in its design that undermine the possibilities to have graceful upgrades in this cryptocurrency. A coding experiment was conducted to illustrate the efficacy of these principles and demonstrate graceful communications across three different versions of the same software