12 research outputs found

    Grammar Generation and Optimization from Multiple Inputs

    Get PDF
    Human being uses multiple modes like speech, text, facial expression, hand gesture, showing picture etc. for communication in between them. The use of this ways for communication makes human communication more simple and fast. In previous years several techniques are used to bring the human computer interaction more closely. It costs more for development and maintenance of Multimodal grammar in integrating and understanding input in multimodal interfaces i.e. using multiple input ways. This leads to improve and investigate more robust algorithm. The proposed system generates the grammar from multiple inputs called as multimodal grammar and evaluates grammar description length. Furthermore, to optimize the multimodal grammar proposed system uses learning operators which improves grammar description DOI: 10.17762/ijritcc2321-8169.15016

    Search-Based Evolution of XML Schemas

    Get PDF
    The use of schemas makes an XML-based application more reliable, since they contribute to avoid failures by defining the specific format for the data that the application manipulates. In practice, when an application evolves, new requirements for the data may be established, raising the need of schema evolution. In some cases the generation of a schema is necessary, if such schema does not exist. To reduce maintenance and reengineering costs, automatic evolution of schemas is very desirable. However, there are no algorithms to satisfactorily solve the problem. To help in this task, this paper introduces a search-based approach that explores the correspondence between schemas and context-free grammars. The approach is supported by a tool, named EXS. Our tool implements algorithms of grammatical inference based on LL(1) Parsing. If a grammar (that corresponds to a schema) is given and a new word (XML document) is provided, the EXS system infers the new grammar that: i) continues to generate the same words as before and ii) generates the new word, by modifying the original grammar. If no initial grammar is available, EXS is also capable of generating a grammar from scratch from a set of samples

    Inductive Synthesis of Cover-Grammars with the Help of Ant Colony Optimization

    Get PDF
    A cover-grammar of a finite language is a context-free grammar that accepts all words in the language and possibly other words that are longer than any word in the language. In this paper, we describe an efficient algorithm aided by Ant Colony System that, for a given finite language, synthesizes (constructs) a small cover-grammar of the language. We also check its ability to solve a grammatical inference task through the series of experiments

    SEQUIN: a grammar inference framework for analyzing malicious system behavior

    Get PDF
    Open access articleTargeted attacks on IT systems are a rising threat to the confidentiality of sensitive data and the availability of critical systems. The emergence of Advanced Persistent Threats (APTs) made it paramount to fully understand the particulars of such attacks in order to improve or devise effective defense mechanisms. Grammar inference paired with visual analytics (VA) techniques offers a powerful foundation for the automated extraction of behavioral patterns from sequential event traces. To facilitate the interpretation and analysis of APTs, we present SEQUIN, a grammar inference system based on the Sequitur compression algorithm that constructs a context-free grammar (CFG) from string-based input data. In addition to recursive rule extraction, we expanded the procedure through automated assessment routines capable of dealing with multiple input sources and types. This automated assessment enables the accurate identification of interesting frequent or anomalous patterns in sequential corpora of arbitrary quantity and origin. On the formal side, we extended the CFG with attributes that help describe the extracted (malicious) actions. Discovery-focused pattern visualization of the output is provided by our dedicated KAMAS VA prototype

    Protecting Systems From Exploits Using Language-Theoretic Security

    Get PDF
    Any computer program processing input from the user or network must validate the input. Input-handling vulnerabilities occur in programs when the software component responsible for filtering malicious input---the parser---does not perform validation adequately. Consequently, parsers are among the most targeted components since they defend the rest of the program from malicious input. This thesis adopts the Language-Theoretic Security (LangSec) principle to understand what tools and research are needed to prevent exploits that target parsers. LangSec proposes specifying the syntactic structure of the input format as a formal grammar. We then build a recognizer for this formal grammar to validate any input before the rest of the program acts on it. To ensure that these recognizers represent the data format, programmers often rely on parser generators or parser combinators tools to build the parsers. This thesis propels several sub-fields in LangSec by proposing new techniques to find bugs in implementations, novel categorizations of vulnerabilities, and new parsing algorithms and tools to handle practical data formats. To this end, this thesis comprises five parts that tackle various tenets of LangSec. First, I categorize various input-handling vulnerabilities and exploits using two frameworks. First, I use the mismorphisms framework to reason about vulnerabilities. This framework helps us reason about the root causes leading to various vulnerabilities. Next, we built a categorization framework using various LangSec anti-patterns, such as parser differentials and insufficient input validation. Finally, we built a catalog of more than 30 popular vulnerabilities to demonstrate the categorization frameworks. Second, I built parsers for various Internet of Things and power grid network protocols and the iccMAX file format using parser combinator libraries. The parsers I built for power grid protocols were deployed and tested on power grid substation networks as an intrusion detection tool. The parser I built for the iccMAX file format led to several corrections and modifications to the iccMAX specifications and reference implementations. Third, I present SPARTA, a novel tool I built that generates Rust code that type checks Portable Data Format (PDF) files. The type checker I helped build strictly enforces the constraints in the PDF specification to find deviations. Our checker has contributed to at least four significant clarifications and corrections to the PDF 2.0 specification and various open-source PDF tools. In addition to our checker, we also built a practical tool, PDFFixer, to dynamically patch type errors in PDF files. Fourth, I present ParseSmith, a tool to build verified parsers for real-world data formats. Most parsing tools available for data formats are insufficient to handle practical formats or have not been verified for their correctness. I built a verified parsing tool in Dafny that builds on ideas from attribute grammars, data-dependent grammars, and parsing expression grammars to tackle various constructs commonly seen in network formats. I prove that our parsers run in linear time and always terminate for well-formed grammars. Finally, I provide the earliest systematic comparison of various data description languages (DDLs) and their parser generation tools. DDLs are used to describe and parse commonly used data formats, such as image formats. Next, I conducted an expert elicitation qualitative study to derive various metrics that I use to compare the DDLs. I also systematically compare these DDLs based on sample data descriptions available with the DDLs---checking for correctness and resilience

    Deep Understanding of Technical Documents : Automated Generation of Pseudocode from Digital Diagrams & Analysis/Synthesis of Mathematical Formulas

    Get PDF
    The technical document is an entity that consists of several essential and interconnected parts, often referred to as modalities. Despite the extensive attention that certain parts have already received, per say the textual information, there are several aspects that severely under researched. Two such modalities are the utility of diagram images and the deep automated understanding of mathematical formulas. Inspired by existing holistic approaches to the deep understanding of technical documents, we develop a novel formal scheme for the modelling of digital diagram images. This extends to a generative framework that allows for the creation of artificial images and their annotation. We contribute on the field with the creation of a novel synthetic dataset and its generation mechanism. We propose the conversion of the pseudocode generation problem to an image captioning task and provide a family of techniques based on adaptive image partitioning. We address the mathematical formulas’ semantic understanding by conducting an evaluating survey on the field, published in May 2021. We then propose a formal synthesis framework that utilized formula graphs as metadata, reaching for novel valuable formulas. The synthesis framework is validated by a deep geometric learning mechanism, that outsources formula data to simulate the missing a priori knowledge. We close with the proof of concept, the description of the overall pipeline and our future aims

    Advanced Threat Intelligence: Interpretation of Anomalous Behavior in Ubiquitous Kernel Processes

    Get PDF
    Targeted attacks on digital infrastructures are a rising threat against the confidentiality, integrity, and availability of both IT systems and sensitive data. With the emergence of advanced persistent threats (APTs), identifying and understanding such attacks has become an increasingly difficult task. Current signature-based systems are heavily reliant on fixed patterns that struggle with unknown or evasive applications, while behavior-based solutions usually leave most of the interpretative work to a human analyst. This thesis presents a multi-stage system able to detect and classify anomalous behavior within a user session by observing and analyzing ubiquitous kernel processes. Application candidates suitable for monitoring are initially selected through an adapted sentiment mining process using a score based on the log likelihood ratio (LLR). For transparent anomaly detection within a corpus of associated events, the author utilizes star structures, a bipartite representation designed to approximate the edit distance between graphs. Templates describing nominal behavior are generated automatically and are used for the computation of both an anomaly score and a report containing all deviating events. The extracted anomalies are classified using the Random Forest (RF) and Support Vector Machine (SVM) algorithms. Ultimately, the newly labeled patterns are mapped to a dedicated APT attacker–defender model that considers objectives, actions, actors, as well as assets, thereby bridging the gap between attack indicators and detailed threat semantics. This enables both risk assessment and decision support for mitigating targeted attacks. Results show that the prototype system is capable of identifying 99.8% of all star structure anomalies as benign or malicious. In multi-class scenarios that seek to associate each anomaly with a distinct attack pattern belonging to a particular APT stage we achieve a solid accuracy of 95.7%. Furthermore, we demonstrate that 88.3% of observed attacks could be identified by analyzing and classifying a single ubiquitous Windows process for a mere 10 seconds, thereby eliminating the necessity to monitor each and every (unknown) application running on a system. With its semantic take on threat detection and classification, the proposed system offers a formal as well as technical solution to an information security challenge of great significance.The financial support by the Christian Doppler Research Association, the Austrian Federal Ministry for Digital and Economic Affairs, and the National Foundation for Research, Technology and Development is gratefully acknowledged

    Synthesizing Context Free Grammars from Sample Strings Based on Inductive CYK Algorithm

    No full text
    corecore