5,060 research outputs found

    Synthesizing Program Input Grammars

    Full text link
    We present an algorithm for synthesizing a context-free grammar encoding the language of valid program inputs from a set of input examples and blackbox access to the program. Our algorithm addresses shortcomings of existing grammar inference algorithms, which both severely overgeneralize and are prohibitively slow. Our implementation, GLADE, leverages the grammar synthesized by our algorithm to fuzz test programs with structured inputs. We show that GLADE substantially increases the incremental coverage on valid inputs compared to two baseline fuzzers

    An integrated architecture for shallow and deep processing

    Get PDF
    We present an architecture for the integration of shallow and deep NLP components which is aimed at flexible combination of different language technologies for a range of practical current and future applications. In particular, we describe the integration of a high-level HPSG parsing system with different high-performance shallow components, ranging from named entity recognition to chunk parsing and shallow clause recognition. The NLP components enrich a representation of natural language text with layers of new XML meta-information using a single shared data structure, called the text chart. We describe details of the integration methods, and show how information extraction and language checking applications for realworld German text benefit from a deep grammatical analysis

    Algorithmic Programming Language Identification

    Full text link
    Motivated by the amount of code that goes unidentified on the web, we introduce a practical method for algorithmically identifying the programming language of source code. Our work is based on supervised learning and intelligent statistical features. We also explored, but abandoned, a grammatical approach. In testing, our implementation greatly outperforms that of an existing tool that relies on a Bayesian classifier. Code is written in Python and available under an MIT license.Comment: 11 pages. Code: https://github.com/simon-weber/Programming-Language-Identificatio

    XLOP (XML Language-Oriented Processing)

    Get PDF
    En este trabajo de Sistemas Informáticos se ha desarrollado un entorno para el procesamiento de documentos XML mediante gramáticas de atributos denominado XLOP(XML Language‐Oriented Processing). XLOP incluye un lenguaje de especificación que permite describir aplicaciones de procesamiento XML como gramáticas de atributos, cuyas funciones semánticas son proporcionadas mediante métodos de clases Java. El entorno incluye un generador que traduce las gramáticas de atributos en implementaciones expresadas en el lenguaje de CUP (una herramienta Java para la construcción de analizadores/traductores ascendentes). XLOP soporta la evaluación on‐line de los atributos (es decir, simultáneamente al procesamiento de los documentos). Así mismo, el entorno permite optimizar las implementaciones CUP mediante el cálculo de marcadores (nuevos no terminales definidos mediante producciones vacías). Dichos marcadores permiten albergar atributos heredados, y sus producciones disparar la evaluación de ecuaciones semánticas. Así mismo, bajo ciertas circunstancias, XLOP optimiza la propagación de atributos hereadados a través de cadenas generadas por recursión a izquierda, permitiendo referir directamente el valor al comienzo de la cadena. En muchos casos, esto permite procesar documentos con una cantidad de memoria que no depende de la anchura de los mismos. A fin de probar la potencialidad de XLOP para el desarrollo de aplicaciones XML, en este trabajo se ha desarrollado mediante XLOP una aplicación no trivial en el dominio de e‐ Learning. La aplicación, que se denomina , permite generar tutoriales interactivos a partir de su descripción como documentos XML. [ABSTRACT] In this work we have developed an environment for processing XML documents with attribute grammars. This environment is called XLOP (XML Language‐Oriented Processing). XLOP provides a specification language that makes it posible to describe XML processing applications with attribute grammars. The semantic functions used in these grammars are supplied as methods in Java classes. The environment provides a generator for translating attribute grammars to CUP‐based implementations (CUP is a Java tool for building bottom‐up parsers/translators). XLOP gives support to an on‐line attribute evaluation model (i.e., attribute evaluation is interleaved with document parsing). Also, the environment allows the optimization of the CUP implementations by computing markers (new non‐terminals that are defined using empty syntax rules). These markers are useful for containing inherited attributes. Also, their syntax rules can be used for firing the evaluation of semantic equations. In addition, under certain reasonable assumptions, XLOP optimizes the propagation of inherited attributes through chains generated by left‐recursive rules, enabling the direct referencing to the value placed at the beginning of the chain. In many cases, it makes it possible to process documents with a space that does not depend on the document width. In order to test the feasibility of XLOP in the development of XML applications, in this work we have developed a non‐trivial application in the e‐Learning domain using XLOP. The application, which is called , supports the generation of interactive tutorials described as XML documents

    Technological Spaces: An Initial Appraisal

    Get PDF
    In this paper, we propose a high level view of technological spaces (TS) and relations among these spaces. A technological space is a working context with a set of associated concepts, body of knowledge, tools, required skills, and possibilities. It is often associated to a given user community with shared know-how, educational support, common literature and even workshop and conference regular meetings. Although it is difficult to give a precise definition, some TSs can be easily identified, e.g. the XML TS, the DBMS TS, the abstract syntax TS, the meta-model (OMG/MDA) TS, etc. The purpose of our work is not to define an abstract theory of technological spaces, but to figure out how to work more efficiently by using the best possibilities of each technology. To do so, we need a basic understanding of the similarities and differences between various TSs, and also of the possible operational bridges that will allow transferring the results obtained in one TS to other TS. We hope that the presented industrial vision may help us putting forward the idea that there could be more cooperation than competition among alternative technologies. Furthermore, as the spectrum of such available technologies is rapidly broadening, the necessity to offer clear guidelines when choosing practical solutions to engineering problems is becoming a must, not only for teachers but for project leaders as well

    Recovering Grammar Relationships for the Java Language Specification

    Get PDF
    Grammar convergence is a method that helps discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent

    Multimodal agent interfaces and system architectures for health and fitness companions

    Get PDF
    Multimodal conversational spoken dialogues using physical and virtual agents provide a potential interface to motivate and support users in the domain of health and fitness. In this paper we present how such multimodal conversational Companions can be implemented to support their owners in various pervasive and mobile settings. In particular, we focus on different forms of multimodality and system architectures for such interfaces

    Improved genetic algorithm for the context-free grammatical inference

    Get PDF
    Inductive learning of formal languages, often called grammatical inference, is an active area inmachine learning and computational learning theory. By learning a language we understandfinding the grammar of the language when some positive (words from language) and negativeexamples (words that are not in language) are given. Learning mechanisms use the naturallanguage learning model: people master a language, used by their environment, by the analysis ofpositive and negative examples. The problem of inferring context-free languages (CFG) has boththeoretical and practical motivations. Practical applications include pattern recognition (forexample finding DTD or XML schemas for XML documents) and speech recognition (the abilityto infer context-free grammars for natural languages would enable speech recognition to modify itsinternal grammar on the fly). There were several attempts to find effective learning methods forcontext-free languages (for example [1,2,3,4,5]). In particular, Y.Sakakibara [3] introduced aninteresting method of finding a context-free grammar in the Chomsky normal form with a minimalset of nonterminals. He used the tabular representation similar to the parse table used in the CYKalgorithm, simultaneously with genetic algorithms. In this paper we present several adjustments tothe algorithm suggested by Sakakibara. The adjustments are concerned mainly with the geneticalgorithms used and are as follows:– we introduce a method of creating the initial population which makes use of characteristicfeatures of context-free grammars,– new genetic operations are used (mutation with a path added, ‘die process’, ‘war/diseaseprocess’),– different definition of the fitness function,– an effective compression of the structure of an individual in the population is suggested.These changes allow to speed up the process of grammar generation and, what is more, theyallow to infer richer grammars than considered in [3]

    A Grammatical Inference Approach to Language-Based Anomaly Detection in XML

    Full text link
    False-positives are a problem in anomaly-based intrusion detection systems. To counter this issue, we discuss anomaly detection for the eXtensible Markup Language (XML) in a language-theoretic view. We argue that many XML-based attacks target the syntactic level, i.e. the tree structure or element content, and syntax validation of XML documents reduces the attack surface. XML offers so-called schemas for validation, but in real world, schemas are often unavailable, ignored or too general. In this work-in-progress paper we describe a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax. We discuss properties and expressiveness of XML to understand limits of learnability. Our contributions are an XML Schema compatible lexical datatype system to abstract content in XML and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples. The proposed algorithm does not require the tree representation of XML, so it can process large documents or streams. The resulting deterministic VPA then allows stream validation of documents to recognize deviations in the underlying tree structure or datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and Countermeasures ECTCM 201
    corecore