24,815 research outputs found

    Separable graphs, planar graphs and web grammars

    Get PDF
    This paper is concerned with the class of “web grammars,≓ introduced by Pfaltz and Rosenfeld, whose languages are sets of labelled graphs. A slightly modified definition of web grammar is given, in which the rewriting rules can have an applicability condition, and it is proved that, in general, this extension does not increase the generative power of the grammar. This extension is useful, however, for otherwise it is not possible to incorporate negative contextual conditions into the rules, since the context of a given vertex can be unbounded. A number of web grammars are presented which define interesting classes of graphs, including unseparable graphs, unseparable planar graphs and planar graphs. All the grammars in this paper use “normal embeddings≓ in which the connections between the web that is written and the host web are conserved, so that any rewriting rule affects the web only locally

    DCU-Paris13 systems for the SANCL 2012 shared task

    Get PDF
    The DCU-Paris13 team submitted three systems to the SANCL 2012 shared task on parsing English web text. The first submission, the highest ranked constituency parsing system, uses a combination of PCFG-LA product grammar parsing and self-training. In the second submission, also a constituency parsing system, the n-best lists of various parsing models are combined using an approximate sentence-level product model. The third system, the highest ranked system in the dependency parsing track, uses voting over dependency arcs to combine the output of three constituency parsing systems which have been converted to dependency trees. All systems make use of a data-normalisation component, a parser accuracy predictor and a genre classifier

    Graph Transformations and Game Theory: A Generative Mechanism for Network Formation

    Get PDF
    Many systems can be described in terms of networks with characteristic structural properties. To better understand the formation and the dynamics of complex networks one can develop generative models. We propose here a generative model (named dynamic spatial game) that combines graph transformations and game theory. The idea is that a complex network is obtained by a sequence of node-based transformations determined by the interactions of nodes present in the network. We model the node-based transformations by using graph grammars and the interactions between the nodes by using game theory. We illustrate dynamic spatial games on a couple of examples: the role of cooperation in tissue formation and tumor development and the emergence of patterns during the formation of ecological networks

    Comparing the use of edited and unedited text in parser self-training

    Get PDF
    We compare the use of edited text in the form of newswire and unedited text in the form of discussion forum posts as sources for training material in a self-training experiment involving the Brown reranking parser and a test set of sentences from an online sports discussion forum. We find that grammars induced from the two automatically parsed corpora achieve similar Parseval f-scores, with the grammars induced from the discussion forum material being slightly superior. An error analysis reveals that the two types of grammars do behave differently

    Algorithmic Programming Language Identification

    Full text link
    Motivated by the amount of code that goes unidentified on the web, we introduce a practical method for algorithmically identifying the programming language of source code. Our work is based on supervised learning and intelligent statistical features. We also explored, but abandoned, a grammatical approach. In testing, our implementation greatly outperforms that of an existing tool that relies on a Bayesian classifier. Code is written in Python and available under an MIT license.Comment: 11 pages. Code: https://github.com/simon-weber/Programming-Language-Identificatio

    Unsupervised Extraction of Representative Concepts from Scientific Literature

    Full text link
    This paper studies the automated categorization and extraction of scientific concepts from titles of scientific articles, in order to gain a deeper understanding of their key contributions and facilitate the construction of a generic academic knowledgebase. Towards this goal, we propose an unsupervised, domain-independent, and scalable two-phase algorithm to type and extract key concept mentions into aspects of interest (e.g., Techniques, Applications, etc.). In the first phase of our algorithm we propose PhraseType, a probabilistic generative model which exploits textual features and limited POS tags to broadly segment text snippets into aspect-typed phrases. We extend this model to simultaneously learn aspect-specific features and identify academic domains in multi-domain corpora, since the two tasks mutually enhance each other. In the second phase, we propose an approach based on adaptor grammars to extract fine grained concept mentions from the aspect-typed phrases without the need for any external resources or human effort, in a purely data-driven manner. We apply our technique to study literature from diverse scientific domains and show significant gains over state-of-the-art concept extraction techniques. We also present a qualitative analysis of the results obtained.Comment: Published as a conference paper at CIKM 201

    Multiple hierarchies : new aspects of an old solution

    Get PDF
    In this paper, we present the Multiple Annotation approach, which solves two problems: the problem of annotating overlapping structures, and the problem that occurs when documents should be annotated according to different, possibly heterogeneous tag sets. This approach has many advantages: it is based on XML, the modeling of alternative annotations is possible, each level can be viewed separately, and new levels can be added at any time. The files can be regarded as an interrelated unit, with the text serving as the implicit link. Two representations of the information contained in the multiple files (one in Prolog and one in XML) are described. These representations serve as a base for several applications
    • 

    corecore