2,718 research outputs found

    Recovering Grammar Relationships for the Java Language Specification

    Get PDF
    Grammar convergence is a method that helps discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent

    Towards digital library service integration

    Get PDF
    Digital Library Service Integration (DLSI) aims to provide a systematic approach in integrating the services and collections of National Science and Digital Library. The National Science and Digital Library collections can share the services among themselves in a totally integrated environinent. Collections as such will require no change to plug into the DLSI architecture. Collections will keep using the services of NSDL in the similar manner as before. These services will in turn pass few parameters to the services of DLSI. With the help of these parameters, wrappers will fetch the details and priority of the users. These wrappers will be using the services of Search and Discovery module, Metadata Management services, and Access Management services. Users will see a totally integrated environment. They will see their digital library system just as before. In addition to that, they will find some extra link anchors on the document. These links serve to provide the supplemental information or arrange the information in the user preferred way. For this matter, the DLSI maintains basic user\u27s information and preferences. Other contributions include incorporating collaborative filtering for customizing large sets of links, and advance lexical analysis tool to identify the objects of interest in a document

    Polyglot: Distributed Word Representations for Multilingual NLP

    Full text link
    Distributed word representations (word embeddings) have recently contributed to competitive performance in language modeling and several NLP tasks. In this work, we train word embeddings for more than 100 languages using their corresponding Wikipedias. We quantitatively demonstrate the utility of our word embeddings by using them as the sole features for training a part of speech tagger for a subset of these languages. We find their performance to be competitive with near state-of-art methods in English, Danish and Swedish. Moreover, we investigate the semantic features captured by these embeddings through the proximity of word groupings. We will release these embeddings publicly to help researchers in the development and enhancement of multilingual applications.Comment: 10 pages, 2 figures, Proceedings of Conference on Computational Natural Language Learning CoNLL'201

    GATE -- an Environment to Support Research and Development in Natural Language Engineering

    Get PDF
    We describe a software environment to support research and development in natural language (NL) engineering. This environment -- GATE (General Architecture for Text Engineering) -- aims to advance research in the area of machine processing of natural languages by providing a software infrastructure on top of which heterogeneous NL component modules may be evaluated and refined individually or may be combined into larger application systems. Thus, GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE will promote reuse of component technology, permit specialisation and collaboration in large-scale projects, and allow for the comparison and evaluation of alternative technologies. The first release of GATE is now available

    Constraint Logic Programming for Natural Language Processing

    Full text link
    This paper proposes an evaluation of the adequacy of the constraint logic programming paradigm for natural language processing. Theoretical aspects of this question have been discussed in several works. We adopt here a pragmatic point of view and our argumentation relies on concrete solutions. Using actual contraints (in the CLP sense) is neither easy nor direct. However, CLP can improve parsing techniques in several aspects such as concision, control, efficiency or direct representation of linguistic formalism. This discussion is illustrated by several examples and the presentation of an HPSG parser.Comment: 15 pages, uuencoded and compressed postscript to appear in Proceedings of the 5th Int. Workshop on Natural Language Understanding and Logic Programming. Lisbon, Portugal. 199
    • …
    corecore