Search CORE

4,497 research outputs found

Recovering Grammar Relationships for the Java Language Specification

Author: A. Dubey
C. A. R. Hoare
D. A. Thomas
D. Barnard
E. Bouwers
H. H. Do
M. Di Penta
R. Lämmel
Ralf Lämmel
T. Dean
Vadim Zaytsev
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/08/2010
Field of study

Grammar convergence is a method that helps discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent

arXiv.org e-Print Archive

CiteSeerX

Crossref

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

TRX: A Formally Verified Parser Interpreter

Author: Adam Koprowski
Andrew Gordon
Gerald J. Sussman and Guy L. Steele Jr
Graham Hutton
Henri Binsztok
Luís Cruz-Filipe and Pierre Letouzey
Roman R. Redziejowski
Xavier Leroy
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2011
Field of study

Parsing is an important problem in computer science and yet surprisingly little attention has been devoted to its formal verification. In this paper, we present TRX: a parser interpreter formally developed in the proof assistant Coq, capable of producing formally correct parsers. We are using parsing expression grammars (PEGs), a formalism essentially representing recursive descent parsing, which we consider an attractive alternative to context-free grammars (CFGs). From this formalization we can extract a parser for an arbitrary PEG grammar with the warranty of total correctness, i.e., the resulting parser is terminating and correct with respect to its grammar and the semantics of PEGs; both properties formally proven in Coq.Comment: 26 pages, LMC

arXiv.org e-Print Archive

CiteSeerX

Crossref

Wrapper Maintenance: A Machine Learning Approach

Author: Knoblock C. A.
Lerman K.
Minton S. N.
Publication venue: 'AI Access Foundation'
Publication date: 23/06/2011
Field of study

The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wrappers from extracting data correctly. We present an efficient algorithm that learns structural information about data from positive examples alone. We describe how this information can be used for two wrapper maintenance applications: wrapper verification and reinduction. The wrapper verification system detects when a wrapper is not extracting correct data, usually because the Web source has changed its format. The reinduction algorithm automatically recovers from changes in the Web source by identifying data on Web pages so that a new wrapper may be generated for this source. To validate our approach, we monitored 27 wrappers over a period of a year. The verification algorithm correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes, resulting in precision of 0.73 and recall of 0.95. We validated the reinduction algorithm on ten Web sources. We were able to successfully reinduce the wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data extraction task

arXiv.org e-Print Archive

Crossref

Treebank-based acquisition of wide-coverage, probabilistic LFG resources: project overview, results and evaluation

Author: Burke Michael
Cahill Aoife
O'Donovan Ruth
van Genabith Josef
Way Andy
Publication venue
Publication date: 01/01/2004
Field of study

This paper presents an overview of a project to acquire wide-coverage, probabilistic Lexical-Functional Grammar (LFG) resources from treebanks. Our approach is based on an automatic annotation algorithm that annotates “raw” treebank trees with LFG f-structure information approximating to basic predicate-argument/dependency structure. From the f-structure-annotated treebank we extract probabilistic unification grammar resources. We present the annotation algorithm, the extraction of lexical information and the acquisition of wide-coverage and robust PCFG-based LFG approximations including long-distance dependency resolution. We show how the methodology can be applied to multilingual, treebank-based unification grammar acquisition. Finally we show how simple (quasi-)logical forms can be derived automatically from the f-structures generated for the treebank trees

CiteSeerX

Irish Universities

DCU Online Research Access Service

Automatic acquisition of LFG resources for German - as good as it gets

Author: Rehbein Ines
van Genabith Josef
Publication venue: CSLI Publications
Publication date: 01/01/2009
Field of study

We present data-driven methods for the acquisition of LFG resources from two German treebanks. We discuss problems specific to semi-free word order languages as well as problems arising fromthe data structures determined by the design of the different treebanks. We compare two ways of encoding semi-free word order, as done in the two German treebanks, and argue that the design of the TiGer treebank is more adequate for the acquisition of LFG resources. Furthermore, we describe an architecture for LFG grammar acquisition for German, based on the two German treebanks, and compare our results with a hand-crafted German LFG grammar

CiteSeerX

Irish Universities

DCU Online Research Access Service

Handling non-compositionality in multilingual CNLs

Author: Enache Ramona
Kolachina Prasanth
Listenmaa Inari
Publication venue
Publication date: 01/01/2014
Field of study

In this paper, we describe methods for handling multilingual non-compositional constructions in the framework of GF. We specifically look at methods to detect and extract non-compositional phrases from parallel texts and propose methods to handle such constructions in GF grammars. We expect that the methods to handle non-compositional constructions will enrich CNLs by providing more flexibility in the design of controlled languages. We look at two specific use cases of non-compositional constructions: a general-purpose method to detect and extract multilingual multiword expressions and a procedure to identify nominal compounds in German. We evaluate our procedure for multiword expressions by performing a qualitative analysis of the results. For the experiments on nominal compounds, we incorporate the detected compounds in a full SMT pipeline and evaluate the impact of our method in machine translation process.Comment: CNL workshop in COLING 201

arXiv.org e-Print Archive

Crossref

Ontology Driven Web Extraction from Semi-structured and Unstructured Data for B2B Market Analysis

Author: Darlington John
Imtiaz Hazzaz
Zuo Landong
Publication venue
Publication date: 01/09/2009
Field of study

The Market Blended Insight project1 has the objective of improving the UK business to business marketing performance using the semantic web technologies. In this project, we are implementing an ontology driven web extraction and translation framework to supplement our backend triple store of UK companies, people and geographical information. It deals with both the semi-structured data and the unstructured text on the web, to annotate and then translate the extracted data according to the backend schema

Southampton (e-Prints Soton)

Connectionist natural language parsing

Author: Berg
Callan
Christiansen
Christiansen
Cleeremans
Cottrell
Cottrell
Dominic Palmer-Brown
Elman
Fanty
Fodor
Frazier
Friederici
Giles
Greene
Hadley
Heather M. Powell
Ho
Howells
Jonathan A. Tepper
Kwansy
Lane
Lawrence
MacDonald
Marcus
Martelli
Mayberry
McDonald
Miikkulainen
Miikkulainen
Moisl
Pearlmutter
Pollack
Rayner
Reilly
Rodriguez
Santos
Sells
Selman
Servan-Schreiber
Sharkey
Sharkey
St. John
Stevenson
Stowe
Tanenhaus
Taraban
Tepper
Tepper
Waltz
Wermter
Wiles
Zeng
Publication venue: 'Elsevier BV'
Publication date: 01/01/2002
Field of study

The key developments of two decades of connectionist parsing are reviewed. Connectionist parsers are assessed according to their ability to learn to represent syntactic structures from examples automatically, without being presented with symbolic grammar rules. This review also considers the extent to which connectionist parsers offer computational models of human sentence processing and provide plausible accounts of psycholinguistic data. In considering these issues, special attention is paid to the level of realism, the nature of the modularity, and the type of processing that is to be found in a wide range of parsers

Crossref

Nottingham Trent Institutional Repository (IRep)

Recommended from our members

Proceedings of QG2010: The Third Workshop on Question Generation

Author: Boyer Kristy Elizabeth
Piwek Paul
Publication venue: questiongeneration.org
Publication date: 18/06/2010
Field of study

These are the peer-reviewed proceedings of "QG2010, The Third Workshop on Question Generation". The workshop included a special track for "QGSTEC2010: The First Question Generation Shared Task and Evaluation Challenge". QG2010 was held as part of The Tenth International Conference on Intelligent Tutoring Systems (ITS2010)

Open Research Online (The Open University)