1,212 research outputs found

    GALENA: tabular DCG parsing for natural languages

    Get PDF
    [Abstract] We present a definite clause based parsing environment for natural languages, whose operational model is the dynamic interpretation of logical push-down automata. We attempt to briefly explain our design decisions in terms of a set of properties that practical natural language processing systems should incorporate. The aim is to show both the advantages and the drawbacks of our approach.España. Gobierno; HF96-36Xunta de Galcia; XUGA10505B96Xunta de Galcia; XUGA20402B9

    Parallel parsing made practical

    Get PDF
    The property of local parsability allows to parse inputs through inspecting only a bounded-length string around the current token. This in turn enables the construction of a scalable, data-parallel parsing algorithm, which is presented in this work. Such an algorithm is easily amenable to be automatically generated via a parser generator tool, which was realized, and is also presented in the following. Furthermore, to complete the framework of a parallel input analysis, a parallel scanner can also combined with the parser. To prove the practicality of a parallel lexing and parsing approach, we report the results of the adaptation of JSON and Lua to a form fit for parallel parsing (i.e. an operator-precedence grammar) through simple grammar changes and scanning transformations. The approach is validated with performance figures from both high performance and embedded multicore platforms, obtained analyzing real-world inputs as a test-bench. The results show that our approach matches or dominates the performances of production-grade LR parsers in sequential execution, and achieves significant speedups and good scaling on multi-core machines. The work is concluded by a broad and critical survey of the past work on parallel parsing and future directions on the integration with semantic analysis and incremental parsing

    Schema-Driven Information Extraction from Heterogeneous Tables

    Full text link
    In this paper, we explore the question of whether large language models can support cost-efficient information extraction from tables. We introduce schema-driven information extraction, a new task that transforms tabular data into structured records following a human-authored schema. To assess various LLM's capabilities on this task, we develop a benchmark composed of tables from four diverse domains: machine learning papers, chemistry literature, material science journals, and webpages. Alongside the benchmark, we present an extraction method based on instruction-tuned LLMs. Our approach shows competitive performance without task-specific labels, achieving F1 scores ranging from 74.2 to 96.1, while maintaining great cost efficiency. Moreover, we validate the possibility of distilling compact table-extraction models to reduce API reliance, as well as extraction from image tables using multi-modal models. By developing a benchmark and demonstrating the feasibility of this task using proprietary models, we aim to support future work on open-source schema-driven IE models

    An Estelle compiler

    Get PDF
    The increasing development and use of computer networks has necessitated international standards to be defined. Central to the standardization efforts is the concept of a Formal Description Technique (FDT) which is used to provide a definition medium for communication protocols and services. This document describes the design and implementation of one of the few existing compilers for the one such FDT, the language "Estelle" ([ISO85], [ISO86], [ISO87])

    An overview of decision table literature 1982-1995.

    Get PDF
    This report gives an overview of the literature on decision tables over the past 15 years. As much as possible, for each reference, an author supplied abstract, a number of keywords and a classification are provided. In some cases own comments are added. The purpose of these comments is to show where, how and why decision tables are used. The literature is classified according to application area, theoretical versus practical character, year of publication, country or origin (not necessarily country of publication) and the language of the document. After a description of the scope of the interview, classification results and the classification by topic are presented. The main body of the paper is the ordered list of publications with abstract, classification and comments.

    An Interactive Reverse Engineering Environment for Large-Scale C++ Code

    Get PDF

    An Interactive Reverse Engineering Environment for Large-Scale C++ Code

    Get PDF

    Automatic extraction of subcategorization frames for Italian

    Get PDF
    Subcategorization is a kind of knowledge which can be considered as crucial in several NLP tasks, such as Information Extraction or parsing, but the collection of very large resources including subcategorization representation is difficult and time-consuming. Various experiences show that the automatic extraction can be a practical and reliable solution for acquiring such a kind of knowledge. The aim of this paper is at investigating the relationships between subcategorization frame extraction and the nature of data from which the frames have to be extracted, e.g. how much the task can be influenced by the richness/poorness of the annotation. Therefore, we present some experiments that apply statistical subcategorization extraction methods, known in literature, on an Italian treebank that exploits a rich set of dependency relations that can be annotated at different degrees of specificity. Benefiting of the availability of relation sets that implement different granularity in the representation of relations, we evaluate our results with reference to previous works in a cross-linguistic perspective. 1

    Evaluating Parsers with Dependency Constraints

    Get PDF
    Many syntactic parsers now score over 90% on English in-domain evaluation, but the remaining errors have been challenging to address and difficult to quantify. Standard parsing metrics provide a consistent basis for comparison between parsers, but do not illuminate what errors remain to be addressed. This thesis develops a constraint-based evaluation for dependency and Combinatory Categorial Grammar (CCG) parsers to address this deficiency. We examine the constrained and cascading impact, representing the direct and indirect effects of errors on parsing accuracy. This identifies errors that are the underlying source of problems in parses, compared to those which are a consequence of those problems. Kummerfeld et al. (2012) propose a static post-parsing analysis to categorise groups of errors into abstract classes, but this cannot account for cascading changes resulting from repairing errors, or limitations which may prevent the parser from applying a repair. In contrast, our technique is based on enforcing the presence of certain dependencies during parsing, whilst allowing the parser to choose the remainder of the analysis according to its grammar and model. We draw constraints for this process from gold-standard annotated corpora, grouping them into abstract error classes such as NP attachment, PP attachment, and clause attachment. By applying constraints from each error class in turn, we can examine how parsers respond when forced to correctly analyse each class. We show how to apply dependency constraints in three parsers: the graph-based MSTParser (McDonald and Pereira, 2006) and the transition-based ZPar (Zhang and Clark, 2011b) dependency parsers, and the C&C CCG parser (Clark and Curran, 2007b). Each is widely-used and influential in the field, and each generates some form of predicate-argument dependencies. We compare the parsers, identifying common sources of error, and differences in the distribution of errors between constrained and cascaded impact. Our work allows us to contrast the implementations of each parser, and how they respond to constraint application. Using our analysis, we experiment with new features for dependency parsing, which encode the frequency of proposed arcs in large-scale corpora derived from scanned books. These features are inspired by and extend on the work of Bansal and Klein (2011). We target these features at the most notable errors, and show how they address some, but not all of the difficult attachments across newswire and web text. CCG parsing is particularly challenging, as different derivations do not always generate different dependencies. We develop dependency hashing to address semantically redundant parses in n-best CCG parsing, and demonstrate its necessity and effectiveness. Dependency hashing substantially improves the diversity of n-best CCG parses, and improves a CCG reranker when used for creating training and test data. We show the intricacies of applying constraints to C&C, and describe instances where applying constraints causes the parser to produce a worse analysis. These results illustrate how algorithms which are relatively straightforward for constituency and dependency parsers are non-trivial to implement in CCG. This work has explored dependencies as constraints in dependency and CCG parsing. We have shown how dependency hashing can efficiently eliminate semantically redundant CCG n-best parses, and presented a new evaluation framework based on enforcing the presence of dependencies in the output of the parser. By otherwise allowing the parser to proceed as it would have, we avoid the assumptions inherent in other work. We hope this work will provide insights into the remaining errors in parsing, and target efforts to address those errors, creating better syntactic analysis for downstream applications

    A new approach to specifying and handling exceptions

    Get PDF
    technical reportAn operation generally exhibits different patterns of behavior over different parts of its domain. Depending upon the context, such behavior may either be conceived of as "normal" or as an "exception." Thus, the behavior of an operation Is quite naturally characterized by the set of partial operations that characterize its (different) behavior on Its subdomains, and exceptions essentially serve to extend (modify) the normal behavior of an operation. In this milieu, we consider the issues of specifying and "handling" exceptional conditions that might occur during the execution of an operation. We argue that one of the important features that an exception handling mechanism need possess is to enable the behavior of an operation be altered over part of Its domain i.e. that of being able o incrementally modify t h e semantics of t h e partial operations that serve to characterize an operation. Surprisingly, this requirement arises out of a pragmatic consideration ? that of providing for the existence of a library of subroutines. We propose a general mechanism for specifying exceptions and their handlers that does not compromise the possibility of efficient implementations. Examples of application of the method are presented In the context of abstract algebraic) data type specifications, using skeletal specifications of a Stack and an error-correcting parser for a context free grammar. The major advantage of the proposed mechanism over conventional appoaches is that of completely avoiding the problem of "dynamic context propagation.
    corecore