1,046 research outputs found
Overview of the SPMRL 2013 shared task: cross-framework evaluation of parsing morphologically rich languages
This paper reports on the first shared task on statistical parsing of morphologically rich languages (MRLs). The task features data sets from nine languages, each available both in constituency and dependency annotation. We report on the preparation of the data sets, on the proposed parsing scenarios, and on the evaluation metrics for parsing MRLs given different representation types. We present and analyze parsing results obtained by the task participants, and then provide an analysis and comparison of the parsers across languages and frameworks, reported for gold input as well as more realistic parsing scenarios
PARSEC: A Constraint-Based Parser for Spoken Language Processing
PARSEC (1), a text-based and spoken language processing framework based on the Constraint Dependency Grammar (CDG) developed by Maruyama [26,27], is discussed. The scope of CDG is expanded to allow for the analysis of sentences containing lexically ambiguous words, to allow feature analysis in constraints, and to efficiently process multiple sentence candidates that are likely to arise in spoken language processing. The benefits of the CDG parsing approach are summarized. Additionally, the development CDG grammars using PARSEC grammar writing tools and the implementation of the PARSEC parser for word graphs is discussed. (1) Parallel ARchitecture Sentence Constraine
Darstellung und stochastische Auflösung von Ambiguität in constraint-basiertem Parsing
Diese Arbeit untersucht zwei komplementäre Ansätze zum Umgang mit Mehrdeutigkeiten bei der automatischen Verarbeitung natürlicher Sprache. Zunächst werden Methoden vorgestellt, die es erlauben, viele konkurrierende Interpretationen in einer gemeinsamen Datenstruktur kompakt zu repräsentieren. Dann werden Ansätze vorgeschlagen, die verschiedenen Interpretationen mit Hilfe von stochastischen Modellen zu bewerten. Für das dabei auftretende Problem, Wahrscheinlichkeiten von seltenen Ereignissen zu schätzen, die in den Trainingsdaten nicht auftraten, werden neuartige Methoden vorgeschlagen.This thesis investigates two complementary approches to cope with ambiguities in natural language processing. It first presents methods that allow to store many competing interpretations compactly in one shared datastructure. It then suggests approaches to score the different interpretations using stochastic models. This leads to the problem of estimation of probabilities of rare events that have not been observed in the training data, for which novel methods are proposed
Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages
International audienceThis paper reports on the first shared task on statistical parsing of morphologically rich lan- guages (MRLs). The task features data sets from nine languages, each available both in constituency and dependency annotation. We report on the preparation of the data sets, on the proposed parsing scenarios, and on the eval- uation metrics for parsing MRLs given dif- ferent representation types. We present and analyze parsing results obtained by the task participants, and then provide an analysis and comparison of the parsers across languages and frameworks, reported for gold input as well as more realistic parsing scenarios
Recommended from our members
Formalizing graphical notations
The thesis describes research into graphical notations for software engineering, with a principal interest in ways of formalizing them. The research seeks to provide a theoretical basis that will help in designing both notations and the software tools that process them.
The work starts from a survey of literature on notation, followed by a review of techniques for formal description and for computational handling of notations. The survey concentrates on collecting views of the benefits and the problems attending notation use in software development; the review covers picture description languages, grammars and tools such as generic editors and visual programming environments. The main problem of notation is found to be a lack of any coherent, rigorous description methods. The current approaches to this problem are analysed as lacking in consensus on syntax specification and also lacking a clear focus on a defined concept of notated expression.
To address these deficiencies, the thesis embarks upon an exploration of serniotic, linguistic and logical theory; this culminates in a proposed formalization of serniosis in notations, using categorial model theory as a mathematical foundation. An argument about the structure of sign systems leads to an analysis of notation into a layered system of tractable theories, spanning the gap between expressive pictorial medium and subject domain. This notion of 'tectonic' theory aims to treat both diagrams and formulae together.
The research gives details of how syntactic structure can be sketched in a mathematical sense, with examples applying to software development diagrams, offering a new solution to the problem of notation specification. Based on these methods, the thesis discusses directions for resolving the harder problems of supporting notation design, processing and computer-aided generic editing. A number of future research areas are thereby opened up. For practical trial of the ideas, the work proceeds to the development and partial implementation of a system to aid the design of notations and editors. Finally the thesis is evaluated as a contribution to theory in an area which has not attracted a standard approach
Visual language representation for use case evolution and traceability
The primary goal of this research is to assist non-technical stakeholders involved in requirements engineering with a comprehensible method for managing changing requirements within a specific domain. An important part of managing evolving requirements over time is to maintain a temporal ordering of the changes and to support traceability of the modifications. This research defines a semi-formal syntactical and semantic definition of such a method using a visual language, RE/TRAC (Requirements Evolution with Traceability), and a supporting formal semantic notation RE/TRAC-SEM. RE/TRAC-SEM is an ontological specification employing a combination of models, including verbal definitions, set theory and a string language specification RE/TRAC-CF. The language RE/TRAC-CF enables the separation of the syntactical description of the visual language from the semantic meaning of the model, permitting varying target representations and taking advantage of existing efficient parsing algorithms for context-free grammars. As an application of the RE/TRAC representation, this research depicts the hierarchical step-wise refinement of UML use case diagrams to demonstrate evolving system requirements. In the current arena of software development, where systems are described using platform independent models (PIMs) which emphasize the front-end design process, requirements and design documents, including the use cases, have become the primary artifacts of the system. Therefore the management of requirements’ evolution has become even more critical in the creation and maintenance of systems
Object-oriented engineering of visual languages
Visual languages are notations that employ graphics (icons, diagrams) to present information in a two or more dimensional space. This work focuses on diagrammatic visual languages, as found in software engineering, and their computer implementations. Implementation means the development of processors to automatically analyze diagrams and the development of graphical editors for constructing the diagrams. We propose a rigorous implementation technique that uses a formal grammar to specify the syntax of a visual language and that uses parsing to automatically analyze the visual sentences generated by the grammar. The theoretical contributions of our work are an original treatment of error handling (error detection, reporting, and recovery) in off-line visual language parsing, and the source-to-source translation of visual languages. We have also substantially extended an existing grammatical model for multidimensional languages, called atomic relational grammars. We have added support for meta-language expressions that denote optional and repetitive right-hand-side elements. We hav
Formalising Graphical Service Descriptions using SDL
It is convenient to describe telecomms services using a graphical notation that is accessible to non-specialists. However, the notation should also have a formal interpretation for rigorous analysis. CRESS (Chisel Representation Employing Systematic Specification) has been developed for this purpose. A brief overview of CRESS is given. It is explained how features (additional services) can be defined in a modular fashion, and automatically combined with a base service. Brief case studies illustrate how the approach has been used to describe services in the IN (Intelligent Network), SIP (Session Initiation Protocol), and IVR (Interactive Voice Response). Finally, it is shown how CRESS diagrams are translated into SDL for automated simulation, validation and implementation
- …