Search CORE

11 research outputs found

Formal Language Model for Detecting Ambiguity in SGML

Author: Matzen Richard Walter
Publication venue
Publication date: 01/12/1993
Field of study

Computer Scienc

SHAREOK repository

Attribute grammars for scalable query processing on XML streams

Author: Koch Christoph
Scherzinger Stefanie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/06/2011
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Optimization and Parallelization of RegEx Based Information Extraction

Author: Doleschal Johannes
Publication venue
Publication date: 01/01/2021
Field of study

EPub Bayreuth

Deciding determinism of caterpillar expressions

Author: Salomaa Kai
Yu Sheng
Zan Jinfeng
Publication venue: Elsevier B.V.
Publication date: 01/09/2009
Field of study

AbstractCaterpillar expressions have been introduced by Brüggemann-Klein and Wood for applications in markup languages. Caterpillar expressions provide a convenient formalism for specifying the operation of tree-walking automata on unranked trees. Here we give a formal definition of determinism of caterpillar expressions that is based on the language of instruction sequences defined by the expression. We show that determinism of caterpillar expressions can be decided in polynomial time

Elsevier - Publisher Connector

Efficient Testing and Matching of Deterministic Regular Expressions

Author: Groz Benoît
Maneth Sebastian
Publication venue: 'Elsevier BV'
Publication date: 01/11/2017
Field of study

International audienc

HAL-CentraleSupelec

Crossref

Edinburgh Research Explorer

HAL-Rennes 1

Computer Science's Digest Volume 1

Author: Castillo Juan C.
Guerrero Jairo
Insuasti Jesús
Publication venue: Editorial Universitaria
Publication date: 01/01/2015
Field of study

This series is dedicated to the students of the Systems Department, to give them reading material related to computer science in a second language. This book covers the Introduction to Computer Science, Computer Communications, Networking and Web Applications

Crossref

Universidad de Nariño (Narnia)

Unambiguity of Extended Regular Expressions in SGML Document Grammars

Author: Colleen Cavanaugh (596781)
Julie Huber (4253527)
Julie Reveillaud (225126)
Rika Anderson (4802478)
Sintra Reves-Sohn (4802475)
Publication venue: Springer-Verlag
Publication date: 01/01/1993
Field of study

In the Standard Generalized Markup Language (SGML), document types are defined by context-free grammars in an extended Backus-Naur form. The right-hand side of a production is called a content model. Content models are extended regular expressions that have to be unambiguous in the sense that "an element : : : that occurs in the document instance must be able to satisfy only one primitive content token without looking ahead in the document instance." In this paper, we present a linear-time algorithm that decides whether a given content model is unambiguous. A similar result has previously been obtained not for content models but for the smaller class of standard regular expressions. It relies on the fact that the languages of marked regular expressions are local---a property that does not hold any more for content models that contain the new &-operator. Therefore, it is necessary to develop new techniques for content models. Besides solving an interesting problem in formal ..

CiteSeerX

Crossref

Woods Hole Open Access Server

Harvard University - DASH

Directory of Open Access Journals

Towards Entity Status

Author: Wolters Maria Klara
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Discourse entities are an important construct in computational linguistics. They introduce an additional level of representation between referring expressions and that which they refer to: the level of mental representation. In this thesis, I first explore some semiotic and communication theoretic aspects of discourse entities. Then, I develop the concept of "entity status". Entity status is a meta-variable that collects two dimensions formations about the role that an entity plays a discourse, and management informations about how the entity is created, accessed, and updated. Finally, the concept is applied to two case studies: the first one focusses on the choice of referring expressions in radio news, while the second looks at the conditions under which a discourse entity can be mentioned as a pronoun.Diskursentitäten sind ein wichtiger Konstrukt in der Computerlinguistik. Sie führen eine zusätzliche Repräsentationsebene ein zwischen referierenden Ausdrücken, und dem, auf das diese Ausdrücke referieren: die Ebene der mentalen Repräsentation. In dieser Dissertation erkunde ich zunächst einige semiotische und kommunikationstheoretische Aspekte von Diskursentitäten. Danach führe ich den Begriff des "Entitätenstatus" ein. Entitätenstatus ist eine Meta-Variable, die zwei Dimensionen von Information über eine Diskursentität vereinigt: Struktur-Informationen über die Rolle, die eine Entität im Diskurs spielt, und Verwaltungs-Informationen über Erstellung, Zugriff und Update. Dieser Begriff wird schlussendlich auf zwei Fallstudien angewendet: die erste Studie konzentriert sich auf die Wahl referierender Ausdrücke in Radionachrichten, während die zweite Studie die Bedingungen untersucht, in denen eine Diskursentität als Pronomen erwähnt werden kann

bonndoc – Der Publikationsserver der Universität Bonn

Open-source resources and standards for Arabic word structure analysis: Fine grained morphological analysis of Arabic text corpora

Author: Sawalha Majdi Shaker Salem
Publication venue: University of Leeds
Publication date: 01/01/2011
Field of study

Morphological analyzers are preprocessors for text analysis. Many Text Analytics applications need them to perform their tasks. The aim of this thesis is to develop standards, tools and resources that widen the scope of Arabic word structure analysis - particularly morphological analysis, to process Arabic text corpora of different domains, formats and genres, of both vowelized and non-vowelized text. We want to morphologically tag our Arabic Corpus, but evaluation of existing morphological analyzers has highlighted shortcomings and shown that more research is required. Tag-assignment is significantly more complex for Arabic than for many languages. The morphological analyzer should add the appropriate linguistic information to each part or morpheme of the word (proclitic, prefix, stem, suffix and enclitic); in effect, instead of a tag for a word, we need a subtag for each part. Very fine-grained distinctions may cause problems for automatic morphosyntactic analysis – particularly probabilistic taggers which require training data, if some words can change grammatical tag depending on function and context; on the other hand, finegrained distinctions may actually help to disambiguate other words in the local context. The SALMA – Tagger is a fine grained morphological analyzer which is mainly depends on linguistic information extracted from traditional Arabic grammar books and prior knowledge broad-coverage lexical resources; the SALMA – ABCLexicon. More fine-grained tag sets may be more appropriate for some tasks. The SALMA –Tag Set is a theory standard for encoding, which captures long-established traditional fine-grained morphological features of Arabic, in a notation format intended to be compact yet transparent. The SALMA – Tagger has been used to lemmatize the 176-million words Arabic Internet Corpus. It has been proposed as a language-engineering toolkit for Arabic lexicography and for phonetically annotating the Qur’an by syllable and primary stress information, as well as, fine-grained morphological tagging

White Rose E-theses Online

OpenGrey Repository