2,110 research outputs found
Annotation and representation of a diachronic corpus of Spanish
In this article we describe two different strategies for the automatic tagging of a Spanish diachronic corpus involving the adaptation of existing NLP tools developed for modern Spanish. In the initial approach we follow a state-of-the-art strategy, which consists on standardizing the spelling and the lexicon. This approach boosts POS-tagging accuracy to 90, which represents a raw improvement of over 20% with respect to the results obtained without any pre-processing. In order to enable non-expert users in NLP to use this new resource, the corpus has been integrated into IAC (Corpora Interface Access). We discuss the shortcomings of the initial approach and propose a new one, which does not consist in adapting the source texts to the tagger, but rather in modifying the tagger for the direct treatment of the old variants.This second strategy addresses some important shortcomings in the previous approach and is likely to be useful not only in the creation of diachronic linguistic resources but also for the treatment of dialectal or non-standard variants of
synchronic languages as well.Peer ReviewedPostprint (published version
Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction
We present a Markov part-of-speech tagger for which the P (w|t) emission probabilities of word w given tag t are replaced by a linear interpolation of tag emission probabilities given a list of representations of w. As word representations, string su#xes of w are cut o# at the local maxima of the Normalized Backward Successor Variety. This procedure allows for the derivation of linguistically meaningful string suffixes that may relate to certain POS labels. Since no linguistic knowledge is needed, the procedure is language independent. Basic Markov model part-of-speech taggers are significantly outperformed by our model
An automatic part-of-speech tagger for Middle Low German
Syntactically annotated corpora are highly important for enabling large-scale diachronic and diatopic language research. Such corpora have recently been developed for a variety of historical languages, or are still under development. One of those under development is the fully tagged and parsed Corpus of Historical Low German (CHLG), which is aimed at facilitating research into the highly under-researched diachronic syntax of Low German. The present paper reports on a crucial step in creating the corpus, viz. the creation of a part-of-speech tagger for Middle Low German (MLG). Having been transmitted in several non-standardised written varieties, MLG poses a challenge to standard POS taggers, which usually rely on normalized spelling. We outline the major issues faced in the creation of the tagger and present our solutions to them
An Experimental Digital Library Platform - A Demonstrator Prototype for the DigLib Project at SICS
Within the framework of the Digital Library project at SICS, this thesis describes the implementation of a demonstrator prototype of a digital library (DigLib); an experimental platform integrating several functions in one common interface. It includes descriptions of the structure and formats of the digital library collection, the tailoring of the search engine Dienst, the construction of a keyword extraction tool, and the design and development of the interface. The platform was realised through sicsDAIS, an agent interaction and presentation system, and is to be used for testing and evaluating various tools for information seeking. The platform supports various user interaction strategies by providing: search in bibliographic records (Dienst); an index of keywords (the Keyword Extraction Function (KEF)); and browsing through the hierarchical structure of the collection. KEF was developed for this thesis work, and extracts and presents keywords from Swedish documents. Although based on a comparatively simple algorithm, KEF contributes by supplying a long-felt want in the area of Information Retrieval. Evaluations of the tasks and the interface still remain to be done, but the digital library is very much up and running. By implementing the platform through sicsDAIS, DigLib can deploy additional tools and search engines without interfering with already running modules. If wanted, agents providing other services than SICS can supply, can be plugged in
Marrying Universal Dependencies and Universal Morphology
The Universal Dependencies (UD) and Universal Morphology (UniMorph) projects
each present schemata for annotating the morphosyntactic details of language.
Each project also provides corpora of annotated text in many languages - UD at
the token level and UniMorph at the type level. As each corpus is built by
different annotators, language-specific decisions hinder the goal of universal
schemata. With compatibility of tags, each project's annotations could be used
to validate the other's. Additionally, the availability of both type- and
token-level resources would be a boon to tasks such as parsing and homograph
disambiguation. To ease this interoperability, we present a deterministic
mapping from Universal Dependencies v2 features into the UniMorph schema. We
validate our approach by lookup in the UniMorph corpora and find a
macro-average of 64.13% recall. We also note incompatibilities due to paucity
of data on either side. Finally, we present a critical evaluation of the
foundations, strengths, and weaknesses of the two annotation projects.Comment: UDW1
Towards Universal Semantic Tagging
The paper proposes the task of universal semantic tagging---tagging word
tokens with language-neutral, semantically informative tags. We argue that the
task, with its independent nature, contributes to better semantic analysis for
wide-coverage multilingual text. We present the initial version of the semantic
tagset and show that (a) the tags provide semantically fine-grained
information, and (b) they are suitable for cross-lingual semantic parsing. An
application of the semantic tagging in the Parallel Meaning Bank supports both
of these points as the tags contribute to formal lexical semantics and their
cross-lingual projection. As a part of the application, we annotate a small
corpus with the semantic tags and present new baseline result for universal
semantic tagging.Comment: 9 pages, International Conference on Computational Semantics (IWCS
Measurement of pi^0 photoproduction on the proton at MAMI C
Differential cross sections for the gamma p -> pi^0 p reaction have been
measured with the A2 tagged-photon facilities at the Mainz Microtron, MAMI C,
up to the center-of-mass energy W=1.9 GeV. The new results, obtained with a
fine energy and angular binning, increase the existing quantity of pi^0
photoproduction data by ~47%. Owing to the unprecedented statistical accuracy
and the full angular coverage, the results are sensitive to high partial-wave
amplitudes. This is demonstrated by the decomposition of the differential cross
sections in terms of Legendre polynomials and by further comparison to model
predictions. A new solution of the SAID partial-wave analysis obtained after
adding the new data into the fit is presented.Comment: 13 pages, 12 figures, 1 tabl
SDN Access Control for the Masses
The evolution of Software-Defined Networking (SDN) has so far been
predominantly geared towards defining and refining the abstractions on the
forwarding and control planes. However, despite a maturing south-bound
interface and a range of proposed network operating systems, the network
management application layer is yet to be specified and standardized. It has
currently poorly defined access control mechanisms that could be exposed to
network applications. Available mechanisms allow only rudimentary control and
lack procedures to partition resource access across multiple dimensions.
We address this by extending the SDN north-bound interface to provide control
over shared resources to key stakeholders of network infrastructure: network
providers, operators and application developers. We introduce a taxonomy of SDN
access models, describe a comprehensive design for SDN access control and
implement the proposed solution as an extension of the ONOS network controller
intent framework
- …