Search CORE

10,539 research outputs found

Decreasing lexical data sparsity in statistical syntactic parsing - experiments with named entities

Author: Foster Jennifer
Hogan Deirdre
van Genabith Josef
Publication venue
Publication date: 01/01/2011
Field of study

In this paper we present preliminary experiments that aim to reduce lexical data sparsity in statistical parsing by exploiting information about named entities. Words in the WSJ corpus are mapped to named entity clusters and a latent variable constituency parser is trained and tested on the transformed corpus. We explore two different methods for mapping words to entities, and look at the effect of mapping various subsets of named entity types. Thus far, results show no improvement in parsing accuracy over the best baseline score; we identify possible problems and outline suggestions for future directions

CiteSeerX

Irish Universities

DCU Online Research Access Service

Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure

Author: McDonald Ryan
Täckström Oscar
Uszkoreit Jakob
Publication venue
Publication date: 01/01/2012
Field of study

It has been established that incorporating word cluster features derived from large unlabeled corpora can significantly improve prediction of linguistic structure. While previous work has focused primarily on English, we extend these results to other languages along two dimensions. First, we show that these results hold true for a number of languages across families. Second, and more interestingly, we provide an algorithm for inducing cross-lingual clusters and we show that features derived from these clusters significantly improve the accuracy of cross-lingual structure prediction. Specifically, we show that by augmenting direct-transfer systems with cross-lingual cluster features, the relative error of delexicalized dependency parsers, trained on English treebanks and transferred to foreign languages, can be reduced by up to 13%. When applying the same method to direct transfer of named-entity recognizers, we observe relative improvements of up to 26%

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

An integrated architecture for shallow and deep processing

Author: Becker Markus
Crysmann Berthold
Frank Anette
Kiefer Bernd
Krieger Hans-Ulrich
Müller Stefan
Neumann Günter
Piskorski Jakub
Schäfer Ulrich
Siegel Melanie
Uszkoreit Hans
Xu Feiyu
Publication venue
Publication date: 21/12/2011
Field of study

We present an architecture for the integration of shallow and deep NLP components which is aimed at flexible combination of different language technologies for a range of practical current and future applications. In particular, we describe the integration of a high-level HPSG parsing system with different high-performance shallow components, ranging from named entity recognition to chunk parsing and shallow clause recognition. The NLP components enrich a representation of natural language text with layers of new XML meta-information using a single shared data structure, called the text chart. We describe details of the integration methods, and show how information extraction and language checking applications for realworld German text benefit from a deep grammatical analysis

Hochschulschriftenserver - Universität Frankfurt am Main

Towards Universal Semantic Tagging

Author: Abzianidze Lasha
Bos Johan
Publication venue
Publication date: 29/09/2017
Field of study

The paper proposes the task of universal semantic tagging---tagging word tokens with language-neutral, semantically informative tags. We argue that the task, with its independent nature, contributes to better semantic analysis for wide-coverage multilingual text. We present the initial version of the semantic tagset and show that (a) the tags provide semantically fine-grained information, and (b) they are suitable for cross-lingual semantic parsing. An application of the semantic tagging in the Parallel Meaning Bank supports both of these points as the tags contribute to formal lexical semantics and their cross-lingual projection. As a part of the application, we annotate a small corpus with the semantic tags and present new baseline result for universal semantic tagging.Comment: 9 pages, International Conference on Computational Semantics (IWCS

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen