Search CORE

327 research outputs found

Recommended from our members

An Evaluation Exercise for Romanian Word Sense Disambiguation

Author: Chklovski Timothy A. (Timothy Anatolievich), 1977-
Hristea Florentina T.
Mihalcea Rada, 1974-
Nastase Vivi
Tatar Doina
Tufis Dan
Publication venue
Publication date: 01/07/2004
Field of study

This paper discusses an evaluation exercise for Romanian word sense disambiguation

UNT Digital Library

Sometimes less is more : Romanian word sense disambiguation revisited

Author: Dinu Georgiana
Kübler Sandra
Publication venue
Publication date: 01/01/2007
Field of study

Recent approaches to Word Sense Disambiguation (WSD) generally fall into two classes: (1) information-intensive approaches and (2) information-poor approaches. Our hypothesis is that for memory-based learning (MBL), a reduced amount of data is more beneficial than the full range of features used in the past. Our experiments show that MBL combined with a restricted set of features and a feature selection method that minimizes the feature set leads to competitive results, outperforming all systems that participated in the SENSEVAL-3 competition on the Romanian data. Thus, with this specific method, a tightly controlled feature set improves the accuracy of the classifier, reaching 74.0% in the fine-grained and 78.7% in the coarse-grained evaluation

Hochschulschriftenserver - Universität Frankfurt am Main

Overview of the CLEF 2008 Multilingual Question Answering Track

Author: Alegria Iñaki
Forascu Corina
Forner Pamela
Moreau Nicolas
Osenova Petya
Peñas Anselmo
Prokopidis Prokopis
Rocha Paulo
Sacaleanu Bogdan
Sang Erik Tjong Kim
Sutcliffe Richard
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2009
Field of study

Crossref

Repositório Comum

Towards a machine-learning architecture for lexical functional grammar parsing

Author: Chrupała Grzegorz
Publication venue: Dublin City University. School of Computing
Publication date: 01/11/2008
Field of study

Data-driven grammar induction aims at producing wide-coverage grammars of human languages. Initial efforts in this field produced relatively shallow linguistic representations such as phrase-structure trees, which only encode constituent structure. Recent work on inducing deep grammars from treebanks addresses this shortcoming by also recovering non-local dependencies and grammatical relations. My aim is to investigate the issues arising when adapting an existing Lexical Functional Grammar (LFG) induction method to a new language and treebank, and find solutions which will generalize robustly across multiple languages. The research hypothesis is that by exploiting machine-learning algorithms to learn morphological features, lemmatization classes and grammatical functions from treebanks we can reduce the amount of manual specification and improve robustness, accuracy and domain- and language -independence for LFG parsing systems. Function labels can often be relatively straightforwardly mapped to LFG grammatical functions. Learning them reliably permits grammar induction to depend less on language-specific LFG annotation rules. I therefore propose ways to improve acquisition of function labels from treebanks and translate those improvements into better-quality f-structure parsing. In a lexicalized grammatical formalism such as LFG a large amount of syntactically relevant information comes from lexical entries. It is, therefore, important to be able to perform morphological analysis in an accurate and robust way for morphologically rich languages. I propose a fully data-driven supervised method to simultaneously lemmatize and morphologically analyze text and obtain competitive or improved results on a range of typologically diverse languages

Irish Universities

DCU Online Research Access Service

Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches

Author: Benjamin Snyder
Citable Link
Jacob Eisenstein
Regina Barzilay
Tahira Naseem
Publication venue: 'AI Access Foundation'
Publication date: 15/01/2014
Field of study

We demonstrate the effectiveness of multilingual learning for unsupervised part-of-speech tagging. The central assumption of our work is that by combining cues from multiple languages, the structure of each becomes more apparent. We consider two ways of applying this intuition to the problem of unsupervised part-of-speech tagging: a model that directly merges tag structures for a pair of languages into a single sequence and a second model which instead incorporates multilingual context using latent variables. Both approaches are formulated as hierarchical Bayesian models, using Markov Chain Monte Carlo sampling techniques for inference. Our results demonstrate that by incorporating multilingual evidence we can achieve impressive performance gains across a range of scenarios. We also found that performance improves steadily as the number of available languages increases

arXiv.org e-Print Archive

CiteSeerX

Language technologies for a multilingual Europe

Author
Publication venue: Language Science Press
Publication date: 01/04/2020
Field of study

This volume of the series “Translation and Multilingual Natural Language Processing” includes most of the papers presented at the Workshop “Language Technology for a Multilingual Europe”, held at the University of Hamburg on September 27, 2011 in the framework of the conference GSCL 2011 with the topic “Multilingual Resources and Multilingual Applications”, along with several additional contributions. In addition to an overview article on Machine Translation and two contributions on the European initiatives META-NET and Multilingual Web, the volume includes six full research articles. Our intention with this workshop was to bring together various groups concerned with the umbrella topics of multilingualism and language technology, especially multilingual technologies. This encompassed, on the one hand, representatives from research and development in the field of language technologies, and, on the other hand, users from diverse areas such as, among others, industry, administration and funding agencies. The Workshop “Language Technology for a Multilingual Europe” was co-organised by the two GSCL working groups “Text Technology” and “Machine Translation” (http://gscl.info) as well as by META-NET (http://www.meta-net.eu)

Directory of Open Access Books (DOAB)

Abstract syntax as interlingua: Scaling up the grammatical framework from controlled languages to robust pipelines

Author: Angelov Krasimir
Gruzitis N.
Kolachina Prasanth
Ranta Aarne
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2020
Field of study

Syntax is an interlingual representation used in compilers. Grammatical Framework (GF) applies the abstract syntax idea to natural languages. The development of GF started in 1998, first as a tool for controlled language implementations, where it has gained an established position in both academic and commercial projects. GF provides grammar resources for over 40 languages, enabling accurate generation and translation, as well as grammar engineering tools and components for mobile and Web applications. On the research side, the focus in the last ten years has been on scaling up GF to wide-coverage language processing. The concept of abstract syntax offers a unified view on many other approaches: Universal Dependencies, WordNets, FrameNets, Construction Grammars, and Abstract Meaning Representations. This makes it possible for GF to utilize data from the other approaches and to build robust pipelines. In return, GF can contribute to data-driven approaches by methods to transfer resources from one language to others, to augment data by rule-based generation, to check the consistency of hand-annotated corpora, and to pipe analyses into high-precision semantic back ends. This article gives an overview of the use of abstract syntax as interlingua through both established and emerging NLP applications involving GF

Chalmers Research

D7.1. Criteria for evaluation of resources, technology and integration.

Author: Arranz Victoria
Bel Nuria
Caselli Tommaso
Hamon Olivier
Papavassiliou Vassilis
Poch Riera Marc
Quochi Valeria
Rimell Laura
Strik Lievers Francesca
Thurmair Gregor
Toral Antonio
Publication venue
Publication date
Field of study

This deliverable defines how evaluation is carried out at each integration cycle in the PANACEA project. As PANACEA aims at producing large scale resources, evaluation becomes a critical and challenging issue. Critical because it is important to assess the quality of the results that should be delivered to users. Challenging because we prospect rather new areas, and through a technical platform: some new methodologies will have to be explored or old ones to be adapted

PUblication MAnagement