Search CORE

523 research outputs found

Natural Language Processing at the School of Information Studies for Africa

Author: Eriksson Gunnar
Fourla Athanassia
Gambäck Björn
Publication venue
Publication date: 01/01/2005
Field of study

The lack of persons trained in computational linguistic methods is a severe obstacle to making the Internet and computers accessible to people all over the world in their own languages. The paper discusses the experiences of designing and teaching an introductory course in Natural Language Processing to graduate computer science students at Addis Ababa University, Ethiopia, in order to initiate the education of computational linguists in the Horn of Africa region

Crossref

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Development of tag sets for part-of-speech tagging

Author: Atwell ES
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/12/2008
Field of study

This article discusses tag sets used when PoS-tagging a corpus, that is, enriching a corpus by adding a part-of-speech tag to each word. This requires a tag-set, a list of grammatical category labels; a tagging scheme, practical definitions of each tag or label, showing words and contexts where each tag applies; and a tagger, a program for assigning a tag to each word in the corpus, implementing the tag-set and tagging-scheme in a tag-assignment algorithm. We start by reviewing tag-sets developed for English corpora in section 1, since English was the first language studied by corpus linguists. Pioneering corpus linguists thought that their English corpora could be more useful research resources if each word was annotated with a Part-of-Speech label or tag. Traditional English grammars generally provide 8 basic parts of speech, derived from Latin grammar. However, most tag-set developers wanted to capture finer grammatical distinctions, leading to larger tag-sets. PoS-tagged English corpora have been used in a wide range of applications. Section 2 examines criteria used in development of English corpus Part-of-Speech tag sets: mnemonic tag names; underlying linguistic theory; classification by form or function; analysis of idiosyncratic words; categorization problems; tokenisation issues: defining what counts as a word; multi-word lexical items; target user and/or application; availability and/or adaptability of tagger software; adherence to standards; variations in genre, register, or type of language; and degree of delicacy of the tag-set. To illustrate these issues, section 3 outlines a range of examples of tag set developments for different languages, and discusses how these criteria apply. First we consider tag-sets for an online Part-of-Speech tagging service for English; then design of a tag-set for another language from the same broad Indo-European language family, Urdu; then for a non-Indo-European language with a highly inflexional grammar, Arabic; then for a contrasting non-Indo-European language with isolating grammar, Malay. Finally, we present some conclusions in section 4, and references in section 5

White Rose Research Online

Focus to emphasize tone analysis for prosodic generation

Author: Cercone Nick
Keselj Vlado
Narupiyakul Lalita
Sirinaovakul Booncharoen
Publication venue: Elsevier Ltd.
Publication date: 30/04/2008
Field of study

AbstractEmphasizing prosody of a sentence at its focus part when producing a speaker’s utterance can improve the recognition rate to hearers and reduce its ambiguity. Our objective is to address this challenge by analysing the concept of foci in speech utterances and the relationship of focus, speaker’s intention and prosody. Our investigation is aimed at understanding and modelling how a speaker’s utterances are influenced by the speaker’s intentions. The relationship between speaker’s intentions and focus information is used to consider which parts of the sentence serve as the focus parts. We propose using the Focus to Emphasize Tone (FET) analysis, which includes: (i) generating the constraints for foci, speaker’s intention and prosodic features, (ii) defining the intonation patterns, (iii) labelling a set of prosodic marks for a sentence. We also design the FET structure to support our analysis and to contain focus, speaker’s intention and prosodic components. An implementation of the system is described and the evaluation results on the CMU Communicator (CMU–COM) dataset are presented

Elsevier - Publisher Connector

Recent advances in Janus: a speech translation system

Author: [u.a.] Alex
Coccaro Noah
Eisele Andreas
Mcnair A.
Rogina Ivica
Sloboda Tilo
Waibel Alex
Woszczyna Monika
Publication venue
Publication date: 02/08/2007
Field of study

KITopen

Macro Grammars and Holistic Triggering for Efficient Semantic Parsing

Author: Liang Percy
Pasupat Panupong
Zhang Yuchen
Publication venue
Publication date: 01/01/2017
Field of study

To learn a semantic parser from denotations, a learning algorithm must search over a combinatorially large space of logical forms for ones consistent with the annotated denotations. We propose a new online learning algorithm that searches faster as training progresses. The two key ideas are using macro grammars to cache the abstract patterns of useful logical forms found thus far, and holistic triggering to efficiently retrieve the most relevant patterns based on sentence similarity. On the WikiTableQuestions dataset, we first expand the search space of an existing model to improve the state-of-the-art accuracy from 38.7% to 42.7%, and then use macro grammars and holistic triggering to achieve an 11x speedup and an accuracy of 43.7%.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

High efficiency realization for a wide-coverage unification grammar

Author: D. Flickinger
J. Phillips
S. Oepen
S. Shieber
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

We give a detailed account of an algorithm for efficient tactical generation from underspecified logical-form semantics, using a wide-coverage grammar and a corpus of real-world target utterances. Some earlier claims about chart realization are critically reviewed and corrected in the light of a series of practical experiments. As well as a set of algorithmic refinements, we present two novel techniques: the integration of subsumption-based local ambiguity factoring, and a procedure to selectively unpack the generation forest according to a probability distribution given by a conditional, discriminative model

CiteSeerX

Crossref

Sussex Research Online

Example-based machine translation of the Basque language

Author: Groves Declan
Sarasola Kepa
Stroppa Nicolas
Way Andy
Publication venue
Publication date: 01/01/2006
Field of study

Basque is both a minority and a highly inflected language with free order of sentence constituents. Machine Translation of Basque is thus both a real need and a test bed for MT techniques. In this paper, we present a modular Data-Driven MT system which includes different chunkers as well as chunk aligners which can deal with the free order of sentence constituents of Basque. We conducted Basque to English translation experiments, evaluated on a large corpus (270, 000 sentence pairs). The experimental results show that our system significantly outperforms state-of-the-art approaches according to several common automatic evaluation metrics

CiteSeerX

DCU Online Research Access Service

Edinburgh's Statistical Machine Translation Systems for WMT16

Author: Bojar Ondrej
Haddow Barry
Huck Matthias
Nadejde Maria
Sennrich Rico
Williams Philip
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

This paper describes the University of Edinburgh’s phrase-based and syntax-based submissions to the shared translation tasks of the ACL 2016 First Conference on Machine Translation (WMT16). We submitted five phrase-based and five syntaxbased systems for the news task, plus one phrase-based system for the biomedical task

Crossref

Edinburgh Research Explorer

Publikationsserver der RWTH Aachen University

Biblio at Institute of Formal and Applied Linguistics