Search CORE

2,197 research outputs found

University of Sheffield TREC-8 Q & A System

Author: Gaizauskas R.
Hepple M.
Humphreys K.
Sanderson M.
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/01/1999
Field of study

The system entered by the University of Sheffield in the question answering track of TREC-8 is the result of coupling two existing technologies - information retrieval (IR) and information extraction (IE). In essence the approach is this: the IR system treats the question as a query and returns a set of top ranked documents or passages; the IE system uses NLP techniques to parse the question, analyse the top ranked documents or passages returned by the IR system, and instantiate a query variable in the semantic representation of the question against the semantic representation of the analysed documents or passages. Thus, while the IE system by no means attempts “full text understanding", this approach is a relatively deep approach which attempts to work with meaning representations. Since the information retrieval systems we used were not our own (AT&T and UMass) and were used more or less “off the shelf", this paper concentrates on describing the modifications made to our existing information extraction system to allow it to participate in the Q & A task

White Rose Research Online

Compacting the Penn Treebank Grammar

Author: Gaizauskas Robert
Hepple Mark
Krotov Alexander
Wilks Yorick
Publication venue
Publication date: 01/01/1998
Field of study

Treebanks, such as the Penn Treebank (PTB), offer a simple approach to obtaining a broad coverage grammar: one can simply read the grammar off the parse trees in the treebank. While such a grammar is easy to obtain, a square-root rate of growth of the rule set with corpus size suggests that the derived grammar is far from complete and that much more treebanked text would be required to obtain a complete grammar, if one exists at some limit. However, we offer an alternative explanation in terms of the underspecification of structures within the treebank. This hypothesis is explored by applying an algorithm to compact the derived grammar by eliminating redundant rules -- rules whose right hand sides can be parsed by other rules. The size of the resulting compacted grammar, which is significantly less than that of the full treebank grammar, is shown to approach a limit. However, such a compacted grammar does not yield very good performance figures. A version of the compaction algorithm taking rule probabilities into account is proposed, which is argued to be more linguistically motivated. Combined with simple thresholding, this method can be used to give a 58% reduction in grammar size without significant change in parsing performance, and can produce a 69% reduction with some gain in recall, but a loss in precision.Comment: 5 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Implementation of liquid culture for tuberculosis diagnosis in a remote setting: lessons learned.

Author: Cheruiyot C
Hepple P
Novoa-Cain J
Richter E
Ritmeijer K
Publication venue
Publication date: 01/03/2011
Field of study

Although sputum smear microscopy is the primary method for tuberculosis (TB) diagnosis in low-resource settings, it has low sensitivity. The World Health Organization recommends the use of liquid culture techniques for TB diagnosis and drug susceptibility testing in low- and middle-income countries. An evaluation of samples from southern Sudan found that culture was able to detect cases of active pulmonary TB and extra-pulmonary TB missed by conventional smear microscopy. However, the long delays involved in obtaining culture results meant that they were usually not clinically useful, and high rates of non-tuberculous mycobacteria isolation made interpretation of results difficult. Improvements in diagnostic capacity and rapid speciation facilities, either on-site or through a local reference laboratory, are crucial

MSF Field Research

Evaluating two methods for Treebank grammar compaction

Author: Gaizauskas R.
Hepple M.
Krotov A.
Wilks Y.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/12/1999
Field of study

Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be ‘read off’ the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar. In this paper, we explore ways by which a treebank grammar can be reduced in size or ‘compacted’, which involve the use of two kinds of technique: (i) thresholding of rules by their number of occurrences; and (ii) a method of rule-parsing, which has both probabilistic and non-probabilistic variants. Our results show that by a combined use of these two techniques, a probabilistic context-free grammar can be reduced in size by 62% without any loss in parsing performance, and by 71% to give a gain in recall, but some loss in precision

Crossref

White Rose Research Online

Experiments in Structure-Preserving Grammar Compaction

Author: Hepple Mark
van Genabith Josef
Publication venue
Publication date: 01/01/2000
Field of study

Structure preserving grammar compaction (SPC) is a simple CFG compaction technique originally described in (van Genabith et al., 1999a, 1999b). It works by generalising category labels and in so doing plugs holes in the grammar. To date the method has been tested on small corpra only. In the present research we apply SPC to a large grammar extracted from the Penn Treebank and examine its effects on rule treebank grammar size and on rule accession rates (as an indicator of grammar completeness) . 1 Introduction Tree banks and resources compiled from treebanks are potentially very useful in NLP. Grammars extracted from treebanks --- so called treebank grammars (Charniak, 1996) --- can form the basis of large coverage NLP systems. Such treebank grammars, however, can suffer from several shortcomings: they commonly feature a large number of flat, highly specific rules that may be rarely used, with ensuing costs for processing (load) under the grammar

CiteSeerX

Irish Universities

DCU Online Research Access Service

Rural social organization in Dent County, Missouri

Author: Almack Ronald B.
Hepple Lawrence Milton
Publication venue: University of Missouri, College of Agriculture, Agricultural Experiment Station
Publication date: 01/01/1950
Field of study

Also available online.Digitized 2007 AES

University of Missouri: MOspace

Use of Transformation-Based Learning in Annotation Pipeline of Igbo, an African Language

Author: Ezeani I.
Hepple M.
Onyenwe I.
Uchechukwu C.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

The accuracy of an annotated corpus can be increased through evaluation and re- vision of the annotation scheme, and through adjudication of the disagreements found. In this paper, we describe a novel process that has been applied to improve a part-of-speech (POS) tagged corpus for the African language Igbo. An inter-annotation agreement (IAA) exercise was undertaken to iteratively revise the tagset used in the creation of the initial tagged corpus, with the aim of refining the tagset and maximizing annotator performance. The tagset revisions and other corrections were efficiently propagated to the overall corpus in a semi-automated manner using transformation-based learning (TBL) to identify candidates for cor- rection and to propose possible tag corrections. The affected word-tag pairs in the corpus were inspected to ensure a high quality end-product with an accuracy that would not be achieved through a purely automated process. The results show that the tagging accuracy increases from 88% to 94%. The tagged corpus is potentially re-usable for other dialects of the language

White Rose Research Online

Lexical Disambiguation of Igbo using Diacritic Restoration

Author: Ezeani I.
Hepple M.R.
Onyenwe I.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

Properly written texts in Igbo, a low resource African language, are rich in both orthographic and tonal diacritics. Diacritics are essential in capturing the distinctions in pronunciation and meaning of words, as well as in lexical disambiguation. Unfortunately, most electronic texts in diacritic languages are written without diacritics. This makes diacritic restoration a necessary step in corpus building and language processing tasks for languages with diacritics. In our previous work, we built some n−gram models with simple smoothing techniques based on a closedworld assumption. However, as a classi- fication task, diacritic restoration is well suited for and will be more generalisable with machine learning. This paper, therefore, presents a more standard approach to dealing with the task which involves the application of machine learning algorithms

Crossref

White Rose Research Online

Part-of-speech Tagset and Corpus Development for Igbo, an African

Author: Hepple M.
Onyenwe I.E.
Uchechukwu C.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

This project aims to develop linguistic resources to support computational NLP research on the Igbo language. The starting point for this project is the development of a new part-of-speech tagging scheme based on the EAGLES tagset guidelines, adapted to incorporate additional language internal features. The tags are currently being used in a part-of-speech annotation task for the development of POS tagged Igbo corpus. The proposed tagset has 59 tags

Crossref

White Rose Research Online