1,670 research outputs found
Evaluating two methods for Treebank grammar compaction
Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be ‘read off’ the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar.
In this paper, we explore ways by which a treebank grammar can be reduced in size or ‘compacted’, which involve the use of two kinds of technique: (i) thresholding of rules by their number of occurrences; and (ii) a method of rule-parsing, which has both probabilistic and non-probabilistic variants. Our results show that by a combined use of these two techniques, a probabilistic context-free grammar can be reduced in size by 62% without any loss in parsing performance, and by 71% to give a gain in recall, but some loss in precision
University of Sheffield TREC-8 Q & A System
The system entered by the University of Sheffield in the question answering track of TREC-8 is the result of coupling two existing technologies - information retrieval (IR) and information extraction (IE). In essence the approach is this: the IR system treats the question as a query and returns a set of top ranked documents or passages; the IE system uses NLP techniques to parse the question, analyse the top ranked documents or passages returned by the IR system, and instantiate a query variable in the semantic representation of the question against the semantic representation of the analysed documents or passages. Thus, while the IE system by no means attempts “full text understanding", this approach is a relatively deep approach which attempts to work with meaning representations.
Since the information retrieval systems we used were not our own (AT&T and UMass) and were used more or less “off the shelf", this paper concentrates on describing the modifications made to our existing information extraction system to allow it to participate in the Q & A task
Compacting the Penn Treebank Grammar
Treebanks, such as the Penn Treebank (PTB), offer a simple approach to
obtaining a broad coverage grammar: one can simply read the grammar off the
parse trees in the treebank. While such a grammar is easy to obtain, a
square-root rate of growth of the rule set with corpus size suggests that the
derived grammar is far from complete and that much more treebanked text would
be required to obtain a complete grammar, if one exists at some limit. However,
we offer an alternative explanation in terms of the underspecification of
structures within the treebank. This hypothesis is explored by applying an
algorithm to compact the derived grammar by eliminating redundant rules --
rules whose right hand sides can be parsed by other rules. The size of the
resulting compacted grammar, which is significantly less than that of the full
treebank grammar, is shown to approach a limit. However, such a compacted
grammar does not yield very good performance figures. A version of the
compaction algorithm taking rule probabilities into account is proposed, which
is argued to be more linguistically motivated. Combined with simple
thresholding, this method can be used to give a 58% reduction in grammar size
without significant change in parsing performance, and can produce a 69%
reduction with some gain in recall, but a loss in precision.Comment: 5 pages, 2 figure
Implementation of liquid culture for tuberculosis diagnosis in a remote setting: lessons learned.
Although sputum smear microscopy is the primary method for tuberculosis (TB) diagnosis in low-resource settings, it has low sensitivity. The World Health Organization recommends the use of liquid culture techniques for TB diagnosis and drug susceptibility testing in low- and middle-income countries. An evaluation of samples from southern Sudan found that culture was able to detect cases of active pulmonary TB and extra-pulmonary TB missed by conventional smear microscopy. However, the long delays involved in obtaining culture results meant that they were usually not clinically useful, and high rates of non-tuberculous mycobacteria isolation made interpretation of results difficult. Improvements in diagnostic capacity and rapid speciation facilities, either on-site or through a local reference laboratory, are crucial
Experiments in Structure-Preserving Grammar Compaction
Structure preserving grammar compaction (SPC) is a simple CFG compaction technique originally described in (van Genabith et al., 1999a, 1999b). It works by generalising category labels and in so doing plugs holes in the grammar. To date the method has been tested on small corpra only. In the present research we apply SPC to a large grammar extracted from the Penn Treebank and examine its effects on rule treebank grammar size and on rule accession rates (as an indicator of grammar completeness) . 1 Introduction Tree banks and resources compiled from treebanks are potentially very useful in NLP. Grammars extracted from treebanks --- so called treebank grammars (Charniak, 1996) --- can form the basis of large coverage NLP systems. Such treebank grammars, however, can suffer from several shortcomings: they commonly feature a large number of flat, highly specific rules that may be rarely used, with ensuing costs for processing (load) under the grammar
Rural social organization in Dent County, Missouri
Also available online.Digitized 2007 AES
- …