12 research outputs found
Design and evaluation of the linguistic basis of an automatic F-struture annotation algorithm for the Penn-II treebank
In this thesis, we describe the design and evaluation of the linguistic basis of an automatic f-structure annotation algorithm for the Wall Street Journal (WSJ) section of the Penn-II Treebank, which consists of more than 1,000,000 words, tagged for part~of-speech information, in about 50,000 sentences and trees. We discuss the background and some of the main principles of Lexical- Functional Grammar (LFG), which is the theory of language used to represent the predicate-argument-modifier structure of a sentence by us in our application.
We then present the guidelines for the tagging of the Penn-II Treebank, followed by a description of how the linguistics of the Penn-II Treebank relate to LFG. The automatic annotation of such Treebank grammars is difficult as annotation rules often need to identify sub-sequences in the right-hand-sides of (often) flat Treebank rules as they explicitly encode head, complement and modifier relations. The algorithm we have developed is designed to handle these flat grammar rules. We describe the methodology used to encode the linguistic generalisations needed to annotate Treebank resources with LFG f-structure information, which, unlike previous approaches to this problem, scales up to the size of the WSJ section of the Penn-II Treebank.
Finally, we present and assess a number of automatic evaluation methodologies for assessing the effectiveness of the techniques we have developed. We first employ a quantitative evaluation, whcih measures the coverage of our annotation algorithm with respect to rule types and tokens, and calculates the degree of fragmentation of the automatically generated f-structure. Secondly, we present a qualitative evaluation, which measures the quality of the f-structures produced against a manually constructed âgold standardâ set of f-structures. Finally, we summarise our work to date, and outline possibilities for further work
Evaluating automatic F-structure annotation for the Penn-II treebank
Methodologies have been developed (van Genabith et al., 1999a,b; Sadler et al., 2000; Frank, 2000; van Genabith et al., 2001; Frank et al., 2002) for automatically annotating treebank resources with Lexical-Functional Grammar (LFG: Kaplan and Bresnan, 1982) fstructure information. Until recently, however, most of this work on automatic annotation has been applied only to limited datasets, so while it may have shown 'proof of concept', it has not been demonstrated that the techniques developed scale up to much larger data sets (Liakata and Pulman, 2002). More recent work (Cahill et al., 2002a,b) has presented efforts in evolving and scaling techniques established in these previous papers to the full Penn-ll Treebank (Marcus et al., 1994). In this paper, we present and assess a number of quantitative and qualitative evaluation methodologies which provide insights into the effectiveness of the techniques developed to derive automatically a set of f-structures for the more than 1,000,000 words and 49,000 sentences of Penn-II
Treebank-based multilingual unification-grammar development
Broad-coverage, deep unification grammar development is time-consuming and costly. This problem can be exacerbated
in multilingual grammar development scenarios. Recently (Cahill et al., 2002) presented a treebank-based methodology
to semi-automatically create broadcoverage, deep, unification grammar resources for English. In this paper we
present a project which adapts this model to a multilingual grammar development scenario to obtain robust, wide-coverage, probabilistic Lexical-Functional Grammars
(LFGs) for English and German via automatic f-structure annotation algorithms based on the Penn-II and TIGER
treebanks. We outline our method used to extract a probabilistic LFG from the TIGER treebank and report on the quality of the f-structures produced. We achieve an f-score of 66.23 on the evaluation of 100 random sentences against a manually constructed gold standard
Parsing with PCFGs and automatic f-structure annotation
The development of large coverage, rich unification- (constraint-) based grammar resources is very time consuming, expensive and requires lots of linguistic expertise. In this paper we report initial results on a new methodology that attempts to partially automate the development of substantial parts of large coverage, rich unification- (constraint-) based grammar resources. The method is based on a treebank resource (in our case Penn-II) and an automatic f-structure annotation algorithm that annotates treebank trees with proto-f-structure information. Based on these, we present two parsing architectures: in our pipeline architecture we first extract a PCFG from the treebank following the method of (Charniak,1996), use the PCFG to parse new text, automatically annotate the resulting trees with our f-structure annotation algorithm and generate proto-f-structures. By contrast, in the integrated architecture we first automatically annotate the treebank trees with f-structure information and then extract an annotated PCFG (A-PCFG) from the treebank. We then use the A-PCFG to parse new text to generate proto-f-structures. Currently our best parsers achieve more than 81% f-score on the 2400 trees in section 23 of the Penn-II treebank and more than 60% f-score on gold-standard proto-f-structures for 105 randomly selected trees from section 23
Quasi-logical forms from f-structures for the Penn treebank
In this paper we show how the trees in the Penn treebank can
be associated automatically with simple quasi-logical forms. Our approach is based on combining two independent strands of work: the first is the observation that there is a close correspondence between quasi-logical forms and LFG f-structures [van Genabith and Crouch, 1996]; the second is the development of an automatic f-structure annotation algorithm for the Penn treebank [Cahill et al, 2002a; Cahill
et al, 2002b]. We compare our approach with that of [Liakata and Pulman, 2002]
Evaluating automatic LFG f-structure annotation for the Penn-II treebank
Lexical-Functional Grammar (LFG: Kaplan and Bresnan, 1982; Bresnan, 2001; Dalrymple, 2001) f-structures represent abstract syntactic information approximating to basic predicate-argument-modifier (dependency) structure or simple logical form (van Genabith and Crouch, 1996; Cahill et al., 2003a) . A number of methods have been developed (van Genabith et al., 1999a,b, 2001; Frank, 2000; Sadler et al., 2000; Frank et al., 2003) for automatically annotating treebank resources with LFG f-structure information. Until recently, however, most of this work on automatic f-structure annotation has been applied only to limited data sets, so while it may have shown lsquoproof of conceptrsquo, it has not yet demonstrated that the techniques developed scale up to much larger data sets. More recent work (Cahill et al., 2002a,b) has presented efforts in evolving and scaling techniques established in these previous papers to the full Penn-II Treebank (Marcus et al., 1994). In this paper, we present a number of quantitative and qualitative evaluation experiments which provide insights into the effectiveness of the techniques developed to automatically derive a set of f-structures for the more than 1,000,000 words and 49,000 sentences of Penn-II. Currently we obtain 94.85% Precision, 95.4% Recall and 95.09% F-Score for preds-only f-structures against a manually encoded gold standard
Parsing with PCFGs and Automatic F-Structure Annotation
The development of large coverage, rich unification- (constraint-) based grammar resources is very time consuming, expensive and requires lots of linguistic expertise. In this paper we report initial results on a new methodology that attempts to partially automate the development of substantial parts of large coverage, rich unification- (constraint-) based grammar resources. The method is based on a treebank resource (in our case Penn-II) and an automatic f-structure annotation algorithm that annotates treebank trees with proto-fstructure information. Based on these, we present two parsing architectures: in our pipeline architecture we first extract a PCFG from the treebank following the method of [Charniak, 1993; Charniak, 1996], use the PCFG to parse new text, automatically annotate the resulting trees with our f-structure annotation algorithm and generate proto-f-structures. By contrast, in the integrated architecture we first automatically annotate the treebank trees with fstructure information and then extract an annotated PCFG (A-PCFG) from the treebank. We then use the A-PCFG to parse new text to generate proto-fstructures. Currently our best parsers achieve more than 81 % f-score on the 2400 trees in section 23 of the Penn-II treebank and more than 60 % f-score on gold-standard proto-f-structures for 105 randomly selected trees from section 23.
Quasi-logical forms from f-structures for the Penn treebank
In this paper we show how the trees in the Penn treebank can
be associated automatically with simple quasi-logical forms. Our approach is based on combining two independent strands of work: the first is the observation that there is a close correspondence between quasi-logical forms and LFG f-structures [van Genabith and Crouch, 1996]; the second is the development of an automatic f-structure annotation algorithm for the Penn treebank [Cahill et al, 2002a; Cahill
et al, 2002b]. We compare our approach with that of [Liakata and Pulman, 2002]
âWhat effect do safety culture interventions have on health care workers in hospital settings?â A systematic review of the international literature [version 2; peer review: 2 approved, 1 approved with reservations]
Introduction: Interventions designed to improve safety culture in hospitals foster organisational environments that prevent patient safety events and support organisational and staff learning when events do occur. A safety culture supports the required health workforce behaviours and norms that enable safe patient care, and the well-being of patients and staff. The impact of safety culture interventions on staff perceptions of safety culture and patient outcomes has been established. To-date, however, there is no common understanding of what staff outcomes are associated with interventions to improve safety culture and what staff outcomes should be measured. Objectives: The study seeks to examine the effect of safety culture interventions on staff in hospital settings, globally. Methods and Analysis: A mixed methods systematic review will be conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Searches will be conducted using the electronic databases of MEDLINE, EMBASE, CINAHL, Health Business Elite, and Scopus. Returns will be screened in Covidence according to inclusion and exclusion criteria. The mixed-methods appraisal tool (MMAT) will be used as a quality assessment tool. The Cochrane Collaborationâs tool for assessing risk of bias in randomised trials and non-randomised studies of interventions will be employed to verify bias. Synthesis will follow the Joanna Briggs Institute methodological guidance for mixed methods reviews, which recommends a convergent approach to synthesis and integration. Discussion: This systematic review will contribute to the international evidence on how interventions to improve safety culture may support staff outcomes and how such interventions may be appropriately designed and implemented