849 research outputs found

    Evaluating two methods for Treebank grammar compaction

    Get PDF
    Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be ‘read off’ the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar. In this paper, we explore ways by which a treebank grammar can be reduced in size or ‘compacted’, which involve the use of two kinds of technique: (i) thresholding of rules by their number of occurrences; and (ii) a method of rule-parsing, which has both probabilistic and non-probabilistic variants. Our results show that by a combined use of these two techniques, a probabilistic context-free grammar can be reduced in size by 62% without any loss in parsing performance, and by 71% to give a gain in recall, but some loss in precision

    Added value of bleach sedimentation microscopy for diagnosis of tuberculosis: a cost-effectiveness study.

    Get PDF
    SETTING: Bleach sedimentation is a method used to increase the diagnostic yield of sputum microscopy for countries with a high prevalence of human immunodeficiency virus (HIV) infection and limited resources. OBJECTIVES: To compare the relative cost-effectiveness of different microscopy approaches in diagnosing tuberculosis (TB) in Kenya. METHODS: An analytical decision tree model including cost and effectiveness measures of 10 combinations of direct (D) and overnight bleach (B) sedimentation microscopy was constructed. Data were drawn from the evaluation of the bleach sedimentation method on two specimens (first on the spot [1] and second morning [2]) from 644 TB suspects in a peripheral health clinic. Incremental cost per smear-positive detected case was measured. Costs included human resources and materials using a micro-costing evaluation. RESULTS: All bleach-based microscopy approaches detected significantly more cases (between 23.3% for B1 and 25.9% for B1+B2) than the conventional D1+D2 approach (21.0%). Cost per tested case ranged between respectively euro 2.7 and euro 4.5 for B1 and B1+D2+B2. B1 and B1+B2 were the most cost-effective approaches. D1+B2 and D1+B1 were good alternatives to avoid using approaches exclusively based on bleach sedimentation microscopy. CONCLUSIONS: Among several effective microscopy approaches used, including sodium hypochlorite sedimentation, only some resulted in a limited increase in the laboratory workload and would be most suitable for programmatic implementation

    Joining up health and bioinformatics: e-science meets e-health

    Get PDF
    CLEF (Co-operative Clinical e-Science Framework) is an MRC sponsored project in the e-Science programme that aims to establish methodologies and a technical infrastructure forthe next generation of integrated clinical and bioscience research. It is developing methodsfor managing and using pseudonymised repositories of the long-term patient histories whichcan be linked to genetic, genomic information or used to support patient care. CLEF concentrateson removing key barriers to managing such repositories ? ethical issues, informationcapture, integration of disparate sources into coherent ?chronicles? of events, userorientedmechanisms for querying and displaying the information, and compiling the requiredknowledge resources. This paper describes the overall information flow and technicalapproach designed to meet these aims within a Grid framework

    The SENSEI Overview of Newspaper Readers’ Comments

    Get PDF
    Automatic summarization of reader comments in on-line news is a challenging but clearly useful task. Work to date has produced extractive summaries using well-known techniques from other areas of NLP. But do users really want these, and do they support users in realistic tasks? We specify an alternative summary type for reader comments, based on the notions of issues and viewpoints, and demonstrate our user interface to present it. An evaluation to assess how well summarization systems support users in time-limited tasks (identifying issues and characterizing opinions) gives good results for this prototype

    Simultaneous identification of GSTP1 Ile105→Val105 and Ala114→Val114 substitutions using an amplification refractory mutation systempolymerase chain reactionassay: studies in patients with asthma

    Get PDF
    BACKGROUND: The glutathione S-transferase (GST) enzyme GSTP1 utilizes byproducts of oxidative stress. We previously showed that alleles of GSTP1 that encode the Ile105→Val105 substitution are associated with the asthma phenotypes of atopy and bronchial hyperresponsiveness (BHR). However, a further polymorphic site (Ala114→Val114) has been identified that results in the following alleles: GSTP1(*)A (wild-type Ile105→Ala114), GSTP1(*)B (Val105→Ala114), GSTP1(*)C (Val105→Val114) and GSTP1(*)D (Ile105→Val114). METHODS: Because full identification of GSTP1 alleles may identify stronger links with asthma phenotypes, we describe an amplification refractory mutation system (ARMS) assay that allows identification of all genotypes. We explored whether the GSTP1 substitutions influence susceptibility to asthma, atopy and BHR. RESULTS: Among 191 atopic nonasthmatic, atopic asthmatic and nonatopic nonasthmatic individuals, none had the BD, CD, or DD genotypes. GSTP1 BC was significantly associated with reduced risk for atopy (P = 0.031). Compared with AA, trend test analysis identified a significant decrease in the frequency of GSTP1 BC with increasing severity of BHR (P = 0.031). Similarly, the frequency of GSTP1 AA increased with increasing BHR. CONCLUSION: These data suggest that GSTP1(*)B and possibly GSTP1(*)C are protective against asthma and related phenotypes

    Automatic Label Generation for News Comment Clusters

    Get PDF
    We present a supervised approach to automat- ically labelling topic clusters of reader com- ments to online news. We use a feature set that includes both features capturing proper- ties local to the cluster and features that cap- ture aspects from the news article and from comments outside the cluster. We evaluate the approach in an automatic and a manual, task-based setting. Both evaluations show the approach to outperform a baseline method, which uses tf*idf to select comment-internal terms for use as topic labels. We illustrate how cluster labels can be used to generate cluster summaries and present two alternative sum- mary formats: a pie chart summary and an ab- stractive summary

    Automatic Label Generation for News Comment Clusters

    Get PDF
    We present a supervised approach to automat- ically labelling topic clusters of reader com- ments to online news. We use a feature set that includes both features capturing proper- ties local to the cluster and features that cap- ture aspects from the news article and from comments outside the cluster. We evaluate the approach in an automatic and a manual, task-based setting. Both evaluations show the approach to outperform a baseline method, which uses tf*idf to select comment-internal terms for use as topic labels. We illustrate how cluster labels can be used to generate cluster summaries and present two alternative sum- mary formats: a pie chart summary and an ab- stractive summary

    The SENSEI Annotated Corpus: Human Summaries of Reader Comment Conversations in On-line News

    Get PDF
    Researchers are beginning to explore how to generate summaries of extended argumentative conversations in social media, such as those found in reader comments in on-line news. To date, however, there has been little discussion of what these summaries should be like and a lack of humanauthored exemplars, quite likely because writing summaries of this kind of interchange is so difficult. In this paper we propose one type of reader comment summary – the conversation overview summary – that aims to capture the key argumentative content of a reader comment conversation. We describe a method we have developed to support humans in authoring conversation overview summaries and present a publicly available corpus – the first of its kind – of news articles plus comment sets, each multiply annotated, according to our method, with conversation overview summaries

    What's the issue here?: Task-based evaluation of reader comment summarization systems

    Get PDF
    Automatic summarization of reader comments in on-line news is an extremely challenging task and a capability for which there is a clear need. Work to date has focussed on producing extractive summaries using well-known techniques imported from other areas of language processing. But are extractive summaries of comments what users really want? Do they support users in performing the sorts of tasks they are likely to want to perform with reader comments? In this paper we address these questions by doing three things. First, we offer a specification of one possible summary type for reader comment, based on an analysis of reader comment in terms of issues and viewpoints. Second, we define a task-based evaluation framework for reader comment summarization that allows summarization systems to be assessed in terms of how well they support users in a time-limited task of identifying issues and characterising opinion on issues in comments. Third, we describe a pilot evaluation in which we used the task-based evaluation framework to evaluate a prototype reader comment clustering and summarization system, demonstrating the viability of the evaluation framework and illustrating the sorts of insight such an evaluation affords

    CO2 storage monitoring: leakage detection and measurement in subsurface volumes from 3D seismic data at Sleipner

    Get PDF
    Demonstrating secure containment is a key plank of CO2 storage monitoring. Here we use the time-lapse 3D seismic surveys at the Sleipner CO2 storage site to assess their ability to provide robust and uniform three-dimensional spatial surveillance of the Storage Complex and provide a quantitative leakage detection tool. We develop a spatial-spectral methodology to determine the actual detection limits of the datasets which takes into account both the reflectivity of a thin CO2 layer and also its lateral extent. Using a tuning relationship to convert reflectivity to layer thickness, preliminary analysis indicates that, at the top of the Utsira reservoir, CO2 accumulations with pore volumes greater than about 3000 m3 should be robustly detectable for layer thicknesses greater than one metre, which will generally be the case. Making the conservative assumption of full CO2 saturation, this pore volume corresponds to a CO2 mass detection threshold of around 2100 tonnes. Within the overburden, at shallower depths, CO2 becomes progressively more reflective, less dense, and correspondingly more detectable, as it passes from the dense phase into a gaseous state. Our preliminary analysis indicates that the detection threshold falls to around 950 tonnes of CO2 at 590 m depth, and to around 315 tonnes at 490 m depth, where repeatability noise levels are particularly low. Detection capability can be equated to the maximum allowable leakage rate consistent with a storage site meeting its greenhouse gas emissions mitigation objective. A number of studies have suggested that leakage rates around 0.01% per year or less would ensure effective mitigation performance. So for a hypothetical large-scale storage project, the detection capability of the Sleipner seismics would far exceed that required to demonstrate the effective mitigation leakage limit. More generally it is likely that well-designed 3D seismic monitoring systems will have robust 3D detection capability significantly superior to what is required to prove greenhouse gas mitigation efficacy