757 research outputs found

    Evaluating two methods for Treebank grammar compaction

    Get PDF
    Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be ‘read off’ the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar. In this paper, we explore ways by which a treebank grammar can be reduced in size or ‘compacted’, which involve the use of two kinds of technique: (i) thresholding of rules by their number of occurrences; and (ii) a method of rule-parsing, which has both probabilistic and non-probabilistic variants. Our results show that by a combined use of these two techniques, a probabilistic context-free grammar can be reduced in size by 62% without any loss in parsing performance, and by 71% to give a gain in recall, but some loss in precision

    University of Sheffield TREC-8 Q & A System

    Get PDF
    The system entered by the University of Sheffield in the question answering track of TREC-8 is the result of coupling two existing technologies - information retrieval (IR) and information extraction (IE). In essence the approach is this: the IR system treats the question as a query and returns a set of top ranked documents or passages; the IE system uses NLP techniques to parse the question, analyse the top ranked documents or passages returned by the IR system, and instantiate a query variable in the semantic representation of the question against the semantic representation of the analysed documents or passages. Thus, while the IE system by no means attempts “full text understanding", this approach is a relatively deep approach which attempts to work with meaning representations. Since the information retrieval systems we used were not our own (AT&T and UMass) and were used more or less “off the shelf", this paper concentrates on describing the modifications made to our existing information extraction system to allow it to participate in the Q & A task

    Joining up health and bioinformatics: e-science meets e-health

    Get PDF
    CLEF (Co-operative Clinical e-Science Framework) is an MRC sponsored project in the e-Science programme that aims to establish methodologies and a technical infrastructure forthe next generation of integrated clinical and bioscience research. It is developing methodsfor managing and using pseudonymised repositories of the long-term patient histories whichcan be linked to genetic, genomic information or used to support patient care. CLEF concentrateson removing key barriers to managing such repositories ? ethical issues, informationcapture, integration of disparate sources into coherent ?chronicles? of events, userorientedmechanisms for querying and displaying the information, and compiling the requiredknowledge resources. This paper describes the overall information flow and technicalapproach designed to meet these aims within a Grid framework

    The SENSEI Annotated Corpus: Human Summaries of Reader Comment Conversations in On-line News

    Get PDF
    Researchers are beginning to explore how to generate summaries of extended argumentative conversations in social media, such as those found in reader comments in on-line news. To date, however, there has been little discussion of what these summaries should be like and a lack of humanauthored exemplars, quite likely because writing summaries of this kind of interchange is so difficult. In this paper we propose one type of reader comment summary – the conversation overview summary – that aims to capture the key argumentative content of a reader comment conversation. We describe a method we have developed to support humans in authoring conversation overview summaries and present a publicly available corpus – the first of its kind – of news articles plus comment sets, each multiply annotated, according to our method, with conversation overview summaries

    The SENSEI Overview of Newspaper Readers’ Comments

    Get PDF
    Automatic summarization of reader comments in on-line news is a challenging but clearly useful task. Work to date has produced extractive summaries using well-known techniques from other areas of NLP. But do users really want these, and do they support users in realistic tasks? We specify an alternative summary type for reader comments, based on the notions of issues and viewpoints, and demonstrate our user interface to present it. An evaluation to assess how well summarization systems support users in time-limited tasks (identifying issues and characterizing opinions) gives good results for this prototype

    Automatic Label Generation for News Comment Clusters

    Get PDF
    We present a supervised approach to automat- ically labelling topic clusters of reader com- ments to online news. We use a feature set that includes both features capturing proper- ties local to the cluster and features that cap- ture aspects from the news article and from comments outside the cluster. We evaluate the approach in an automatic and a manual, task-based setting. Both evaluations show the approach to outperform a baseline method, which uses tf*idf to select comment-internal terms for use as topic labels. We illustrate how cluster labels can be used to generate cluster summaries and present two alternative sum- mary formats: a pie chart summary and an ab- stractive summary

    Automatic Label Generation for News Comment Clusters

    Get PDF
    We present a supervised approach to automat- ically labelling topic clusters of reader com- ments to online news. We use a feature set that includes both features capturing proper- ties local to the cluster and features that cap- ture aspects from the news article and from comments outside the cluster. We evaluate the approach in an automatic and a manual, task-based setting. Both evaluations show the approach to outperform a baseline method, which uses tf*idf to select comment-internal terms for use as topic labels. We illustrate how cluster labels can be used to generate cluster summaries and present two alternative sum- mary formats: a pie chart summary and an ab- stractive summary

    CO2 storage monitoring: leakage detection and measurement in subsurface volumes from 3D seismic data at Sleipner

    Get PDF
    Demonstrating secure containment is a key plank of CO2 storage monitoring. Here we use the time-lapse 3D seismic surveys at the Sleipner CO2 storage site to assess their ability to provide robust and uniform three-dimensional spatial surveillance of the Storage Complex and provide a quantitative leakage detection tool. We develop a spatial-spectral methodology to determine the actual detection limits of the datasets which takes into account both the reflectivity of a thin CO2 layer and also its lateral extent. Using a tuning relationship to convert reflectivity to layer thickness, preliminary analysis indicates that, at the top of the Utsira reservoir, CO2 accumulations with pore volumes greater than about 3000 m3 should be robustly detectable for layer thicknesses greater than one metre, which will generally be the case. Making the conservative assumption of full CO2 saturation, this pore volume corresponds to a CO2 mass detection threshold of around 2100 tonnes. Within the overburden, at shallower depths, CO2 becomes progressively more reflective, less dense, and correspondingly more detectable, as it passes from the dense phase into a gaseous state. Our preliminary analysis indicates that the detection threshold falls to around 950 tonnes of CO2 at 590 m depth, and to around 315 tonnes at 490 m depth, where repeatability noise levels are particularly low. Detection capability can be equated to the maximum allowable leakage rate consistent with a storage site meeting its greenhouse gas emissions mitigation objective. A number of studies have suggested that leakage rates around 0.01% per year or less would ensure effective mitigation performance. So for a hypothetical large-scale storage project, the detection capability of the Sleipner seismics would far exceed that required to demonstrate the effective mitigation leakage limit. More generally it is likely that well-designed 3D seismic monitoring systems will have robust 3D detection capability significantly superior to what is required to prove greenhouse gas mitigation efficacy

    A Graph-Based Approach to Topic Clustering for Online Comments to News

    Get PDF
    This paper investigates graph-based approaches to labeled topic clustering of reader comments in online news. For graph-based clustering we propose a linear regression model of similarity between the graph nodes (comments) based on similarity features and weights trained using automatically derived training data. To label the clusters our graph-based approach makes use of DBPedia to abstract topics extracted from the clusters. We evaluate the clustering approach against gold standard data created by human annotators and compare its results against LDA – currently reported as the best method for the news comment clustering task. Evaluation of cluster labelling is set up as a retrieval task, where human annotators are asked to identify the best cluster given a cluster label. Our clustering approach significantly outperforms the LDA baseline and our evaluation of abstract cluster labels shows that graph-based approaches are a promising method of creating labeled clusters of news comments, although we still find cases where the automatically generated abstractive labels are insufficient to allow humans to correctly associate a label with its cluster
    • 

    corecore