757 research outputs found
Evaluating two methods for Treebank grammar compaction
Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be âread offâ the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar.
In this paper, we explore ways by which a treebank grammar can be reduced in size or âcompactedâ, which involve the use of two kinds of technique: (i) thresholding of rules by their number of occurrences; and (ii) a method of rule-parsing, which has both probabilistic and non-probabilistic variants. Our results show that by a combined use of these two techniques, a probabilistic context-free grammar can be reduced in size by 62% without any loss in parsing performance, and by 71% to give a gain in recall, but some loss in precision
University of Sheffield TREC-8 Q & A System
The system entered by the University of Sheffield in the question answering track of TREC-8 is the result of coupling two existing technologies - information retrieval (IR) and information extraction (IE). In essence the approach is this: the IR system treats the question as a query and returns a set of top ranked documents or passages; the IE system uses NLP techniques to parse the question, analyse the top ranked documents or passages returned by the IR system, and instantiate a query variable in the semantic representation of the question against the semantic representation of the analysed documents or passages. Thus, while the IE system by no means attempts âfull text understanding", this approach is a relatively deep approach which attempts to work with meaning representations.
Since the information retrieval systems we used were not our own (AT&T and UMass) and were used more or less âoff the shelf", this paper concentrates on describing the modifications made to our existing information extraction system to allow it to participate in the Q & A task
Joining up health and bioinformatics: e-science meets e-health
CLEF (Co-operative Clinical e-Science Framework) is an MRC sponsored project in the e-Science programme that aims to establish methodologies and a technical infrastructure forthe next generation of integrated clinical and bioscience research. It is developing methodsfor managing and using pseudonymised repositories of the long-term patient histories whichcan be linked to genetic, genomic information or used to support patient care. CLEF concentrateson removing key barriers to managing such repositories ? ethical issues, informationcapture, integration of disparate sources into coherent ?chronicles? of events, userorientedmechanisms for querying and displaying the information, and compiling the requiredknowledge resources. This paper describes the overall information flow and technicalapproach designed to meet these aims within a Grid framework
The SENSEI Annotated Corpus: Human Summaries of Reader Comment Conversations in On-line News
Researchers are beginning to explore how
to generate summaries of extended argumentative
conversations in social media,
such as those found in reader comments in
on-line news. To date, however, there has
been little discussion of what these summaries
should be like and a lack of humanauthored
exemplars, quite likely because
writing summaries of this kind of interchange
is so difficult. In this paper we
propose one type of reader comment summary
â the conversation overview summary
â that aims to capture the key argumentative
content of a reader comment
conversation. We describe a method we
have developed to support humans in authoring
conversation overview summaries
and present a publicly available corpus â
the first of its kind â of news articles plus
comment sets, each multiply annotated,
according to our method, with conversation
overview summaries
The SENSEI Overview of Newspaper Readersâ Comments
Automatic summarization of reader comments in on-line news
is a challenging but clearly useful task. Work to date has produced extractive
summaries using well-known techniques from other areas of NLP.
But do users really want these, and do they support users in realistic
tasks? We specify an alternative summary type for reader comments,
based on the notions of issues and viewpoints, and demonstrate our user
interface to present it. An evaluation to assess how well summarization
systems support users in time-limited tasks (identifying issues and characterizing
opinions) gives good results for this prototype
Automatic Label Generation for News Comment Clusters
We present a supervised approach to automat-
ically labelling topic clusters of reader com-
ments to online news. We use a feature set
that includes both features capturing proper-
ties local to the cluster and features that cap-
ture aspects from the news article and from
comments outside the cluster. We evaluate
the approach in an automatic and a manual,
task-based setting. Both evaluations show the
approach to outperform a baseline method,
which uses tf*idf to select comment-internal
terms for use as topic labels. We illustrate how
cluster labels can be used to generate cluster
summaries and present two alternative sum-
mary formats: a pie chart summary and an ab-
stractive summary
Automatic Label Generation for News Comment Clusters
We present a supervised approach to automat-
ically labelling topic clusters of reader com-
ments to online news. We use a feature set
that includes both features capturing proper-
ties local to the cluster and features that cap-
ture aspects from the news article and from
comments outside the cluster. We evaluate
the approach in an automatic and a manual,
task-based setting. Both evaluations show the
approach to outperform a baseline method,
which uses tf*idf to select comment-internal
terms for use as topic labels. We illustrate how
cluster labels can be used to generate cluster
summaries and present two alternative sum-
mary formats: a pie chart summary and an ab-
stractive summary
CO2 storage monitoring: leakage detection and measurement in subsurface volumes from 3D seismic data at Sleipner
Demonstrating secure containment is a key plank of CO2 storage monitoring. Here we use the time-lapse 3D seismic surveys at the Sleipner CO2 storage site to assess their ability to provide robust and uniform three-dimensional spatial surveillance of the Storage Complex and provide a quantitative leakage detection tool. We develop a spatial-spectral methodology to determine the actual detection limits of the datasets which takes into account both the reflectivity of a thin CO2 layer and also its lateral extent. Using a tuning relationship to convert reflectivity to layer thickness, preliminary analysis indicates that, at the top of the Utsira reservoir, CO2 accumulations with pore volumes greater than about 3000 m3 should be robustly detectable for layer thicknesses greater than one metre, which will generally be the case. Making the conservative assumption of full CO2 saturation, this pore volume corresponds to a CO2 mass detection threshold of around 2100 tonnes. Within the overburden, at shallower depths, CO2 becomes progressively more reflective, less dense, and correspondingly more detectable, as it passes from the dense phase into a gaseous state. Our preliminary analysis indicates that the detection threshold falls to around 950 tonnes of CO2 at 590 m depth, and to around 315 tonnes at 490 m depth, where repeatability noise levels are particularly low. Detection capability can be equated to the maximum allowable leakage rate consistent with a storage site meeting its greenhouse gas emissions mitigation objective. A number of studies have suggested that leakage rates around 0.01% per year or less would ensure effective mitigation performance. So for a hypothetical large-scale storage project, the detection capability of the Sleipner seismics would far exceed that required to demonstrate the effective mitigation leakage limit. More generally it is likely that well-designed 3D seismic monitoring systems will have robust 3D detection capability significantly superior to what is required to prove greenhouse gas mitigation efficacy
A Graph-Based Approach to Topic Clustering for Online Comments to News
This paper investigates graph-based approaches to labeled topic clustering of reader comments in online news. For graph-based clustering we propose a linear regression model of similarity between the graph nodes (comments) based on similarity features and weights trained using automatically derived training data. To label the clusters our graph-based approach makes use of DBPedia to abstract topics extracted from the clusters. We evaluate the clustering approach against gold standard data created by human annotators and compare its results against LDA â currently reported as the best method for the news comment clustering task. Evaluation of cluster labelling is set up as a retrieval task, where human annotators are asked to identify the best cluster given a cluster label. Our clustering approach significantly outperforms the LDA baseline and our evaluation of abstract cluster labels shows that graph-based approaches are a promising method of creating labeled clusters of news comments, although we still find cases where the automatically generated abstractive labels are insufficient to allow humans to correctly associate a label with its cluster
- âŠ