161 research outputs found
Evaluating two methods for Treebank grammar compaction
Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be âread offâ the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar.
In this paper, we explore ways by which a treebank grammar can be reduced in size or âcompactedâ, which involve the use of two kinds of technique: (i) thresholding of rules by their number of occurrences; and (ii) a method of rule-parsing, which has both probabilistic and non-probabilistic variants. Our results show that by a combined use of these two techniques, a probabilistic context-free grammar can be reduced in size by 62% without any loss in parsing performance, and by 71% to give a gain in recall, but some loss in precision
The Crimean Solar Maximum Year Workshop, selected reports
Problems associated with the transport of energy and acceleration of charged particles in solar flares are considered. Existing theories are compared with observation with a view to either discriminating between rival theories (such as whether hard X-rays are emitted by thermal or nonthermal bremsstrahlung), constraining existing theories (such as deduction of the number of nonthermal electrons present from spectroscopic diagnostics in the soft X-ray part of the spectrum), or suggesting theories (such as attempting to explain the observed spatial structure of microwave emission relative to alpha)
Ephemeral active regions and coronal bright points: A solar maximum Mission 2 guest investigator study
A dominate association of coronal bright points (as seen in He wavelength 10830) was confirmed with the approach and subsequent disappearance of opposite polarity magnetic network. While coronal bright points do occur with ephemeral regions, this association is a factor of 2 to 4 less than with sites of disappearing magnetic flux. The intensity variations seen in He I wavelength 10830 are intermittent and often rapid, varying over the 3 minute time resolution of the data; their bright point counterparts in the C IV wavelength 1548 and 20 cm wavelength show similar, though not always coincident time variations. Ejecta are associated with about 1/3 of the dark points and are evident in the C IV and H alpha data. These results support the idea that the anti-correlation of X-ray bright points with the solar cycle can be explained by the correlation of these coronal emission structures with sites of cancelling flux, indicating that, in some cases, the process of magnetic flux removal results in the release of energy. That the intensity variations are rapid and variable suggests that this process works intermittently
Joining up health and bioinformatics: e-science meets e-health
CLEF (Co-operative Clinical e-Science Framework) is an MRC sponsored project in the e-Science programme that aims to establish methodologies and a technical infrastructure forthe next generation of integrated clinical and bioscience research. It is developing methodsfor managing and using pseudonymised repositories of the long-term patient histories whichcan be linked to genetic, genomic information or used to support patient care. CLEF concentrateson removing key barriers to managing such repositories ? ethical issues, informationcapture, integration of disparate sources into coherent ?chronicles? of events, userorientedmechanisms for querying and displaying the information, and compiling the requiredknowledge resources. This paper describes the overall information flow and technicalapproach designed to meet these aims within a Grid framework
The SENSEI Overview of Newspaper Readersâ Comments
Automatic summarization of reader comments in on-line news
is a challenging but clearly useful task. Work to date has produced extractive
summaries using well-known techniques from other areas of NLP.
But do users really want these, and do they support users in realistic
tasks? We specify an alternative summary type for reader comments,
based on the notions of issues and viewpoints, and demonstrate our user
interface to present it. An evaluation to assess how well summarization
systems support users in time-limited tasks (identifying issues and characterizing
opinions) gives good results for this prototype
The SENSEI Annotated Corpus: Human Summaries of Reader Comment Conversations in On-line News
Researchers are beginning to explore how
to generate summaries of extended argumentative
conversations in social media,
such as those found in reader comments in
on-line news. To date, however, there has
been little discussion of what these summaries
should be like and a lack of humanauthored
exemplars, quite likely because
writing summaries of this kind of interchange
is so difficult. In this paper we
propose one type of reader comment summary
â the conversation overview summary
â that aims to capture the key argumentative
content of a reader comment
conversation. We describe a method we
have developed to support humans in authoring
conversation overview summaries
and present a publicly available corpus â
the first of its kind â of news articles plus
comment sets, each multiply annotated,
according to our method, with conversation
overview summaries
Automatic Label Generation for News Comment Clusters
We present a supervised approach to automat-
ically labelling topic clusters of reader com-
ments to online news. We use a feature set
that includes both features capturing proper-
ties local to the cluster and features that cap-
ture aspects from the news article and from
comments outside the cluster. We evaluate
the approach in an automatic and a manual,
task-based setting. Both evaluations show the
approach to outperform a baseline method,
which uses tf*idf to select comment-internal
terms for use as topic labels. We illustrate how
cluster labels can be used to generate cluster
summaries and present two alternative sum-
mary formats: a pie chart summary and an ab-
stractive summary
Automatic Label Generation for News Comment Clusters
We present a supervised approach to automat-
ically labelling topic clusters of reader com-
ments to online news. We use a feature set
that includes both features capturing proper-
ties local to the cluster and features that cap-
ture aspects from the news article and from
comments outside the cluster. We evaluate
the approach in an automatic and a manual,
task-based setting. Both evaluations show the
approach to outperform a baseline method,
which uses tf*idf to select comment-internal
terms for use as topic labels. We illustrate how
cluster labels can be used to generate cluster
summaries and present two alternative sum-
mary formats: a pie chart summary and an ab-
stractive summary
A Graph-Based Approach to Topic Clustering for Online Comments to News
This paper investigates graph-based approaches to labeled topic clustering of reader comments in online news. For graph-based clustering we propose a linear regression model of similarity between the graph nodes (comments) based on similarity features and weights trained using automatically derived training data. To label the clusters our graph-based approach makes use of DBPedia to abstract topics extracted from the clusters. We evaluate the clustering approach against gold standard data created by human annotators and compare its results against LDA â currently reported as the best method for the news comment clustering task. Evaluation of cluster labelling is set up as a retrieval task, where human annotators are asked to identify the best cluster given a cluster label. Our clustering approach significantly outperforms the LDA baseline and our evaluation of abstract cluster labels shows that graph-based approaches are a promising method of creating labeled clusters of news comments, although we still find cases where the automatically generated abstractive labels are insufficient to allow humans to correctly associate a label with its cluster
- âŠ