72 research outputs found
Machine Learning of Generic and User-Focused Summarization
A key problem in text summarization is finding a salience function which
determines what information in the source should be included in the summary.
This paper describes the use of machine learning on a training corpus of
documents and their abstracts to discover salience functions which describe
what combination of features is optimal for a given summarization task. The
method addresses both "generic" and user-focused summaries.Comment: In Proceedings of the Fifteenth National Conference on AI (AAAI-98),
p. 821-82
Animation Motion in NarrativeML
This paper describes qualitative spatial representations relevant to cartoon motion incorporated into NarrativeML, an annotation scheme intended to capture some of the core aspects of narrative. These representations are motivated by linguistic distinctions drawn from cross-linguistic studies. Motion is modeled in terms of transitions in spatial configurations, using an expressive dynamic logic with the manner and path of motion being derived from a few basic primitives. The manner is elaborated to represent properties of motion that bear on character affect. Such representations can potentially be used to support cartoon narrative summarization and question-answering. The paper discusses annotation challenges, and the use of computer vision to help in annotation. Work is underway on annotating a cartoon corpus in terms of this scheme
Chronoscopes: A theory of underspecified temporal representations
Representation and reasoning about time and events is a fundamental aspect of our cognitive abilities and intrinsic to our construal of the structure of our personal and historical lives and recall of past experiences. This talk describes an abstract device called a Chronoscope, that allows a temporal representation (a set of events and their temporal relations) to be viewed based on temporal abstractions. The temporal representation is augmented with abstract events called episodes that stand for discourse segments. The temporal abstractions allow one to collapse temporal relations, or view the representation at different time granularities (hour, day, month, year, etc.), with corresponding changes in event characterization and temporal relations at those granularities. A temporal representation can also be filtered to specify temporal trajectories of particular participants. Trajectories, in turn, can be intersected at various levels of granularity. Chronoscopes can be used to compare temporal representations (e.g., for aggregation, summarization, or evaluation purposes), as well as help in the visualization of temporal narrative
The Creeping Virtuality of Place
Places are inherently dynamic. They also mediate between entities and events of significance to us, and space. They reflect a network of associations, involving landmarks deemed salient for various reasons. These are all properties assigned to a place by a speaker, and may or may not correspond to the properties assigned to a place by any other speaker. As a result, places have a subjective quality. These properties of dynamicity and subjectivity present interesting challenges when producing mashups that align different
data sources. I propose addressing this by assuming that entities, following Hornsby & Egenhofer (2000), have histories, namely sequences of time intervals when they are predicated to exist. Places are entities with spatial properties that include topological relationships to other places, represented in terms of RCC-8 or the 9-intersection calculus, as well as distance and orientation
relations. This spatio-temporal integration can avail of existing annotation schemes for space and time in natural language, but it leaves some open issues related to the representation of subjectivity
Machine Learning of User Profiles: Representational Issues
As more information becomes available electronically, tools for finding
information of interest to users becomes increasingly important. The goal of
the research described here is to build a system for generating comprehensible
user profiles that accurately capture user interest with minimum user
interaction. The research described here focuses on the importance of a
suitable generalization hierarchy and representation for learning profiles
which are predictively accurate and comprehensible. In our experiments we
evaluated both traditional features based on weighted term vectors as well as
subject features corresponding to categories which could be drawn from a
thesaurus. Our experiments, conducted in the context of a content-based
profiling system for on-line newspapers on the World Wide Web (the IDD News
Browser), demonstrate the importance of a generalization hierarchy and the
promise of combining natural language processing techniques with machine
learning (ML) to address an information retrieval (IR) problem.Comment: 6 page
Learning to match names across languages
We report on research on matching names in different scripts across languages. We explore two trainable approaches based on comparing pronunciations. The first, a cross-lingual approach, uses an automatic name-matching program that exploits rules based on phonological comparisons of the two languages carried out by humans. The second, monolingual approach, relies only on automatic comparison of the phonological representations of each pair. Alignments produced by each approach are fed to a machine learning algorithm. Results show that the monolingual approach results in machine-learning based comparison of person-names in English and Chinese at an accuracy of over 97.0 F-measure.
How to Evaluate your Question Answering System Every Day and Still Get Real Work Done
In this paper, we report on Qaviar, an experimental automated evaluation
system for question answering applications. The goal of our research was to
find an automatically calculated measure that correlates well with human
judges' assessment of answer correctness in the context of question answering
tasks. Qaviar judges the response by computing recall against the stemmed
content words in the human-generated answer key. It counts the answer correct
if it exceeds agiven recall threshold. We determined that the answer
correctness predicted by Qaviar agreed with the human 93% to 95% of the time.
41 question-answering systems were ranked by both Qaviar and human assessors,
and these rankings correlated with a Kendall's Tau measure of 0.920, compared
to a correlation of 0.956 between human assessors on the same data.Comment: 6 pages, 3 figures, to appear in Proceedings of the Second
International Conference on Language Resources and Evaluation (LREC 2000
Protein Name Tagging Guidelines: Lessons Learned
Interest in information extraction from the biomedical literature is motivated by the
need to speed up the creation of structured databases representing the latest scientific
knowledge about specific objects, such as proteins and genes. This paper addresses
the issue of a lack of standard definition of the problem of protein name tagging. We
describe the lessons learned in developing a set of guidelines and present the first set
of inter-coder results, viewed as an upper bound on system performance. Problems
coders face include: (a) the ambiguity of names that can refer to either genes or
proteins; (b) the difficulty of getting the exact extents of long protein names; and
(c) the complexity of the guidelines. These problems have been addressed in two ways:
(a) defining the tagging targets as protein named entities used in the literature to
describe proteins or protein-associated or -related objects, such as domains, pathways,
expression or genes, and (b) using two types of tags, protein tags and long-form tags,
with the latter being used to optionally extend the boundaries of the protein tag
when the name boundary is difficult to determine. Inter-coder consistency across
three annotators on protein tags on 300 MEDLINE abstracts is 0.868 F-measure.
The guidelines and annotated datasets, along with automatic tools, are available for
research use
- …