12,587 research outputs found
Dating Texts without Explicit Temporal Cues
This paper tackles temporal resolution of documents, such as determining when
a document is about or when it was written, based only on its text. We apply
techniques from information retrieval that predict dates via language models
over a discretized timeline. Unlike most previous works, we rely {\it solely}
on temporal cues implicit in the text. We consider both document-likelihood and
divergence based techniques and several smoothing methods for both of them. Our
best model predicts the mid-point of individuals' lives with a median of 22 and
mean error of 36 years for Wikipedia biographies from 3800 B.C. to the present
day. We also show that this approach works well when training on such
biographies and predicting dates both for non-biographical Wikipedia pages
about specific years (500 B.C. to 2010 A.D.) and for publication dates of short
stories (1798 to 2008). Together, our work shows that, even in absence of
temporal extraction resources, it is possible to achieve remarkable temporal
locality across a diverse set of texts
Recommended from our members
Lexical patterns, features and knowledge resources for coreference resolution in clinical notes
Generation of entity coreference chains provides a means to extract linked narrative events from clinical notes, but despite being a well-researched topic in natural language processing, general- purpose coreference tools perform poorly on clinical texts. This paper presents a knowledge-centric and pattern-based approach to resolving coreference across a wide variety of clinical records comprising discharge summaries, progress notes, pathology, radiology and surgical reports from two corpora (Ontology Development and Information Extraction (ODIE) and i2b2/VA). In addition, a method for generating coreference chains using progressively pruned linked lists is demonstrated that reduces the search space and facilitates evaluation by a number of metrics. Independent evaluation results show an F-measure for each corpus of 79.2% and 87.5%, respectively, which offers performance at least as good as human annotators, greatly increased performance over general- purpose tools, and improvement on previously reported clinical coreference systems. The system uses a number of open-source components that are available to download
- …