Location of Repository

A Framework for Text Processing and Supporting Access to Collections of Digitized Historical Newspapers

By Robert B Allen, Andrea J. Copeland, Palakorn Achananuparp and Ki Jung Lee

Abstract

Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that facilitate automatic indexing of them. For this processing, we employ lexical semantics, structural models, and community content. Furthermore, we describe visualization and summarization techniques that can be used to present the extracted events

Topics: text processing, historical newspapers, digitization
Year: 2007
OAI identifier: oai:scholarworks.iupui.edu:1805/4552
Provided by: IUPUIScholarWorks

Suggested articles

Preview


To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.