A Framework for Text Processing and Supporting Access to Collections of Digitized Historical Newspapers

Allen, Robert B; Copeland, Andrea J.; Achananuparp, Palakorn; Lee, Ki Jung

research

oai:scholarworks.iupui.edu:1805/4552

A Framework for Text Processing and Supporting Access to Collections of Digitized Historical Newspapers

Authors: Robert B Allen
Andrea J. Copeland
Palakorn Achananuparp
Ki Jung Lee
Publication date: 1 January 2007
Publisher

Abstract

Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that facilitate automatic indexing of them. For this processing, we employ lexical semantics, structural models, and community content. Furthermore, we describe visualization and summarization techniques that can be used to present the extracted events

Similar works

Full text

Open in the Core reader

Download PDF

IUPUIScholarWorks

oai:scholarworks.iupui.edu:180...

Last time updated on 08/11/2016

This paper was published in IUPUIScholarWorks.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.