1 research outputs found
Historical Document Processing: Historical Document Processing: A Survey of Techniques, Tools, and Trends
Historical Document Processing is the process of digitizing written material
from the past for future use by historians and other scholars. It incorporates
algorithms and software tools from various subfields of computer science,
including computer vision, document analysis and recognition, natural language
processing, and machine learning, to convert images of ancient manuscripts,
letters, diaries, and early printed texts automatically into a digital format
usable in data mining and information retrieval systems. Within the past twenty
years, as libraries, museums, and other cultural heritage institutions have
scanned an increasing volume of their historical document archives, the need to
transcribe the full text from these collections has become acute. Since
Historical Document Processing encompasses multiple sub-domains of computer
science, knowledge relevant to its purpose is scattered across numerous
journals and conference proceedings. This paper surveys the major phases of,
standard algorithms, tools, and datasets in the field of Historical Document
Processing, discusses the results of a literature review, and finally suggests
directions for further research.Comment: 30 pages, 6 figure