Search CORE

30 research outputs found

Impact Analysis of OCR Quality on Research Tasks in Digital Archives

Author: A Acerbi
A Bingham
B Nicholson
CD Brown
DJ Cohen
E Mittendorf
HI Xie
HI Xie
K Taghva
K Taghva
N Fuhr
S Tanner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Humanities scholars increasingly rely on digital archives for their research instead of time-consuming visits to physical archives. This shift in research method has the hidden cost of working with digitally processed historical documents: how much trust can a scholar place in noisy representations of source texts? In a series of interviews with historians about their use of digital archives, we found that scholars are aware that optical character recognition (OCR) errors may bias their results. They were, however, unable to quantify this bias or to indicate what information they would need to estimate it. This, however, would be important to assess whether the results are publishable. Based on the interviews and a literature study, we provide a classification of scholarly research tasks that gives account of their susceptibility to specific OCR-induced biases and the data required for uncertainty estimations. We conducted a use case study on a national newspaper archive with example research tasks. From this we learned what data is typically available in digital archives and how it could be used to reduce and/or assess the uncertainty in result sets. We conclude that the current knowledge situation on the users’ side as well as on the tool makers’ and data providers’ side is insufficient and needs to be improved

Crossref

VU Research Portal

CWI's Institutional Repository

Autotag: A tool for creating structured document collections from printed materials

Author: D. S. Connelly
G. Salton
I. A. Macleod
I. A. Macleod
K. L. Kwok
K. Taghva
K. Taghva
K. Taghva
K. Taghva
M. A. Hearst
M. Fuller
S. C. Deerwester
U. Hahn
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Evaluation of model-based retrieval effectiveness with OCR text

Author: Allen Condit
CALLAN J P
CROFT W. B.
CROFT W. B.
DEERWESTER S. C.
HARMAN D.
Julie Borsack
Kazem Taghva
MPEDOVO S.
NAGY G.
NARTKSR T. A.
RAU L. F.
ROCCHIO J. J.
SALTON
TAGHVA K.
TAGHVA K.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Adaptive text correction with Web-crawled domain-dependent dictionaries

Author: Angell R. C.
Chelba C.
Christoph Ringlstetter
Clarkson P.
Cucerzan S.
Gaizauskas R.
Grefenstette G.
Hoch R.
Klaus U. Schulz
Oh A. H.
Ostendorf M.
Schmid H.
Stoyan Mihov
Strohmaier C.
Taghva K.
Taghva K.
Weigel A.
Williams H.
Witten I. H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Recommended from our members

A22 STRATEGIES TO ADDRESS QUALITY IMPROVEMENT: Impact Of Spontaneous Breathing Trials In A Hybrid, Academic And Private Acute Care Hospital

Author: Ashley D
Campo R
Carrington S
Falise J
Ferreira T
Hombreiro P
Hunt I
Jacobskind R
Kwasnik A
Latibeaudiere R
Lupe L
Mendes E S
Miller S A
Rico R
Sarmento B
Scialla T
Taghva S
Publication venue: 'American Thoracic Society'
Publication date: 01/01/2015
Field of study

University of Miami: Scholarship Miami

Ring-chain transformation of 4-aroyl-5-phenylamino-2,3-dihydrothiophene-2,3-diones: Facile and efficient synthesis of novel pyrrolo[2,3- c

Author: Bondock S
Fatahala SS
Haider N
Hassan Kabirifard
Kabirifard H
Kolar P
Minetto G
Moloudi M
Onal A
Pardis Hafez Taghva
Yoshida N
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

An automatic mark-up approach for structured document retrieval in engineering design

Author: A Lowe
C Friedman
C. A. McMahon
CA McMahon
K Taghva
M Gardoni
M. J. Darlington
P. J. Wild
R Feldman
S Akhtar
S Liu
S Liu
S Liu
S Liu
S. J. Culley
S. Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref