57,510 research outputs found
Text Line Segmentation of Historical Documents: a Survey
There is a huge amount of historical documents in libraries and in various
National Archives that have not been exploited electronically. Although
automatic reading of complete pages remains, in most cases, a long-term
objective, tasks such as word spotting, text/image alignment, authentication
and extraction of specific fields are in use today. For all these tasks, a
major step is document segmentation into text lines. Because of the low quality
and the complexity of these documents (background noise, artifacts due to
aging, interfering lines),automatic text line segmentation remains an open
research field. The objective of this paper is to present a survey of existing
methods, developed during the last decade, and dedicated to documents of
historical interest.Comment: 25 pages, submitted version, To appear in International Journal on
Document Analysis and Recognition, On line version available at
http://www.springerlink.com/content/k2813176280456k3
Competitive-Ratio Approximation Schemes for Minimizing the Makespan in the Online-List Model
We consider online scheduling on multiple machines for jobs arriving
one-by-one with the objective of minimizing the makespan. For any number of
identical parallel or uniformly related machines, we provide a
competitive-ratio approximation scheme that computes an online algorithm whose
competitive ratio is arbitrarily close to the best possible competitive ratio.
We also determine this value up to any desired accuracy. This is the first
application of competitive-ratio approximation schemes in the online-list
model. The result proves the applicability of the concept in different online
models. We expect that it fosters further research on other online problems
Macroscale multimodal imaging reveals ancient painting production technology and the vogue in Greco-Roman Egypt.
Macroscale multimodal chemical imaging combining hyperspectral diffuse reflectance (400-2500 nm), luminescence (400-1000 nm), and X-ray fluorescence (XRF, 2 to 25 keV) data, is uniquely equipped for noninvasive characterization of heterogeneous complex systems such as paintings. Here we present the first application of multimodal chemical imaging to analyze the production technology of an 1,800-year-old painting and one of the oldest surviving encaustic ("burned in") paintings in the world. Co-registration of the data cubes from these three hyperspectral imaging modalities enabled the comparison of reflectance, luminescence, and XRF spectra at each pixel in the image for the entire painting. By comparing the molecular and elemental spectral signatures at each pixel, this fusion of the data allowed for a more thorough identification and mapping of the painting's constituent organic and inorganic materials, revealing key information on the selection of raw materials, production sequence and the fashion aesthetics and chemical arts practiced in Egypt in the second century AD
Automatic Translating Between Ancient Chinese and Contemporary Chinese with Limited Aligned Corpora
The Chinese language has evolved a lot during the long-term development.
Therefore, native speakers now have trouble in reading sentences written in
ancient Chinese. In this paper, we propose to build an end-to-end neural model
to automatically translate between ancient and contemporary Chinese. However,
the existing ancient-contemporary Chinese parallel corpora are not aligned at
the sentence level and sentence-aligned corpora are limited, which makes it
difficult to train the model. To build the sentence level parallel training
data for the model, we propose an unsupervised algorithm that constructs
sentence-aligned ancient-contemporary pairs by using the fact that the aligned
sentence pair shares many of the tokens. Based on the aligned corpus, we
propose an end-to-end neural model with copying mechanism and local attention
to translate between ancient and contemporary Chinese. Experiments show that
the proposed unsupervised algorithm achieves 99.4% F1 score for sentence
alignment, and the translation model achieves 26.95 BLEU from ancient to
contemporary, and 36.34 BLEU from contemporary to ancient.Comment: Acceptted by NLPCC 201
Image and interpretation using artificial intelligence to read ancient Roman texts
The ink and stylus tablets discovered at the Roman Fort of Vindolanda are a unique resource for scholars of ancient history. However, the stylus tablets have proved particularly difficult to read. This paper describes a system that assists expert papyrologists in the interpretation of the Vindolanda writing tablets. A model-based approach is taken that relies on models of the written form of characters, and statistical modelling of language, to produce plausible interpretations of the documents. Fusion of the contributions from the language, character, and image feature models is achieved by utilizing the GRAVA agent architecture that uses Minimum Description Length as the basis for information fusion across semantic levels. A system is developed that reads in image data and outputs plausible interpretations of the Vindolanda tablets
How Ordinary Elimination Became Gaussian Elimination
Newton, in notes that he would rather not have seen published, described a
process for solving simultaneous equations that later authors applied
specifically to linear equations. This method that Euler did not recommend,
that Legendre called "ordinary," and that Gauss called "common" - is now named
after Gauss: "Gaussian" elimination. Gauss's name became associated with
elimination through the adoption, by professional computers, of a specialized
notation that Gauss devised for his own least squares calculations. The
notation allowed elimination to be viewed as a sequence of arithmetic
operations that were repeatedly optimized for hand computing and eventually
were described by matrices.Comment: 56 pages, 21 figures, 1 tabl
- …