57,510 research outputs found

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

    Competitive-Ratio Approximation Schemes for Minimizing the Makespan in the Online-List Model

    Full text link
    We consider online scheduling on multiple machines for jobs arriving one-by-one with the objective of minimizing the makespan. For any number of identical parallel or uniformly related machines, we provide a competitive-ratio approximation scheme that computes an online algorithm whose competitive ratio is arbitrarily close to the best possible competitive ratio. We also determine this value up to any desired accuracy. This is the first application of competitive-ratio approximation schemes in the online-list model. The result proves the applicability of the concept in different online models. We expect that it fosters further research on other online problems

    Macroscale multimodal imaging reveals ancient painting production technology and the vogue in Greco-Roman Egypt.

    Get PDF
    Macroscale multimodal chemical imaging combining hyperspectral diffuse reflectance (400-2500 nm), luminescence (400-1000 nm), and X-ray fluorescence (XRF, 2 to 25 keV) data, is uniquely equipped for noninvasive characterization of heterogeneous complex systems such as paintings. Here we present the first application of multimodal chemical imaging to analyze the production technology of an 1,800-year-old painting and one of the oldest surviving encaustic ("burned in") paintings in the world. Co-registration of the data cubes from these three hyperspectral imaging modalities enabled the comparison of reflectance, luminescence, and XRF spectra at each pixel in the image for the entire painting. By comparing the molecular and elemental spectral signatures at each pixel, this fusion of the data allowed for a more thorough identification and mapping of the painting's constituent organic and inorganic materials, revealing key information on the selection of raw materials, production sequence and the fashion aesthetics and chemical arts practiced in Egypt in the second century AD

    Automatic Translating Between Ancient Chinese and Contemporary Chinese with Limited Aligned Corpora

    Full text link
    The Chinese language has evolved a lot during the long-term development. Therefore, native speakers now have trouble in reading sentences written in ancient Chinese. In this paper, we propose to build an end-to-end neural model to automatically translate between ancient and contemporary Chinese. However, the existing ancient-contemporary Chinese parallel corpora are not aligned at the sentence level and sentence-aligned corpora are limited, which makes it difficult to train the model. To build the sentence level parallel training data for the model, we propose an unsupervised algorithm that constructs sentence-aligned ancient-contemporary pairs by using the fact that the aligned sentence pair shares many of the tokens. Based on the aligned corpus, we propose an end-to-end neural model with copying mechanism and local attention to translate between ancient and contemporary Chinese. Experiments show that the proposed unsupervised algorithm achieves 99.4% F1 score for sentence alignment, and the translation model achieves 26.95 BLEU from ancient to contemporary, and 36.34 BLEU from contemporary to ancient.Comment: Acceptted by NLPCC 201

    Image and interpretation using artificial intelligence to read ancient Roman texts

    Get PDF
    The ink and stylus tablets discovered at the Roman Fort of Vindolanda are a unique resource for scholars of ancient history. However, the stylus tablets have proved particularly difficult to read. This paper describes a system that assists expert papyrologists in the interpretation of the Vindolanda writing tablets. A model-based approach is taken that relies on models of the written form of characters, and statistical modelling of language, to produce plausible interpretations of the documents. Fusion of the contributions from the language, character, and image feature models is achieved by utilizing the GRAVA agent architecture that uses Minimum Description Length as the basis for information fusion across semantic levels. A system is developed that reads in image data and outputs plausible interpretations of the Vindolanda tablets

    How Ordinary Elimination Became Gaussian Elimination

    Get PDF
    Newton, in notes that he would rather not have seen published, described a process for solving simultaneous equations that later authors applied specifically to linear equations. This method that Euler did not recommend, that Legendre called "ordinary," and that Gauss called "common" - is now named after Gauss: "Gaussian" elimination. Gauss's name became associated with elimination through the adoption, by professional computers, of a specialized notation that Gauss devised for his own least squares calculations. The notation allowed elimination to be viewed as a sequence of arithmetic operations that were repeatedly optimized for hand computing and eventually were described by matrices.Comment: 56 pages, 21 figures, 1 tabl
    • …
    corecore