1,262 research outputs found

    Learning to Read by Spelling: Towards Unsupervised Text Recognition

    Full text link
    This work presents a method for visual text recognition without using any paired supervisory data. We formulate the text recognition task as one of aligning the conditional distribution of strings predicted from given text images, with lexically valid strings sampled from target corpora. This enables fully automated, and unsupervised learning from just line-level text-images, and unpaired text-string samples, obviating the need for large aligned datasets. We present detailed analysis for various aspects of the proposed method, namely - (1) impact of the length of training sequences on convergence, (2) relation between character frequencies and the order in which they are learnt, (3) generalisation ability of our recognition network to inputs of arbitrary lengths, and (4) impact of varying the text corpus on recognition accuracy. Finally, we demonstrate excellent text recognition accuracy on both synthetically generated text images, and scanned images of real printed books, using no labelled training examples

    Recognizing Degraded Handwritten Characters

    Get PDF
    In this paper, Slavonic manuscripts from the 11th century written in Glagolitic script are investigated. State-of-the-art optical character recognition methods produce poor results for degraded handwritten document images. This is largely due to a lack of suitable results from basic pre-processing steps such as binarization and image segmentation. Therefore, a new, binarization-free approach will be presented that is independent of pre-processing deficiencies. It additionally incorporates local information in order to recognize also fragmented or faded characters. The proposed algorithm consists of two steps: character classification and character localization. Firstly scale invariant feature transform features are extracted and classified using support vector machines. On this basis interest points are clustered according to their spatial information. Then, characters are localized and eventually recognized by a weighted voting scheme of pre-classified local descriptors. Preliminary results show that the proposed system can handle highly degraded manuscript images with background noise, e.g. stains, tears, and faded characters

    Readability Enhancement and Palimpsest Decipherment of Historical Manuscripts

    Get PDF
    This paper presents image acquisition and readability enhancement techniques for historical manuscripts developed in the interdisciplinary project “The Enigma of the Sinaitic Glagolitic Tradition” (Sinai II Project).1 We are mainly dealing with parchment documents originating from the 10th to the 12th centuries from St. Cather- ine’s Monastery on Mount Sinai. Their contents are being analyzed, fully or partly transcribed and edited in the course of the project. For comparison also other mss. are taken into consideration. The main challenge derives from the fact that some of the manuscripts are in a bad condition due to various damages, e.g. mold, washed out or faded text, etc. or contain palimpsest (=overwritten) parts. Therefore, the manuscripts investigated are imaged with a portable multispectral imaging system. This non-invasive conservation technique has proven extremely useful for the exami- nation and reconstruction of vanished text areas and erased or washed o palimpsest texts. Compared to regular white light, the illumination with speci c wavelengths highlights particular details of the documents, i.e. the writing and writing material, ruling, and underwritten text. In order to further enhance the contrast of the de- graded writings, several Blind Source Separation techniques are applied onto the multispectral images, including Principal Component Analysis (PCA), Independent Component Analysis (ICA) and others. Furthermore, this paper reports on other latest developments in the Sinai II Project, i.e. Document Image Dewarping, Automatic Layout Analysis, the recent result of another project related to our work: the image processing tool Paleo Toolbar, and the launch of the series Glagolitica Sinaitica

    A Study in Authenticity : Admissible Concealed Indicators of Authority and Other Features of Forgeries : A Case Study on Clement of Alexandria, Letter to Theodore, and the Longer Gospel of Mark

    Get PDF
    A standard approach in historically minded disciplines to documents and other artefacts that have become suspect is to concentrate on their dissimilarities with known genuine artefacts. While such an approach works reasonably well with relatively poor forgeries, more skilfully done counterfeits have tended to divide expert opinions, demanding protracted scholarly attention. As there has not been a widespread scholarly consensus on a constrained set of criteria for detecting forgeries, a pragmatic maximum for such dissimilarities—as there are potentially an infinite numbers of differences that can be enumerated between any two artefacts—has been impossible to set. Thus, rather than relying on a philosophically robust critical framework, scholars have been accustomed to approaching the matter on a largely case-by-case basis, with a handful of loosely formulated rules for guidance. In response to these shortcomings, this dissertation argues that a key characteristic of inquiry in historically minded disciplines should be the ability to distinguish between knowledge-claims that are epistemically warranted—i.e., that can be asserted post hoc from the material reality they have become embedded in with reference to some sort of rigorous methodological framework—and knowledge-claims that are not. An ancient letter by Clement of Alexandria (ca. 150–215 CE) to Theodore, in which two passages from the Longer Gospel of Mark (also known as the Secret Gospel of Mark) are quoted, has long been suspected of having been forged by Morton Smith (1915–1991), its putative discoverer. The bulk of this dissertation consists of four different articles that each use different methodological approaches. The first, a discourse analysis on scholarly debate over the letter’s authenticity, illuminates the reasons behind its odd character and troubled history. Second, archival research unearths how data points have become corrupted through unintended additions in digital-image processing (a phenomenon labelled line screen distortion here). Third, a quantitative study of the handwriting in Clement’s Letter to Theodore shows the inadequacy of unwittingly applying palaeographic standards in cases of suspected deceptions compared to the standards adhered to in forensic studies. Additionally, Smith’s conduct as an academic manuscript hunter is found to have been consistent with the standard practices of that profession. Finally, a study of the conceptual distinctions and framing of historical explanations in contemporary forgery discourse reveals the power of the methodologic approach of WWFD (What Would a Forger Do?), which has recently been used in three varieties (unconcealed, concealed, and hyperactive) to construe suspected documents as potential forgeries—despite its disregard of justificatory grounding in favour of coming up with free-form, first-person narratives in which the conceivable functions as its own justification. Together, the four articles illustrate the pitfalls of scholarly discourse on forgeries, especially that surrounding Clement’s Letter to Theodore. The solution to the poor argumentation that has characterized the scholarly study of forgeries is suggested to be an exercise in demarcation: to decide (in the abstract) which features should be acceptable as evidence either for or against the ascription of the status of forgery to an historical artefact. Implied within this suggestion is the notion of constraint, i.e., such that a constrained criterion would be one that cannot be employed to back up both an argument and its counter-argument. A topical case study—a first step on the road to creating a rigorous standard for constrained criteria in determining counterfeits—is the alternative narrative of an imagined creation of Clement’s Letter to Theodore by Smith around the time of its reported discovery (1958). Concealed indicators of authority, or the deliberate concealment of authorial details within the forged artefact by the forger, is established as a staple of the literary strategy of mystification, and their post hoc construction as acceptable evidence of authorship is argued to follow according to criteria: 1) that the beginning of the act of decipherment of a concealed indicator of authority has to have been preceded by a literary primer that is unambiguous to a high degree, 2) that, following the prompting of the literary primer, the act of deciphering a concealed indicator of authority has to have adhered to a technique or method that is unambiguous to a high degree, and 3) that, following the prompting of the literary primer and the act of decipherment, both of which must have been practiced in an unambiguous manner to a high degree, the plain-text solution to the concealed indicator of authority must likewise be unambiguous to a high degree.Tässä väitöskirjassa tarkastellaan Klemens Aleksandrialaisen (n. 150-215 jaa.) kirjettä Theodorokselle, joka sisältää Salaisen Markuksen evankeliumin nimellä tunnettuja tekstikatkelmia. Näissä katkelmissa, jotka eivät sisälly kanonisoituun Uuteen testamenttiin, Jeesus mm. herättää nuorukaisen kuolleista ja opettaa tälle Jumalan valtakunnan salaisuuden. Klemensin kirje todistaa laajemmin kristinuskon varhaisvaiheen moninaisuudesta, mutta sitä on myös epäilty väärennökseksi. Historiallisten väärennösten tunnistamiseen ei ole löytynyt yleisesti hyväksyttyä metodia. Historiantutkijat ovat joutuneet arvioimaan epäiltyjä väärennöksiä tapauskohtaisesti, ja taidokkaasti toteutetut väärennökset johtavatkin usein pitkään ja kiivaaseen keskusteluun. Väitöskirjan ytimen muodostavat neljä artikkelia, joissa tarkastellaan Klemensin kirjettä eri näkökulmista ja kuvataan myös yleisemmin historiallisten väärennösten paljastamiseen liittyviä sudenkuoppia. Ensimmäinen artikkeli kuvaa diskurssianalyysin keinoin väärennösväitteistä käytyä sananvaihtoa, jota leimaa puhuminen toisten tutkijoiden ohi ja yli. Toinen ja kolmas artikkeli analysoivat Klemensin kirjeen käsialaa. Ne paljastavat, että digitaalinen kuvankäsittely on tahattomasti muokannut käsialan yksityiskohtia. Vertailuaineisto osoittaa, ettei Klemensin kirjeen käsiala sisällä "väärentäjän vapinaa" tai muita yleisiä väärennöksen tuntomerkkejä. Neljäs artikkeli tarkastelee ja problematisoi tutkijoiden tapaa perustella väärennösväitteitä luomalla kuvitteellisia tarinoita, joilla selitetään väärennöksien yksityiskohtien syntymistä. Väitöskirjassa ehdotetaan, että historiallisten väärennösten paljastamiseen täytyy kehittää vankka tieteellinen viitekehys. Väitöskirjan yhteenvetoluvussa otetaan tähän ensimmäinen askel tarkastelemalla, kuinka autenttisuuden kysymystä on lähestytty mm. kirjallisuustieteen alalla. Yhteenvetoluvussa analysoidaan mystifikaatiolle (kirjallinen genre) tyypillistä tapaa piilottaa "kätkettyjä tekijyyden indikaattoreita" väärennöksiin. Analyysin perusteella todetaan, että aiemmin tutkijat ovat saattaneet langeta kehittelemään villejä väärennösteorioita erilaisten kuviteltujen vihjeiden ja salakirjoitusten pohjalta. Jotta vältytään tämänkaltaiselta "kryptoanalyyttiseltä hyperaktiivisuudelta," tarvitaan "kätkettyjen tekijyyden indikaattoreiden" käytölle kriteerejä. Ehdotettujen kriteerien mukaan ainoastaan sellaiset "kätketyt tekijyyden indikaattorit" voidaan hyväksyä todellisiksi, joiden 1) olemassaoloon viitataan yksiselitteisesti, joiden 2) purkaminen tapahtuu yksiselitteisellä metodilla ja jotka 3) nimeävät tekijän yksiselitteisellä tavalla

    Tracing: A Graphical-Digital Method for Restoring Damaged Manuscripts

    Get PDF
    Different kinds of graphical properties of manuscripts such as layout, marginalia, handwriting or text decorations are crucial for the palaeographic and philological analysis thereof. These properties help to locate the manuscript in time and space, as well as enhance the philological analysis of the text. However, in the case of ancient historical documents, this can be considerably impeded by various kinds of damages such as deterioration, erasure, moulds, fading, staining or overwriting, just to name a few. The aim of this paper is to provide a new and handy method for digital reconstruction referred to as Tracing that allows quite accurate reconstructing of the original graphical appearance of a damaged manuscript without requiring considerable technical expertise. Tracing is a non-invasive method that crucially relies on high-resolution digital images of the manuscript. Its application is illustrated here on the basis of the palimpsested manuscript Vaticanus graecus 73. Tracing was employed in order to restore the earlier, underlying text layer (scriptio inferior) on 12 folios or 24 pages. The results are quality images of the reconstructed manuscript pages that faithfully render the graphical properties of the original. These images may immediately be used for palaeographical and philological analyses

    Kirja-arvosteluja — Book reviews

    Get PDF

    Advanced Techniques for the Decipherment of Ancient Scripts

    Get PDF
    This contribution explores modern and traditional approaches to the decipherment of ancient writing systems. It surveys methods used by paleographers and epigraphers and state-of-the art applications of computational linguistics, such as models based on neural networks. It frames the contextual problems scholars encounter in dealing with ancient codes, the situations and preconditions of the unknown codes, their idiosyncrasies and peculiarities, and the potential solutions afforded by both traditional and novel methods of investigation
    corecore