6,103 research outputs found

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

    Beyond writing: The development of literacy in the Ancient Near East

    Get PDF
    Previous discussions of the origins of writing in the Ancient Near East have not incorporated the neuroscience of literacy, which suggests that when southern Mesopotamians wrote marks on clay in the late-fourth millennium, they inadvertently reorganized their neural activity, a factor in manipulating the writing system to reflect language, yielding literacy through a combination of neurofunctional change and increased script fidelity to language. Such a development appears to take place only with a sufficient demand for writing and reading, such as that posed by a state-level bureaucracy; the use of a material with suitable characteristics; and the production of marks that are conventionalized, handwritten, simple, and non-numerical. From the perspective of Material Engagement Theory, writing and reading represent the interactivity of bodies, materiality, and brains: movements of hands, arms, and eyes; clay and the implements used to mark it and form characters; and vision, motor planning, object recognition, and language. Literacy is a cognitive change that emerges from and depends upon the nexus of interactivity of the components

    Multi-Character Field Recognition for Arabic and Chinese Handwriting

    Get PDF
    Two methods, Symbolic Indirect Correlation (SIC) and Style Constrained Classification (SCC), are proposed for recognizing handwritten Arabic and Chinese words and phrases. SIC reassembles variable-length segments of an unknown query that match similar segments of labeled reference words. Recognition is based on the correspondence between the order of the feature vectors and of the lexical transcript in both the query and the references. SIC implicitly incorporates language context in the form of letter n-grams. SCC is based on the notion that the style (distortion or noise) of a character is a good predictor of the distortions arising in other characters, even of a different class, from the same source. It is adaptive in the sense that with a long-enough field, its accuracy converges to that of a style-specific classifier trained on the writer of the unknown query. Neither SIC nor SCC requires the query words to appear among the references

    Multi-Character Field Recognition for Arabic and Chinese Handwriting

    Get PDF
    Two methods, Symbolic Indirect Correlation (SIC) and Style Constrained Classification (SCC), are proposed for recognizing handwritten Arabic and Chinese words and phrases. SIC reassembles variable-length segments of an unknown query that match similar segments of labeled reference words. Recognition is based on the correspondence between the order of the feature vectors and of the lexical transcript in both the query and the references. SIC implicitly incorporates language context in the form of letter n-grams. SCC is based on the notion that the style (distortion or noise) of a character is a good predictor of the distortions arising in other characters, even of a different class, from the same source. It is adaptive in the sense that with a long-enough field, its accuracy converges to that of a style-specific classifier trained on the writer of the unknown query. Neither SIC nor SCC requires the query words to appear among the references
    • …
    corecore