74,449 research outputs found

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

    A Query Integrator and Manager for the Query Web

    Get PDF
    We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions

    Penyelenggaraan struktur penahan cerun rock shed: langkah mitigasi runtuhan tanah di Simpang Pulai - Blue Valley, Perak

    Get PDF
    Industri pembinaan merupakan industri yang sangat mencabar bukan sahaja di Malaysia malah di seluruh dunia yang merangkumi skop 3D dirty, difficult and dangerous. Industri ini juga meruapakan antara penyumbang terbesar KDNK iaitu sebanyak 7.4 peratus pada tahun 2016, walaupun industri ini antara penyumbang terbesar dari aspek keselamatan iaitu kemalangan (CIDB, 2017). Justeru itu, pihak yang bertanggungjawab seharusnya memandang serius mengenai masalah-masalah yang dihadapi supaya industri ini mampu bersaing di peringkat antarabangsa

    Mapping and Displaying Structural Transformations between XML and PDF

    Get PDF
    Documents are often marked up in XML-based tagsets to delineate major structural components such as headings, paragraphs, figure captions and so on, without much regard to their eventual displayed appearance. And yet these same abstract documents, after many transformations and 'typesetting' processes, often emerge in the popular format of Adobe PDF, either for dissemination or archiving. Until recently PDF has been a totally display-based document representation, relying on the underlying PostScript semantics of PDF. Early versions of PDF had no mechanism for retaining any form of abstract document structure but recent releases have now introduced an internal structure tree to create the so called 'Tagged PDF'. This paper describes the development of a plugin for Adobe Acrobat which creates a two-window display. In one window is shown an XML document original and in the other its Tagged PDF counterpart is seen, with an internal structure tree that, in some sense, matches the one seen in XML. If a component is highlighted in either window then the corresponding structured item, with any attendant text, is also highlighted in the other window. Important applications of correctly Tagged PDF include making PDF documents reflow intelligently on small screen devices and enabling them to be read out in correct reading order, via speech synthesiser software, for the visually impaired. By tracing structure transformation from source document to destination one can implement the repair of damaged PDF structure or the adaptation of an existing structure tree to an incrementally updated document

    Applying semantic web technologies to knowledge sharing in aerospace engineering

    Get PDF
    This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale
    • …
    corecore