74,464 research outputs found
Recommended from our members
Use of colour for hand-filled form analysis and recognition
Colour information in form analysis is currently under utilised. As technology has advanced and computing costs have reduced, the processing of forms in colour has now become practicable. This paper describes a novel colour-based approach to the extraction of filled data from colour form images. Images are first quantised to reduce the colour complexity and data is extracted by examining the colour characteristics of the images. The improved performance of the proposed method has been verified by comparing the processing time, recognition rate, extraction precision and recall rate to that of an equivalent black and white system
Text Line Segmentation of Historical Documents: a Survey
There is a huge amount of historical documents in libraries and in various
National Archives that have not been exploited electronically. Although
automatic reading of complete pages remains, in most cases, a long-term
objective, tasks such as word spotting, text/image alignment, authentication
and extraction of specific fields are in use today. For all these tasks, a
major step is document segmentation into text lines. Because of the low quality
and the complexity of these documents (background noise, artifacts due to
aging, interfering lines),automatic text line segmentation remains an open
research field. The objective of this paper is to present a survey of existing
methods, developed during the last decade, and dedicated to documents of
historical interest.Comment: 25 pages, submitted version, To appear in International Journal on
Document Analysis and Recognition, On line version available at
http://www.springerlink.com/content/k2813176280456k3
A Query Integrator and Manager for the Query Web
We introduce two concepts: the Query Web as a layer of interconnected queries over the document web and the semantic web, and a Query Web Integrator and Manager (QI) that enables the Query Web to evolve. QI permits users to write, save and reuse queries over any web accessible source, including other queries saved in other installations of QI. The saved queries may be in any language (e.g. SPARQL, XQuery); the only condition for interconnection is that the queries return their results in some form of XML. This condition allows queries to chain off each other, and to be written in whatever language is appropriate for the task. We illustrate the potential use of QI for several biomedical use cases, including ontology view generation using a combination of graph-based and logical approaches, value set generation for clinical data management, image annotation using terminology obtained from an ontology web service, ontology-driven brain imaging data integration, small-scale clinical data integration, and wider-scale clinical data integration. Such use cases illustrate the current range of applications of QI and lead us to speculate about the potential evolution from smaller groups of interconnected queries into a larger query network that layers over the document and semantic web. The resulting Query Web could greatly aid researchers and others who now have to manually navigate through multiple information sources in order to answer specific questions
Penyelenggaraan struktur penahan cerun rock shed: langkah mitigasi runtuhan tanah di Simpang Pulai - Blue Valley, Perak
Industri pembinaan merupakan industri yang sangat mencabar bukan sahaja di Malaysia malah di seluruh dunia yang merangkumi skop 3D dirty, difficult and dangerous. Industri ini juga meruapakan antara penyumbang terbesar KDNK iaitu sebanyak 7.4 peratus pada tahun 2016, walaupun industri ini antara penyumbang terbesar dari aspek keselamatan iaitu kemalangan (CIDB, 2017). Justeru itu, pihak yang bertanggungjawab seharusnya memandang serius mengenai masalah-masalah yang dihadapi supaya industri ini mampu bersaing di peringkat antarabangsa
Mapping and Displaying Structural Transformations between XML and PDF
Documents are often marked up in XML-based tagsets to delineate major structural components such as headings, paragraphs, figure captions and so on, without much regard to their eventual displayed appearance. And yet these same abstract documents, after many transformations and 'typesetting' processes, often emerge in the popular format of Adobe PDF, either for dissemination or archiving.
Until recently PDF has been a totally display-based document representation, relying on the underlying PostScript semantics of PDF. Early versions of PDF had no mechanism for retaining any form of abstract document structure but recent releases have now introduced an internal structure tree to create the so called 'Tagged PDF'.
This paper describes the development of a plugin for Adobe Acrobat which creates a two-window display. In one window is shown an XML document original and in the other its Tagged PDF counterpart is seen, with an internal structure tree that, in some sense, matches the one seen in XML. If a component is highlighted in either window then the corresponding structured item, with any attendant text, is also highlighted in the other window.
Important applications of correctly Tagged PDF include making PDF documents reflow intelligently on small screen devices and enabling them to be read out in correct reading order, via speech synthesiser software, for the visually impaired. By tracing structure transformation from source document to destination one can implement the repair of damaged PDF structure or the adaptation of an existing structure tree to an incrementally updated document
Applying semantic web technologies to knowledge sharing in aerospace engineering
This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale
- …