6 research outputs found

    Normalisasi Tampilan dan Tata Letak Tampilan Citra Dengan Algorima Adaktif Splitter

    Get PDF
    Abstrak Dokumen gambar saat ditampilkan di monitor, melalui World Wide Web mengalami penurunan resolusi citra RGB apabila resolusi citranya besar. Sistem yang akan di kembangkan yaitu system secara otomatis melakukan splitter terhadap image dengan adaktif hyperdocument. Citra akan dapat diakses dengan cepat melalui internet, dalam format seperti HTML dan SGML/XML, juga telah dengan cepat meningkat. Namun demikian, hanya beberapa pekerjaan telah dilakukan pada konversi dari dokumen ke hyperdocument. Selain itu, sebagian besar citra yang di olah yaitu mencakup teks dan gambar objek. Adapun metode adaktif yang digunakan untuk mengkonversi lebih komplek multi kolom dokumen citra ke dalam dokumen HTML, dan metode normalisasi adaktif untuk menghasilkan sebuah tabel yang terstruktur untuk menampikan halaman isi. Berdasarkan analisis struktur logis dokumen gambar. Percobaan dengan berbagai macam multi kolom dokumen citra menunjukkan bahwa, dengan menggunakan metode yang diusulkan, dalam dokumen HTML yang sesuai dapat dihasilkan di atur tata letak visual yang sama sebagai bahwa gambar dokumen, dan Tabel Halaman isi mereka terstruktur dapat juga diproduksi dengan hirarki memerintahkan judul pasal hyper link ke isi.Kata kunci: hyperdocumen, Multi kolom dokumen, Konversi dokumen, spliter zoom; Dokumen Logis, Analisis strukturAbstrak The image data is displayed in the monitor is process in the World Wide Web has decreased resolution RGB image if the resolution of the image Image Data displayed on the monitor via the Web, RGB resolution has decreased and the read and write process data is very long times.Research that is developing a system that is able to read the data and do a splitter to images with adaktif hyper document. The result image Data splitter will automatically be fast in access over the web, because the data in formats such as HTML and SGML/XML. The conversion of documents into hyper document will do the process of text and image data. Hyper document with adaktif method is able to convert more complex multi-column document images into HTML documents by performing the normalization of adaktif, generate a table of contents page is structured. Based on the logical structure analysis and display the document image. Experiment with a variety of multi column indicates that document the result image splitter: 8 x 8, 16x16, up to 200 x 200, The conversion of images into HTML documents, are able to be displayed with the average 0.1- 0.17 second.keywords: image splitter, hyper document, multi-column documents, splitter zoom, logical documen

    Eyes Wide Open: an interactive learning method for the design of rule-based systems

    Get PDF
    International audienceWe present in this paper a new general method, the Eyes Wide Open method (EWO) for the design of rule-based document recognition systems. Our contribution is to introduce a learning procedure, through machine learning techniques, in interaction with the user to design the recognition system. Therefore, and unlike many approaches that are manually designed, ours can easily adapt to a new type of documents while taking advantage of the expressiveness of rule-based systems and their ability to convey the hierarchical structure of a document. The EWO method is independent of any existing recognition system. An automatic analysis of an annotated corpus, guided by the user, is made to help the adaption of the recognition system to a new kind of document. The user will then bring sense to the automatically extracted information. In this paper, we validate EWO by producing two rule-based systems: one for the Mau-rdor international competition, on a heterogeneous corpus of documents, containing handwritten and printed documents, written in different languages and another one for the RIMES competition corpus, a homogeneous corpus of French handwritten business letters. On the RIMES corpus, our method allows an assisted design of a grammatical description that gives better results than all the previously proposed statistical systems

    Graphical tools for ground truth generation in HTR tasks

    Full text link
    [EN] This report will cover the development of several graphical tools for ground truth generation in HTR tasks, specifically for layout analysis, line segmentation, and transcription, as well as one ad hoc tool needed for point classification in an implemented line size normalization method. It will show the design process behind the tools, giving an overview of the internal structure through class diagrams. It will also explain the mentioned phases of the HTR with the aim of clarifying each tool context and utility. Finally, the report will close with a brief conclusions and considerations about the future of the tools.[CA] Aquest informe cobrirà el desenvolupament de diverses ferramentes gràfiques utilitzades en la generació de ground truth en tasques de reconeixement de text manuscrit (HTR), especificament anàlisi de layout, segmentació en línies i transcripció, així com una ferramenta ad hoc requerida per a la classificació de punts necessària en un mètode de normalització de tamany de línia que vam implementar. Mostrarà el procès de disseny previ al desenvolupament de les ferramentes, donant una visió general de l'estructura interna a través de diagrames de classe. També explicarà les diferents fases del procès de HTR previament mencionades, amb l'intenció de clarificar el context i l'utilitat de les diferents ferramentes. Finalment, l'informe acabarà amb unes breus conclussions i algunes consideracions sobre el futur de les ferramentes.Martínez Vargas, J. (2014). Graphical tools for ground truth generation in HTR tasks. http://hdl.handle.net/10251/36156.Archivo delegad

    Semantic framework for regulatory compliance support

    Get PDF
    Regulatory Compliance Management (RCM) is a management process, which an organization implements to conform to regulatory guidelines. Some processes that contribute towards automating RCM are: (i) extraction of meaningful entities from the regulatory text and (ii) mapping regulatory guidelines with organisational processes. These processes help in updating the RCM with changes in regulatory guidelines. The update process is still manual since there are comparatively less research in this direction. The Semantic Web technologies are potential candidates in order to make the update process automatic. There are stand-alone frameworks that use Semantic Web technologies such as Information Extraction, Ontology Population, Similarities and Ontology Mapping. However, integration of these innovative approaches in the semantic compliance management has not been explored yet. Considering these two processes as crucial constituents, the aim of this thesis is to automate the processes of RCM. It proposes a framework called, RegCMantic. The proposed framework is designed and developed in two main phases. The first part of the framework extracts the regulatory entities from regulatory guidelines. The extraction of meaningful entities from the regulatory guidelines helps in relating the regulatory guidelines with organisational processes. The proposed framework identifies the document-components and extracts the entities from the document-components. The framework extracts important regulatory entities using four components: (i) parser, (ii) definition terms, (iii) ontological concepts and (iv) rules. The parsers break down a sentence into useful segments. The extraction is carried out by using the definition terms, ontological concepts and the rules in the segments. The entities extracted are the core-entities such as subject, action and obligation, and the aux-entities such as time, place, purpose, procedure and condition. The second part of the framework relates the regulatory guidelines with organisational processes. The proposed framework uses a mapping algorithm, which considers three types of Abstract 3 entities in the regulatory-domain and two types of entities in the process-domains. In the regulatory-domain, the considered entities are regulation-topic, core-entities and aux-entities. Whereas, in the process-domain, the considered entities are subject and action. Using these entities, it computes aggregation of three types of similarity scores: topic-score, core-score and aux-score. The aggregate similarity score determines whether a regulatory guideline is related to an organisational process. The RegCMantic framework is validated through the development of a prototype system. The prototype system implements a case study, which involves regulatory guidelines governing the Pharmaceutical industries in the UK. The evaluation of the results from the case-study has shown improved accuracy in extraction of the regulatory entities and relating regulatory guidelines with organisational processes. This research has contributed in extracting meaningful entities from regulatory guidelines, which are provided in unstructured text and mapping the regulatory guidelines with organisational processes semantically

    Interprétation contextuelle et assistée de fonds d'archives numérisées (application à des registres de ventes du XVIIIe siècle)

    Get PDF
    Les fonds d'archives forment de grandes quantités de documents difficiles à interpréter automatiquement : les approches classiques imposent un lourd effort de conception, sans parvenir à empêcher la production d'erreurs qu'il faut corriger après les traitements.Face à ces limites, notre travail vise à améliorer la processus d'interprétation, en conservant un fonctionnement page par page, et en lui apportant des informations contextuelles extraites du fonds documentaire ou fournies par des opérateurs humains.Nous proposons une extension ciblée de la description d'une page qui permet la mise en place systématique d'échanges entre le processus d'interprétation et son environnement. Un mécanisme global itératif gère l'apport progressif d'informations contextuelles à ce processus, ce qui améliore l'interprétation.L'utilisation de ces nouveaux outils pour le traitement de documents du XVIIIe siècle a montré qu'il était facile d'intégrer nos propositions à un système existant, que sa conception restait simple, et que l'effort de correction pouvait être diminué.Fonds, also called historical document collections, are important amounts of digitized documents which are difficult to interpret automatically: usual approaches require a lot of work during design, but do not manage to avoid producing many errors which have to be corrected after processing.To cope with those limitations, our work aimed at improving the interpretation process by making use of information extracted from the fond, or provided by human operators, while keeping a page by page processing.We proposed a simple extension of page description language which permits to automatically generate information exchange between the interpretation process and its environment. A global iterative mechanism progressively brings contextual information to the later process, and improves interpretation.Experiments and application of those new tools for the processing of documents from the 18th century showed that our propositions were easy to integrate in an existing system, that its design is still simple, and that required manual corrections were reduced.RENNES-INSA (352382210) / SudocSudocFranceF
    corecore