23 research outputs found

    Character Recognition System using Radial Features

    Get PDF
    Extraction of text from documented images finds application in maximum entries which are document related in offices. The most of the popular applications which we find in public or college libraries where the entries of number of books are done by manually typing the title of book along with other credentials like name of the author and other attributes. The complete process can be made effortless with the application of a suitable algorithm or application software which can be extract the documented part from the cover of book and other parts of the book thereby reducing the manual job like typing of user. Which reduces the overall job to only arranging the book title etc.by formatting the material

    Reducing human effort in engineering drawing validation.

    Get PDF
    Oil & Gas facilities are extremely huge and have complex industrial structures that are documented using thousands of printed sheets. During the last years, it has been a tendency to migrate these paper sheets towards a digital environment, with the final end of regenerating the original computer-aided design (CAD) projects which are useful to visualise and analyse these facilities through diverse computer applications. Usually, this was done manually by re-sketching each page using CAD applications. Nevertheless, some applications have appeared which generate the CAD document automatically given the paper sheets. In this last case, the final document is always verified by an engineer due to the need of being a zero-error process. Since the need of an engineer is absolutely accepted, we present a new method to reduce the required engineer working time. This is done by highlighting the digitised components in the CAD document that the automatic method could have incorrectly identified. Thus, the engineer is required only to look at these components. The experimental section shows our method achieves a reduction of approximately 40% of the human effort keeping a zero-error process

    Ontologies and representation spaces for sketch map interpretation

    Get PDF
    In this paper, we present a systematic approach to sketch map interpretation. The method decomposes the elements of a sketch map into a hierarchy of categories, from the material sketch map level to the non-material representational sketch map level, and then interprets the sketch map using the five formal representation spaces that we develop. These spaces (set, graph, metric and Euclidean) provide a tiered formal representation based on standard mathematical structures. We take the view that a sketch map bears information about the physical world and systematises this using extensions of existing formal ontologies. The motivation for this work is the partially automatic extraction and integration of information from sketch maps. We propose a set of ontologies and methods as a first step in the direction of a formalisation of partially automatic extraction and integration of sketch map content. We also see this work as a contribution to spatial cognition, where researchers externalise spatial knowledge using sketch mapping. The paper concludes by working through an example that demonstrates the sketch map interpretation at different levels using the underlying method

    Challenges for the Engineering Drawing Lehigh Steel Collection

    Get PDF
    International audienceThe Lehigh Steel Collection (LSC) is an extremely large, heterogeneous set of documents dating from the 1960's through the 1990's. It was retrieved by Lehigh University after it acquired research facilities from Bethlehem Steel, a now-bankrupt company that was once the second-largest steel producer and the largest shipbuilder in the United States. The documents account for and describe research and development activities that were conducted on site, and consist of a very wide range of technical documentation, handwritten notes and memos, annotated printed documents, etc. This paper addresses only a sub-part of this collection: the approximately 4000 engineering drawings and blueprints that were retrieved. The challenge resides essentially in the fact that these documents come in different sizes and shapes, in a wide variety of conservation and degradation stages, and more importantly in bulk, and without ground-truth. Making them available to the research community through digitization is one step the good direction, the question now is what to do with them. This paper tries to lay down some first basic stepping stones for enhancing the documents' meta-data and annotations

    Extracción de información en documentos antiguos y manuscritos

    Get PDF
    El objetivo de este proyecto es el de, partiendo de un set de imágenes de documentos como entrada, realizar una serie de procesos sobre las imágenes con el fin de poder generar un modelo de predicción basado en machine learning que sea capaz de clasificar si los elementos que aparecen en los documentos anteriormente mencionados se tratan de texto escrito a mano, impreso o si no son texto en absoluto. Para ello, se desarrollarán y utilizarán diversos programas, con los que se pretende, por un lado, aislar los elementos de texto de las imágenes y extraer información de dichos elementos, así como crear una matriz de adyacencia que los relacione, y por el otro, aplicar estos datos para entrenar un modelo de predicción que utilice Structured Support Vector Machine. Por último, para comprobar la eficacia de dicho modelo, se harán múltiples pruebas variando los distintos modos de funcionamiento que permite el algoritmo, con tal de observar en qué condiciones funciona mejor, y realizándose un estudio de los mismos.This project's objective is, starting with a set of document images as input, to carry out a series of procedures on the images with the purpose of obtaining a prediction model based on machine learning able to classify if the elements that appear on the previously mentioned documents are either handwritten text, printed text or no text at all. In order to do that, several programs will be developed and utilized, with which it is intended, on the one hand, to isolate the text elements from the images, extract information of said elements, as well as the creation of an adjacency matrix that relates them, and on the other, to apply this data to train a prediction model that uses Structured Support Vector Machine. Lastly, in order to check the efficiency of said model, multiple tests will be done modifying the various functioning modes that the algorithm allows, with the goal of observing under which conditions does it perform better, and studying the results of those tests.L'objectiu d'aquest projecte es el de, partint d'un set d'imatges de documents com a entrada, realitzar una sèrie de processos sobre les imatges amb el fi de poder generar un model de predicció basat en machine learning que sigui capaç de classificar si els elements que apareixen en els documents anteriorment esmentats es tracten de text escrit a ma, imprès, o si no son text escrit en absolut. Pera a això, es desenvoluparan i faran servir diversos programes, amb els que es pretén, d'una banda, aïllar els elements de text de les imatges i extreure informació de dits elements, així com crear una matriu d'adjacència que els relacioni, i de l'altre, aplicar aquestes dades per a entrenar un model de predicció que utilitzi Structured Support Vector Machine. Per últim, per comprovar l'eficàcia de dit model, es faran múltiples proves variant els diferents modes de funcionament que permet l'algoritme, amb l'objectiu d'observar en que condicions funciona millor, i fent-se un estudi d'aquests

    Text/graphic separation using a sparse representation with multi-learned dictionaries

    Get PDF
    International audienceIn this paper, we propose a new approach to extract text regions from graphical documents. In our method, we first empirically construct two sequences of learned dictionaries for the text and graphical parts respectively. Then, we compute the sparse representations of all different sizes and non-overlapped document patches in these learned dictionaries. Based on these representations, each patch can be classified into the text or graphic category by comparing its reconstruction errors. Same-sized patches in one category are then merged together to define the corresponding text or graphic layers which are combined to createfinal text/graphic layer. Finally, in a post-processing step, text regions are further filtered out by using some learned thresholds

    New trends on digitisation of complex engineering drawings

    Get PDF
    Engineering drawings are commonly used across different industries such as oil and gas, mechanical engineering and others. Digitising these drawings is becoming increasingly important. This is mainly due to the legacy of drawings and documents that may provide rich source of information for industries. Analysing these drawings often requires applying a set of digital image processing methods to detect and classify symbols and other components. Despite the recent significant advances in image processing, and in particular in deep neural networks, automatic analysis and processing of these engineering drawings is still far from being complete. This paper presents a general framework for complex engineering drawing digitisation. A thorough and critical review of relevant literature, methods and algorithms in machine learning and machine vision is presented. Real-life industrial scenario on how to contextualise the digitised information from specific type of these drawings, namely piping and instrumentation diagrams, is discussed in details. A discussion of how new trends on machine vision such as deep learning could be applied to this domain is presented with conclusions and suggestions for future research directions
    corecore