552 research outputs found

    A Convolutional Neural Network-based Patent Image Retrieval Method for Design Ideation

    Full text link
    The patent database is often used in searches of inspirational stimuli for innovative design opportunities because of its large size, extensive variety and rich design information in patent documents. However, most patent mining research only focuses on textual information and ignores visual information. Herein, we propose a convolutional neural network (CNN)-based patent image retrieval method. The core of this approach is a novel neural network architecture named Dual-VGG that is aimed to accomplish two tasks: visual material type prediction and international patent classification (IPC) class label prediction. In turn, the trained neural network provides the deep features in the image embedding vectors that can be utilized for patent image retrieval and visual mapping. The accuracy of both training tasks and patent image embedding space are evaluated to show the performance of our model. This approach is also illustrated in a case study of robot arm design retrieval. Compared to traditional keyword-based searching and Google image searching, the proposed method discovers more useful visual information for engineering design.Comment: 11 pages, 11 figure

    Textual summarisation of flowcharts in patent drawings for CLEF-IP 2012

    Get PDF
    International audienceThe CLEF-IP 2012 track included the Flowchart Recognition task, an image-based task where the goal was to process binary images of flowcharts taken from patent draw- ings to produce summaries containing information about their structure. The textual summaries include information about the flowchart title, the box-node shapes, the con- necting edge types, text describing flowchart content and the structural relationships between nodes and edges. An algorithm designed for this task and characterised by the following method steps is presented: * Text-graphic segmentation based on connected-component clustering; * Line segment bridging with an adaptive, oriented filter; * Box shape classification using a stretch-invariant transform to extract features based on shape-specific symmetry; * Text object recognition using a noisy channel model to enhance the results of a commercial OCR package. Performance evaluation results for the CLEF-IP 2012 Flowchart Recognition task are not yet available so the performance of the algorithm has been measured by com- paring algorithm output with object-level ground-truth values. An average F-score was calculated by combining node classification and edge detection (ignoring edge di- rectivity). Using this measure, a third of all drawings were recognized without error (average F-score=1.00) and 75% show an F-score of 0.78 or better. The most impor- tant failure modes of the algorithm have been identified as text-graphic segmentation, line-segment bridging and edge directivity classification. The text object recognition module of the algorithm has been independently eval- uated. Two different state-of-the-art OCR software packages were compared and a post-correction method was applied to their output. Post-correction yields an im- provement of 9% in OCR accuracy and a 26% reduction in the word error rate

    Topic identification using filtering and rule generation algorithm for textual document

    Get PDF
    Information stored digitally in text documents are seldom arranged according to specific topics. The necessity to read whole documents is time-consuming and decreases the interest for searching information. Most existing topic identification methods depend on occurrence of terms in the text. However, not all frequent occurrence terms are relevant. The term extraction phase in topic identification method has resulted in extracted terms that might have similar meaning which is known as synonymy problem. Filtering and rule generation algorithms are introduced in this study to identify topic in textual documents. The proposed filtering algorithm (PFA) will extract the most relevant terms from text and solve synonym roblem amongst the extracted terms. The rule generation algorithm (TopId) is proposed to identify topic for each verse based on the extracted terms. The PFA will process and filter each sentence based on nouns and predefined keywords to produce suitable terms for the topic. Rules are then generated from the extracted terms using the rule-based classifier. An experimental design was performed on 224 English translated Quran verses which are related to female issues. Topics identified by both TopId and Rough Set technique were compared and later verified by experts. PFA has successfully extracted more relevant terms compared to other filtering techniques. TopId has identified topics that are closer to the topics from experts with an accuracy of 70%. The proposed algorithms were able to extract relevant terms without losing important terms and identify topic in the verse

    Towards recovering architectural information from images of architectural diagrams

    Get PDF
    The architecture of a software system is often described with diagrams embedded in the documentation. However, these diagrams are normally stored and shared as images, losing track of model-level architectural information and refraining software engineers from working on the architectural model later on. In this context, tools able to extract architectural information from images can be of great help. In this article, we present a framework called IMEAV for processing architectural diagrams (based on speci c viewtypes) and recovering information from them. We have instantiated our framework to analyze \module views" and evaluated this prototype with an image dataset. Results have been encouraging, showing a good accuracy for recognizing modules, relations and textual features.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Towards recovering architectural information from images of architectural diagrams

    Get PDF
    The architecture of a software system is often described with diagrams embedded in the documentation. However, these diagrams are normally stored and shared as images, losing track of model-level architectural information and refraining software engineers from working on the architectural model later on. In this context, tools able to extract architectural information from images can be of great help. In this article, we present a framework called IMEAV for processing architectural diagrams (based on speci c viewtypes) and recovering information from them. We have instantiated our framework to analyze \module views" and evaluated this prototype with an image dataset. Results have been encouraging, showing a good accuracy for recognizing modules, relations and textual features.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Structure Diagram Recognition in Financial Announcements

    Full text link
    Accurately extracting structured data from structure diagrams in financial announcements is of great practical importance for building financial knowledge graphs and further improving the efficiency of various financial applications. First, we proposed a new method for recognizing structure diagrams in financial announcements, which can better detect and extract different types of connecting lines, including straight lines, curves, and polylines of different orientations and angles. Second, we developed a two-stage method to efficiently generate the industry's first benchmark of structure diagrams from Chinese financial announcements, where a large number of diagrams were synthesized and annotated using an automated tool to train a preliminary recognition model with fairly good performance, and then a high-quality benchmark can be obtained by automatically annotating the real-world structure diagrams using the preliminary model and then making few manual corrections. Finally, we experimentally verified the significant performance advantage of our structure diagram recognition method over previous methods

    Towards recovering architectural information from images of architectural diagrams

    Get PDF
    The architecture of a software system is often described with diagrams embedded in the documentation. However, these diagrams are normally stored and shared as images, losing track of model-level architectural information and refraining software engineers from working on the architectural model later on. In this context, tools able to extract architectural information from images can be of great help. In this article, we present a framework called IMEAV for processing architectural diagrams (based on speci c viewtypes) and recovering information from them. We have instantiated our framework to analyze \\module views and evaluated this prototype with an image dataset. Results have been encouraging, showing a good accuracy for recognizing modules, relations and textual features.XV Simposio Argentino de Ingeniería de Softwar

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    Classification & prediction methods and their application

    Get PDF
    corecore