1,880 research outputs found
A Multimodal Approach for Semantic Patent Image Retrieval
Patent images such as technical drawings contain valuable information and are frequently used by experts to compare patents. However, current approaches to patent information retrieval are largely focused on textual information. Consequently, we review previous work on patent retrieval with a focus on illustrations in figures. In this paper, we report on work in progress for a novel approach for patent image retrieval that uses deep multimodal features. Scene text spotting and optical character recognition are employed to extract numerals from an image to subsequently identify references to corresponding sentences in the patent document. Furthermore, we use a neural state-of-the-art CLIP model to extract structural features from illustrations and additionally derive textual features from the related patent text using a sentence transformer model. To fuse our multimodal features for similarity search we apply re-ranking according to averaged or maximum scores. In our experiments, we compare the impact of different modalities on the task of similarity search for patent images. The experimental results suggest that patent image retrieval can be successfully performed using the proposed feature sets, while the best results are achieved when combining the features of both modalities
Recommended from our members
A Multimodal Approach for Semantic Patent Image Retrieval
Patent images such as technical drawings contain valuable information and are frequently used by experts to compare patents. However, current approaches to patent information retrieval are largely focused on textual information. Consequently, we review previous work on patent retrieval with a focus on illustrations in figures. In this paper, we report on work in progress for a novel approach for patent image retrieval that uses deep multimodal features. Scene text spotting and optical character recognition are employed to extract numerals from an image to subsequently identify references to corresponding sentences in the patent document. Furthermore, we use a neural state-of-the-art CLIP model to extract structural features from illustrations and additionally derive textual features from the related patent text using a sentence transformer model. To fuse our multimodal features for similarity search we apply re-ranking according to averaged or maximum scores. In our experiments, we compare the impact of different modalities on the task of similarity search for patent images. The experimental results suggest that patent image retrieval can be successfully performed using the proposed feature sets, while the best results are achieved when combining the features of both modalities
ISTRAŽIVANJE O POVEZIVANJU ENTITETA ZA SPECIFIÄNE DOMENE S HETEROGENIM INFORMACIJSKIM MREŽAMA
Entity linking is a task of extracting information that links the mentioned entity in a collection of text with their similar knowledge base as well as it is the task of allocating unique identity to various entities such as locations, individuals and companies. Knowledgebase (KB) is used to optimize the information collection, organization and for retrieval of information. Heterogeneous information networks (HIN) comprises multiple-type interlinked objects with various types of relationship which are becoming increasingly most popular named bibliographic networks, social media networks as well including the typical relational database data. In HIN, there are various data objects are interconnected through various relations. The entity linkage determines the corresponding entities from unstructured web text, in the existing HIN. This work is the most important and it is the most challenge because of ambiguity and existing limited knowledge. Some HIN could be considered as a domain-specific KB. The current Entity Linking (EL) systems aimed towards corpora which contain heterogeneous as web information and it performs sub-optimally on the domain-specific corpora. The EL systems used one or more general or specific domains of linking such as DBpedia, Wikipedia, Freebase, IMDB, YAGO, Wordnet and MKB. This paper presents a survey on domain-specific entity linking with HIN. This survey describes with a deep understanding of HIN, which includes datasets,types and examples with related concepts.Povezivanje entiteta je zadatak izvlaÄenja podataka koji povezuju spomenuti entitet u zbirci teksta sa njihovom sliÄnom bazom znanja, kao i zadatak dodjeljivanja jedinstvenog identiteta razliÄitim entitetima, kao Å”to su lokacije, pojedinci i tvrtke. Baza znanja (BZ) koristi se za optimizaciju prikupljanja, organizacije i pronalaženja informacija. Heterogene mreže informacija (HMI) obuhvaÄaju viÅ”estruke meÄusobno povezane objekte razliÄitih vrsta odnosa koji postaju sve popularniji i nazivaju se bibliografskim mrežama, mrežama druÅ”tvenih medija, ukljuÄujuÄi tipiÄne podatke relacijske baze podataka. U HMI-u postoje razni podaci koji su meÄusobno povezani kroz razliÄite odnose. Povezanost entiteta odreÄuje odgovarajuÄe entitete iz nestrukturiranog teksta na webu u postojeÄem HMI-u. Ovaj je rad najvažniji i najveÄi izazov zbog nejasnoÄe i postojeÄeg ograniÄenog znanja. Neki se HMI mogu smatrati BZ-om specifiÄnim za domenu. Trenutni sustav povezivanja entiteta (PE) usmjeren je prema korpusima koji sadrže heterogene informacije kao web informacije i oni djeluju suptimalno na korpusima specifiÄnim za domenu. PE sustavi koristili su jednu ili viÅ”e opÄih ili specifiÄnih domena povezivanja, kao Å”to su DBpedia, Wikipedia, Freebase, IMDB, YAGO, Wordnet i MKB. U ovom radu predstavljeno je istraživanje o povezivanju entiteta specifiÄnog za domenu sa HMI-om. Ovo istraživanje opisuje s dubokim razumijevanjem HMI-a, Å”to ukljuÄuje skupove podataka, vrste i primjere s povezanim konceptima
Component Segmentation of Engineering Drawings Using Graph Convolutional Networks
We present a data-driven framework to automate the vectorization and machine
interpretation of 2D engineering part drawings. In industrial settings, most
manufacturing engineers still rely on manual reads to identify the topological
and manufacturing requirements from drawings submitted by designers. The
interpretation process is laborious and time-consuming, which severely inhibits
the efficiency of part quotation and manufacturing tasks. While recent advances
in image-based computer vision methods have demonstrated great potential in
interpreting natural images through semantic segmentation approaches, the
application of such methods in parsing engineering technical drawings into
semantically accurate components remains a significant challenge. The severe
pixel sparsity in engineering drawings also restricts the effective
featurization of image-based data-driven methods. To overcome these challenges,
we propose a deep learning based framework that predicts the semantic type of
each vectorized component. Taking a raster image as input, we vectorize all
components through thinning, stroke tracing, and cubic bezier fitting. Then a
graph of such components is generated based on the connectivity between the
components. Finally, a graph convolutional neural network is trained on this
graph data to identify the semantic type of each component. We test our
framework in the context of semantic segmentation of text, dimension and,
contour components in engineering drawings. Results show that our method yields
the best performance compared to recent image, and graph-based segmentation
methods.Comment: Preprint accepted to Computers in Industr
- ā¦