57 research outputs found

    Product graph-based higher order contextual similarities for inexact subgraph matching

    Get PDF
    This is the author accepted manuscript. The final version is available from Elsevier via the DOI in this record Many algorithms formulate graph matching as an optimization of an objective function of pairwise quantification of nodes and edges of two graphs to be matched. Pairwise measurements usually consider local attributes but disregard contextual information involved in graph structures. We address this issue by proposing contextual similarities between pairs of nodes. This is done by considering the tensor product graph (TPG) of two graphs to be matched, where each node is an ordered pair of nodes of the operand graphs. Contextual similarities between a pair of nodes are computed by accumulating weighted walks (normalized pairwise similarities) terminating at the corresponding paired node in TPG. Once the contextual similarities are obtained, we formulate subgraph matching as a node and edge selection problem in TPG. We use contextual similarities to construct an objective function and optimize it with a linear programming approach. Since random walk formulation through TPG takes into account higher order information, it is not a surprise that we obtain more reliable similarities and better discrimination among the nodes and edges. Experimental results shown on synthetic as well as real benchmarks illustrate that higher order contextual similarities increase discriminating power and allow one to find approximate solutions to the subgraph matching problem.European Union Horizon 202

    Hierarchical stochastic graphlet embedding for graph-based pattern recognition

    Get PDF
    This is the final version. Available on open access from Springer via the DOI in this recordDespite being very successful within the pattern recognition and machine learning community, graph-based methods are often unusable with many machine learning tools. This is because of the incompatibility of most of the mathematical operations in graph domain. Graph embedding has been proposed as a way to tackle these difficulties, which maps graphs to a vector space and makes the standard machine learning techniques applicable for them. However, it is well known that graph embedding techniques usually suffer from the loss of structural information. In this paper, given a graph, we consider its hierarchical structure for mapping it into a vector space. The hierarchical structure is constructed by topologically clustering the graph nodes, and considering each cluster as a node in the upper hierarchical level. Once this hierarchical structure of graph is constructed, we consider its various configurations of its parts, and use stochastic graphlet embedding (SGE) for mapping them into vector space. Broadly speaking, SGE produces a distribution of uniformly sampled low to high order graphlets as a way to embed graphs into the vector space. In what follows, the coarse-to-fine structure of a graph hierarchy and the statistics fetched through the distribution of low to high order stochastic graphlets complements each other and include important structural information with varied contexts. Altogether, these two techniques substantially cope with the usual information loss involved in graph embedding techniques, and it is not a surprise that we obtain more robust vector space embedding of graphs. This fact has been corroborated through a detailed experimental evaluation on various benchmark graph datasets, where we outperform the state-of-the-art methods.European Union Horizon 2020Ministerio de Educación, Cultura y Deporte, SpainGeneralitat de Cataluny

    Table Detection in Invoice Documents by Graph Neural Networks

    Get PDF
    This is the author accepted manuscript. The final version is available from IEEE via the DOI in this record.Tabular structures in documents offer a complementary dimension to the raw textual data, representing logical or quantitative relationships among pieces of information. In digital mail room applications, where a large amount of administrative documents must be processed with reasonable accuracy, the detection and interpretation of tables is crucial. Table recognition has gained interest in document image analysis, in particular in unconstrained formats (absence of rule lines, unknown information of rows and columns). In this work, we propose a graph-based approach for detecting tables in document images. Instead of using the raw content (recognized text), we make use of the location, context and content type, thus it is purely a structure perception approach, not dependent on the language and the quality of the text reading. Our framework makes use of Graph Neural Networks (GNNs) in order to describe the local repetitive structural information of tables in invoice documents. Our proposed model has been experimentally validated in two invoice datasets and achieved encouraging results. Additionally, due to the scarcity of benchmark datasets for this task, we have contributed to the community a novel dataset derived from the RVL-CDIP invoice data. It will be publicly released to facilitate future research.European Unio

    Chapter 2. The Baix Llobregat (BALL) Demographic Database, between Historical Demography and Computer Vision (nineteenth–twentieth centuries)

    Full text link
    The main aims with this book are to compare source materials, databases and research results, as well as creating new opportunities for collaboration in the field of social and population history in the East and the West. All the contributions are based on nominative source material, mainly censuses and vital records, which have been preserved, scanned, transcribed into databases in order to be used for cross-sectional and longitudinal research. The chapters in the first part of this book mostly focus on the construction of nominative databases in Germany, Spain and Romania. The chapters in the second and third part are case studies on the relationship between marriage and fertility; mortality and fertility; marriage behavior and religion; urban mortality; migration, etc. made on the Russian, Austrian, Estonian, Hungarian and Norwegian databases

    Runner’s Profile and Propensity to Sports Injury

    Full text link
    Se evalúa las relaciones entre el perfil sociodemográfico, la accidentabilidad y la propensión al accidente de los participantes en tres eventos deportivos: Zurich Marató de Barcelona, Cros de Muntanya Can Caralleu, y Marató Borredà-Xtrail. Una adaptación del cuestionario de propensión al accidente deportivo (PAD-22) de Latorre y Pantoja (2013) fue administrado a un total de 237 corredores. Los principales resultados muestran que: los corredores tienden a ser mayoritariamente varones, de entre 30 y 46 años, asalariados, con estudios postobligatorios, con experiencia previa en eventos de larga distancia, entrenan una media de 4 veces y un total de 7 horas a la semana; y los corredores de la maratón por asfalto tienen una sobreestimación de la Competencia Percibida y grados de Competitividad mayores a los corredores por montañaThis study evaluates relations between sociodemographic profile, accident rate and accident’s propensity of three sport events participants: Zurich Marató de Barcelona, Cros de Muntanya Can Caralleu & Marató Borredà-Xtrail. The used method was an adaptation of the sports accident prone scale (PAD-22) from Latorre y Pantoja (2013), to 237 runners. The main results show that: runners tend to be mostly men, aged of 30-46 years, are salaried, have post-compulsory studies, have some experience in long distance events, train a mean of 4 times and more than 7 hours per week; and marathon asphalt runners have a overestimation of Perceived Competence and elevated degrees of Competitiveness, more than trail runnersEste trabajo forma parte del Proyecto de Investigación, con código 2014 PINEF 00006 y ha sido realizado con el apoyo del programa de becas predoctorales del Instituto Nacional de Educación Física de Cataluña (PINEFC-2015). Agradecemos el apoyo dado por el INEFC en la realización de este estudio, puesto que, sin su cobijo, no se hubiera podido llevar a cabo con las mismas condicione

    Data Centric Domain Adaptation for Historical Text with OCR Errors

    Get PDF
    We propose new methods for in-domain and cross-domain Named Entity Recognition (NER) on historical data for Dutch and French. For the cross-domain case, we address domain shift by integrating unsupervised in-domain data via contextualized string embeddings; and OCR errors by injecting synthetic OCR errors into the source domain and address data centric domain adaptation. We propose a general approach to imitate OCR errors in arbitrary input data. Our cross-domain as well as our in-domain results outperform several strong baselines and establish state-of-the-art results. We publish preprocessed versions of the French and Dutch Europeana NER corpora

    Subgraph spotting in graph representations of comic book images

    Get PDF
    This is the author accepted manuscript. The final version is available from Elsevier via the DOI in this record Graph-based representations are the most powerful data structures for extracting, representing and preserving the structural information of underlying data. Subgraph spotting is an interesting research problem, especially for studying and investigating the structural information based content-based image retrieval (CBIR) and query by example (QBE) in image databases. In this paper we address the problem of lack of freely available ground-truthed datasets for subgraph spotting and present a new dataset for subgraph spotting in graph representations of comic book images (SSGCI) with its ground-truth and evaluation protocol. Experimental results of two state-of-the-art methods of subgraph spotting are presented on the new SSGCI dataset.University of La Rochelle (France

    Heuristics-based detection to improve text/graphics segmentation in complex engineering drawings.

    Get PDF
    The demand for digitisation of complex engineering drawings becomes increasingly important for the industry given the pressure to improve the efficiency and time effectiveness of operational processes. There have been numerous attempts to solve this problem, either by proposing a general form of document interpretation or by establishing an application dependant framework. Moreover, text/graphics segmentation has been presented as a particular form of addressing document digitisation problem, with the main aim of splitting text and graphics into different layers. Given the challenging characteristics of complex engineering drawings, this paper presents a novel sequential heuristics-based methodology which is aimed at localising and detecting the most representative symbols of the drawing. This implementation enables the subsequent application of a text/graphics segmentation method in a more effective form. The experimental framework is composed of two parts: first we show the performance of the symbol detection system and then we present an evaluation of three different state of the art text/graphic segmentation techniques to find text on the remaining image

    Graphic Recognition: The Concept Lattice Approach

    No full text

    On Influence of Line Segmentation in Efficient Word Segmentation in Old Manuscripts

    No full text
    The objective of this work is to show the importance of a good line segmentation to obtain better results in the segmentation of words of historical documents. We have used the approach developed by Manmatha and Rothfeder [1] to segment words in old handwritten documents. In their work the lines of the documents are extracted using projections. In this work, we have developed an approach to segment lines more efficiently. The new line segmentation algorithm tackles with skewed, touching and noising lines, so it is significantly improves word segmentation. Experiments using Spanish docu- ments from the Marriages Database of the Barcelona Cathedral show that this approach reduces the error rate by more than 20%
    corecore