8 research outputs found

    Leveraging graph dimensions in online graph search

    Full text link
    Graphs have been widely used due to its expressive power to model complicated relationships. However, given a graph database DG = {g1; g2; ··· , gn}, it is challenging to process graph queries since a basic graph query usually involves costly graph operations such as maximum common subgraph and graph edit distance computation, which are NP-hard. In this paper, we study a novel DS-preserved mapping which maps graphs in a graph database DG onto a multidimensional space MG under a structural dimension Musing a mapping function φ(). The DS-preserved mapping preserves two things: distance and structure. By the distance-preserving, it means that any two graphs gi and gj in DG must map to two data objects φ(gi) and φ(gj) in MG, such that the distance, d(φ(gi); φ(gj), between φ(gi) and φ(gj) in MG approximates the graph dissimilarity δ(gi; gj) in DG. By the structure-preserving, it further means that for a given unseen query graph q, the distance between q and any graph gi in DG needs to be preserved such that δ(q; gi) ≈ d(φ(q); φ(gi)). We discuss the rationality of using graph dimension M for online graph processing, and show how to identify a small set of subgraphs to form M efficiently. We propose an iterative algorithm DSPM to compute the graph dimension, and discuss its optimization techniques. We also give an approximate algorithm DSPMap in order to handle a large graph database. We conduct extensive performance studies on both real and synthetic datasets to evaluate the top-k similarity query which is to find top-k similar graphs from DG for a query graph, and show the effectiveness and efficiency of our approaches. © 2014 VLDB

    Unsupervised Graph-based Rank Aggregation for Improved Retrieval

    Full text link
    This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to combine arbitrary models, defined in terms of different ranking criteria, such as those based on textual, image or hybrid content representations. We reformulate the ad-hoc retrieval problem as a document retrieval based on fusion graphs, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we claim that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Another contribution is that our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking, without further computations and post-processing steps over the graphs. Based on the graphs, a novel similarity retrieval score is formulated using an efficient computation of minimum common subgraphs. Finally, another benefit over existing approaches is the absence of hyperparameters. A comprehensive experimental evaluation was conducted considering diverse well-known public datasets, composed of textual, image, and multimodal documents. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused, thus demonstrating the successful capability of the proposal in representing queries based on a unified graph-based model of rank fusions

    Indexing query graphs to speedup graph query processing

    Get PDF
    Subgraph/supergraph queries although central to graph analytics, are costly as they entail the NP-Complete problem of subgraph isomorphism. We present a fresh solution, the novel principle of which is to acquire and utilize knowledge from the results of previously executed queries. Our approach, iGQ, encompasses two component subindexes to identify if a new query is a subgraph/supergraph of previously executed queries and stores related key information. iGQ comes with novel query processing and index space management algorithms, including graph replacement policies. The end result is a system that leads to significant reduction in the number of required subgraph isomorphism tests and speedups in query processing time. iGQ can be incorporated into any sub/supergraph query processing method and help improve performance. In fact, it is the only contribution that can speedup significantly both subgraph and supergraph query processing. We establish the principles of iGQ and formally prove its correctness. We have implemented iGQ and have incorporated it within three popular recent state of the art index-based graph query processing solutions. We evaluated its performance using real-world and synthetic graph datasets with different characteristics, and a number of query workloads, showcasing its benefits

    Sacola de grafos textuais : um modelo de representação de textos baseado em grafos, preciso, eficiente e de propósito geral

    Get PDF
    Orientador: Ricardo da Silva TorresDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Modelos de representação de textos são o alicerce fundamental para as tarefas de Recuperação de Informação e Mineração de Textos. Apesar de diferentes modelos de representação de textos terem sido propostos, eles não são ao mesmo tempo eficientes, precisos e flexíveis para serem usados em aplicações variadas. Neste projeto, apresentamos a Sacola de Grafos Textuais (do inglês \textit{Bag of Textual Graphs}), um modelo de representação de textos que satisfaz esses três requisitos, ao propor uma combinação de um modelo de representação baseado em grafos com um arcabouço genérico de síntese de grafos em representações vetoriais. Avaliamos nosso método em experimentos considerando quatro coleções textuais bem conhecidas: Reuters-21578, 20-newsgroups, 4-universidades e K-series. Os resultados experimentais demonstram que o nosso modelo é genérico o bastante para lidar com diferentes coleções, e é mais eficiente do que métodos atuais e largamente utilizados em tarefas de classificação e recuperação de textos, sem perda de precisãoAbstract: Text representation models are the fundamental basis for Information Retrieval and Text Mining tasks. Despite different text models have been proposed, they are not at the same time efficient, accurate, and flexible to be used in several applications. Here we present Bag of Textual Graphs, a text representation model that addresses these three requirements, by combining a graph-representation model with an generic framework for graph-to-vector synthesis. We evaluate our method on experiments considering four well-known text collections: Reuters-21578, 20-newsgroups, 4-universities, and K-series. Experimental results demonstrate that our model is generic enough to handle different collections, and is more efficient than widely-used state-of-the-art methods in textual classification and retrieval tasks, without losing accuracy performanceMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

    Agregação de ranks baseada em grafos

    Get PDF
    Orientador: Ricardo da Silva TorresTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Neste trabalho, apresentamos uma abordagem robusta de agregação de listas baseada em grafos, capaz de combinar resultados de modelos de recuperação isolados. O método segue um esquema não supervisionado, que é independente de como as listas isoladas são geradas. Nossa abordagem é capaz de incorporar modelos heterogêneos, de diferentes critérios de recuperação, tal como baseados em conteúdo textual, de imagem ou híbridos. Reformulamos o problema de recuperação ad-hoc como uma recuperação baseada em fusion graphs, que propomos como um novo modelo de representação unificada capaz de mesclar várias listas e expressar automaticamente inter-relações de resultados de recuperação. Assim, mostramos que o sistema de recuperação se beneficia do aprendizado da estrutura intrínseca das coleções, levando a melhores resultados de busca. Nossa formulação de agregação baseada em grafos, diferentemente das abordagens existentes, permite encapsular informação contextual oriunda de múltiplas listas, que podem ser usadas diretamente para ranqueamento. Experimentos realizados demonstram que o método apresenta alto desempenho, produzindo melhores eficácias que métodos recentes da literatura e promovendo ganhos expressivos sobre os métodos de recuperação fundidos. Outra contribuição é a extensão da proposta de grafo de fusão visando consulta eficiente. Trabalhos anteriores são promissores quanto à eficácia, mas geralmente ignoram questões de eficiência. Propomos uma função inovadora de agregação de consulta, não supervisionada, intrinsecamente multimodal almejando recuperação eficiente e eficaz. Introduzimos os conceitos de projeção e indexação de modelos de representação de agregação de consulta com base em grafos, e a sua aplicação em tarefas de busca. Formulações de projeção são propostas para representações de consulta baseadas em grafos. Introduzimos os fusion vectors, uma representação de fusão tardia de objetos com base em listas, a partir da qual é definido um modelo de recuperação baseado intrinsecamente em agregação. A seguir, apresentamos uma abordagem para consulta rápida baseada nos vetores de fusão, promovendo agregação de consultas eficiente. O método apresentou alta eficácia quanto ao estado da arte, além de trazer uma perspectiva de eficiência pouco abordada. Ganhos consistentes de eficiência são alcançadas em relação aos trabalhos recentes. Também propomos modelos de representação baseados em consulta para problemas gerais de predição. Os conceitos de grafos de fusão e vetores de fusão são estendidos para cenários de predição, nos quais podem ser usados para construir um modelo de estimador para determinar se um objeto de avaliação (ainda que multimodal) se refere a uma classe ou não. Experimentos em tarefas de classificação multimodal, tal como detecção de inundação, mostraram que a solução é altamente eficaz para diferentes cenários de predição que envolvam dados textuais, visuais e multimodais, produzindo resultados melhores que vários métodos recentes. Por fim, investigamos a adoção de abordagens de aprendizagem para ajudar a otimizar a criação de modelos de representação baseados em consultas, a fim de maximizar seus aspectos de capacidade discriminativa e eficiência em tarefas de predição e de buscaAbstract: In this work, we introduce a robust graph-based rank aggregation approach, capable of combining results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to incorporate heterogeneous models, defined in terms of different ranking criteria, such as those based on textual, image, or hybrid content representations. We reformulate the ad-hoc retrieval problem as a graph-based retrieval based on {\em fusion graphs}, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we show that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused. Another contribution refers to the extension of the fusion graph solution for efficient rank aggregation. Although previous works are promising with respect to effectiveness, they usually overlook efficiency aspects. We propose an innovative rank aggregation function that it is unsupervised, intrinsically multimodal, and targeted for fast retrieval and top effectiveness performance. We introduce the concepts of embedding and indexing graph-based rank-aggregation representation models, and their application for search tasks. Embedding formulations are also proposed for graph-based rank representations. We introduce the concept of {\em fusion vectors}, a late-fusion representation of objects based on ranks, from which an intrinsically rank-aggregation retrieval model is defined. Next, we present an approach for fast retrieval based on fusion vectors, thus promoting an efficient rank aggregation system. Our method presents top effectiveness performance among state-of-the-art related work, while promoting an efficiency perspective not yet covered. Consistent speedups are achieved against the recent baselines in all datasets considered. Derived from the fusion graphs and fusion vectors, we propose rank-based representation models for general prediction problems. The concepts of fusion graphs and fusion vectors are extended to prediction scenarios, where they can be used to build an estimator model to determine whether an input (even multimodal) object refers to a class or not. Performed experiments in the context of multimodal classification tasks, such as flood detection, show that the proposed solution is highly effective for different detection scenarios involving textual, visual, and multimodal features, yielding better detection results than several state-of-the-art methods. Finally, we investigate the adoption of learning approaches to help optimize the creation of rank-based representation models, in order to maximize their discriminative power and efficiency aspects in prediction and search tasksDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã

    Optimizing graph query performance by indexing and caching

    Get PDF
    Subgraph/supergraph queries, though central to graph analytics, are costly as they entail the NP-Complete problem of subgraph isomorphism. To expedite graph query processing, the community has contributed a wealth of approaches that gradually form two categories, i.e., heuristic subgraph isomorphism (SI) methods and algorithms following “filter-then-verify” paradigm (FTV). However, they both bear performance limitations. And a significant drawback of current studies lies in that they throw away the results obtained when executing previous graph queries. To this end, the current work shall present a fresh solution named iGQ, principle of which is to acquire and utilize knowledge from the results of previously executed queries. iGQ encompasses two component subindexes to identify if a new query is a subgraph or supergraph of previously executed queries, such that the stored knowledge will be turned on to accelerate the execution of the new query graph through reducing the subgraph isomorphism tests to be performed. The correctness of iGQ is assured by formal proof. Moreover, iGQ affords the elegance of double use for subgraph and supergraph query processing, bridging the two separate research threads in the community. On the other hand, using cache to accelerate query processing has been prevalent in data management systems. In the realm of graph structured queries, however, little work has been done. Meanwhile, modern big data applications are emerging and demanding the high performance of graph query processing. Therefore, this thesis shall put forth a full-fledged graph caching system coined GraphCache for graph queries. From the ground up, GraphCache is designed as a semantic graph cache that could harness both subgraph and supergraph cache hits, expanding the traditional hits confined by exact match. GraphCache is featured by well-defined subsystems and interfaces, allowing for the flexibility of plugging in any general subgraph/supergraph query solution, be it an FTV algorithm or SI method. Furthermore, GraphCache incorporates the iGQ as the engine of query processing, where previously issued queries are leveraged to expedite graph query processing. With the continuous arrival of queries and the finite memory space, GraphCache requires mechanisms to effectively manage the space, which in turn emerges the problem of cache replacement. But none of the existing replacement policies are developed specifically for graph cache. This work hence proposes a number of graph query aware strategies with different trade-offs and emphasizes a novel hybrid replacement policy with competitive performance. Following the established research in literature, GraphCache handles graph queries against a static dataset, i.e., all graphs in the underlying dataset keep untouched during the continual arrival and execution of queries. However, in real-world applications, the graph dataset naturally evolves/changes over time. This poses a significant challenge for the current graph caching technique and hence gives rise to the requirement of advanced systems that are capable of accelerating subgraph/supergraph queries against dynamic datasets. To address the problem, this work shall contribute an upgraded graph caching system, namely GraphCache+, stressing the newly plugged in subsystems and components of dealing with the consistency of graph cache. GraphCache+ is characterized by its two cache models that represent different designs of ensuring graph cache consistency, as well as the novel logics of alleviating subgraph and supergraph query processing with formal proof of correctness. Additionally, this work is bundled with comprehensive performance evaluations of GraphCache/GraphCache+ with over 6 million queries against both real-world and synthetic datasets with different characteristics, revealing a number of non-trivial lessons. In overall, this work contributes to the community from three perspectives: it provides a fresh idea to expedite graph query processing, applicable for both SI methods and FTV algorithms; it presents GraphCache, to the best of our knowledge the first full-fledged graph caching system for general subgraph/supergraph queries; it explores the topic of graph cache consistency, putting forth a systematic solution GraphCache+

    Leveraging Graph Dimensions in Online Graph Search

    No full text
    ABSTRACT Graphs have been widely used due to its expressive power to model complicated relationships. However, given a graph database DG = {g1, g2, · · · , gn}, it is challenging to process graph queries since a basic graph query usually involves costly graph operations such as maximum common subgraph and graph edit distance computation, which are NP-hard. In this paper, we study a novel DSpreserved mapping which maps graphs in a graph database DG onto a multidimensional space MG under a structural dimension M using a mapping function φ(). The DS-preserved mapping preserves two things: distance and structure. By the distance-preserving, it means that any two graphs gi and gj in DG must map to two data objects φ(gi) and φ(gj) in MG, such that the distance, d(φ(gi), φ(gj)), between φ(gi) and φ(gj) in MG approximates the graph dissimilarity δ(gi, gj) in DG. By the structure-preserving, it further means that for a given unseen query graph q, the distance between q and any graph gi in DG needs to be preserved such that δ(q, gi) ≈ d(φ(q), φ(gi)). We discuss the rationality of using graph dimension M for online graph processing, and show how to identify a small set of subgraphs to form M efficiently. We propose an iterative algorithm DSPM to compute the graph dimension, and discuss its optimization techniques. We also give an approximate algorithm DSPMap in order to handle a large graph database. We conduct extensive performance studies on both real and synthetic datasets to evaluate the top-k similarity query which is to find top-k similar graphs from DG for a query graph, and show the effectiveness and efficiency of our approaches
    corecore