42 research outputs found

    Using similarity of graphs in evaluation of designs

    Get PDF
    This paper deals with evaluating design on the basis of their internal structures in the form of graphs. A set containing graphs representing solutions of similar design tasks is used to search for frequently occurring subgraphs. On the basis of the results of the search the quality of new solutions is evaluated. Moreover the common subgraphs found are considered to be design patterns characterizing a given design task solutions. The paper presents the generic concept of such an approach as well as illustrates it by the small example of floor layout design

    When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors

    Full text link
    Finding similar user pairs is a fundamental task in social networks, with numerous applications in ranking and personalization tasks such as link prediction and tie strength detection. A common manifestation of user similarity is based upon network structure: each user is represented by a vector that represents the user's network connections, where pairwise cosine similarity among these vectors defines user similarity. The predominant task for user similarity applications is to discover all similar pairs that have a pairwise cosine similarity value larger than a given threshold Ď„\tau. In contrast to previous work where Ď„\tau is assumed to be quite close to 1, we focus on recommendation applications where Ď„\tau is small, but still meaningful. The all pairs cosine similarity problem is computationally challenging on networks with billions of edges, and especially so for settings with small Ď„\tau. To the best of our knowledge, there is no practical solution for computing all user pairs with, say Ď„=0.2\tau = 0.2 on large social networks, even using the power of distributed algorithms. Our work directly addresses this challenge by introducing a new algorithm --- WHIMP --- that solves this problem efficiently in the MapReduce model. The key insight in WHIMP is to combine the "wedge-sampling" approach of Cohen-Lewis for approximate matrix multiplication with the SimHash random projection techniques of Charikar. We provide a theoretical analysis of WHIMP, proving that it has near optimal communication costs while maintaining computation cost comparable with the state of the art. We also empirically demonstrate WHIMP's scalability by computing all highly similar pairs on four massive data sets, and show that it accurately finds high similarity pairs. In particular, we note that WHIMP successfully processes the entire Twitter network, which has tens of billions of edges

    GraphFind: enhancing graph searching by low support data mining techniques

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biomedical and chemical databases are large and rapidly growing in size. Graphs naturally model such kinds of data. To fully exploit the wealth of information in these graph databases, a key role is played by systems that search for all exact or approximate occurrences of a query graph. To deal efficiently with graph searching, advanced methods for indexing, representation and matching of graphs have been proposed.</p> <p>Results</p> <p>This paper presents GraphFind. The system implements efficient graph searching algorithms together with advanced filtering techniques that allow approximate search. It allows users to select candidate subgraphs rather than entire graphs. It implements an effective data storage based also on low-support data mining.</p> <p>Conclusions</p> <p>GraphFind is compared with Frowns, GraphGrep and gIndex. Experiments show that GraphFind outperforms the compared systems on a very large collection of small graphs. The proposed low-support mining technique which applies to any searching system also allows a significant index space reduction.</p

    Efficient Subgraph Matching on Billion Node Graphs

    Full text link
    The ability to handle large scale graph data is crucial to an increasing number of applications. Much work has been dedicated to supporting basic graph operations such as subgraph matching, reachability, regular expression matching, etc. In many cases, graph indices are employed to speed up query processing. Typically, most indices require either super-linear indexing time or super-linear indexing space. Unfortunately, for very large graphs, super-linear approaches are almost always infeasible. In this paper, we study the problem of subgraph matching on billion-node graphs. We present a novel algorithm that supports efficient subgraph matching for graphs deployed on a distributed memory store. Instead of relying on super-linear indices, we use efficient graph exploration and massive parallel computing for query processing. Our experimental results demonstrate the feasibility of performing subgraph matching on web-scale graph data.Comment: VLDB201

    Comparing and Fusing Terrain Network Information

    Get PDF
    International audienceTerrain networks (or complex networks) is a type of relational infor-mation that is encountered in many fields. In order to properly answer questionspertaining to the comparison or to the merging of such networks, a method thattakes into account the underlying structure of graphs is proposed. The effective-ness of the method is illustrated using real linguistic data networks and artificialnetworks, in particular

    Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

    Full text link
    Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges' occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase,we develop tight lower and upper bounds of subgraph similarity probability based on a probabilistic matrix index, PMI. PMI is composed of discriminative subgraph features associated with tight lower and upper bounds of subgraph isomorphism probability. Based on PMI, we can sort out a large number of probabilistic graphs and maximize the pruning capability. During the verification phase, we develop an efficient sampling algorithm to validate the remaining candidates. The efficiency of our proposed solutions has been verified through extensive experiments.Comment: VLDB201

    An Approach for Keyword Searching in Uncertain Graph Data

    Get PDF
    ABSTRACT: Keyword searching is generally used for retrieving the relevant data from the database. For input query, the related data is retrieved. But it is tedious task to search keyword on uncertain graph. In this paper, the keyword searching technique over uncertain graph is introduced. The Keyword routing method is used to route the keywords to relevant source. In this approach two methods are included. The keyword relationship graph deduces the relationship between keywords and the element mentioning them. The scoring mechanism computes the score of keywords at each level which reduces the ambiguity. The result will include the subtree of the entire graph which includes all keywords of input query having high score and in addition it retrieves the most relevant data . Effective results are derived from employed method
    corecore