34 research outputs found

    RoleSim* : scaling axiomatic role-based similarity ranking on large graphs

    Get PDF
    RoleSim and SimRank are among the popular graph-theoretic similarity measures with many applications in, e.g., web search, collaborative filtering, and sociometry. While RoleSim addresses the automorphic (role) equivalence of pairwise similarity which SimRank lacks, it ignores the neighboring similarity information out of the automorphically equivalent set. Consequently, two pairs of nodes, which are not automorphically equivalent by nature, cannot be well distinguished by RoleSim if the averages of their neighboring similarities over the automorphically equivalent set are the same. To alleviate this problem: 1) We propose a novel similarity model, namely RoleSim*, which accurately evaluates pairwise role similarities in a more comprehensive manner. RoleSim* not only guarantees the automorphic equivalence that SimRank lacks, but also takes into account the neighboring similarity information outside the automorphically equivalent sets that are overlooked by RoleSim. 2) We prove the existence and uniqueness of the RoleSim* solution, and show its three axiomatic properties (i.e., symmetry, boundedness, and non-increasing monotonicity). 3) We provide a concise bound for iteratively computing RoleSim* formula, and estimate the number of iterations required to attain a desired accuracy. 4) We induce a distance metric based on RoleSim* similarity, and show that the RoleSim* metric fulfills the triangular inequality, which implies the sum-transitivity of its similarity scores. 5) We present a threshold-based RoleSim* model that reduces the computational time further with provable accuracy guarantee. 6) We propose a single-source RoleSim* model, which scales well for sizable graphs. 7) We also devise methods to scale RoleSim* based search by incorporating its triangular inequality property with partitioning techniques. Our experimental results on real datasets demonstrate that RoleSim* achieves higher accuracy than its competitors while scaling well on sizable graphs with billions of edges

    Neural IR Meets Graph Embedding: A Ranking Model for Product Search

    Full text link
    Recently, neural models for information retrieval are becoming increasingly popular. They provide effective approaches for product search due to their competitive advantages in semantic matching. However, it is challenging to use graph-based features, though proved very useful in IR literature, in these neural approaches. In this paper, we leverage the recent advances in graph embedding techniques to enable neural retrieval models to exploit graph-structured data for automatic feature extraction. The proposed approach can not only help to overcome the long-tail problem of click-through data, but also incorporate external heterogeneous information to improve search results. Extensive experiments on a real-world e-commerce dataset demonstrate significant improvement achieved by our proposed approach over multiple strong baselines both as an individual retrieval model and as a feature used in learning-to-rank frameworks.Comment: A preliminary version of the work to appear in TheWebConf'19 (formerly, WWW'19

    Dynamic network analytics for recommending scientific collaborators

    Full text link
    Collaboration is one of the most important contributors to scientific advancement and a crucial aspect of an academic’s career. However, the explosion in academic publications has, for some time, been making it more challenging to find suitable research partners. Recommendation approaches to help academics find potential collaborators are not new. However, the existing methods operate on static data, which can render many suggestions less useful or out of date. The approach presented in this paper simulates a dynamic network from static data to gain further insights into the changing research interests, activities and co-authorships of scholars in a field–all insights that can improve the quality of the recommendations produced. Following a detailed explanation of the entire framework, from data collection through to recommendation modelling, we provide a case study on the field of information science to demonstrate the reliability of the proposed method, and the results provide empirical insights to support decision-making in related stakeholders—e.g., scientific funding agencies, research institutions and individual researchers in the field

    Cybersecurity knowledge graphs

    Get PDF
    Cybersecurity knowledge graphs, which represent cyber-knowledge with a graph-based data model, provide holistic approaches for processing massive volumes of complex cybersecurity data derived from diverse sources. They can assist security analysts to obtain cyberthreat intelligence, achieve a high level of cyber-situational awareness, discover new cyber-knowledge, visualize networks, data flow, and attack paths, and understand data correlations by aggregating and fusing data. This paper reviews the most prominent graph-based data models used in this domain, along with knowledge organization systems that define concepts and properties utilized in formal cyber-knowledge representation for both background knowledge and specific expert knowledge about an actual system or attack. It is also discussed how cybersecurity knowledge graphs enable machine learning and facilitate automated reasoning over cyber-knowledge

    Clustering sequence graphs

    Get PDF
    In application domains ranging from social networks to e-commerce, it is important to cluster users with respect to both their relationships (e.g., friendship or trust) and their actions (e.g., visited locations or rated products). Motivated by these applications, we introduce here the task of clustering the nodes of a sequence graph, i.e., a graph whose nodes are labeled with strings (e.g., sequences of users’ visited locations or rated products). Both string clustering algorithms and graph clustering algorithms are inappropriate to deal with this task, as they do not consider the structure of strings and graph simultaneously. Moreover, attributed graph clustering algorithms generally construct poor solutions because they need to represent a string as a vector of attributes, which inevitably loses information and may harm clustering quality. We thus introduce the problem of clustering a sequence graph. We first propose two pairwise distance measures for sequence graphs, one based on edit distance and shortest path distance and another one based on SimRank. We then formalize the problem under each measure, showing also that it is NP-hard. In addition, we design a polynomial-time 2-approximation algorithm, as well as a heuristic for the problem. Experiments using real datasets and a case study demonstrate the effectiveness and efficiency of our methods

    Semantic Search Using a Similarity Graph

    Get PDF
    Given a set of documents and an input query that is expressed in a natural language, the problem of document search is retrieving the most relevant documents. Unlike most existing systems that perform document search based on keywords matching, we propose a search method that considers the meaning of the words in the query and the document. As a result, our algorithm can return documents that have no words in common with the input query as long as the documents are relevant. For example, a document that contains the words “Ford”, “Chrysler” and “General Motors” multiple times is surely relevant for the query “car” even if the word “car” does not appear in the document. Our semantic search algorithm is based on a similarity graph that contains the degree of semantic similarity between terms, where a term can be a word or a phrase. We experimentally validate our algorithm on the Cranfield benchmark that contains 1400 documents and 225 natural language queries. The benchmark also contains the relevant documents for every query as determined by human judgment. We show that our semantic search algorithm produces a higher value for the mean average precision (MAP) score than a keywords matching algorithm. This shows that our approach can improve the quality of the result because the meaning of the words and phrases in the documents and the queries is taken into account
    corecore