223 research outputs found

    Modeling Dynamic Heterogeneous Graph and Node Importance for Future Citation Prediction

    Full text link
    Accurate citation count prediction of newly published papers could help editors and readers rapidly figure out the influential papers in the future. Though many approaches are proposed to predict a paper's future citation, most ignore the dynamic heterogeneous graph structure or node importance in academic networks. To cope with this problem, we propose a Dynamic heterogeneous Graph and Node Importance network (DGNI) learning framework, which fully leverages the dynamic heterogeneous graph and node importance information to predict future citation trends of newly published papers. First, a dynamic heterogeneous network embedding module is provided to capture the dynamic evolutionary trends of the whole academic network. Then, a node importance embedding module is proposed to capture the global consistency relationship to figure out each paper's node importance. Finally, the dynamic evolutionary trend embeddings and node importance embeddings calculated above are combined to jointly predict the future citation counts of each paper, by a log-normal distribution model according to multi-faced paper node representations. Extensive experiments on two large-scale datasets demonstrate that our model significantly improves all indicators compared to the SOTA models.Comment: Accepted by CIKM'202

    Clustering and Community Detection in Directed Networks: A Survey

    Full text link
    Networks (or graphs) appear as dominant structures in diverse domains, including sociology, biology, neuroscience and computer science. In most of the aforementioned cases graphs are directed - in the sense that there is directionality on the edges, making the semantics of the edges non symmetric. An interesting feature that real networks present is the clustering or community structure property, under which the graph topology is organized into modules commonly called communities or clusters. The essence here is that nodes of the same community are highly similar while on the contrary, nodes across communities present low similarity. Revealing the underlying community structure of directed complex networks has become a crucial and interdisciplinary topic with a plethora of applications. Therefore, naturally there is a recent wealth of research production in the area of mining directed graphs - with clustering being the primary method and tool for community detection and evaluation. The goal of this paper is to offer an in-depth review of the methods presented so far for clustering directed networks along with the relevant necessary methodological background and also related applications. The survey commences by offering a concise review of the fundamental concepts and methodological base on which graph clustering algorithms capitalize on. Then we present the relevant work along two orthogonal classifications. The first one is mostly concerned with the methodological principles of the clustering algorithms, while the second one approaches the methods from the viewpoint regarding the properties of a good cluster in a directed network. Further, we present methods and metrics for evaluating graph clustering results, demonstrate interesting application domains and provide promising future research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear

    Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

    Full text link
    Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

    Recommending on graphs: a comprehensive review from a data perspective

    Full text link
    Recent advances in graph-based learning approaches have demonstrated their effectiveness in modelling users' preferences and items' characteristics for Recommender Systems (RSS). Most of the data in RSS can be organized into graphs where various objects (e.g., users, items, and attributes) are explicitly or implicitly connected and influence each other via various relations. Such a graph-based organization brings benefits to exploiting potential properties in graph learning (e.g., random walk and network embedding) techniques to enrich the representations of the user and item nodes, which is an essential factor for successful recommendations. In this paper, we provide a comprehensive survey of Graph Learning-based Recommender Systems (GLRSs). Specifically, we start from a data-driven perspective to systematically categorize various graphs in GLRSs and analyze their characteristics. Then, we discuss the state-of-the-art frameworks with a focus on the graph learning module and how they address practical recommendation challenges such as scalability, fairness, diversity, explainability and so on. Finally, we share some potential research directions in this rapidly growing area.Comment: Accepted by UMUA

    A Network Science and Document Similarity based Hybrid Job Recommendation System

    Get PDF
    Tööde soovitussüsteemid kasutavad erinevaid andmeallikaid lõppkasutajale parema sisu tagamiseks. Hästi toimiva soovitussüsteemi arendamine nõuab keerulisi hübriidseid lähenemisi sarnasuse kujutamisele põhinedes töökuulutuste ja resümeede sisudele ja nendevahelistele interaktsioonidele. Antud töö tulemina arendati efektiivne võrgul baseeruv töökohtade soovitussüsteem, mis kasutab Personalized PageRank algoritmi töökohtade järjestamiseks põhinedes tööotsija resümee ja töökuulutuse kui tekstiliste dokumentide sarnasustele ning eelnevatele kasutaja ja töökuulutuste vahelistele interaktsioonidele.Meie lähenemine saavutas 50%-lise saagise ja tekitas online A/B testi jooksul rohkem kandideerimisi kui eelmised algoritmid.Job recommendation systems mainly use different sources of data in order to give the better content for the end user. Developing the well-performing system requires complex hybrid approaches of representing similarity based on the content of job postings and resumes as well as interactions between them. We develop an efficient hybrid network-based job recommendation system which uses Personalized PageRank algorithm in order to rank vacancies for the users based on the similarity between resumes and job posts as textual documents, along with previous interactions of users with vacancies. Our approach achieved the recall of 50% and generated more applies for the jobs during the online A/B test than previous algorithms

    Personalized PageRank on Evolving Graphs with an Incremental Index-Update Scheme

    Full text link
    {\em Personalized PageRank (PPR)} stands as a fundamental proximity measure in graph mining. Since computing an exact SSPPR query answer is prohibitive, most existing solutions turn to approximate queries with guarantees. The state-of-the-art solutions for approximate SSPPR queries are index-based and mainly focus on static graphs, while real-world graphs are usually dynamically changing. However, existing index-update schemes can not achieve a sub-linear update time. Motivated by this, we present an efficient indexing scheme to maintain indexed random walks in expected O(1)O(1) time after each graph update. To reduce the space consumption, we further propose a new sampling scheme to remove the auxiliary data structure for vertices while still supporting O(1)O(1) index update cost on evolving graphs. Extensive experiments show that our update scheme achieves orders of magnitude speed-up on update performance over existing index-based dynamic schemes without sacrificing the query efficiency
    corecore