163 research outputs found

    Introducing scalegraph: an x10 library for billion scale graph analytics

    Get PDF

    Link prediction in large directed graphs

    Get PDF
    The first chapter introduces an approach to machine learning (ML) were data is understood as a network of connected entities. This strategy seeks inter-entity information for knowledge discovery, in contrast with traditional intra-entity approaches based on instances and their features. We discuss the importance of this connectivist ML (which we refer to as graph mining) in the current context where large, topology-based data sets have been made available. Chapter ends by introducing the Link Prediction (LP) problem, together with its current computational and performance limitations. The second chapter discusses early contributions to graph mining, and introduces problems frequently tackled through this paradigm. Later the chapter focuses on the state-of-the-art of LP. It presents three different approaches to the problem of finding links in a relational set, and argues about the importance of the most computationally scalable one: similarity-based algorithms. It categorizes similarity-based algorithms in three types of LP scores. For the most scalable type, local similarity-based algorithms, the chapter identifies and formally describes the most competitive proposals according to the bibliography. Chapter three analyses the LP problem, partly as a classic binary classification problem. A list of graph properties such as directionality, weights and time are discussed in the context of LP. Follows a formal time and space complexity analysis of similarity-based scores of LP. The chapter ends with an study of the class imbalance found in LP problems. In chapter four a novel similarity-based score of LP is introduced. The chapter first elaborates on the importance of hierarchies for representing knowledge through directed graphs. Several modifications to the proposed score are also defined. This chapter presents a modified version of the most competitive undirected scores of LP, to adapt them to directed graphs. The evaluation methodologies of LP are analyzed in the fifth chapter. It starts by discussing the problem of evaluating domains with a huge class imbalance, identifying the most appropriate methodologies for it. A modification of the most appropriate evaluation methodology according to the bibliography is presented, with the goal of focusing on relevant predictions. Follows a discussion on the faithful estimation of the precision of predictors. Chapter six describes the graphs used for score evaluation, as well as how data was transformed into a directed graph. Reasons on why these particular domains were chosen are given, making a special case of webgraphs and their well known relation with hierarchies. The most basic properties of each resultant graph are shown. Tests performed are presented in chapter seven. The three most competitive LP scores currently available are tested among themselves, and against a proposed version of those same scores for directed graphs. Our proposed score and its modifications are tested against the scores obtaining the best results in the previous tests. The case of LP in webgraphs is considered separately, testing six different webgraphs. The chapter ends with a discussion on the limitations of this formal analysis, showing examples of predictions obtained. Chapter eight includes the computational aspects of the work done. It starts with a discussion on the importance of memory management for determining the computational cost of LP algorithms. A proposal on how to reduce this cost through precision reduction is presented. Follows a section focused on the parallelization of code, which includes two different implementations on one graph-specific programming model (Pregel) and on one generic programming model (OpenMP). The chapter ends with a specification of the computational resources used for the tests done. The conclusions of this thesis proposal are presented in nine. Chapter ten contains several future lines of work.El primer capítol introdueix una perspectiva de l'aprenentatge automàtic on les dades s'entén com una xarxa d'entitats connectades. Aquesta estratègia es centra en les relacions entre entitats per aprendre, en contrast amb les solucions tradicionals basades en instancies i els seus atributs. Discutim sobre la importància d'aquesta perspectiva connectivista (a la que ens referim com mineria de grafs) en el context actual on grans conjunts de dades basats en xarxes estan apareixent. El capítol finalitza amb la presentació del problema de Predicció d'Arestes (PA), junt amb una primera anàlisi de les seves limitacions actuals. El segon capítol presenta les primeres contribucions a la mineria de grafs, introduint problemes típicament solucionats mitjançant aquest paradigma. El capítol es centra en l'estat de l'art de PA. Presenta tres solucions diferents per al problema i argumenta la importància del més computacionalment escalable: els algoritmes basats en similitud. Categoritza aquests en tres tipus, i per als més escalables d'aquests, els algoritmes locals, s'identifica i es descriu formalment les propostes més competitives d'acord amb la bibliografia. El tercer capítol analitza el problema de PA, inicialment com a problema de classificació binari. Una llista de propietats de grafs són discutides en el context de la PA, com la direccionalitat o els pesos. Segueix una anàlisi del cost computacional en temps com en espai, dels algorismes basats en similitud. El capítol finalitza amb un estudi del desbalanceig de classes, freqüent en la PA. Al capítol quatre es presenta un nou algorisme basat en similitud per la PA. El capítol elabora sobre la importància de les jerarquies a la representació del coneixement a través de grafs dirigits. Varies modificacions es proposen per al nou algorisme. Aquest capítol també inclou una modificació sobre els actuals algorismes de similitud per a grafs no dirigits, per adaptar-los per a grafs dirigits. Les metodologies d'avaluació de la PA s'analitzen al cinquè capítol. Comença amb una discussió sobre els problemes que suposa avaluar un context amb un gran desbalanceig de classes, identificant les metodologies apropiades per aquests casos. Es proposa una modificació sobre el mètode més apropiat actualment disponible, per tal de centrar-se en les prediccions rellevants. Segueix una discussió sobre l'estimació fidedigna de la precisió dels predictors. El sisè capítol descriu els grafs usats per avaluar els algorismes, així com la metodologia usada per transformar-los en grafs dirigits. Les raons per triar aquest conjunt de grafs són exposades, posant especial interès al cas dels grafs web i a la seva ben coneguda relació amb les jerarquies. Les propietats més bàsiques de cada graf resultant són descrites. Els tests efectuats es mostren al capítol setè. Els tres algorismes actuals de PA més competitius són comparats amb ells mateixos i amb la versió per a grafs dirigits definida anteriorment. L'algorisme proposat anteriorment i les seves modificacions també són avaluats. El problema de la PA en grafs web es considera per separat, avaluant sis grafs web diferents. El capítol acaba amb una discussió sobre les limitacions de les avaluacions formals, mostrant exemples de prediccions obtingudes. El vuitè capítol inclou els aspectes computacionals de la tesi. Comença amb una discussió sobre la importància de la gestió de memòria per a la definició del cost computacional dels algorismes de PA. Inclou una proposta sobre com reduir aquest cost mitjançant una reducció en la precisió. Segueix una secció centrada en la paral·lelització del codi, que inclou dues implementacions diferents, una en un model de programació específic per grafs (Pregel) i una amb un model de programació paral·lela genèric (OpenMP). El capítol finalitza amb una especificació dels recursos computacionals usats per als tests realitzats. Les conclusions de la tesi es presenten al capítol novè, i les línies de treball futur al des

    Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses

    Full text link
    Graph Neural Networks (GNNs) are emerging as a powerful tool for learning from graph-structured data and performing sophisticated inference tasks in various application domains. Although GNNs have been shown to be effective on modest-sized graphs, training them on large-scale graphs remains a significant challenge due to lack of efficient data access and data movement methods. Existing frameworks for training GNNs use CPUs for graph sampling and feature aggregation, while the training and updating of model weights are executed on GPUs. However, our in-depth profiling shows the CPUs cannot achieve the throughput required to saturate GNN model training throughput, causing gross under-utilization of expensive GPU resources. Furthermore, when the graph and its embeddings do not fit in the CPU memory, the overhead introduced by the operating system, say for handling page-faults, comes in the critical path of execution. To address these issues, we propose the GPU Initiated Direct Storage Access (GIDS) dataloader, to enable GPU-oriented GNN training for large-scale graphs while efficiently utilizing all hardware resources, such as CPU memory, storage, and GPU memory with a hybrid data placement strategy. By enabling GPU threads to fetch feature vectors directly from storage, GIDS dataloader solves the memory capacity problem for GPU-oriented GNN training. Moreover, GIDS dataloader leverages GPU parallelism to tolerate storage latency and eliminates expensive page-fault overhead. Doing so enables us to design novel optimizations for exploiting locality and increasing effective bandwidth for GNN training. Our evaluation using a single GPU on terabyte-scale GNN datasets shows that GIDS dataloader accelerates the overall DGL GNN training pipeline by up to 392X when compared to the current, state-of-the-art DGL dataloader.Comment: Under Submission. Source code: https://github.com/jeongminpark417/GID

    Efficient Large Language Models Fine-Tuning On Graphs

    Full text link
    Learning from Text-Attributed Graphs (TAGs) has attracted significant attention due to its wide range of real-world applications. The rapid evolution of large language models (LLMs) has revolutionized the way we process textual data, which indicates a strong potential to replace shallow text embedding generally used in Graph Neural Networks (GNNs). However, we find that existing LLM approaches that exploit text information in graphs suffer from inferior computation and data efficiency. In this work, we introduce a novel and efficient approach for the end-to-end fine-tuning of Large Language Models (LLMs) on TAGs, named LEADING. The proposed approach maintains computation cost and memory overhead comparable to the graph-less fine-tuning of LLMs. Moreover, it transfers the rick knowledge in LLMs to downstream graph learning tasks effectively with limited labeled data in semi-supervised learning. Its superior computation and data efficiency are demonstrated through comprehensive experiments, offering a promising solution for a wide range of LLMs and graph learning tasks on TAGs

    Rethinking Efficiency and Redundancy in Training Large-scale Graphs

    Full text link
    Large-scale graphs are ubiquitous in real-world scenarios and can be trained by Graph Neural Networks (GNNs) to generate representation for downstream tasks. Given the abundant information and complex topology of a large-scale graph, we argue that redundancy exists in such graphs and will degrade the training efficiency. Unfortunately, the model scalability severely restricts the efficiency of training large-scale graphs via vanilla GNNs. Despite recent advances in sampling-based training methods, sampling-based GNNs generally overlook the redundancy issue. It still takes intolerable time to train these models on large-scale graphs. Thereby, we propose to drop redundancy and improve efficiency of training large-scale graphs with GNNs, by rethinking the inherent characteristics in a graph. In this paper, we pioneer to propose a once-for-all method, termed DropReef, to drop the redundancy in large-scale graphs. Specifically, we first conduct preliminary experiments to explore potential redundancy in large-scale graphs. Next, we present a metric to quantify the neighbor heterophily of all nodes in a graph. Based on both experimental and theoretical analysis, we reveal the redundancy in a large-scale graph, i.e., nodes with high neighbor heterophily and a great number of neighbors. Then, we propose DropReef to detect and drop the redundancy in large-scale graphs once and for all, helping reduce the training time while ensuring no sacrifice in the model accuracy. To demonstrate the effectiveness of DropReef, we apply it to recent state-of-the-art sampling-based GNNs for training large-scale graphs, owing to the high precision of such models. With DropReef leveraged, the training efficiency of models can be greatly promoted. DropReef is highly compatible and is offline performed, benefiting the state-of-the-art sampling-based GNNs in the present and future to a significant extent.Comment: 11 Page

    GraphM : an efficient storage system for high throughput of concurrent graph processing

    Get PDF
    With the rapidly growing demand of graph processing in the real world, a large number of iterative graph processing jobs run concurrently on the same underlying graph. However, the storage engines of existing graph processing frameworks are mainly designed for running an individual job. Our studies show that they are inefficient when running concurrent jobs due to the redundant data storage and access overhead. To cope with this issue, we develop an efficient storage system, called GraphM. It can be integrated into the existing graph processing systems to efficiently support concurrent iterative graph processing jobs for higher throughput by fully exploiting the similarities of the data accesses between these concurrent jobs. GraphM regularizes the traversing order of the graph partitions for concurrent graph processing jobs by streaming the partitions into the main memory and the Last-Level Cache (LLC) in a common order, and then processes the related jobs concurrently in a novel fine-grained synchronization. In this way, the concurrent jobs share the same graph structure data in the LLC/memory and also the data accesses to the graph, so as to amortize the storage consumption and the data access overhead. To demonstrate the efficiency of GraphM, we plug it into state-of-the-art graph processing systems, including GridGraph, GraphChi, PowerGraph, and Chaos. Experiments results show that GraphM improves the throughput by 1.73~13 times

    Computing methods for parallel processing and analysis on complex networks

    Get PDF
    Nowadays to solve some problems is required to model complex systems to simulate and understand its behavior. A good example of one of those complex systems is the Facebook Social Network, this system represents people and their relationships, Other example, the Internet composed by a vast number of servers, computers, modems and routers, All Science field (physics, economics political, and so on) have complex systems which are complex because of the big volume of data required to represent them and their fast change on their structure Analyze the behavior of these complex systems is important to create simulations or discover dynamics over it with main goal of understand how it works. Some complex systems cannot be easily modeled; We can begin by analyzing their structure, this is possible creating a network model, Mapping the problem´s entities and the relations between them. Some popular analysis over the structure of a network are: • The Community Detection – discover how their entities are grouped • Identify the most important entities – measure the node´s influence over the network • Features over whole network like – the diameter, number of triangles, clustering coefficient, and the shortest path between two entities. Multiple algorithms have been created to give a result to these analyses over the network model although if they are executed by one machine take a lot of time to complete the task or may not be executed due to machine limitation resources. As more demanding applications have been appearing to process the algorithms of these type of analysis, several parallel programming models and different kind of hardware architecture have been created to deal with the big input of data, reduce the time execution, save power consumption and enhance the efficiency in the computation in each machine also taking in mine the application requirements. Parallelize these algorithms are a challenge due to: • We need to analyze data dependence to implement a parallel version of the algorithm always taking in mine the scalability and the performance of the code. • Create a implementation of the algorithm for one parallel programming model like MapReduce (Apache Hadoop), RDD (Apache Spark), Pregel(Apache Giraph) these oriented to bigdata or HPC models how MPI + OpenMP , OmpSS or CUDA. • Distribute the data input over the processing platform for each node or offload it into accelerators such as GPU or FPGA and so on. • Store the data input and store the result of the processing requires techniques of Distribute file systems(HDFS), distribute NoSQL Data Bases (Object Data Bases, Graph Data Bases, Document Data Bases) or traditional relational Data Bases(oracle, SQL server). In this Master Thesis, we decided create Graph processing using Apache bigdata Tools mainly creating testing over MareNostrum III and the Amazon cloud for some Community Detection Algorithms using SNAP Graphs with ground-truth communities. Creating a comparative between their parallel computational time execution and scalability
    • …
    corecore