9 research outputs found

    Peregrine: A Pattern-Aware Graph Mining System

    Full text link
    Graph mining workloads aim to extract structural properties of a graph by exploring its subgraph structures. General purpose graph mining systems provide a generic runtime to explore subgraph structures of interest with the help of user-defined functions that guide the overall exploration process. However, the state-of-the-art graph mining systems remain largely oblivious to the shape (or pattern) of the subgraphs that they mine. This causes them to: (a) explore unnecessary subgraphs; (b) perform expensive computations on the explored subgraphs; and, (c) hold intermediate partial subgraphs in memory; all of which affect their overall performance. Furthermore, their programming models are often tied to their underlying exploration strategies, which makes it difficult for domain users to express complex mining tasks. In this paper, we develop Peregrine, a pattern-aware graph mining system that directly explores the subgraphs of interest while avoiding exploration of unnecessary subgraphs, and simultaneously bypassing expensive computations throughout the mining process. We design a pattern-based programming model that treats "graph patterns" as first class constructs and enables Peregrine to extract the semantics of patterns, which it uses to guide its exploration. Our evaluation shows that Peregrine outperforms state-of-the-art distributed and single machine graph mining systems, and scales to complex mining tasks on larger graphs, while retaining simplicity and expressivity with its "pattern-first" programming approach.Comment: This is the full version of the paper appearing in the European Conference on Computer Systems (EuroSys), 202

    GraphM : an efficient storage system for high throughput of concurrent graph processing

    Get PDF
    With the rapidly growing demand of graph processing in the real world, a large number of iterative graph processing jobs run concurrently on the same underlying graph. However, the storage engines of existing graph processing frameworks are mainly designed for running an individual job. Our studies show that they are inefficient when running concurrent jobs due to the redundant data storage and access overhead. To cope with this issue, we develop an efficient storage system, called GraphM. It can be integrated into the existing graph processing systems to efficiently support concurrent iterative graph processing jobs for higher throughput by fully exploiting the similarities of the data accesses between these concurrent jobs. GraphM regularizes the traversing order of the graph partitions for concurrent graph processing jobs by streaming the partitions into the main memory and the Last-Level Cache (LLC) in a common order, and then processes the related jobs concurrently in a novel fine-grained synchronization. In this way, the concurrent jobs share the same graph structure data in the LLC/memory and also the data accesses to the graph, so as to amortize the storage consumption and the data access overhead. To demonstrate the efficiency of GraphM, we plug it into state-of-the-art graph processing systems, including GridGraph, GraphChi, PowerGraph, and Chaos. Experiments results show that GraphM improves the throughput by 1.73~13 times

    Exploiting Asynchrony for Performance and Fault Tolerance in Distributed Graph Processing

    No full text
    While various iterative graph algorithms can be expressed via asynchronous parallelism, lack of its proper understanding limits the performance benefits that can be achieved via informed relaxations. In this thesis, we capture the algorithmic intricacies and execution semantics that enable us to improve asynchronous processing and allow us to reason about semantics of asynchronous execution while leveraging its benefits. To this end, we specify the asynchronous processing model in a distributed setting by identifying key properties of read-write dependences and ordering of reads that expose the set of legal executions of an asynchronous program. And then, we develop techniques to exploit the availability of multiple legal executions by choosing faster executions that reduce communication and computation while processing static and dynamic graphs. For static graphs, we first develop a relaxed consistency protocol to allow the use of stale values during processing in order to eliminate long latency communication operations by up to 58%, hence accelerating the overall processing by a factor of 2. Then, to efficiently handle machine failures, we present a light-weight confined recovery strategy that quickly constructs an alternate execution state that may be different from any previously encountered program state, but is nevertheless a legal state that guarantees correct asynchronous semantics upon resumption of execution. Our confined recovery strategy enables the processing to finish 1.5-3.2x faster compared to the traditional recovery mechanism when failures impact 1-6 machines of a 16 machine cluster.We further design techniques based on computation reordering and incremental computation to amortize the computation and communication costs incurred in processing evolving graphs, hence accelerating their processing by up to 10x. Finally, to process streaming graphs, we develop a dynamic dependence based incremental processing technique that identifies the minimal set of computations required to calculate the change in results that reflects the mutation in graph structure. We show that this technique not only produces correct results, but also improves processing by 8.5-23.7x.Finally, we demonstrate the efficacy of asynchrony beyond distributed setting by leveraging it to design dynamic partitions that eliminate wasteful disk I/O involved in out-of-core graph processing by 25-76%

    CoRAL

    No full text

    A Comparative Study to Predict Bearing Degradation Using Discrete Wavelet Transform (DWT), Tabular Generative Adversarial Networks (TGAN) and Machine Learning Models

    No full text
    Prognostics and health management (PHM) is a framework to identify damage prior to its occurrence which leads to the reduction of both maintenance costs and safety hazards. Based on the data collected in condition monitoring, the degradation of the part is predicted. Studies show that most failures are caused by faults in rolling element bearing, which highlights that a bearing is one of the most important mechanical components of any machine. Thus, it becomes important to monitor bearing degradation to make sure that it is utilized properly. Generally, machine learning (ML) or deep learning (DL) techniques are utilized to predict bearing degradation using a data-driven approach, where signals are captured from the machine. There should be a large amount of data to apply either ML or DL techniques, but it is difficult to collect that amount of data directly from any machine. In this study, health assessment is carried out using the correlation coefficient to divide the bearing life into two degradation stages. The raw signal is processed using discrete wavelet transform (DWT), where mutual information (MI) is used to rank and select the base wavelet, after which tabular generative adversarial networks (TGAN) are used to generate the artificial coefficients. Statistical features are calculated from the real data (DWT coefficients) and the artificial data (generated from TGAN). The constructed feature vector is then used as an input to train machine learning models, namely ensemble bagged tree (EBT) and Gaussian process regression with the squared exponential kernel function (SEGPR), to estimate bearing degradation conditions. Both the machine learning models were validated on the publicly available experimental data of FEMTO bearing. Obtained results showed that the developed EBT and SEGPR models accurately predicted the bearing degradation conditions with the average lowest RMSE value of 0.0045 and MAE value of 0.0037

    Abstracts of National Conference on Research and Developments in Material Processing, Modelling and Characterization 2020

    No full text
    This book presents the abstracts of the papers presented to the Online National Conference on Research and Developments in Material Processing, Modelling and Characterization 2020 (RDMPMC-2020) held on 26th and 27th August 2020 organized by the Department of Metallurgical and Materials Science in Association with the Department of Production and Industrial Engineering, National Institute of Technology Jamshedpur, Jharkhand, India. Conference Title: National Conference on Research and Developments in Material Processing, Modelling and Characterization 2020Conference Acronym: RDMPMC-2020Conference Date: 26–27 August 2020Conference Location: Online (Virtual Mode)Conference Organizer: Department of Metallurgical and Materials Engineering, National Institute of Technology JamshedpurCo-organizer: Department of Production and Industrial Engineering, National Institute of Technology Jamshedpur, Jharkhand, IndiaConference Sponsor: TEQIP-
    corecore