674 research outputs found

    DDSL: Efficient Subgraph Listing on Distributed and Dynamic Graphs

    Full text link
    Subgraph listing is a fundamental problem in graph theory and has wide applications in areas like sociology, chemistry, and social networks. Modern graphs can usually be large-scale as well as highly dynamic, which challenges the efficiency of existing subgraph listing algorithms. Recent works have shown the benefits of partitioning and processing big graphs in a distributed system, however, there is only few work targets subgraph listing on dynamic graphs in a distributed environment. In this paper, we propose an efficient approach, called Distributed and Dynamic Subgraph Listing (DDSL), which can incrementally update the results instead of running from scratch. DDSL follows a general distributed join framework. In this framework, we use a Neighbor-Preserved storage for data graphs, which takes bounded extra space and supports dynamic updating. After that, we propose a comprehensive cost model to estimate the I/O cost of listing subgraphs. Then based on this cost model, we develop an algorithm to find the optimal join tree for a given pattern. To handle dynamic graphs, we propose an efficient left-deep join algorithm to incrementally update the join results. Extensive experiments are conducted on real-world datasets. The results show that DDSL outperforms existing methods in dealing with both static dynamic graphs in terms of the responding time

    A Comparison of Big Data Frameworks on a Layered Dataflow Model

    Get PDF
    In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only informal (and often confusing) semantics is generally provided, all share a common underlying model, namely, the Dataflow model. The Dataflow model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.Comment: 19 pages, 6 figures, 2 tables, In Proc. of the 9th Intl Symposium on High-Level Parallel Programming and Applications (HLPP), July 4-5 2016, Muenster, German

    Efficient classification using parallel and scalable compressed model and Its application on intrusion detection

    Full text link
    In order to achieve high efficiency of classification in intrusion detection, a compressed model is proposed in this paper which combines horizontal compression with vertical compression. OneR is utilized as horizontal com-pression for attribute reduction, and affinity propagation is employed as vertical compression to select small representative exemplars from large training data. As to be able to computationally compress the larger volume of training data with scalability, MapReduce based parallelization approach is then implemented and evaluated for each step of the model compression process abovementioned, on which common but efficient classification methods can be directly used. Experimental application study on two publicly available datasets of intrusion detection, KDD99 and CMDC2012, demonstrates that the classification using the compressed model proposed can effectively speed up the detection procedure at up to 184 times, most importantly at the cost of a minimal accuracy difference with less than 1% on average

    Algorithmic skeleton framework for the orchestration of GPU computations

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaThe Graphics Processing Unit (GPU) is gaining popularity as a co-processor to the Central Processing Unit (CPU), due to its ability to surpass the latter’s performance in certain application fields. Nonetheless, harnessing the GPU’s capabilities is a non-trivial exercise that requires good knowledge of parallel programming. Thus, providing ways to extract such computational power has become an emerging research topic. In this context, there have been several proposals in the field of GPGPU (Generalpurpose Computation on Graphics Processing Unit) development. However, most of these still offer a low-level abstraction of the GPU computing model, forcing the developer to adapt application computations in accordance with the SPMD model, as well as to orchestrate the low-level details of the execution. On the other hand, the higher-level approaches have limitations that prevent the full exploitation of GPUs when the purpose goes beyond the simple offloading of a kernel. To this extent, our proposal builds on the recent trend of applying the notion of algorithmic patterns (skeletons) to GPU computing. We propose Marrow, a high-level algorithmic skeleton framework that expands the set of skeletons currently available in this field. Marrow’s skeletons orchestrate the execution of OpenCL computations and introduce optimizations that overlap communication and computation, thus conjoining programming simplicity with performance gains in many application scenarios. Additionally, these skeletons can be combined (nested) to create more complex applications. We evaluated the proposed constructs by confronting them against the comparable skeleton libraries for GPGPU, as well as against hand-tuned OpenCL programs. The results are favourable, indicating that Marrow’s skeletons are both flexible and efficient in the context of GPU computing.FCT-MCTES - financing the equipmen
    • …
    corecore