5,235 research outputs found

    Shared Memory Parallel Subgraph Enumeration

    Full text link
    The subgraph enumeration problem asks us to find all subgraphs of a target graph that are isomorphic to a given pattern graph. Determining whether even one such isomorphic subgraph exists is NP-complete---and therefore finding all such subgraphs (if they exist) is a time-consuming task. Subgraph enumeration has applications in many fields, including biochemistry and social networks, and interestingly the fastest algorithms for solving the problem for biochemical inputs are sequential. Since they depend on depth-first tree traversal, an efficient parallelization is far from trivial. Nevertheless, since important applications produce data sets with increasing difficulty, parallelism seems beneficial. We thus present here a shared-memory parallelization of the state-of-the-art subgraph enumeration algorithms RI and RI-DS (a variant of RI for dense graphs) by Bonnici et al. [BMC Bioinformatics, 2013]. Our strategy uses work stealing and our implementation demonstrates a significant speedup on real-world biochemical data---despite a highly irregular data access pattern. We also improve RI-DS by pruning the search space better; this further improves the empirical running times compared to the already highly tuned RI-DS.Comment: 18 pages, 12 figures, To appear at the 7th IEEE Workshop on Parallel / Distributed Computing and Optimization (PDCO 2017

    What does fault tolerant Deep Learning need from MPI?

    Full text link
    Deep Learning (DL) algorithms have become the de facto Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive - even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults - requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: What is needed from MPI for de- signing fault tolerant DL implementations? In this paper, we address this problem for permanent faults. We motivate the need for a fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM

    Parallel Frequent Item Set Mining with Selective Item Replication

    Get PDF
    Cataloged from PDF version of article.We introduce a transaction database distribution scheme that divides the frequent item set mining task in a top-down fashion. Our method operates on a graph where vertices correspond to frequent items and edges correspond to frequent item sets of size two. We show that partitioning this graph by a vertex separator is sufficient to decide a distribution of the items such that the subdatabases determined by the item distribution can be mined independently. This distribution entails an amount of data replication, which may be reduced by setting appropriate weights to vertices. The data distribution scheme is used in the design of two new parallel frequent item set mining algorithms. Both algorithms replicate the items that correspond to the separator. NoClique replicates the work induced by the separator and NoClique2 computes the same work collectively. Computational load balancing and minimization of redundant or collective work may be achieved by assigning appropriate load estimates to vertices. The experiments show favorable speedups on a system with small-to-medium number of processors for synthetic and real-world databases

    PARSECSs: Evaluating the impact of task parallelism in the PARSEC benchmark suite

    Get PDF
    In this work, we show how parallel applications can be implemented efficiently using task parallelism. We also evaluate the benefits of such parallel paradigm with respect to other approaches. We use the PARSEC benchmark suite as our test bed, which includes applications representative of a wide range of domains from HPC to desktop and server applications. We adopt different parallelization techniques, tailored to the needs of each application, to fully exploit the task-based model. Our evaluation shows that task parallelism achieves better performance than thread-based parallelization models, such as Pthreads. Our experimental results show that we can obtain scalability improvements up to 42% on a 16-core system and code size reductions up to 81%. Such reductions are achieved by removing from the source code application specific schedulers or thread pooling systems and transferring these responsibilities to the runtime system software.This work has been partially supported by the European Research Council under the European Union 7th FP, ERC Grant Agreement number 321253, by the Spanish Ministry of Science and Innovation under grant TIN2012-34557, by the Severo Ochoa Program, awarded by the Spanish Government, under grant SEV-2011-00067 and by the HiPEAC Network of Excellence. M. Moreto has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva post-doctoral fellowship number JCI-2012-15047, and M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Co-fund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243). Finally, the authors are grateful to the reviewers for their valuable comments, to the people from the Programming Models Group at BSC for their technical support, to the RoMoL team, and to Xavier Teruel, Roger Ferrer and Paul Caheny for their help in this work.Peer ReviewedPostprint (author's final draft
    corecore